From da@ski.org Tue Aug 3 00:01:26 1999 From: da@ski.org (David Ascher) Date: Mon, 2 Aug 1999 16:01:26 -0700 (Pacific Daylight Time) Subject: [Python-Dev] Pickling w/ low overhead Message-ID: An issue which has dogged the NumPy project is that there is (to my knowledge) no way to pickle very large arrays without creating strings which contain all of the data. This can be a problem given that NumPy arrays tend to be very large -- often several megabytes, sometimes much bigger. This slows things down, sometimes a lot, depending on the platform. It seems that it should be possible to do something more efficient. Two alternatives come to mind: -- define a new pickling protocol which passes a file-like object to the instance and have the instance write itself to that file, being as efficient or inefficient as it cares to. This protocol is used only if the instance/type defines the appropriate slot. Alternatively, enrich the semantics of the getstate interaction, so that an object can return partial data and tell the pickling mechanism to come back for more. -- make pickling of objects which support the buffer interface use that inteface's notion of segments and use that 'chunk' size to do something more efficient if not necessarily most efficient. (oh, and make NumPy arrays support the buffer interface =). This is simple for NumPy arrays since we want to pickle "everything", but may not be what other buffer-supporting objects want. Thoughts? Alternatives? --david From mhammond@skippinet.com.au Tue Aug 3 01:41:23 1999 From: mhammond@skippinet.com.au (Mark Hammond) Date: Tue, 3 Aug 1999 10:41:23 +1000 Subject: [Python-Dev] Buffer interface in abstract.c? Message-ID: <001001bedd48$ea796280$1101a8c0@bobcat> Hi all, Im trying to slowly wean myself over to the buffer interfaces. My exploration so far indicates that, for most cases, simply replacing "PyString_FromStringAndSize" with "PyBuffer_FromMemory" handles the vast majority of cases, and is preferred when the data contains arbitary bytes. PyArg_ParseTuple("s#", ...) still works correctly as we would hope. However, performing this explicitly is a pain. Looking at getargs.c, the code to achieve this is a little too convoluted to cut-and-paste each time. Therefore, I would like to propose these functions to be added to abstract.c: int PyObject_GetBufferSize(); void *PyObject_GetReadWriteBuffer(); /* or "char *"? */ const void *PyObject_GetReadOnlyBuffer(); Although equivalent functions exist for the buffer object, I can't see the equivalent abstract implementations - ie, that work with any object supporting the protocol. Im willing to provide a patch if there is agreement a) the general idea is good, and b) my specific spelling of the idea is OK (less likely - PyBuffer_* seems better, but loses any implication of being abstract?). Thoughts? Mark. From gstein@lyra.org Tue Aug 3 02:51:43 1999 From: gstein@lyra.org (Greg Stein) Date: Mon, 02 Aug 1999 18:51:43 -0700 Subject: [Python-Dev] Buffer interface in abstract.c? References: <001001bedd48$ea796280$1101a8c0@bobcat> Message-ID: <37A64B2F.3386F0A9@lyra.org> Mark Hammond wrote: > ... > Therefore, I would like to propose these functions to be added to > abstract.c: > > int PyObject_GetBufferSize(); > void *PyObject_GetReadWriteBuffer(); /* or "char *"? */ > const void *PyObject_GetReadOnlyBuffer(); > > Although equivalent functions exist for the buffer object, I can't see the > equivalent abstract implementations - ie, that work with any object > supporting the protocol. > > Im willing to provide a patch if there is agreement a) the general idea is > good, and b) my specific spelling of the idea is OK (less likely - > PyBuffer_* seems better, but loses any implication of being abstract?). Marc-Andre proposed exactly the same thing back at the end of March (to me and Guido). The two of us hashed out some of the stuff and M.A. came up with a full patch for the stuff. Guido was relatively non-committal at the point one way or another, but said they seemed fine. It appears the stuff never made it into source control. If Marc-Andre can resurface the final proposal/patch, then we'd be set. Until then: use the bufferprocs :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal@lemburg.com Tue Aug 3 10:11:11 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 03 Aug 1999 11:11:11 +0200 Subject: [Python-Dev] Pickling w/ low overhead References: Message-ID: <37A6B22F.7A14BA2C@lemburg.com> David Ascher wrote: > > An issue which has dogged the NumPy project is that there is (to my > knowledge) no way to pickle very large arrays without creating strings > which contain all of the data. This can be a problem given that NumPy > arrays tend to be very large -- often several megabytes, sometimes much > bigger. This slows things down, sometimes a lot, depending on the > platform. It seems that it should be possible to do something more > efficient. > > Two alternatives come to mind: > > -- define a new pickling protocol which passes a file-like object to the > instance and have the instance write itself to that file, being as > efficient or inefficient as it cares to. This protocol is used only > if the instance/type defines the appropriate slot. Alternatively, > enrich the semantics of the getstate interaction, so that an object > can return partial data and tell the pickling mechanism to come back > for more. > > -- make pickling of objects which support the buffer interface use that > inteface's notion of segments and use that 'chunk' size to do > something more efficient if not necessarily most efficient. (oh, and > make NumPy arrays support the buffer interface =). This is simple > for NumPy arrays since we want to pickle "everything", but may not be > what other buffer-supporting objects want. > > Thoughts? Alternatives? Hmm, types can register their own pickling/unpickling functions via copy_reg, so they can access the self.write method in pickle.py to implement the write to file interface. Don't know how this would be done for cPickle.c though. For instances the situation is different since there is no dispatching done on a per-class basis. I guess an optional argument could help here. Perhaps some lazy pickling wrapper would help fix this in general: an object which calls back into the to-be-pickled object to access the data rather than store the data in a huge string. Yet another idea would be using memory mapped files instead of strings as temporary storage (but this is probably hard to implement right and not as portable). Dunno... just some thoughts. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 150 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Tue Aug 3 08:50:33 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 03 Aug 1999 09:50:33 +0200 Subject: [Python-Dev] Buffer interface in abstract.c? References: <001001bedd48$ea796280$1101a8c0@bobcat> <37A64B2F.3386F0A9@lyra.org> Message-ID: <37A69F49.3575AE85@lemburg.com> Greg Stein wrote: > > Mark Hammond wrote: > > ... > > Therefore, I would like to propose these functions to be added to > > abstract.c: > > > > int PyObject_GetBufferSize(); > > void *PyObject_GetReadWriteBuffer(); /* or "char *"? */ > > const void *PyObject_GetReadOnlyBuffer(); > > > > Although equivalent functions exist for the buffer object, I can't see the > > equivalent abstract implementations - ie, that work with any object > > supporting the protocol. > > > > Im willing to provide a patch if there is agreement a) the general idea is > > good, and b) my specific spelling of the idea is OK (less likely - > > PyBuffer_* seems better, but loses any implication of being abstract?). > > Marc-Andre proposed exactly the same thing back at the end of March (to > me and Guido). The two of us hashed out some of the stuff and M.A. came > up with a full patch for the stuff. Guido was relatively non-committal > at the point one way or another, but said they seemed fine. It appears > the stuff never made it into source control. > > If Marc-Andre can resurface the final proposal/patch, then we'd be set. Below is the code I currently use. I don't really remember if this is what Greg and I discussed a while back, but I'm sure he'll correct me ;-) Note that you the buffer length is implicitly returned by these APIs. /* Takes an arbitrary object which must support the character (single segment) buffer interface and returns a pointer to a read-only memory location useable as character based input for subsequent processing. buffer and buffer_len are only set in case no error occurrs. Otherwise, -1 is returned and an exception set. */ static int PyObject_AsCharBuffer(PyObject *obj, const char **buffer, int *buffer_len) { PyBufferProcs *pb = obj->ob_type->tp_as_buffer; const char *pp; int len; if ( pb == NULL || pb->bf_getcharbuffer == NULL || pb->bf_getsegcount == NULL ) { PyErr_SetString(PyExc_TypeError, "expected a character buffer object"); goto onError; } if ( (*pb->bf_getsegcount)(obj,NULL) != 1 ) { PyErr_SetString(PyExc_TypeError, "expected a single-segment buffer object"); goto onError; } len = (*pb->bf_getcharbuffer)(obj,0,&pp); if (len < 0) goto onError; *buffer = pp; *buffer_len = len; return 0; onError: return -1; } /* Same as PyObject_AsCharBuffer() except that this API expects a readable (single segment) buffer interface and returns a pointer to a read-only memory location which can contain arbitrary data. buffer and buffer_len are only set in case no error occurrs. Otherwise, -1 is returned and an exception set. */ static int PyObject_AsReadBuffer(PyObject *obj, const void **buffer, int *buffer_len) { PyBufferProcs *pb = obj->ob_type->tp_as_buffer; void *pp; int len; if ( pb == NULL || pb->bf_getreadbuffer == NULL || pb->bf_getsegcount == NULL ) { PyErr_SetString(PyExc_TypeError, "expected a readable buffer object"); goto onError; } if ( (*pb->bf_getsegcount)(obj,NULL) != 1 ) { PyErr_SetString(PyExc_TypeError, "expected a single-segment buffer object"); goto onError; } len = (*pb->bf_getreadbuffer)(obj,0,&pp); if (len < 0) goto onError; *buffer = pp; *buffer_len = len; return 0; onError: return -1; } /* Takes an arbitrary object which must support the writeable (single segment) buffer interface and returns a pointer to a writeable memory location in buffer of size buffer_len. buffer and buffer_len are only set in case no error occurrs. Otherwise, -1 is returned and an exception set. */ static int PyObject_AsWriteBuffer(PyObject *obj, void **buffer, int *buffer_len) { PyBufferProcs *pb = obj->ob_type->tp_as_buffer; void*pp; int len; if ( pb == NULL || pb->bf_getwritebuffer == NULL || pb->bf_getsegcount == NULL ) { PyErr_SetString(PyExc_TypeError, "expected a writeable buffer object"); goto onError; } if ( (*pb->bf_getsegcount)(obj,NULL) != 1 ) { PyErr_SetString(PyExc_TypeError, "expected a single-segment buffer object"); goto onError; } len = (*pb->bf_getwritebuffer)(obj,0,&pp); if (len < 0) goto onError; *buffer = pp; *buffer_len = len; return 0; onError: return -1; } -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 150 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jack@oratrix.nl Tue Aug 3 10:53:39 1999 From: jack@oratrix.nl (Jack Jansen) Date: Tue, 03 Aug 1999 11:53:39 +0200 Subject: [Python-Dev] Buffer interface in abstract.c? In-Reply-To: Message by "M.-A. Lemburg" , Tue, 03 Aug 1999 09:50:33 +0200 , <37A69F49.3575AE85@lemburg.com> Message-ID: <19990803095339.E02CE303120@snelboot.oratrix.nl> Why not pass the index to the As*Buffer routines as well and make getsegcount available too? Then you could code things like for(i=0; i Message-ID: <37A6C387.7360D792@lyra.org> Jack Jansen wrote: > > Why not pass the index to the As*Buffer routines as well and make getsegcount > available too? Then you could code things like > for(i=0; i if ( PyObject_AsCharBuffer(obj, &buf, &count, i) < 0 ) > return -1; > write(fp, buf, count); > } Simply because multiple segments hasn't been seen. All objects supporting the buffer interface have a single segment. IMO, it is best to drop the argument to make typical usage easier. For handling multiple segments, a caller can use the raw interface rather than the handy functions. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jim@digicool.com Tue Aug 3 11:58:54 1999 From: jim@digicool.com (Jim Fulton) Date: Tue, 03 Aug 1999 06:58:54 -0400 Subject: [Python-Dev] Buffer interface in abstract.c? References: <001001bedd48$ea796280$1101a8c0@bobcat> Message-ID: <37A6CB6E.C990F561@digicool.com> Mark Hammond wrote: > > Hi all, > Im trying to slowly wean myself over to the buffer interfaces. OK, I'll bite. Where is the buffer interface documented? I found references to it in various places (e.g. built-in buffer()) but didn't find the interface itself. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From mal@lemburg.com Tue Aug 3 12:06:46 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 03 Aug 1999 13:06:46 +0200 Subject: [Python-Dev] Buffer interface in abstract.c? References: <19990803095339.E02CE303120@snelboot.oratrix.nl> Message-ID: <37A6CD46.642A9C6D@lemburg.com> Jack Jansen wrote: > > Why not pass the index to the As*Buffer routines as well and make getsegcount > available too? Then you could code things like > for(i=0; i if ( PyObject_AsCharBuffer(obj, &buf, &count, i) < 0 ) > return -1; > write(fp, buf, count); > } Well, just like Greg said, this is not much different than using the buffer interface directly. While the above would be a handy PyObject_WriteAsBuffer() kind of helper, I don't think that this is really used all that much. E.g. in mxODBC I use the APIs for accessing the raw char data in a buffer: the pointer is passed directly to the ODBC APIs without copying, which makes things quite fast. IMHO, this is the greatest advantage of the buffer interface. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 150 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Fred L. Drake, Jr." References: <001001bedd48$ea796280$1101a8c0@bobcat> <37A64B2F.3386F0A9@lyra.org> Message-ID: <14246.59808.561395.761772@weyr.cnri.reston.va.us> Greg Stein writes: > Until then: use the bufferprocs :-) Greg, On the topic of the buffer interface: Have you written documentation for this that I can include in the API reference? Bugging you about this is on my to-do list. ;-) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mal@lemburg.com Tue Aug 3 12:29:43 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 03 Aug 1999 13:29:43 +0200 Subject: [Python-Dev] Buffer interface in abstract.c? References: <001001bedd48$ea796280$1101a8c0@bobcat> <37A6CB6E.C990F561@digicool.com> Message-ID: <37A6D2A7.27F27554@lemburg.com> Jim Fulton wrote: > > Mark Hammond wrote: > > > > Hi all, > > Im trying to slowly wean myself over to the buffer interfaces. > > OK, I'll bite. Where is the buffer interface documented? I found references > to it in various places (e.g. built-in buffer()) but didn't find the interface > itself. I guess it's a read-the-source feature :-) Objects/bufferobject.c and Include/object.h provide a start. Objects/stringobject.c has a "sample" implementation. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 150 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jack@oratrix.nl Tue Aug 3 15:45:25 1999 From: jack@oratrix.nl (Jack Jansen) Date: Tue, 03 Aug 1999 16:45:25 +0200 Subject: [Python-Dev] Buffer interface in abstract.c? In-Reply-To: Message by Greg Stein , Tue, 03 Aug 1999 03:25:11 -0700 , <37A6C387.7360D792@lyra.org> Message-ID: <19990803144526.6B796303120@snelboot.oratrix.nl> > > Why not pass the index to the As*Buffer routines as well and make getsegcount > > available too? > > Simply because multiple segments hasn't been seen. All objects > supporting the buffer interface have a single segment. Hmm. And I went out of my way to include this stupid multi-buffer stuff because the NumPy folks said they couldn't live without it (and one of the reasons for the buffer stuff was to allow NumPy arrays, which may be discontiguous, to be written efficiently). Can someone confirm that the Numeric stuff indeed doesn't use this? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From da@ski.org Tue Aug 3 17:19:19 1999 From: da@ski.org (David Ascher) Date: Tue, 3 Aug 1999 09:19:19 -0700 (Pacific Daylight Time) Subject: [Python-Dev] Pickling w/ low overhead In-Reply-To: <37A6B22F.7A14BA2C@lemburg.com> Message-ID: On Tue, 3 Aug 1999, M.-A. Lemburg wrote: > Hmm, types can register their own pickling/unpickling functions > via copy_reg, so they can access the self.write method in pickle.py > to implement the write to file interface. Are you sure? My understanding of copy_reg is, as stated in the doc: pickle (type, function[, constructor]) Declares that function should be used as a ``reduction'' function for objects of type or class type. function should return either a string or a tuple. The optional constructor parameter, if provided, is a callable object which can be used to reconstruct the object when called with the tuple of arguments returned by function at pickling time. How does one access the 'self.write method in pickle.py'? > Perhaps some lazy pickling wrapper would help fix this in general: > an object which calls back into the to-be-pickled object to > access the data rather than store the data in a huge string. Right. That's an idea. > Yet another idea would be using memory mapped files instead > of strings as temporary storage (but this is probably hard to implement > right and not as portable). That's a very interesting idea! I'll try that -- it might just be the easiest way to do this. I think that portability isn't a huge concern -- the folks who are coming up with the speed issue are on platforms which have mmap support. Thanks for the suggestions. --david From da@ski.org Tue Aug 3 17:20:37 1999 From: da@ski.org (David Ascher) Date: Tue, 3 Aug 1999 09:20:37 -0700 (Pacific Daylight Time) Subject: [Python-Dev] Buffer interface in abstract.c? In-Reply-To: <37A6C387.7360D792@lyra.org> Message-ID: On Tue, 3 Aug 1999, Greg Stein wrote: > Simply because multiple segments hasn't been seen. All objects > supporting the buffer interface have a single segment. IMO, it is best FYI, if/when NumPy objects support the buffer API, they will require multiple-segments. From da@ski.org Tue Aug 3 17:23:31 1999 From: da@ski.org (David Ascher) Date: Tue, 3 Aug 1999 09:23:31 -0700 (Pacific Daylight Time) Subject: [Python-Dev] Buffer interface in abstract.c? In-Reply-To: <19990803144526.6B796303120@snelboot.oratrix.nl> Message-ID: On Tue, 3 Aug 1999, Jack Jansen wrote: > > > Why not pass the index to the As*Buffer routines as well and make getsegcount > > > available too? > > > > Simply because multiple segments hasn't been seen. All objects > > supporting the buffer interface have a single segment. > > Hmm. And I went out of my way to include this stupid multi-buffer stuff > because the NumPy folks said they couldn't live without it (and one of the > reasons for the buffer stuff was to allow NumPy arrays, which may be > discontiguous, to be written efficiently). > > Can someone confirm that the Numeric stuff indeed doesn't use this? /usr/LLNLDistribution/Numerical/Include$ grep buffer *.h /usr/LLNLDistribution/Numerical/Include$ Yes. =) See the other thread on low-overhead pickling. But again, *if* multiarrays supported the buffer interface, they'd have to use the multi-segment feature (repeating myself). --david From mal@lemburg.com Tue Aug 3 20:17:16 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 03 Aug 1999 21:17:16 +0200 Subject: [Python-Dev] Pickling w/ low overhead References: Message-ID: <37A7403C.3BC05D02@lemburg.com> David Ascher wrote: > > On Tue, 3 Aug 1999, M.-A. Lemburg wrote: > > > Hmm, types can register their own pickling/unpickling functions > > via copy_reg, so they can access the self.write method in pickle.py > > to implement the write to file interface. > > Are you sure? My understanding of copy_reg is, as stated in the doc: > > pickle (type, function[, constructor]) > Declares that function should be used as a ``reduction'' function for > objects of type or class type. function should return either a string > or a tuple. The optional constructor parameter, if provided, is a > callable object which can be used to reconstruct the object when > called with the tuple of arguments returned by function at pickling > time. > > How does one access the 'self.write method in pickle.py'? Ooops. Sorry, that doesn't work... well at least not using "normal" Python ;-) You could of course simply go up one stack frame and then grab the self object and then... well, you know... -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 150 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From skip@mojam.com (Skip Montanaro) Tue Aug 3 21:47:04 1999 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 3 Aug 1999 15:47:04 -0500 (CDT) Subject: [Python-Dev] Pickling w/ low overhead In-Reply-To: References: Message-ID: <14247.21628.225029.392711@dolphin.mojam.com> David> An issue which has dogged the NumPy project is that there is (to David> my knowledge) no way to pickle very large arrays without creating David> strings which contain all of the data. This can be a problem David> given that NumPy arrays tend to be very large -- often several David> megabytes, sometimes much bigger. This slows things down, David> sometimes a lot, depending on the platform. It seems that it David> should be possible to do something more efficient. David, Using __getstate__/__setstate__, could you create a compressed representation using zlib or some other scheme? I don't know how well numeric data compresses in general, but that might help. Also, I trust you use cPickle when it's available, yes? Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/~skip/ 847-475-3758 From da@ski.org Tue Aug 3 21:58:23 1999 From: da@ski.org (David Ascher) Date: Tue, 3 Aug 1999 13:58:23 -0700 (Pacific Daylight Time) Subject: [Python-Dev] Pickling w/ low overhead In-Reply-To: <14247.21628.225029.392711@dolphin.mojam.com> Message-ID: On Tue, 3 Aug 1999, Skip Montanaro wrote: > Using __getstate__/__setstate__, could you create a compressed > representation using zlib or some other scheme? I don't know how well > numeric data compresses in general, but that might help. Also, I trust you > use cPickle when it's available, yes? I *really* hate to admit it, but I've found the source of the most massive problem in the pickling process that I was using. I didn't use binary mode, which meant that the huge strings were written & read one-character-at-a-time. I think I'll put a big fat note in the NumPy doc to that effect. (note that luckily this just affected my usage, not all NumPy users). --da From gstein@lyra.org Wed Aug 4 20:15:27 1999 From: gstein@lyra.org (Greg Stein) Date: Wed, 04 Aug 1999 12:15:27 -0700 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/api api.tex References: <199908041313.JAA26344@weyr.cnri.reston.va.us> Message-ID: <37A8914F.6F5B9971@lyra.org> Fred L. Drake wrote: > > Update of /projects/cvsroot/python/dist/src/Doc/api > In directory weyr:/home/fdrake/projects/python/Doc/api > > Modified Files: > api.tex > Log Message: > > Started documentation on buffer objects & types. Very preliminary. > > Greg Stein: Please help with this; it's your baby! > > _______________________________________________ > Python-checkins mailing list > Python-checkins@python.org > http://www.python.org/mailman/listinfo/python-checkins All righty. I'll send some doc on this stuff. Somebody else did the initial buffer interface, but it seems that it has fallen to me now :-) Please give me a little while to get to this, though. I'm in and out of town for the next four weeks. I'm in the process of moving into a new house in Palo Alto, CA, and I'm travelling back and forth until Anni and I move for real in September. I should be able to get to this by the weekend, or possibly in a couple weeks. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Fred L. Drake, Jr." References: <199908041313.JAA26344@weyr.cnri.reston.va.us> <37A8914F.6F5B9971@lyra.org> Message-ID: <14248.43498.664539.597656@weyr.cnri.reston.va.us> Greg Stein writes: > All righty. I'll send some doc on this stuff. Somebody else did the > initial buffer interface, but it seems that it has fallen to me now :-) I was not aware that you were not the origin of this work; feel free to pass it to the right person. > Please give me a little while to get to this, though. I'm in and out of > town for the next four weeks. I'm in the process of > moving into a new house in Palo Alto, CA, and I'm travelling back and > forth until Anni and I move for real in September. Cool! > I should be able to get to this by the weekend, or possibly in a couple > weeks. That's good enough for me. I expect it may be a couple of months or more before I try and get another release out with various fixes and additions. There's not a huge need to update the released doc set, other than a few embarassing editorial...er, "oversights" (!). -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From jack@oratrix.nl Thu Aug 5 10:57:33 1999 From: jack@oratrix.nl (Jack Jansen) Date: Thu, 05 Aug 1999 11:57:33 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/api api.tex In-Reply-To: Message by Greg Stein , Wed, 04 Aug 1999 12:15:27 -0700 , <37A8914F.6F5B9971@lyra.org> Message-ID: <19990805095733.69D90303120@snelboot.oratrix.nl> > All righty. I'll send some doc on this stuff. Somebody else did the > initial buffer interface, but it seems that it has fallen to me now :-) I think I did, but I gladly bequeath it to you. (Hmm, that's the first time I typed "bequeath", I think). -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From fredrik@pythonware.com Thu Aug 5 16:46:43 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Thu, 5 Aug 1999 17:46:43 +0200 Subject: [Python-Dev] Buffer interface in abstract.c? References: Message-ID: <009801bedf59$b8150020$f29b12c2@secret.pythonware.com> > > Simply because multiple segments hasn't been seen. All objects > > supporting the buffer interface have a single segment. IMO, it is best > > FYI, if/when NumPy objects support the buffer API, they will require > multiple-segments. same goes for PIL. in the worst case, there's one segment per line. ... on the other hand, I think something is missing from the buffer design; I definitely don't like that people can write and marshal objects that happen to implement the buffer interface, only to find that Python didn't do what they expected... >>> import unicode >>> import marshal >>> u = unicode.unicode >>> s = u("foo") >>> data = marshal.dumps(s) >>> marshal.loads(data) 'f\000o\000o\000' >>> type(marshal.loads(data)) as for PIL, I would also prefer if the exported buffer corresponded to what you get from im.tostring(). iirc, that cannot be done -- I cannot export via a temporary memory buffer, since there's no way to know when to get rid of it... From jack@oratrix.nl Thu Aug 5 21:59:46 1999 From: jack@oratrix.nl (Jack Jansen) Date: Thu, 05 Aug 1999 22:59:46 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: Message by "Fredrik Lundh" , Thu, 5 Aug 1999 17:46:43 +0200 , <009801bedf59$b8150020$f29b12c2@secret.pythonware.com> Message-ID: <19990805205952.531B9E267A@oratrix.oratrix.nl> Recently, "Fredrik Lundh" said: > on the other hand, I think something is missing from > the buffer design; I definitely don't like that people > can write and marshal objects that happen to > implement the buffer interface, only to find that > Python didn't do what they expected... > > >>> import unicode > >>> import marshal > >>> u = unicode.unicode > >>> s = u("foo") > >>> data = marshal.dumps(s) > >>> marshal.loads(data) > 'f\000o\000o\000' > >>> type(marshal.loads(data)) > Hmm. Looking at the code there is a catchall at the end, with a comment explicitly saying "Write unknown buffer-style objects as a string". IMHO this is an incorrect design, but that's a bit philosophical (so I'll gladly defer to Our Great Philosopher if he has anything to say on the matter:-). Unless, of course, there are buffer-style non-string objects around that are better read back as strings than not read back at all. Hmm again, I think I'd like it better if marshal.dumps() would barf on attempts to write unrepresentable data. Currently unrepresentable objects are written as TYPE_UNKNOWN (unless they have bufferness (or should I call that "a buffer-aspect"? :-)), which means you think you are writing correctly marshalled data but you'll be in for an exception when you try to read it back... -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From akuchlin@mems-exchange.org Thu Aug 5 23:24:03 1999 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Thu, 5 Aug 1999 18:24:03 -0400 (EDT) Subject: [Python-Dev] mmapfile module Message-ID: <199908052224.SAA24159@amarok.cnri.reston.va.us> A while back the suggestion was made that the mmapfile module be added to the core distribution, and there was a guardedly positive reaction. Should I go ahead and do that? No one reported any problems when I asked for bug reports, but that was probably because no one tried it; putting it in the core would cause more people to try it. I suppose this leads to a more important question: at what point should we start checking 1.6-only things into the CVS tree? For example, once the current alphas of the re module are up to it (they're not yet), when should they be checked in? -- A.M. Kuchling http://starship.python.net/crew/amk/ Kids! Bringing about Armageddon can be dangerous. Do not attempt it in your home. -- Terry Pratchett & Neil Gaiman, _Good Omens_ From bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Fri Aug 6 03:10:18 1999 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) (Barry A. Warsaw) Date: Thu, 5 Aug 1999 22:10:18 -0400 (EDT) Subject: [Python-Dev] mmapfile module References: <199908052224.SAA24159@amarok.cnri.reston.va.us> Message-ID: <14250.17418.781127.684009@anthem.cnri.reston.va.us> >>>>> "AMK" == Andrew M Kuchling writes: AMK> I suppose this leads to a more important question: at what AMK> point should we start checking 1.6-only things into the CVS AMK> tree? For example, once the current alphas of the re module AMK> are up to it (they're not yet), when should they be checked AMK> in? Good question. I've had a bunch of people ask about the string methods branch, which I'm assuming will be a 1.6 feature, and I'd like to get that checked in at some point too. I think what's holding this up is that Guido hasn't decided whether there will be a patch release to 1.5.2 or not. -Barry From tim_one@email.msn.com Fri Aug 6 03:26:06 1999 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 5 Aug 1999 22:26:06 -0400 Subject: [Python-Dev] mmapfile module In-Reply-To: <199908052224.SAA24159@amarok.cnri.reston.va.us> Message-ID: <000201bedfb3$09a99000$98a22299@tim> [Andrew M. Kuchling] > ... > I suppose this leads to a more important question: at what point > should we start checking 1.6-only things into the CVS tree? For > example, once the current alphas of the re module are up to it > (they're not yet), when should they be checked in? I'd like to see a bugfix release of 1.5.2 put out first, then have at it. There are several bugfixes that ought to go out ASAP. Thread tstate races, the cpickle/cookie.py snafu, and playing nice with current Tcl/Tk pop to mind immediately. I'm skeptical that anyone other than Guido could decide what *needs* to go out, so it's a good thing he's got nothing to do . one-boy's-opinion-ly y'rs - tim From mhammond@skippinet.com.au Fri Aug 6 04:30:55 1999 From: mhammond@skippinet.com.au (Mark Hammond) Date: Fri, 6 Aug 1999 13:30:55 +1000 Subject: [Python-Dev] mmapfile module In-Reply-To: <000201bedfb3$09a99000$98a22299@tim> Message-ID: <00a801bedfbc$1871a7e0$1101a8c0@bobcat> [Tim laments] > mind immediately. I'm skeptical that anyone other than Guido > could decide > what *needs* to go out, so it's a good thing he's got nothing > to do . He has been very quiet recently - where are you hiding Guido. > one-boy's-opinion-ly y'rs - tim Here is another. Lets take a different tack - what has been checked in since 1.5.2 that should _not_ go out - ie, is too controversial? If nothing else, makes a good starting point, and may help Guido out: Below summary of the CVS diff I just did, and categorized by my opinion. It turns out that most of the changes would appear candidates. While not actually "bug-fixes", many have better documentation, removal of unused imports etc, so would definately not hurt to get out. Looks like some build issues have been fixed too. Apart from possibly Tim's recent "UnboundLocalError" (which is the only serious behaviour change) I can't see anything that should obviously be ommitted. Hopefully this is of interest... [Disclaimer - lots of files here - it is quite possible I missed something...] Mark. UNCONTROVERSIAL: ---------------- RCS file: /projects/cvsroot/python/dist/src/README,v RCS file: /projects/cvsroot/python/dist/src/Lib/cgi.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/ftplib.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/poplib.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/re.py,v RCS file: /projects/cvsroot/python/dist/src/Tools/audiopy/README,v Doc changes. RCS file: /projects/cvsroot/python/dist/src/Lib/SimpleHTTPServer.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/cmd.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/htmllib.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/netrc.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/pipes.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/pty.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/shlex.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/urlparse.py,v Remove unused imports RCS file: /projects/cvsroot/python/dist/src/Lib/pdb.py,v Remove unused globals RCS file: /projects/cvsroot/python/dist/src/Lib/popen2.py,v Change to cleanup RCS file: /projects/cvsroot/python/dist/src/Lib/profile.py,v Remove unused imports and changes to comments. RCS file: /projects/cvsroot/python/dist/src/Lib/pyclbr.py,v Better doc, and support for module level functions. RCS file: /projects/cvsroot/python/dist/src/Lib/repr.py,v self.maxlist changed to self.maxdict RCS file: /projects/cvsroot/python/dist/src/Lib/rfc822.py,v Doc changes, and better date handling. RCS file: /projects/cvsroot/python/dist/src/configure,v RCS file: /projects/cvsroot/python/dist/src/configure.in,v Looks like FreeBSD build flag changes. RCS file: /projects/cvsroot/python/dist/src/Demo/classes/bitvec.py,v RCS file: /projects/cvsroot/python/dist/src/Python/pythonrun.c,v Whitespace fixes. RCS file: /projects/cvsroot/python/dist/src/Demo/scripts/makedir.py,v Check we have passed a non empty string RCS file: /projects/cvsroot/python/dist/src/Include/patchlevel.h,v 1.5.2+ RCS file: /projects/cvsroot/python/dist/src/Lib/BaseHTTPServer.py,v Remove import rfc822 and more robust errors. RCS file: /projects/cvsroot/python/dist/src/Lib/CGIHTTPServer.py,v Support for HTTP_COOKIE RCS file: /projects/cvsroot/python/dist/src/Lib/fpformat.py,v NotANumber supports class exceptions. RCS file: /projects/cvsroot/python/dist/src/Lib/macpath.py,v Use constants from stat module RCS file: /projects/cvsroot/python/dist/src/Lib/macurl2path.py,v Minor changes to path parsing RCS file: /projects/cvsroot/python/dist/src/Lib/mimetypes.py,v Recognise '.js': 'application/x-javascript', RCS file: /projects/cvsroot/python/dist/src/Lib/sunau.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/wave.py,v Support for binary files. RCS file: /projects/cvsroot/python/dist/src/Lib/whichdb.py,v Reads file header to check for bsddb format. RCS file: /projects/cvsroot/python/dist/src/Lib/xmllib.py,v XML may be at the start of the string, instead of the whole string. RCS file: /projects/cvsroot/python/dist/src/Lib/lib-tk/tkSimpleDialog.py,v Destroy method added. RCS file: /projects/cvsroot/python/dist/src/Modules/cPickle.c,v As in the log :-) RCS file: /projects/cvsroot/python/dist/src/Modules/cStringIO.c,v No longer a Py_FatalError on module init failure. RCS file: /projects/cvsroot/python/dist/src/Modules/fpectlmodule.c,v Support for OSF in #ifdefs RCS file: /projects/cvsroot/python/dist/src/Modules/makesetup,v # to handle backslashes for sh's that don't automatically # continue a read when the last char is a backslash RCS file: /projects/cvsroot/python/dist/src/Modules/posixmodule.c,v Better error handling RCS file: /projects/cvsroot/python/dist/src/Modules/timemodule.c,v #ifdef changes for __GNU_LIBRARY__/_GLIBC_ RCS file: /projects/cvsroot/python/dist/src/Python/errors.c,v Better error messages on Win32 RCS file: /projects/cvsroot/python/dist/src/Python/getversion.c,v Bigger buffer and strings. RCS file: /projects/cvsroot/python/dist/src/Python/pystate.c,v Threading bug RCS file: /projects/cvsroot/python/dist/src/Objects/floatobject.c,v Tim Peters writes:1. Fixes float divmod etc. RCS file: /projects/cvsroot/python/dist/src/Objects/listobject.c,v Doc changes, and When deallocating a list, DECREF the items from the end back to the start. RCS file: /projects/cvsroot/python/dist/src/Objects/stringobject.c,v Bug for to do with width of a formatspecifier RCS file: /projects/cvsroot/python/dist/src/Objects/tupleobject.c,v Appropriate overflow checks so that things like sys.maxint*(1,) can'tdump core. RCS file: /projects/cvsroot/python/dist/src/Lib/tempfile.py,v don't cache attributes of type int RCS file: /projects/cvsroot/python/dist/src/Lib/urllib.py,v Number of revisions. RCS file: /projects/cvsroot/python/dist/src/Lib/aifc.py,v Chunk moved to new module. RCS file: /projects/cvsroot/python/dist/src/Lib/audiodev.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/dbhash.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/dis.py,v Changes in comments. RCS file: /projects/cvsroot/python/dist/src/Lib/cmpcache.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/cmp.py,v New "shallow" arg. RCS file: /projects/cvsroot/python/dist/src/Lib/dumbdbm.py,v Coerce f.tell() to int. RCS file: /projects/cvsroot/python/dist/src/Modules/main.c,v Fix to tracebacks off by a line with -x RCS file: /projects/cvsroot/python/dist/src/Lib/lib-tk/Tkinter.py,v Number of changes you can review! OTHERS: -------- RCS file: /projects/cvsroot/python/dist/src/Lib/asynchat.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/asyncore.py,v Latest versions from Sam??? RCS file: /projects/cvsroot/python/dist/src/Lib/smtplib.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/sched.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/ConfigParser.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/SocketServer.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/calendar.py,v Sorry - out of time to detail RCS file: /projects/cvsroot/python/dist/src/Python/bltinmodule.c,v Unbound local, docstring, and better support for ExtensionClasses. Freeze: Few changes IDLE: Lotsa changes :-) Number of .h files have #ifdef changes for CE I wont detail (but would be great to get a few of these in - and I have more :-) Tools directory: Number of changes - outa time to detail From mal@lemburg.com Fri Aug 6 09:54:20 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 06 Aug 1999 10:54:20 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> Message-ID: <37AAA2BC.466750B5@lemburg.com> Jack Jansen wrote: > > Recently, "Fredrik Lundh" said: > > on the other hand, I think something is missing from > > the buffer design; I definitely don't like that people > > can write and marshal objects that happen to > > implement the buffer interface, only to find that > > Python didn't do what they expected... > > > > >>> import unicode > > >>> import marshal > > >>> u = unicode.unicode > > >>> s = u("foo") > > >>> data = marshal.dumps(s) > > >>> marshal.loads(data) > > 'f\000o\000o\000' > > >>> type(marshal.loads(data)) > > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought that unicode objects use a two-byte character representation. Note that implementing the char buffer interface will also give you strange results with other code that uses PyArg_ParseTuple(...,"s#",...), e.g. you could search through Unicode strings as if they were normal 1-byte/char strings (and most certainly not find what you're looking for, I guess). > Hmm again, I think I'd like it better if marshal.dumps() would barf on > attempts to write unrepresentable data. Currently unrepresentable > objects are written as TYPE_UNKNOWN (unless they have bufferness (or > should I call that "a buffer-aspect"? :-)), which means you think you > are writing correctly marshalled data but you'll be in for an > exception when you try to read it back... I'd prefer an exception on write too. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 147 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Fred L. Drake, Jr." References: <000201bedfb3$09a99000$98a22299@tim> <00a801bedfbc$1871a7e0$1101a8c0@bobcat> Message-ID: <14250.62675.807129.878242@weyr.cnri.reston.va.us> Mark Hammond writes: > Apart from possibly Tim's recent "UnboundLocalError" (which is the only > serious behaviour change) I can't see anything that should obviously be Since UnboundLocalError is a subclass of NameError (what you got before) normally, and they are the same string when -X is used, this only represents a new name in the __builtin__ module for legacy code. This should not be a problem; the only real difference is that, using class exceptions for built-in exceptions, you get more useful information in your tracebacks. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fredrik@pythonware.com Sat Aug 7 11:51:56 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Sat, 7 Aug 1999 12:51:56 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> Message-ID: <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> > > > >>> import unicode > > > >>> import marshal > > > >>> u = unicode.unicode > > > >>> s = u("foo") > > > >>> data = marshal.dumps(s) > > > >>> marshal.loads(data) > > > 'f\000o\000o\000' > > > >>> type(marshal.loads(data)) > > > > > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought > that unicode objects use a two-byte character representation. >>> import array >>> import marshal >>> a = array.array >>> s = a("f", [1, 2, 3]) >>> data = marshal.dumps(s) >>> marshal.loads(data) '\000\000\200?\000\000\000@\000\000@@' looks like the various implementors haven't really understood the intentions of whoever designed the buffer interface... From mal@lemburg.com Sat Aug 7 17:14:56 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 07 Aug 1999 18:14:56 +0200 Subject: [Python-Dev] Some more constants for the socket module Message-ID: <37AC5B80.56F740DD@lemburg.com> Following the recent discussion on c.l.p about socket options, I found that the socket module does not define all constants defined in the (Linux) socket header file. Below is a patch that adds a few more (note that the SOL_* constants should be used for the setsockopt() level, not the IPPROTO_* constants). --- socketmodule.c~ Sat Aug 7 17:56:05 1999 +++ socketmodule.c Sat Aug 7 18:10:07 1999 @@ -2005,14 +2005,48 @@ initsocket() PySocketSock_Type.tp_doc = sockettype_doc; Py_INCREF(&PySocketSock_Type); if (PyDict_SetItemString(d, "SocketType", (PyObject *)&PySocketSock_Type) != 0) return; + + /* Address families (we only support AF_INET and AF_UNIX) */ +#ifdef AF_UNSPEC + insint(moddict, "AF_UNSPEC", AF_UNSPEC); +#endif insint(d, "AF_INET", AF_INET); #ifdef AF_UNIX insint(d, "AF_UNIX", AF_UNIX); #endif /* AF_UNIX */ +#ifdef AF_AX25 + insint(moddict, "AF_AX25", AF_AX25); /* Amateur Radio AX.25 */ +#endif +#ifdef AF_IPX + insint(moddict, "AF_IPX", AF_IPX); /* Novell IPX */ +#endif +#ifdef AF_APPLETALK + insint(moddict, "AF_APPLETALK", AF_APPLETALK); /* Appletalk DDP */ +#endif +#ifdef AF_NETROM + insint(moddict, "AF_NETROM", AF_NETROM); /* Amateur radio NetROM */ +#endif +#ifdef AF_BRIDGE + insint(moddict, "AF_BRIDGE", AF_BRIDGE); /* Multiprotocol bridge */ +#endif +#ifdef AF_AAL5 + insint(moddict, "AF_AAL5", AF_AAL5); /* Reserved for Werner's ATM */ +#endif +#ifdef AF_X25 + insint(moddict, "AF_X25", AF_X25); /* Reserved for X.25 project */ +#endif +#ifdef AF_INET6 + insint(moddict, "AF_INET6", AF_INET6); /* IP version 6 */ +#endif +#ifdef AF_ROSE + insint(moddict, "AF_ROSE", AF_ROSE); /* Amateur Radio X.25 PLP */ +#endif + + /* Socket types */ insint(d, "SOCK_STREAM", SOCK_STREAM); insint(d, "SOCK_DGRAM", SOCK_DGRAM); #ifndef __BEOS__ /* We have incomplete socket support. */ insint(d, "SOCK_RAW", SOCK_RAW); @@ -2048,11 +2082,10 @@ initsocket() insint(d, "SO_OOBINLINE", SO_OOBINLINE); #endif #ifdef SO_REUSEPORT insint(d, "SO_REUSEPORT", SO_REUSEPORT); #endif - #ifdef SO_SNDBUF insint(d, "SO_SNDBUF", SO_SNDBUF); #endif #ifdef SO_RCVBUF insint(d, "SO_RCVBUF", SO_RCVBUF); @@ -2111,14 +2144,43 @@ initsocket() #ifdef MSG_ETAG insint(d, "MSG_ETAG", MSG_ETAG); #endif /* Protocol level and numbers, usable for [gs]etsockopt */ -/* Sigh -- some systems (e.g. Linux) use enums for these. */ #ifdef SOL_SOCKET insint(d, "SOL_SOCKET", SOL_SOCKET); #endif +#ifdef SOL_IP + insint(moddict, "SOL_IP", SOL_IP); +#else + insint(moddict, "SOL_IP", 0); +#endif +#ifdef SOL_IPX + insint(moddict, "SOL_IPX", SOL_IPX); +#endif +#ifdef SOL_AX25 + insint(moddict, "SOL_AX25", SOL_AX25); +#endif +#ifdef SOL_ATALK + insint(moddict, "SOL_ATALK", SOL_ATALK); +#endif +#ifdef SOL_NETROM + insint(moddict, "SOL_NETROM", SOL_NETROM); +#endif +#ifdef SOL_ROSE + insint(moddict, "SOL_ROSE", SOL_ROSE); +#endif +#ifdef SOL_TCP + insint(moddict, "SOL_TCP", SOL_TCP); +#else + insint(moddict, "SOL_TCP", 6); +#endif +#ifdef SOL_UDP + insint(moddict, "SOL_UDP", SOL_UDP); +#else + insint(moddict, "SOL_UDP", 17); +#endif #ifdef IPPROTO_IP insint(d, "IPPROTO_IP", IPPROTO_IP); #else insint(d, "IPPROTO_IP", 0); #endif @@ -2266,10 +2328,32 @@ initsocket() #ifdef IP_ADD_MEMBERSHIP insint(d, "IP_ADD_MEMBERSHIP", IP_ADD_MEMBERSHIP); #endif #ifdef IP_DROP_MEMBERSHIP insint(d, "IP_DROP_MEMBERSHIP", IP_DROP_MEMBERSHIP); +#endif +#ifdef IP_DEFAULT_MULTICAST_TTL + insint(moddict, "IP_DEFAULT_MULTICAST_TTL", IP_DEFAULT_MULTICAST_TTL); +#endif +#ifdef IP_DEFAULT_MULTICAST_LOOP + insint(moddict, "IP_DEFAULT_MULTICAST_LOOP", IP_DEFAULT_MULTICAST_LOOP); +#endif +#ifdef IP_MAX_MEMBERSHIPS + insint(moddict, "IP_MAX_MEMBERSHIPS", IP_MAX_MEMBERSHIPS); +#endif + + /* TCP options */ +#ifdef TCP_NODELAY + insint(moddict, "TCP_NODELAY", TCP_NODELAY); +#endif +#ifdef TCP_MAXSEG + insint(moddict, "TCP_MAXSEG", TCP_MAXSEG); +#endif + + /* IPX options */ +#ifdef IPX_TYPE + insint(moddict, "IPX_TYPE", IPX_TYPE); #endif /* Initialize gethostbyname lock */ #ifdef USE_GETHOSTBYNAME_LOCK gethostbyname_lock = PyThread_allocate_lock(); -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 146 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein@lyra.org Sat Aug 7 21:15:08 1999 From: gstein@lyra.org (Greg Stein) Date: Sat, 07 Aug 1999 13:15:08 -0700 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> Message-ID: <37AC93CC.53982F3F@lyra.org> Fredrik Lundh wrote: > > > > > >>> import unicode > > > > >>> import marshal > > > > >>> u = unicode.unicode > > > > >>> s = u("foo") > > > > >>> data = marshal.dumps(s) > > > > >>> marshal.loads(data) > > > > 'f\000o\000o\000' > > > > >>> type(marshal.loads(data)) > > > > This was a "nicety" that was put during a round of patches that I submitted to Guido. We both had questions about it but figured that it couldn't hurt since it at least let some things be marshalled out that couldn't be marshalled before. I would suggest backing out the marshalling of buffer-interface objects and adding a mechanism for arbitrary type objects to marshal themselves. Without the second part, arrays and Unicode objects aren't marshallable at all (seems bad). > > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought > > that unicode objects use a two-byte character representation. Unicode objects should *not* implement the getcharbuffer slot. Only read, write, and segcount. > >>> import array > >>> import marshal > >>> a = array.array > >>> s = a("f", [1, 2, 3]) > >>> data = marshal.dumps(s) > >>> marshal.loads(data) > '\000\000\200?\000\000\000@\000\000@@' > > looks like the various implementors haven't > really understood the intentions of whoever > designed the buffer interface... Arrays can/should support both the getreadbuffer and getcharbuffer interface. The former: definitely. The latter: only if the contents are byte-sized. The loading back as a string is a different matter, as pointed out above. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jack@oratrix.nl Sun Aug 8 21:20:52 1999 From: jack@oratrix.nl (Jack Jansen) Date: Sun, 08 Aug 1999 22:20:52 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: Message by Greg Stein , Sat, 07 Aug 1999 13:15:08 -0700 , <37AC93CC.53982F3F@lyra.org> Message-ID: <19990808202057.DB803E267A@oratrix.oratrix.nl> Recently, Greg Stein said: > I would suggest backing out the marshalling of buffer-interface objects > and adding a mechanism for arbitrary type objects to marshal themselves. > Without the second part, arrays and Unicode objects aren't marshallable > at all (seems bad). This sounds like the right approach. It would require 2 slots in the tp_ structure and a little extra glue for the typecodes (currently marshall knows all the 1-letter typecodes for all objecttypes it can handle, but types marshalling their own objects would require a centralized registry of object types. For the time being it would probably suffice to have the mapping of type<->letter be hardcoded in marshal.h, but eventually you probably want a more extensible scheme, where Joe R. Extension-Writer could add a marshaller to his objects and know it won't collide with someone else's. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mal@lemburg.com Mon Aug 9 09:56:30 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 09 Aug 1999 10:56:30 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990808202057.DB803E267A@oratrix.oratrix.nl> Message-ID: <37AE97BE.2CADF48E@lemburg.com> Jack Jansen wrote: > > Recently, Greg Stein said: > > I would suggest backing out the marshalling of buffer-interface objects > > and adding a mechanism for arbitrary type objects to marshal themselves. > > Without the second part, arrays and Unicode objects aren't marshallable > > at all (seems bad). > > This sounds like the right approach. It would require 2 slots in the > tp_ structure and a little extra glue for the typecodes (currently > marshall knows all the 1-letter typecodes for all objecttypes it can > handle, but types marshalling their own objects would require a > centralized registry of object types. For the time being it would > probably suffice to have the mapping of type<->letter be hardcoded in > marshal.h, but eventually you probably want a more extensible scheme, > where Joe R. Extension-Writer could add a marshaller to his objects > and know it won't collide with someone else's. This registry should ideally be reachable via C APIs. Then a module writer could call these APIs in the init function of his module and he'd be set. Since marshal won't be able to handle imports on the fly (like pickle et al.), these modules will have to be imported before unmarshalling. Aside: wouldn't it make sense to move from marshal to pickle and depreciate marshal altogether ? cPickle is quite fast and much more flexible than marshal, plus it already provides mechanisms for registering new types. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 144 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jack@oratrix.nl Mon Aug 9 14:49:44 1999 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 09 Aug 1999 15:49:44 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: Message by "M.-A. Lemburg" , Mon, 09 Aug 1999 10:56:30 +0200 , <37AE97BE.2CADF48E@lemburg.com> Message-ID: <19990809134944.BB2FC303120@snelboot.oratrix.nl> > Aside: wouldn't it make sense to move from marshal to pickle and > depreciate marshal altogether ? cPickle is quite fast and much more > flexible than marshal, plus it already provides mechanisms for > registering new types. This is probably the best idea so far. Just remove the buffer-workaround in marshall, keep it functioning for the things it is used for now (like pyc files) and refer people to (c)Pickle for new development. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From guido@CNRI.Reston.VA.US Mon Aug 9 15:50:46 1999 From: guido@CNRI.Reston.VA.US (Guido van Rossum) Date: Mon, 09 Aug 1999 10:50:46 -0400 Subject: [Python-Dev] Some more constants for the socket module In-Reply-To: Your message of "Sat, 07 Aug 1999 18:14:56 +0200." <37AC5B80.56F740DD@lemburg.com> References: <37AC5B80.56F740DD@lemburg.com> Message-ID: <199908091450.KAA29179@eric.cnri.reston.va.us> Thanks for the socketmodule patch, Marc. This was on my mental TO-DO list for a long time! I've checked it in. (One note: I had a bit of trouble applying the patch; apparently your mailer expanded all tabs to spaces. Perhaps you could use attachments to mail diffs? Also, you seem to have renamed 'd' to 'moddict' but you didn't send the patch for that...) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Aug 9 17:26:28 1999 From: guido@python.org (Guido van Rossum) Date: Mon, 09 Aug 1999 12:26:28 -0400 Subject: [Python-Dev] preferred conference date? Message-ID: <199908091626.MAA29411@eric.cnri.reston.va.us> I need your input about the date of the next Python conference. Foretec is close to a deal for a Python conference in January 2000 at the Alexandria Old Town Hilton hotel. Given our requirement of a good location in the DC area, this is a very good deal (it's a brand new hotel). The prices are high (they tell me that the whole conference will cost $900, with a room rate of $129) but it's a class A location (metro, tons of restaurants, close to National Airport, etc.) and we have found no cheaper DC hotel suitable for our purposes (even in drab suburban locations). I'm worried that I'll be flamed to hell for this by the PSA members, but I don't think we can get the price any lower without starting all over in a different location, probably causing several months of delay. If people won't come, Foretec (and I) will have learned a valuable lesson and we'll rethink the issue for the 2001 conference. Anyway, given that Foretec is likely to go with this hotel, we have a choice of two dates: January 16-19, or 23-26 (both starting on a Sunday with the tutorials). This is where I need your help: which date would you prefer? Please mail me personally. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@mojam.com (Skip Montanaro) Mon Aug 9 17:31:43 1999 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 9 Aug 1999 11:31:43 -0500 (CDT) Subject: [Python-Dev] preferred conference date? In-Reply-To: <199908091626.MAA29411@eric.cnri.reston.va.us> References: <199908091626.MAA29411@eric.cnri.reston.va.us> Message-ID: <14255.557.474160.824877@dolphin.mojam.com> Guido> The prices are high (they tell me that the whole conference will Guido> cost $900, with a room rate of $129) but it's a class A location No way I (or my company) can afford to plunk down $900 for me to attend... Skip From mal@lemburg.com Mon Aug 9 17:40:45 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 09 Aug 1999 18:40:45 +0200 Subject: [Python-Dev] Some more constants for the socket module References: <37AC5B80.56F740DD@lemburg.com> <199908091450.KAA29179@eric.cnri.reston.va.us> Message-ID: <37AF048D.FC0A540@lemburg.com> Guido van Rossum wrote: > > Thanks for the socketmodule patch, Marc. This was on my mental TO-DO > list for a long time! I've checked it in. Cool, thanks. > (One note: I had a bit of trouble applying the patch; apparently your > mailer expanded all tabs to spaces. Perhaps you could use attachments > to mail diffs? Ok. > Also, you seem to have renamed 'd' to 'moddict' but > you didn't send the patch for that...) Oops, sorry... my "#define to insint" script uses 'd' as moddict, that's the reason why. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 144 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@CNRI.Reston.VA.US Mon Aug 9 18:30:36 1999 From: guido@CNRI.Reston.VA.US (Guido van Rossum) Date: Mon, 09 Aug 1999 13:30:36 -0400 Subject: [Python-Dev] preferred conference date? In-Reply-To: Your message of "Mon, 09 Aug 1999 11:31:43 CDT." <14255.557.474160.824877@dolphin.mojam.com> References: <199908091626.MAA29411@eric.cnri.reston.va.us> <14255.557.474160.824877@dolphin.mojam.com> Message-ID: <199908091730.NAA29559@eric.cnri.reston.va.us> > Guido> The prices are high (they tell me that the whole conference will > Guido> cost $900, with a room rate of $129) but it's a class A location > > No way I (or my company) can afford to plunk down $900 for me to attend... Let me clarify this. The $900 is for the whole 4-day conference, including a day of tutorials and developers' day. I don't know what the exact price breakdown will be, but the tutorials will probably be $300. Last year the total price was $700, with $250 for tutorials. --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov@inrialpes.fr Tue Aug 10 13:04:27 1999 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Tue, 10 Aug 1999 13:04:27 +0100 (NFT) Subject: [Python-Dev] shrinking dicts Message-ID: <199908101204.NAA29572@pukapuka.inrialpes.fr> Currently, dictionaries always grow until they are deallocated from memory. This happens in PyDict_SetItem according to the following code (before inserting the new item into the dict): /* if fill >= 2/3 size, double in size */ if (mp->ma_fill*3 >= mp->ma_size*2) { if (dictresize(mp, mp->ma_used*2) != 0) { if (mp->ma_fill+1 > mp->ma_size) return -1; } } The symmetric case is missing and this has intrigued me for a long time, but I've never had the courage to look deeply into this portion of code and try to propose a solution. Which is: reduce the size of the dict by half when the nb of used items <= 1/6 the size. This situation occurs far less frequently than dict growing, but anyways, it seems useful for the degenerate cases where a dict has a peek usage, then most of the items are deleted. This is usually the case for global dicts holding dynamic object collections, etc. A bonus effect of shrinking big dicts with deleted items is that the lookup speed may be improved, because of the cleaned entries and the reduced overall size (resulting in a better hit ratio). The (only) solution I could came with for this pb is the appended patch. It is not immediately obvious, but in practice, it seems to work fine. (inserting a print statement after the condition, showing the dict size and current usage helps in monitoring what's going on). Any other ideas on how to deal with this? Thoughts, comments? -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 -------------------------------[ cut here ]--------------------------- *** dictobject.c-1.5.2 Fri Aug 6 18:51:02 1999 --- dictobject.c Tue Aug 10 12:21:15 1999 *************** *** 417,423 **** ep->me_value = NULL; mp->ma_used--; Py_DECREF(old_value); ! Py_DECREF(old_key); return 0; } --- 417,430 ---- ep->me_value = NULL; mp->ma_used--; Py_DECREF(old_value); ! Py_DECREF(old_key); ! /* For bigger dictionaries, if used <= 1/6 size, half the size */ ! if (mp->ma_size > MINSIZE*4 && mp->ma_used*6 <= mp->ma_size) { ! if (dictresize(mp, mp->ma_used*2) != 0) { ! if (mp->ma_fill > mp->ma_size) ! return -1; ! } ! } return 0; } From Vladimir.Marangozov@inrialpes.fr Tue Aug 10 14:20:36 1999 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Tue, 10 Aug 1999 14:20:36 +0100 (NFT) Subject: [Python-Dev] shrinking dicts In-Reply-To: <199908101204.NAA29572@pukapuka.inrialpes.fr> from "Vladimir Marangozov" at "Aug 10, 99 01:04:27 pm" Message-ID: <199908101320.OAA21986@pukapuka.inrialpes.fr> I wrote: > > The (only) solution I could came with for this pb is the appended patch. > It is not immediately obvious, but in practice, it seems to work fine. > (inserting a print statement after the condition, showing the dict size > and current usage helps in monitoring what's going on). > > Any other ideas on how to deal with this? Thoughts, comments? > To clarify a bit what the patch does "as is", here's a short description: The code is triggered in PyDict_DelItem only for sizes which are > MINSIZE*4, i.e. greater than 4*4 = 16. Therefore, resizing will occur for a min size of 32 items. one third 32 / 3 = 10 two thirds 32 * 2/3 = 21 one sixth 32 / 6 = 5 So the shinking will happen for a dict size of 32, of which 5 items are used (the sixth was just deleted). After the dictresize, the size will be 16, of which 5 items are used, i.e. one third. The threshold is fixed by the first condition of the patch. It could be made 64, instead of 32. This is subject to discussion... Obviously, this is most useful for bigger dicts, not for small ones. A threshold of 32 items seemed to me to be a reasonable compromise. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From fredrik@pythonware.com Tue Aug 10 13:35:33 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 10 Aug 1999 14:35:33 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> Message-ID: <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> Greg Stein wrote: > > > > > >>> import unicode > > > > > >>> import marshal > > > > > >>> u = unicode.unicode > > > > > >>> s = u("foo") > > > > > >>> data = marshal.dumps(s) > > > > > >>> marshal.loads(data) > > > > > 'f\000o\000o\000' > > > > > >>> type(marshal.loads(data)) > > > > > > > > > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought > > > that unicode objects use a two-byte character representation. > > Unicode objects should *not* implement the getcharbuffer slot. Only > read, write, and segcount. unicode objects do not implement the getcharbuffer slot. here's the relevant descriptor: static PyBufferProcs unicode_as_buffer = { (getreadbufferproc) unicode_buffer_getreadbuf, (getwritebufferproc) unicode_buffer_getwritebuf, (getsegcountproc) unicode_buffer_getsegcount }; the array module uses a similar descriptor. maybe the unicode class shouldn't implement the buffer interface at all? sure looks like the best way to avoid trivial mistakes (the current behaviour of fp.write(unicodeobj) is even more serious than the marshal glitch...) or maybe the buffer design needs an overhaul? From guido@CNRI.Reston.VA.US Tue Aug 10 15:12:23 1999 From: guido@CNRI.Reston.VA.US (Guido van Rossum) Date: Tue, 10 Aug 1999 10:12:23 -0400 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: Your message of "Tue, 10 Aug 1999 14:35:33 +0200." <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> Message-ID: <199908101412.KAA02065@eric.cnri.reston.va.us> > Greg Stein wrote: > > > > > > >>> import unicode > > > > > > >>> import marshal > > > > > > >>> u = unicode.unicode > > > > > > >>> s = u("foo") > > > > > > >>> data = marshal.dumps(s) > > > > > > >>> marshal.loads(data) > > > > > > 'f\000o\000o\000' > > > > > > >>> type(marshal.loads(data)) > > > > > > > > > > > > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought > > > > that unicode objects use a two-byte character representation. > > > > Unicode objects should *not* implement the getcharbuffer slot. Only > > read, write, and segcount. > > unicode objects do not implement the getcharbuffer slot. > here's the relevant descriptor: > > static PyBufferProcs unicode_as_buffer = { > (getreadbufferproc) unicode_buffer_getreadbuf, > (getwritebufferproc) unicode_buffer_getwritebuf, > (getsegcountproc) unicode_buffer_getsegcount > }; > > the array module uses a similar descriptor. > > maybe the unicode class shouldn't implement the > buffer interface at all? sure looks like the best way > to avoid trivial mistakes (the current behaviour of > fp.write(unicodeobj) is even more serious than the > marshal glitch...) > > or maybe the buffer design needs an overhaul? I think most places that should use the charbuffer interface actually use the readbuffer interface. This is what should be fixed. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Tue Aug 10 18:53:56 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 10 Aug 1999 19:53:56 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> Message-ID: <37B06734.4339D3BF@lemburg.com> Fredrik Lundh wrote: > > unicode objects do not implement the getcharbuffer slot. >... > or maybe the buffer design needs an overhaul? I think its usage does. The character slot should be used whenever character data is needed, not the read buffer slot. The latter one is for passing around raw binary data (without reinterpretation !), if I understood Greg correctly back when I gave those abstract APIs a try. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 143 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Tue Aug 10 18:39:29 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 10 Aug 1999 19:39:29 +0200 Subject: [Python-Dev] shrinking dicts References: <199908101204.NAA29572@pukapuka.inrialpes.fr> Message-ID: <37B063D1.29F3106A@lemburg.com> Vladimir Marangozov wrote: > > Currently, dictionaries always grow until they are deallocated from > memory. This happens in PyDict_SetItem according to the following > code (before inserting the new item into the dict): > > /* if fill >= 2/3 size, double in size */ > if (mp->ma_fill*3 >= mp->ma_size*2) { > if (dictresize(mp, mp->ma_used*2) != 0) { > if (mp->ma_fill+1 > mp->ma_size) > return -1; > } > } > > The symmetric case is missing and this has intrigued me for a long time, > but I've never had the courage to look deeply into this portion of code > and try to propose a solution. Which is: reduce the size of the dict by > half when the nb of used items <= 1/6 the size. > > This situation occurs far less frequently than dict growing, but anyways, > it seems useful for the degenerate cases where a dict has a peek usage, > then most of the items are deleted. This is usually the case for global > dicts holding dynamic object collections, etc. > > A bonus effect of shrinking big dicts with deleted items is that > the lookup speed may be improved, because of the cleaned entries > and the reduced overall size (resulting in a better hit ratio). > > The (only) solution I could came with for this pb is the appended patch. > It is not immediately obvious, but in practice, it seems to work fine. > (inserting a print statement after the condition, showing the dict size > and current usage helps in monitoring what's going on). > > Any other ideas on how to deal with this? Thoughts, comments? I think that integrating this into the C code is not really that effective since the situation will not occur that often and then it often better to let the programmer decide rather than integrate an automatic downsize. You can call dict.update({}) to force an internal resize (the empty dictionary can be made global since it is not manipulated in any way and thus does not cause creation overhead). Perhaps a new method .resize(approx_size) would make this even clearer. This would also have the benefit of allowing a programmer to force allocation of the wanted size, e.g. d = {} d.resize(10000) # Insert 10000 items in a batch insert -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 143 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Vladimir.Marangozov@inrialpes.fr Tue Aug 10 20:58:27 1999 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Tue, 10 Aug 1999 20:58:27 +0100 (NFT) Subject: [Python-Dev] shrinking dicts In-Reply-To: <37B063D1.29F3106A@lemburg.com> from "M.-A. Lemburg" at "Aug 10, 99 07:39:29 pm" Message-ID: <199908101958.UAA22028@pukapuka.inrialpes.fr> M.-A. Lemburg wrote: > > [me] > > Any other ideas on how to deal with this? Thoughts, comments? > > I think that integrating this into the C code is not really that > effective since the situation will not occur that often and then > it often better to let the programmer decide rather than integrate > an automatic downsize. Agreed that the situation is rare. But if it occurs, its Python's responsability to manage its data structures (and system resources) efficiently. As a programmer, I really don't want to be bothered with internals -- I trust the interpreter for that. Moreover, how could I decide that at some point, some dict needs to be resized in my fairly big app, say IDLE? > > You can call dict.update({}) to force an internal > resize (the empty dictionary can be made global since it is not > manipulated in any way and thus does not cause creation overhead). I know that I can force the resize in other ways, but this is not the point. I'm usually against the idea of changing the programming logic because of my advanced knowledge of the internals. > > Perhaps a new method .resize(approx_size) would make this even > clearer. This would also have the benefit of allowing a programmer > to force allocation of the wanted size, e.g. > > d = {} > d.resize(10000) > # Insert 10000 items in a batch insert This is interesting, but the two ideas are not mutually excusive. Python has to dowsize dicts automatically (just the same way it doubles the size automatically). Offering more through an API is a plus for hackers. ;-) -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal@lemburg.com Tue Aug 10 21:19:46 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 10 Aug 1999 22:19:46 +0200 Subject: [Python-Dev] shrinking dicts References: <199908101958.UAA22028@pukapuka.inrialpes.fr> Message-ID: <37B08962.6DFB3F0@lemburg.com> Vladimir Marangozov wrote: > > M.-A. Lemburg wrote: > > > > [me] > > > Any other ideas on how to deal with this? Thoughts, comments? > > > > I think that integrating this into the C code is not really that > > effective since the situation will not occur that often and then > > it often better to let the programmer decide rather than integrate > > an automatic downsize. > > Agreed that the situation is rare. But if it occurs, its Python's > responsability to manage its data structures (and system resources) > efficiently. As a programmer, I really don't want to be bothered with > internals -- I trust the interpreter for that. Moreover, how could > I decide that at some point, some dict needs to be resized in my > fairly big app, say IDLE? You usually don't ;-) because "normal" dict only grow (well, more or less). The downsizing thing will only become a problem if you use dictionaries in certain algorithms and there you handle the problem manually. My stack implementation uses the same trick, BTW. Memory is cheap and with an extra resize method (which the mxStack implementation has), problems can be dealt with explicitly for everyone to see in the code. > > You can call dict.update({}) to force an internal > > resize (the empty dictionary can be made global since it is not > > manipulated in any way and thus does not cause creation overhead). > > I know that I can force the resize in other ways, but this is not > the point. I'm usually against the idea of changing the programming > logic because of my advanced knowledge of the internals. True, that why I mentioned... > > > > Perhaps a new method .resize(approx_size) would make this even > > clearer. This would also have the benefit of allowing a programmer > > to force allocation of the wanted size, e.g. > > > > d = {} > > d.resize(10000) > > # Insert 10000 items in a batch insert > > This is interesting, but the two ideas are not mutually excusive. > Python has to dowsize dicts automatically (just the same way it doubles > the size automatically). Offering more through an API is a plus for > hackers. ;-) It's not really for hackers: the point is that it makes the technique visible and understandable (as opposed to the hack above). The same could be useful for lists too (the hack there is l = [None] * size, which I find rather difficult to understand at first sight...). -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 143 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mhammond@skippinet.com.au Tue Aug 10 23:39:30 1999 From: mhammond@skippinet.com.au (Mark Hammond) Date: Wed, 11 Aug 1999 08:39:30 +1000 Subject: [Python-Dev] shrinking dicts In-Reply-To: <37B08962.6DFB3F0@lemburg.com> Message-ID: <010901bee381$36ee5d30$1101a8c0@bobcat> Looking over the messages from Marc and Vladimir, Im going to add my 2c worth. IMO, Marc's position is untenable iff it can be demonstrated that the "average" program is likely to see "sparse" dictionaries, and such dictionaries have an adverse effect on either speed or memory. The analogy is quite simple - you dont need to manually resize lists or dicts before inserting (to allocate more storage - an internal implementation issue) so neither should you need to manually resize when deleting (to reclaim that storage - still internal implementation). Suggesting that the allocation of resources should be automatic, but the recycling of them not be automatic flies in the face of everything else - eg, you dont need to delete each object - when it is no longer referenced, its memory is reclaimed automatically. Marc's position is only reasonable if the specific case we are talking about is very very rare, and unlikely to be hit by anyone with normal, real-world requirements or programs. In this case, exposing the implementation detail is reasonable. So, the question comes down to: "What is the benefit to Vladmir's patch?" Maybe we need some metrics on some dictionaries. For example, maybe a doctored Python that kept stats for each dictionary and log this info. The output of this should be able to tell you what savings you could possibly expect. If you find that the average program really would not benefit at all (say only a few K from a small number of dicts) then the horse was probably dead well before we started flogging it. If however you can demonstrate serious benefits could be achieved, then interest may pick up and I too would lobby for automatic downsizing. Mark. From tim_one@email.msn.com Wed Aug 11 06:30:20 1999 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 11 Aug 1999 01:30:20 -0400 Subject: [Python-Dev] shrinking dicts In-Reply-To: <199908101204.NAA29572@pukapuka.inrialpes.fr> Message-ID: <000001bee3ba$9b226f60$8d2d2399@tim> [Vladimir] > Currently, dictionaries always grow until they are deallocated from > memory. It's more accurate to say they never shrink <0.9 wink>. Even that has exceptions, though, starting with: > This happens in PyDict_SetItem according to the following > code (before inserting the new item into the dict): > > /* if fill >= 2/3 size, double in size */ > if (mp->ma_fill*3 >= mp->ma_size*2) { > if (dictresize(mp, mp->ma_used*2) != 0) { > if (mp->ma_fill+1 > mp->ma_size) > return -1; > } > } This code can shrink the dict too. The load factor computation is based on "fill", but the resize is based on "used". If you grow a huge dict, then delete all the entries one by one, "used" falls to 0 but "fill" stays at its high-water mark. At least 1/3rd of the entries are NULL, so "fill" continues to climb as keys are added again: when the load factor computation triggers again, "used" may be as small as 1, and dictresize can shrink the dict dramatically. The only clear a priori return I see in your patch is that I might save memory if I delete gobs of stuff from a dict and then neither get rid of it nor add keys to it again. But my programs generally grow dicts forever, grow then delete them entirely, or cycle through fat and lean times (in which case the code above already shrinks them from time to time). So I don't expect that your patch would be buy me anything I want, but would cost me more on every delete. > ... > Any other ideas on how to deal with this? Thoughts, comments? Just that slowing the expected case to prevent theoretical bad cases is usually a net loss -- I think the onus is on you to demonstrate that this change is an exception to that rule. I do recall one real-life complaint about it on c.l.py a couple years ago: the poster had a huge dict, eventually deleted most of the items, and then kept it around purely for lookups. They were happy enough to copy the dict into a fresh one a key+value pair at a time; today they could just do d = d.copy() or even d.update({}) to shrink the dict. It would certainly be good to document these tricks! if-it-wasn't-a-problem-the-first-8-years-of-python's-life-it's-hard-to- see-why-1999-is-special-ly y'rs - tim From tim_one@email.msn.com Wed Aug 11 07:45:49 1999 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 11 Aug 1999 02:45:49 -0400 Subject: [Python-Dev] preferred conference date? In-Reply-To: <199908091626.MAA29411@eric.cnri.reston.va.us> Message-ID: <000201bee3c5$25b47b00$8d2d2399@tim> [Guido] > ... > The prices are high (they tell me that the whole conference will cost > $900, with a room rate of $129) Is room rental in addition to, or included in, that $900? > ... > I'm worried that I'll be flamed to hell for this by the PSA members, So have JulieK announce it . > ... > Anyway, given that Foretec is likely to go with this hotel, we have a > choice of two dates: January 16-19, or 23-26 (both starting on a > Sunday with the tutorials). This is where I need your help: which > date would you prefer? 23-26 for me; 16-19 may not be doable. or-everyone-can-switch-to-windows-and-we'll-do-the-conference-via- netmeeting-ly y'rs - tim From Vladimir.Marangozov@inrialpes.fr Wed Aug 11 15:33:17 1999 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Wed, 11 Aug 1999 15:33:17 +0100 (NFT) Subject: [Python-Dev] shrinking dicts In-Reply-To: <000001bee3ba$9b226f60$8d2d2399@tim> from "Tim Peters" at "Aug 11, 99 01:30:20 am" Message-ID: <199908111433.PAA31842@pukapuka.inrialpes.fr> Tim Peters wrote: > > [Vladimir] > > Currently, dictionaries always grow until they are deallocated from > > memory. > > It's more accurate to say they never shrink <0.9 wink>. Even that has > exceptions, though, starting with: > > > This happens in PyDict_SetItem according to the following > > code (before inserting the new item into the dict): > > > > /* if fill >= 2/3 size, double in size */ > > if (mp->ma_fill*3 >= mp->ma_size*2) { > > if (dictresize(mp, mp->ma_used*2) != 0) { > > if (mp->ma_fill+1 > mp->ma_size) > > return -1; > > } > > } > > This code can shrink the dict too. The load factor computation is based on > "fill", but the resize is based on "used". If you grow a huge dict, then > delete all the entries one by one, "used" falls to 0 but "fill" stays at its > high-water mark. At least 1/3rd of the entries are NULL, so "fill" > continues to climb as keys are added again: when the load factor > computation triggers again, "used" may be as small as 1, and dictresize can > shrink the dict dramatically. Thanks for clarifying this! > [snip] > > > ... > > Any other ideas on how to deal with this? Thoughts, comments? > > Just that slowing the expected case to prevent theoretical bad cases is > usually a net loss -- I think the onus is on you to demonstrate that this > change is an exception to that rule. I won't, because this case is rare in practice, classifying it already as an exception. A real exception. I'll have to think a bit more about all this. Adding 1/3 new entries to trigger the next resize sounds suboptimal (if it happens at all). > I do recall one real-life complaint > about it on c.l.py a couple years ago: the poster had a huge dict, > eventually deleted most of the items, and then kept it around purely for > lookups. They were happy enough to copy the dict into a fresh one a > key+value pair at a time; today they could just do > > d = d.copy() > > or even > > d.update({}) > > to shrink the dict. > > It would certainly be good to document these tricks! I think that officializing these tricks in the documentation is a bad idea. > > if-it-wasn't-a-problem-the-first-8-years-of-python's-life-it's-hard-to- > see-why-1999-is-special-ly y'rs - tim > This is a good (your favorite ;-) argument, but don't forget that you've been around, teaching people various tricks. And 1999 is special -- we just had a solar eclipse today, the next being scheduled for 2081. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From fredrik@pythonware.com Wed Aug 11 15:07:44 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 11 Aug 1999 16:07:44 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> Message-ID: <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> > > or maybe the buffer design needs an overhaul? > > I think most places that should use the charbuffer interface actually > use the readbuffer interface. This is what should be fixed. ok. btw, how about adding support for buffer access to data that have strange internal formats (like cer- tain PIL image memories) or isn't directly accessible (like "virtual" and "abstract" image buffers in PIL 1.1). something like: int initbuffer(PyObject* obj, void** context); int exitbuffer(PyObject* obj, void* context); and corresponding context arguments to the rest of the functions... From guido@CNRI.Reston.VA.US Wed Aug 11 15:42:10 1999 From: guido@CNRI.Reston.VA.US (Guido van Rossum) Date: Wed, 11 Aug 1999 10:42:10 -0400 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: Your message of "Wed, 11 Aug 1999 16:07:44 +0200." <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> Message-ID: <199908111442.KAA04423@eric.cnri.reston.va.us> > btw, how about adding support for buffer access > to data that have strange internal formats (like cer- > tain PIL image memories) or isn't directly accessible > (like "virtual" and "abstract" image buffers in PIL 1.1). > something like: > > int initbuffer(PyObject* obj, void** context); > int exitbuffer(PyObject* obj, void* context); > > and corresponding context arguments to the > rest of the functions... Can you explain this idea more? Without more understanding of PIL I have no idea what you're talking about... --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one@email.msn.com Thu Aug 12 06:15:39 1999 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 12 Aug 1999 01:15:39 -0400 Subject: [Python-Dev] shrinking dicts In-Reply-To: <199908111433.PAA31842@pukapuka.inrialpes.fr> Message-ID: <000301bee481$b78ae5c0$4e2d2399@tim> [Tim] >> ...slowing the expected case to prevent theoretical bad cases is >> usually a net loss -- I think the onus is on you to demonstrate >> that this change is an exception to that rule. [Vladimir Marangozov] > I won't, because this case is rare in practice, classifying it already > as an exception. A real exception. I'll have to think a bit more about > all this. Adding 1/3 new entries to trigger the next resize sounds > suboptimal (if it happens at all). "Suboptimal" with respect to which specific cost model? Exhibiting a specific bad case isn't compelling, and especially not when it's considered to be "a real exception". Adding new expense to every delete is an obvious new burden -- where's the payback, and is the expected net effect amortized across all dict usage a win or loss? Offhand it sounds like a small loss to me, although I haven't worked up a formal cost model either . > ... > I think that officializing these tricks in the documentation is a > bad idea. It's rarely a good idea to keep truths secret, although implementation-du-jour tricks don't belong in the current doc set. Probably in a HowTo. >> if-it-wasn't-a-problem-the-first-8-years-of-python's-life-it's-hard- >> to-see-why-1999-is-special-ly y'rs - tim > This is a good (your favorite ;-) argument, I actually hate that kind of argument -- it's one of *Guido's* favorites, and in his current silent state I'm simply channeling him . > but don't forget that you've been around, teaching people various > tricks. As I said, this particular trick has come up only once in real life in my experience; it's never come up in my own code; it's an anti-FAQ. People are 100x more likely to whine about theoretical quadratic-time list growth nobody has ever encountered (although it looks like they may finally get it under an out-of-the-box BDW collector!). > And 1999 is special -- we just had a solar eclipse today, the next being > scheduled for 2081. Ya, like any of us will survive Y2K to see it . 1999-is-special-cuz-it's-the-end-of-civilization-ly y'rs - tim From Vladimir.Marangozov@inrialpes.fr Thu Aug 12 19:22:06 1999 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Thu, 12 Aug 1999 19:22:06 +0100 (NFT) Subject: [Python-Dev] about line numbers Message-ID: <199908121822.TAA40444@pukapuka.inrialpes.fr> Just curious: Is python with vs. without "-O" equivalent today regarding line numbers? Are SET_LINENO opcodes a plus in some situations or not? Next, I see quite often several SET_LINENO in a row in the beginning of code objects due to doc strings, etc. Since I don't think that folding them into one SET_LINENO would be an optimisation (it would rather be avoiding the redundancy), is it possible and/or reasonable to do something in this direction? A trivial example: >>> def f(): ... "This is a comment about f" ... a = 1 ... >>> import dis >>> dis.dis(f) 0 SET_LINENO 1 3 SET_LINENO 2 6 SET_LINENO 3 9 LOAD_CONST 1 (1) 12 STORE_FAST 0 (a) 15 LOAD_CONST 2 (None) 18 RETURN_VALUE >>> Can the above become something like this instead: 0 SET_LINENO 3 3 LOAD_CONST 1 (1) 6 STORE_FAST 0 (a) 9 LOAD_CONST 2 (None) 12 RETURN_VALUE -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From jack@oratrix.nl Thu Aug 12 23:02:06 1999 From: jack@oratrix.nl (Jack Jansen) Date: Fri, 13 Aug 1999 00:02:06 +0200 Subject: [Python-Dev] about line numbers In-Reply-To: Message by Vladimir Marangozov , Thu, 12 Aug 1999 19:22:06 +0100 (NFT) , <199908121822.TAA40444@pukapuka.inrialpes.fr> Message-ID: <19990812220211.B3CED993@oratrix.oratrix.nl> The only possible problem I can see with folding linenumbers is if someone sets a breakpoint on such a line. And I think it'll be difficult to explain the missing line numbers to pdb, so there isn't an easy workaround (at least, it takes more than my 30 seconds of brainpoewr to come up with one:-). -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From Vladimir.Marangozov@inrialpes.fr Fri Aug 13 00:10:26 1999 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Fri, 13 Aug 1999 00:10:26 +0100 (NFT) Subject: [Python-Dev] shrinking dicts In-Reply-To: <000301bee481$b78ae5c0$4e2d2399@tim> from "Tim Peters" at "Aug 12, 99 01:15:39 am" Message-ID: <199908122310.AAA29618@pukapuka.inrialpes.fr> Tim Peters wrote: > > [Tim] > >> ...slowing the expected case to prevent theoretical bad cases is > >> usually a net loss -- I think the onus is on you to demonstrate > >> that this change is an exception to that rule. > > [Vladimir Marangozov] > > I won't, because this case is rare in practice, classifying it already > > as an exception. A real exception. I'll have to think a bit more about > > all this. Adding 1/3 new entries to trigger the next resize sounds > > suboptimal (if it happens at all). > > "Suboptimal" with respect to which specific cost model? Exhibiting a > specific bad case isn't compelling, and especially not when it's considered > to be "a real exception". Adding new expense to every delete is an obvious > new burden -- where's the payback, and is the expected net effect amortized > across all dict usage a win or loss? Offhand it sounds like a small loss to > me, although I haven't worked up a formal cost model either . C'mon Tim, don't try to impress me with cost models. I'm already impressed :-) Anyways, I've looked at some traces. As expected, the conclusion is that this case is extremely rare wrt the average dict usage. There are 3 reasons: (1) dicts are usually deleted entirely and (2) del d[key] is rare in practice (3) often d[key] = None is used instead of (2). There is, however, a small percentage of dicts which are used below 1/3 of their size. I must say, below 1/3 of their peek size, because dowsizing is also rare. To trigger a downsize, 1/3 new entries of the peek size must be inserted. Besides these observations, after looking at the code one more time, I can't really understand why the resize logic is based on the "fill" watermark and not on "used". fill = used + dummy, but since lookdict returns the first free slot (null or dummy), I don't really see what's the point of using a fill watermark... Perhaps you can enlighten me on this. Using only the "used" metrics seems fine to me. I even deactivated "fill" and replaced it with "used" to see what happens -- no visible changes, except a tiny speedup I'm willing to neglect. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From Vladimir.Marangozov@inrialpes.fr Fri Aug 13 00:21:48 1999 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Fri, 13 Aug 1999 00:21:48 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <19990812220211.B3CED993@oratrix.oratrix.nl> from "Jack Jansen" at "Aug 13, 99 00:02:06 am" Message-ID: <199908122321.AAA29572@pukapuka.inrialpes.fr> Jack Jansen wrote: > > > The only possible problem I can see with folding linenumbers is if > someone sets a breakpoint on such a line. And I think it'll be > difficult to explain the missing line numbers to pdb, so there isn't > an easy workaround (at least, it takes more than my 30 seconds of > brainpoewr to come up with one:-). > Eek! We can set a breakpoint on a doc string? :-) There's no code in there. It should be treated as a comment by pdb. I can't set a breakpoint on a comment line even in C ;-) There must be something deeper about it... -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tim_one@email.msn.com Fri Aug 13 01:07:32 1999 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 12 Aug 1999 20:07:32 -0400 Subject: [Python-Dev] about line numbers In-Reply-To: <199908121822.TAA40444@pukapuka.inrialpes.fr> Message-ID: <000101bee51f$d7601de0$fb2d2399@tim> [Vladimir Marangozov] > Is python with vs. without "-O" equivalent today regarding > line numbers? > > Are SET_LINENO opcodes a plus in some situations or not? In theory it should make no difference, except that the trace mechanism makes a callback on each SET_LINENO, and that's how the debugger implements line-number breakpoints. Under -O, there are no SET_LINENOs, so debugger line-number breakpoints don't work under -O. I think there's also a sporadic buglet, which I've never bothered to track down: sometimes a line number reported in a traceback under -O (&, IIRC, it's always the topmost line number) comes out as a senseless negative value. > Next, I see quite often several SET_LINENO in a row in the beginning > of code objects due to doc strings, etc. Since I don't think that > folding them into one SET_LINENO would be an optimisation (it would > rather be avoiding the redundancy), is it possible and/or reasonable > to do something in this direction? All opcodes consume time, although a wasted trip or two around the eval loop at the start of a function isn't worth much effort to avoid. Still, it's a legitimate opportunity for provable speedup, even if unmeasurable speedup . Would be more valuable to rethink the debugger's breakpoint approach so that SET_LINENO is never needed (line-triggered callbacks are expensive because called so frequently, turning each dynamic SET_LINENO into a full-blown Python call; if I used the debugger often enough to care , I'd think about munging in a new opcode to make breakpoint sites explicit). immutability-is-made-to-be-violated-ly y'rs - tim From tim_one@email.msn.com Fri Aug 13 05:53:38 1999 From: tim_one@email.msn.com (Tim Peters) Date: Fri, 13 Aug 1999 00:53:38 -0400 Subject: [Python-Dev] shrinking dicts In-Reply-To: <199908122307.AAA06018@pukapuka.inrialpes.fr> Message-ID: <000101bee547$cffaa020$992d2399@tim> [Vladimir Marangozov, *almost* seems ready to give up on a counter- productive dict pessimization ] > ... > There is, however, a small percentage of dicts which are used > below 1/3 of their size. I must say, below 1/3 of their peek size, > because dowsizing is also rare. To trigger a downsize, 1/3 new > entries of the peek size must be inserted. Not so, although "on average" 1/6 may be correct. Look at an extreme: Say a dict has size 333 (it can't, but it makes the math obvious ...). Say it contains 221 items. Now someone deletes them all, one at a time. used==0 and fill==221 at this point. They insert one new key that happens to hit one of the 333-221 = 112 remaining NULL keys. Then used==1 and fill==222. They insert a 2nd key, and before the dict is searched the new fill of 222 triggers the 2/3rds load-factor resizing -- which asks for a new size of 1*2 == 2. For the minority of dicts that go up and down in size wildly many times, the current behavior is fine. > Besides these observations, after looking at the code one more > time, I can't really understand why the resize logic is based on > the "fill" watermark and not on "used". fill = used + dummy, but > since lookdict returns the first free slot (null or dummy), I don't > really see what's the point of using a fill watermark... Let's just consider an unsuccessful search. Then it does return "the first" free slot, but not necessarily at the time it *sees* the first free slot. So long as it sees a dummy, it has to keep searching; the search doesn't end until it finds a NULL. So consider this, assuming the resize triggered only on "used": d = {} for i in xrange(50000): d[random.randrange(1000000)] = 1 for k in d.keys(): del d[k] # now there are 50000 dummy dict keys, and some number of NULLs # loop invariant: used == 0 for i in xrange(sys.maxint): j = random.randrange(10000000) d[j] = 1 del d[j] assert not d.has_key(i) However many NULL slots remained, the last loop eventually transforms them *all* into dummies. The dummies act exactly like "real keys" with respect to expected time for an unsuccessful search, which is why it's thoroughly appropriate to include dummies in the load factor computation. The loop will run slower and slower as the percentage of dummies approaches 100%, and each failing has_key approaches O(N) time. In most hash table implementations that's the worst that can happen (and it's a disaster), but under Python's implementation it's worse: Python never checks to see whether the probe sequence "wraps around", so the first search after the last NULL is changed to a dummy never ends. Counting the dummies in the load-factor computation prevents all that: no matter how much inserts and deletes are intermixed, the "effective load factor" stays under 2/3rds so gives excellent expected-case behavior; and it also protects against an all-dummy dict, making the lack of an expensive inner-loop "wrapped around?" check safe. > Perhaps you can enlighten me on this. Using only the "used" metrics > seems fine to me. I even deactivated "fill" and replaced it with "used" > to see what happens -- no visible changes, except a tiny speedup I'm > willing to neglect. You need a mix of deletes and inserts for the dummies to make a difference; dicts that always grow don't have dummies, so they're not likely to have any dummy-related problems either . Try this (untested): import time from random import randrange N = 1000 thatmany = [None] * N while 1: start = time.clock() for i in thatmany: d[randrange(10000000)] = 1 for i in d.keys(): del d[i] finish = time.clock() print round(finish - start, 3) Succeeding iterations of the outer loop should grow dramatically slower, and finally get into an infinite loop, despite that "used" never exceeds N. Short course rewording: for purposes of predicting expected search time, a dummy is the same as a live key, because finding a dummy doesn't end a search -- it has to press on until either finding the key it was looking for, or finding a NULL. And with a mix of insertions and deletions, and if the hash function is doing a good job, then over time all the slots in the table will become either live or dummy, even if "used" stays within a very small range. So, that's why . dictobject-may-be-the-subtlest-object-there-is-ly y'rs - tim From gstein@lyra.org Fri Aug 13 10:13:55 1999 From: gstein@lyra.org (Greg Stein) Date: Fri, 13 Aug 1999 02:13:55 -0700 (PDT) Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> Message-ID: On Tue, 10 Aug 1999, Fredrik Lundh wrote: >... > unicode objects do not implement the getcharbuffer slot. This is Goodness. All righty. >... > maybe the unicode class shouldn't implement the > buffer interface at all? sure looks like the best way It is needed for fp.write(unicodeobj) ... It is also very handy for C functions to deal with Unicode strings. > to avoid trivial mistakes (the current behaviour of > fp.write(unicodeobj) is even more serious than the > marshal glitch...) What's wrong with fp.write(unicodeobj)? It should write the unicode value to the file. Are you suggesting that it will need to be done differently? Icky. > or maybe the buffer design needs an overhaul? Not that I know of. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Fri Aug 13 11:36:13 1999 From: gstein@lyra.org (Greg Stein) Date: Fri, 13 Aug 1999 03:36:13 -0700 (PDT) Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: <199908101412.KAA02065@eric.cnri.reston.va.us> Message-ID: On Tue, 10 Aug 1999, Guido van Rossum wrote: >... > > or maybe the buffer design needs an overhaul? > > I think most places that should use the charbuffer interface actually > use the readbuffer interface. This is what should be fixed. I believe that I properly changed all of these within the core distribution. Per your requested design, third-party extensions must switch from "s#" to "t#" to move to the charbuffer interface, as needed. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Vladimir.Marangozov@inrialpes.fr Fri Aug 13 14:47:05 1999 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Fri, 13 Aug 1999 14:47:05 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <000101bee51f$d7601de0$fb2d2399@tim> from "Tim Peters" at "Aug 12, 99 08:07:32 pm" Message-ID: <199908131347.OAA30740@pukapuka.inrialpes.fr> Tim Peters wrote: > > [Vladimir Marangozov, *almost* seems ready to give up on a counter- > productive dict pessimization ] > Of course I will! Now everything is perfectly clear. Thanks. > ... > So, that's why . > Now, *this* one explanation of yours should go into a HowTo/BecauseOf for developers. I timed your scripts and a couple of mine which attest (again) the validity of the current implementation. My patch is out of bounds. It even disturbs from time to time the existing harmony in the results ;-) because of early resizing. All in all, for performance reasons, dicts remain an exception to the rule of releasing memory ASAP. They have been designed to tolerate caching because of their dynamics, which is the main reason for the rare case addressed by my patch. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal@lemburg.com Fri Aug 13 18:27:19 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 13 Aug 1999 19:27:19 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: Message-ID: <37B45577.7772CAA1@lemburg.com> Greg Stein wrote: > > On Tue, 10 Aug 1999, Guido van Rossum wrote: > >... > > > or maybe the buffer design needs an overhaul? > > > > I think most places that should use the charbuffer interface actually > > use the readbuffer interface. This is what should be fixed. > > I believe that I properly changed all of these within the core > distribution. Per your requested design, third-party extensions must > switch from "s#" to "t#" to move to the charbuffer interface, as needed. Shouldn't this be the other way around ? After all, extensions using "s#" do expect character data and not arbitrary binary encodings of information. IMHO, the latter should be special cased, not the prior. E.g. it doesn't make sense to use the re module to scan over 2-byte Unicode with single character based search patterns. Aside: Is the buffer interface reachable in any way from within Python ? Why isn't the interface exposed via __XXX__ methods on normal Python instances (could be implemented by returning a buffer object) ? -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 140 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Fred L. Drake, Jr." References: <37B45577.7772CAA1@lemburg.com> Message-ID: <14260.15000.398399.840716@weyr.cnri.reston.va.us> M.-A. Lemburg writes: > Aside: Is the buffer interface reachable in any way from within > Python ? Why isn't the interface exposed via __XXX__ methods > on normal Python instances (could be implemented by returning a > buffer object) ? Would it even make sense? I though a large part of the intent was to for performance, avoiding memory copies. Perhaps there should be an .__as_buffer__() which returned an object that supports the C buffer interface. I'm not sure how useful it would be; perhaps for classes that represent image data? They could return a buffer object created from a string/array/NumPy array. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fredrik@pythonware.com Fri Aug 13 16:59:12 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Fri, 13 Aug 1999 17:59:12 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us> Message-ID: <00d401bee5a7$13f84750$f29b12c2@secret.pythonware.com> > Would it even make sense? I though a large part of the intent was > to for performance, avoiding memory copies. looks like there's some confusion here over what the buffer interface is all about. time for a new GvR essay, perhaps? From Fred L. Drake, Jr." References: <37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us> <00d401bee5a7$13f84750$f29b12c2@secret.pythonware.com> Message-ID: <14260.17969.497916.382752@weyr.cnri.reston.va.us> Fredrik Lundh writes: > looks like there's some confusion here over > what the buffer interface is all about. time > for a new GvR essay, perhaps? If he'll write something about it, I'll be glad to adapt it to the extending & embedding manual. It seems important that it be included in the standard documentation since it will be important for extension writers to understand when they should implement it. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fredrik@pythonware.com Fri Aug 13 17:34:46 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Fri, 13 Aug 1999 18:34:46 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> Message-ID: <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> Guido van Rossum wrote: > > btw, how about adding support for buffer access > > to data that have strange internal formats (like cer- > > tain PIL image memories) or isn't directly accessible > > (like "virtual" and "abstract" image buffers in PIL 1.1). > > something like: > > > > int initbuffer(PyObject* obj, void** context); > > int exitbuffer(PyObject* obj, void* context); > > > > and corresponding context arguments to the > > rest of the functions... > > Can you explain this idea more? Without more understanding of PIL I > have no idea what you're talking about... in code: void* context; // this can be done at any time segments = pb->getsegcount(obj, NULL, context); if (!pb->bf_initbuffer(obj, &context)) ... failed to initialise buffer api ... ... allocate segment size buffer ... pb->getsegcount(obj, &bytes, context); ... calculate total buffer size and allocate buffer ... for (i = offset = 0; i < segments; i++) { n = pb->getreadbuffer(obj, i, &p, context); if (n < 0) ... failed to fetch a given segment ... memcpy(buf + offset, p, n); // or write to file, or whatevef offset = offset + n; } pb->bf_exitbuffer(obj, context); in other words, this would given the target object a chance to keep some local context (like a temporary buffer) during a sequence of buffer operations... for PIL, this would make it possible to 1) store required metadata (size, mode, palette) along with the actual buffer contents. 2) possibly pack formats that use extra internal storage for performance reasons -- RGB pixels are stored as 32-bit integers, for example. 3) access virtual image memories (that can only be accessed via a buffer-like interface in them- selves -- given an image object, you acquire an access handle, and use a getdata method to access the actual data. without initbuffer, there's no way to do two buffer access in parallel. without exitbuffer, there's no way to release the access handle. without the context variable, there's nowhere to keep the access handle between calls.) 4) access abstract image memories (like virtual memories, but they reside outside PIL, like on a remote server, or inside another image pro- cessing library, or on a hardware device). 5) convert to external formats on the fly: fp.write(im.buffer("JPEG")) and probably a lot more. as far as I can tell, nothing of this can be done using the current design... ... besides, what about buffers and threads? if you return a pointer from getreadbuf, wouldn't it be good to know exactly when Python doesn't need that pointer any more? explicit initbuffer/exitbuffer calls around each sequence of buffer operations would make that a lot safer... From mal@lemburg.com Fri Aug 13 20:16:44 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 13 Aug 1999 21:16:44 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us> Message-ID: <37B46F1C.1A513F33@lemburg.com> Fred L. Drake, Jr. wrote: > > M.-A. Lemburg writes: > > Aside: Is the buffer interface reachable in any way from within > > Python ? Why isn't the interface exposed via __XXX__ methods > > on normal Python instances (could be implemented by returning a > > buffer object) ? > > Would it even make sense? I though a large part of the intent was > to for performance, avoiding memory copies. Perhaps there should be > an .__as_buffer__() which returned an object that supports the C > buffer interface. I'm not sure how useful it would be; perhaps for > classes that represent image data? They could return a buffer object > created from a string/array/NumPy array. That's what I had in mind. def __getreadbuffer__(self): return buffer(self.data) def __getcharbuffer__(self): return buffer(self.string_data) def __getwritebuffer__(self): return buffer(self.mmaped_file) Note that buffer() does not copy the data, it only adds a reference to the object being used. Hmm, how about adding a writeable binary object to the core ? This would be useful for the __getwritebbuffer__() API because currently, I think, only mmap'ed files are useable as write buffers -- no other in-memory type. Perhaps buffer objects could be used for this purpose too, e.g. by having them allocate the needed memory chunk in case you pass None as object. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 140 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jack@oratrix.nl Fri Aug 13 22:48:12 1999 From: jack@oratrix.nl (Jack Jansen) Date: Fri, 13 Aug 1999 23:48:12 +0200 Subject: [Python-Dev] Quick-and-dirty weak references Message-ID: <19990813214817.5393C1C4742@oratrix.oratrix.nl> This week again I was bitten by the fact that Python doesn't have any form of weak references, and while I was toying with some ideas I came up with the following quick-and-dirty scheme that I thought I'd bounce off this list. I might even volunteer to implement it, if people agree it is worth it:-) We add a new builtin function (or a module with that function) weak(). This returns a weak reference to the object passed as a parameter. A weak object has one method: strong(), which returns the corresponding real object or raises an exception if the object doesn't exist anymore. For convenience we could add a method exists() that returns true if the real object still exists. Now comes the bit that I'm unsure about: to implement this I need to add a pointer to every object. This pointer is either NULL or points to the corresponding weak objectt (so for every object there is either no weak reference object or exactly one). But, for the price of 4 bytes extra in every object we get the nicety that there is little cpu-overhead: refcounting macros work identical to the way they do now, the only thing to take care of is that during object deallocation we have to zero the weak pointer. (actually: we could make do with a single bit in every object, with the bit meaning "this object has an associated weak object". We could then use a global dictionary indexed by object address to find the weak object) From here on life is easy: the weak object is a normal refcounted object with a pointer to the real object as its only data. weak() creates the weak object if it doesn't exist and returns the existing (and INCREFfed) weak object if it does. Strong() checks that self->object->weak == self and returns self->object (INCREFfed) if it is. This works on all platforms that I'm aware of, but it could break if there are any (Python) platforms that can have objects at VM addresses that are later, when the object has been free()d, become invalid addresses. And even then a vmaddrvalid() function, only needed in the strong() method, could solve this. The weak object isn't transparent, because you have to call strong() before you can do anything with it, but this is an advantage (says he, aspiring to a career in politics or sales:-): with a transparent weak object the object could disappear at unexpected moments and with this scheme it can't, because when you have the object itself in hand you have a refcount too. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mal@lemburg.com Sat Aug 14 00:15:39 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 14 Aug 1999 01:15:39 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: Message-ID: <37B4A71B.2073875F@lemburg.com> Greg Stein wrote: > > On Tue, 10 Aug 1999, Fredrik Lundh wrote: > > maybe the unicode class shouldn't implement the > > buffer interface at all? sure looks like the best way > > It is needed for fp.write(unicodeobj) ... > > It is also very handy for C functions to deal with Unicode strings. Wouldn't a special C API be (even) more convenient ? > > to avoid trivial mistakes (the current behaviour of > > fp.write(unicodeobj) is even more serious than the > > marshal glitch...) > > What's wrong with fp.write(unicodeobj)? It should write the unicode value > to the file. Are you suggesting that it will need to be done differently? > Icky. Would this also write some kind of Unicode encoding header ? [Sorry, this is my Unicode ignorance shining through... I only remember lots of talk about these things on the string-sig.] Since fp.write() uses "s#" this would use the getreadbuffer slot in 1.5.2... I think what it *should* do is use the getcharbuffer slot instead (see my other post), since dumping the raw unicode data would loose too much information. Again, such things should be handled by extra methods, e.g. fp.rawwrite(). Hmm, I guess the philosophy behind the interface is not really clear. Binary data is fetched via getreadbuffer and then interpreted as character data... I always thought that the getcharbuffer should be used for such an interpretation. Or maybe, we should dump the getcharbufer slot again and use the getreadbuffer information just as we would a void* pointer in C: with no explicit or implicit type information. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 140 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein@lyra.org Sat Aug 14 09:53:04 1999 From: gstein@lyra.org (Greg Stein) Date: Sat, 14 Aug 1999 01:53:04 -0700 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <37B4A71B.2073875F@lemburg.com> Message-ID: <37B52E70.2D957546@lyra.org> M.-A. Lemburg wrote: > > Greg Stein wrote: > > > > On Tue, 10 Aug 1999, Fredrik Lundh wrote: > > > maybe the unicode class shouldn't implement the > > > buffer interface at all? sure looks like the best way > > > > It is needed for fp.write(unicodeobj) ... > > > > It is also very handy for C functions to deal with Unicode strings. > > Wouldn't a special C API be (even) more convenient ? Why? Accessing the Unicode values as a series of bytes matches exactly to the semantics of the buffer interface. Why throw in Yet Another Function? Your abstract.c functions make it quite simple. > > > to avoid trivial mistakes (the current behaviour of > > > fp.write(unicodeobj) is even more serious than the > > > marshal glitch...) > > > > What's wrong with fp.write(unicodeobj)? It should write the unicode value > > to the file. Are you suggesting that it will need to be done differently? > > Icky. > > Would this also write some kind of Unicode encoding header ? > [Sorry, this is my Unicode ignorance shining through... I only > remember lots of talk about these things on the string-sig.] Absolutely not. Placing the Byte Order Mark (BOM) into an output stream is an application-level task. It should never by done by any subsystem. There are no other "encoding headers" that would go into the output stream. The output would simply be UTF-16 (2-byte values in host byte order). > Since fp.write() uses "s#" this would use the getreadbuffer > slot in 1.5.2... I think what it *should* do is use the > getcharbuffer slot instead (see my other post), since dumping > the raw unicode data would loose too much information. Again, I very much disagree. To me, fp.write() is not about writing characters to a stream. I think it makes much more sense as "writing bytes to a stream" and the buffer interface fits that perfectly. There is no loss of data. You could argue that the byte order is lost, but I think that is incorrect. The application defines the semantics: the file might be defined as using host-order, or the application may be writing a BOM at the head of the file. > such things should be handled by extra methods, e.g. fp.rawwrite(). I believe this would be a needless complication of the interface. > Hmm, I guess the philosophy behind the interface is not > really clear. I didn't design or implement it initially, but (as you may have guessed) I am a proponent of its existence. > Binary data is fetched via getreadbuffer and then > interpreted as character data... I always thought that the > getcharbuffer should be used for such an interpretation. The former is bad behavior. That is why getcharbuffer was added (by me, for 1.5.2). It was a preventative measure for the introduction of Unicode strings. Using getreadbuffer for characters would break badly given a Unicode string. Therefore, "clients" that want (8-bit) characters from an object supporting the buffer interface should use getcharbuffer. The Unicode object doesn't implement it, implying that it cannot provide 8-bit characters. You can get the raw bytes thru getreadbuffer. > Or maybe, we should dump the getcharbufer slot again and > use the getreadbuffer information just as we would a > void* pointer in C: with no explicit or implicit type information. Nope. That path is frought with failure :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal@lemburg.com Sat Aug 14 11:21:51 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 14 Aug 1999 12:21:51 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <19990813214817.5393C1C4742@oratrix.oratrix.nl> Message-ID: <37B5433F.61CE6F76@lemburg.com> Jack Jansen wrote: > > This week again I was bitten by the fact that Python doesn't have any > form of weak references, and while I was toying with some ideas I came > up with the following quick-and-dirty scheme that I thought I'd bounce > off this list. I might even volunteer to implement it, if people agree > it is worth it:-) Have you checked the weak reference dictionary implementation by Dieter Maurer ? It's at: http://www.handshake.de/~dieter/weakdict.html While I like the idea of having weak references in the core, I think 4 extra bytes for *every* object is just a little too much. The flag bit idea (with the added global dictionary of weak referenced objects) looks promising though. BTW, how would this be done in JPython ? I guess it doesn't make much sense there because cycles are no problem for the Java VM GC. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 139 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Sat Aug 14 13:30:45 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 14 Aug 1999 14:30:45 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> Message-ID: <37B56175.23ABB350@lemburg.com> Greg Stein wrote: > > M.-A. Lemburg wrote: > > > > Greg Stein wrote: > > > > > > On Tue, 10 Aug 1999, Fredrik Lundh wrote: > > > > maybe the unicode class shouldn't implement the > > > > buffer interface at all? sure looks like the best way > > > > > > It is needed for fp.write(unicodeobj) ... > > > > > > It is also very handy for C functions to deal with Unicode strings. > > > > Wouldn't a special C API be (even) more convenient ? > > Why? Accessing the Unicode values as a series of bytes matches exactly > to the semantics of the buffer interface. Why throw in Yet Another > Function? I meant PyUnicode_* style APIs for dealing with all the aspects of Unicode objects -- much like the PyString_* APIs available. > Your abstract.c functions make it quite simple. BTW, do we need an extra set of those with buffer index or not ? Those would really be one-liners for the sake of hiding the type slots from applications. > > > > to avoid trivial mistakes (the current behaviour of > > > > fp.write(unicodeobj) is even more serious than the > > > > marshal glitch...) > > > > > > What's wrong with fp.write(unicodeobj)? It should write the unicode value > > > to the file. Are you suggesting that it will need to be done differently? > > > Icky. > > > > Would this also write some kind of Unicode encoding header ? > > [Sorry, this is my Unicode ignorance shining through... I only > > remember lots of talk about these things on the string-sig.] > > Absolutely not. Placing the Byte Order Mark (BOM) into an output stream > is an application-level task. It should never by done by any subsystem. > > There are no other "encoding headers" that would go into the output > stream. The output would simply be UTF-16 (2-byte values in host byte > order). Ok. > > Since fp.write() uses "s#" this would use the getreadbuffer > > slot in 1.5.2... I think what it *should* do is use the > > getcharbuffer slot instead (see my other post), since dumping > > the raw unicode data would loose too much information. Again, > > I very much disagree. To me, fp.write() is not about writing characters > to a stream. I think it makes much more sense as "writing bytes to a > stream" and the buffer interface fits that perfectly. This is perfectly ok, but shouldn't the behaviour of fp.write() mimic that of previous Python versions ? How does JPython write the data ? Inlined different subject: I think the internal semantics of "s#" using the getreadbuffer slot and "t#" the getcharbuffer slot should be switched; see my other post. In previous Python versions "s#" had the semantics of string data with possibly embedded NULL bytes. Now it suddenly has the meaning of binary data and you can't simply change extensions to use the new "t#" because people are still using them with older Python versions. > There is no loss of data. You could argue that the byte order is lost, > but I think that is incorrect. The application defines the semantics: > the file might be defined as using host-order, or the application may be > writing a BOM at the head of the file. The problem here is that many application were not written to handle these kind of objects. Previously they could only handle strings, now they can suddenly handle any object having the buffer interface and then fail when the data gets read back in. > > such things should be handled by extra methods, e.g. fp.rawwrite(). > > I believe this would be a needless complication of the interface. It would clarify things and make the interface 100% backward compatible again. > > Hmm, I guess the philosophy behind the interface is not > > really clear. > > I didn't design or implement it initially, but (as you may have guessed) > I am a proponent of its existence. > > > Binary data is fetched via getreadbuffer and then > > interpreted as character data... I always thought that the > > getcharbuffer should be used for such an interpretation. > > The former is bad behavior. That is why getcharbuffer was added (by me, > for 1.5.2). It was a preventative measure for the introduction of > Unicode strings. Using getreadbuffer for characters would break badly > given a Unicode string. Therefore, "clients" that want (8-bit) > characters from an object supporting the buffer interface should use > getcharbuffer. The Unicode object doesn't implement it, implying that it > cannot provide 8-bit characters. You can get the raw bytes thru > getreadbuffer. I agree 100%, but did you add the "t#" instead of having "s#" use the getcharbuffer interface ? E.g. my mxTextTools package uses "s#" on many APIs. Now someone could stick in a Unicode object and get pretty strange results without any notice about mxTextTools and Unicode being incompatible. You could argue that I change to "t#", but that doesn't work since many people out there still use Python versions <1.5.2 and those didn't have "t#", so mxTextTools would then fail completely for them. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 139 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein@lyra.org Sat Aug 14 12:34:17 1999 From: gstein@lyra.org (Greg Stein) Date: Sat, 14 Aug 1999 04:34:17 -0700 Subject: [Python-Dev] buffer design (was: marshal (was:Buffer interface in abstract.c?)) References: <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> Message-ID: <37B55439.683272D2@lyra.org> M.-A. Lemburg wrote: >... > I meant PyUnicode_* style APIs for dealing with all the aspects > of Unicode objects -- much like the PyString_* APIs available. Sure, these could be added as necessary. For raw access to the bytes, I would refer people to the abstract buffer functions, tho. > > Your abstract.c functions make it quite simple. > > BTW, do we need an extra set of those with buffer index or not ? > Those would really be one-liners for the sake of hiding the > type slots from applications. It sounds like NumPy and PIL would need it, which makes the landscape quite a bit different from the last time we discussed this (when we didn't imagine anybody needing those). >... > > > Since fp.write() uses "s#" this would use the getreadbuffer > > > slot in 1.5.2... I think what it *should* do is use the > > > getcharbuffer slot instead (see my other post), since dumping > > > the raw unicode data would loose too much information. Again, > > > > I very much disagree. To me, fp.write() is not about writing characters > > to a stream. I think it makes much more sense as "writing bytes to a > > stream" and the buffer interface fits that perfectly. > > This is perfectly ok, but shouldn't the behaviour of fp.write() > mimic that of previous Python versions ? How does JPython > write the data ? fp.write() had no semantics for writing Unicode objects since they didn't exist. Therefore, we are not breaking or changing any behavior. > Inlined different subject: > I think the internal semantics of "s#" using the getreadbuffer slot > and "t#" the getcharbuffer slot should be switched; see my other post. 1) Too late 2) The use of "t#" ("text") for the getcharbuffer slot was decided by the Benevolent Dictator. 3) see (2) > In previous Python versions "s#" had the semantics of string data > with possibly embedded NULL bytes. Now it suddenly has the meaning > of binary data and you can't simply change extensions to use the > new "t#" because people are still using them with older Python > versions. Guido and I had a pretty long discussion on what the best approach here was. I think we even pulled in Tim as a final arbiter, as I recall. I believe "s#" remained getreadbuffer simply because it *also* meant "give me the bytes of that object". If it changed to getcharbuffer, then you could see exceptions in code that didn't raise exceptions beforehand. (more below) > > There is no loss of data. You could argue that the byte order is lost, > > but I think that is incorrect. The application defines the semantics: > > the file might be defined as using host-order, or the application may be > > writing a BOM at the head of the file. > > The problem here is that many application were not written > to handle these kind of objects. Previously they could only > handle strings, now they can suddenly handle any object > having the buffer interface and then fail when the data > gets read back in. An application is a complete unit. How are you suddenly going to manifest Unicode objects within that application? The only way is if the developer goes in and changes things; let them deal with the issues and fallout of their change. The other is external changes such as an upgrade to the interpreter or a module. Again, (IMO) if you're perturbing a system, then you are responsible for also correcting any problems you introduce. In any case, Guido's position was that things can easily switch over to the "t#" interface to prevent the class of error where you pass a Unicode string to a function that expects a standard string. > > > such things should be handled by extra methods, e.g. fp.rawwrite(). > > > > I believe this would be a needless complication of the interface. > > It would clarify things and make the interface 100% backward > compatible again. No. "s#" used to pull bytes from any buffer-capable object. Your suggestion for "s#" to use the getcharbuffer could introduce exceptions into currently-working code. (this was probably Guido's prime motivation for the currently meaning of "t#"... I can dig up the mail thread if people need an authoritative commentary on the decision that was made) > > > Hmm, I guess the philosophy behind the interface is not > > > really clear. > > > > I didn't design or implement it initially, but (as you may have guessed) > > I am a proponent of its existence. > > > > > Binary data is fetched via getreadbuffer and then > > > interpreted as character data... I always thought that the > > > getcharbuffer should be used for such an interpretation. > > > > The former is bad behavior. That is why getcharbuffer was added (by me, > > for 1.5.2). It was a preventative measure for the introduction of > > Unicode strings. Using getreadbuffer for characters would break badly > > given a Unicode string. Therefore, "clients" that want (8-bit) > > characters from an object supporting the buffer interface should use > > getcharbuffer. The Unicode object doesn't implement it, implying that it > > cannot provide 8-bit characters. You can get the raw bytes thru > > getreadbuffer. > > I agree 100%, but did you add the "t#" instead of having > "s#" use the getcharbuffer interface ? Yes. For reasons detailed above. > E.g. my mxTextTools > package uses "s#" on many APIs. Now someone could stick > in a Unicode object and get pretty strange results without > any notice about mxTextTools and Unicode being incompatible. They could also stick in an array of integers. That supports the buffer interface, meaning the "s#" in your code would extract the bytes from it. In other words, people can already stick bogus stuff into your code. This seems to be a moot argument. > You could argue that I change to "t#", but that doesn't > work since many people out there still use Python versions > <1.5.2 and those didn't have "t#", so mxTextTools would then > fail completely for them. If support for the older versions is needed, then use an #ifdef to set up the appropriate macro in some header. Use that throughout your code. In any case: yes -- I would argue that you should absolutely be using "t#". Cheers, -g -- Greg Stein, http://www.lyra.org/ From fredrik@pythonware.com Sat Aug 14 14:19:07 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Sat, 14 Aug 1999 15:19:07 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> Message-ID: <003101bee657$972d1550$f29b12c2@secret.pythonware.com> M.-A. Lemburg wrote: > I meant PyUnicode_* style APIs for dealing with all the aspects > of Unicode objects -- much like the PyString_* APIs available. it's already there, of course. see unicode.h in the unicode distribution (Mark is hopefully adding this to 1.6 in this very moment...) > > I very much disagree. To me, fp.write() is not about writing characters > > to a stream. I think it makes much more sense as "writing bytes to a > > stream" and the buffer interface fits that perfectly. > > This is perfectly ok, but shouldn't the behaviour of fp.write() > mimic that of previous Python versions ? How does JPython > write the data ? the crucial point is how an average user expects things to work. the current design is quite assymmetric -- you can easily *write* things that implement the buffer inter- face to a stream, but how the heck do you get them back? (as illustrated by the marshal buglet...) From fredrik@pythonware.com Sat Aug 14 16:21:48 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Sat, 14 Aug 1999 17:21:48 +0200 Subject: [Python-Dev] buffer design (was: marshal (was:Buffer interface in abstract.c?)) References: <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> <37B55439.683272D2@lyra.org> Message-ID: <004201bee668$ba6e9870$f29b12c2@secret.pythonware.com> Greg Stein wrote: > > E.g. my mxTextTools > > package uses "s#" on many APIs. Now someone could stick > > in a Unicode object and get pretty strange results without > > any notice about mxTextTools and Unicode being incompatible. > > They could also stick in an array of integers. That supports the buffer > interface, meaning the "s#" in your code would extract the bytes from > it. In other words, people can already stick bogus stuff into your code. Except that people may expect unicode strings to work just like any other kind of string, while arrays are surely a different thing. I'm beginning to suspect that the current buffer design is partially broken; it tries to work around at least two problems at once: a) the current use of "string" objects for two purposes: as strings of 8-bit characters, and as buffers containing arbitrary binary data. b) performance issues when reading/writing certain kinds of data to/from streams. and fails to fully address either of them. From mal@lemburg.com Sat Aug 14 17:30:21 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 14 Aug 1999 18:30:21 +0200 Subject: [Python-Dev] Re: buffer design References: <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> <37B55439.683272D2@lyra.org> Message-ID: <37B5999D.201EA88C@lemburg.com> Greg Stein wrote: > > M.-A. Lemburg wrote: > >... > > I meant PyUnicode_* style APIs for dealing with all the aspects > > of Unicode objects -- much like the PyString_* APIs available. > > Sure, these could be added as necessary. For raw access to the bytes, I > would refer people to the abstract buffer functions, tho. I guess that's up to them... PyUnicode_AS_WCHAR() could also be exposed I guess (are C's wchar strings useable as Unicode basis ?). > > > Your abstract.c functions make it quite simple. > > > > BTW, do we need an extra set of those with buffer index or not ? > > Those would really be one-liners for the sake of hiding the > > type slots from applications. > > It sounds like NumPy and PIL would need it, which makes the landscape > quite a bit different from the last time we discussed this (when we > didn't imagine anybody needing those). Ok, then I'll add them and post the new set next week. > >... > > > > Since fp.write() uses "s#" this would use the getreadbuffer > > > > slot in 1.5.2... I think what it *should* do is use the > > > > getcharbuffer slot instead (see my other post), since dumping > > > > the raw unicode data would loose too much information. Again, > > > > > > I very much disagree. To me, fp.write() is not about writing characters > > > to a stream. I think it makes much more sense as "writing bytes to a > > > stream" and the buffer interface fits that perfectly. > > > > This is perfectly ok, but shouldn't the behaviour of fp.write() > > mimic that of previous Python versions ? How does JPython > > write the data ? > > fp.write() had no semantics for writing Unicode objects since they > didn't exist. Therefore, we are not breaking or changing any behavior. The problem is hidden in polymorph functions and tools: previously they could not handle anything but strings, now they also work on arbitrary buffers without raising exceptions. That's what I'm concerned about. > > Inlined different subject: > > I think the internal semantics of "s#" using the getreadbuffer slot > > and "t#" the getcharbuffer slot should be switched; see my other post. > > 1) Too late > 2) The use of "t#" ("text") for the getcharbuffer slot was decided by > the Benevolent Dictator. > 3) see (2) 1) It's not too late: most people aren't even aware of the buffer interface (except maybe the small crowd on this list). 2) A mistake in patchlevel release of Python can easily be undone in the next minor release. No big deal. 3) Too remain even compatible to 1.5.2 in future revisions, a new explicit marker, e.g. "r#" for raw data, could be added to hold the code for getreadbuffer. "s#" and "z#" should then switch to using getcharbuffer. > > In previous Python versions "s#" had the semantics of string data > > with possibly embedded NULL bytes. Now it suddenly has the meaning > > of binary data and you can't simply change extensions to use the > > new "t#" because people are still using them with older Python > > versions. > > Guido and I had a pretty long discussion on what the best approach here > was. I think we even pulled in Tim as a final arbiter, as I recall. What was the final argument then ? (I guess the discussion was held *before* the addition of getcharbuffer, right ?) > I believe "s#" remained getreadbuffer simply because it *also* meant > "give me the bytes of that object". If it changed to getcharbuffer, then > you could see exceptions in code that didn't raise exceptions > beforehand. > > (more below) "s#" historically always meant "give be char* data with length". It did not mean: "give me a pointer to the data area and its length". That interpretation is new in 1.5.2. Even integers and lists could provide buffer access with the new interpretation... (sound evil ;-) > > > There is no loss of data. You could argue that the byte order is lost, > > > but I think that is incorrect. The application defines the semantics: > > > the file might be defined as using host-order, or the application may be > > > writing a BOM at the head of the file. > > > > The problem here is that many application were not written > > to handle these kind of objects. Previously they could only > > handle strings, now they can suddenly handle any object > > having the buffer interface and then fail when the data > > gets read back in. > > An application is a complete unit. How are you suddenly going to > manifest Unicode objects within that application? The only way is if the > developer goes in and changes things; let them deal with the issues and > fallout of their change. The other is external changes such as an > upgrade to the interpreter or a module. Again, (IMO) if you're > perturbing a system, then you are responsible for also correcting any > problems you introduce. Well, ok, if you're talking about standalone apps. I was referring to applications which interact with other applications, e.g. via files or sockets. You could pass a Unicode obj to a socket and have it transfer the data to the other end without getting an exception on the sending part of the connection. The receiver would read the data as string and most probably fail. The whole application sitting in between and dealing with the protocol and connection management wouldn't even notice that you've just tried to extended its capabilities. > In any case, Guido's position was that things can easily switch over to > the "t#" interface to prevent the class of error where you pass a > Unicode string to a function that expects a standard string. Strange, why should code that relies on 8-bit character data be changed because a new unsupported object type pops up ? Code supporting the new type will have to be rewritten anyway, but why break existing extensions in unpredicted ways ? > > > > such things should be handled by extra methods, e.g. fp.rawwrite(). > > > > > > I believe this would be a needless complication of the interface. > > > > It would clarify things and make the interface 100% backward > > compatible again. > > No. "s#" used to pull bytes from any buffer-capable object. Your > suggestion for "s#" to use the getcharbuffer could introduce exceptions > into currently-working code. The buffer objects were introduced in 1.5.1, AFAIR. Changing the semantics back to the original ones would only break extensions relying on the behaviour you desribe -- the distribution can easily be adapted to use some other marker, such as "r#". > (this was probably Guido's prime motivation for the currently meaning of > "t#"... I can dig up the mail thread if people need an authoritative > commentary on the decision that was made) > > > > > Hmm, I guess the philosophy behind the interface is not > > > > really clear. > > > > > > I didn't design or implement it initially, but (as you may have guessed) > > > I am a proponent of its existence. > > > > > > > Binary data is fetched via getreadbuffer and then > > > > interpreted as character data... I always thought that the > > > > getcharbuffer should be used for such an interpretation. > > > > > > The former is bad behavior. That is why getcharbuffer was added (by me, > > > for 1.5.2). It was a preventative measure for the introduction of > > > Unicode strings. Using getreadbuffer for characters would break badly > > > given a Unicode string. Therefore, "clients" that want (8-bit) > > > characters from an object supporting the buffer interface should use > > > getcharbuffer. The Unicode object doesn't implement it, implying that it > > > cannot provide 8-bit characters. You can get the raw bytes thru > > > getreadbuffer. > > > > I agree 100%, but did you add the "t#" instead of having > > "s#" use the getcharbuffer interface ? > > Yes. For reasons detailed above. > > > E.g. my mxTextTools > > package uses "s#" on many APIs. Now someone could stick > > in a Unicode object and get pretty strange results without > > any notice about mxTextTools and Unicode being incompatible. > > They could also stick in an array of integers. That supports the buffer > interface, meaning the "s#" in your code would extract the bytes from > it. In other words, people can already stick bogus stuff into your code. Right now they can with 1.5.1 and 1.5.2 which is unfortunate. I'd rather have the parsing function raise an exception. > This seems to be a moot argument. Not really when you have to support extensions across three different patchlevels of Python. > > You could argue that I change to "t#", but that doesn't > > work since many people out there still use Python versions > > <1.5.2 and those didn't have "t#", so mxTextTools would then > > fail completely for them. > > If support for the older versions is needed, then use an #ifdef to set > up the appropriate macro in some header. Use that throughout your code. > > In any case: yes -- I would argue that you should absolutely be using > "t#". I can easily change my code, no big deal, but what about the dozens of other extensions I don't want to bother diving into ? I'd rather see an exception then complete garbage written to a file or a socket. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 139 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Sat Aug 14 17:53:45 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 14 Aug 1999 18:53:45 +0200 Subject: [Python-Dev] buffer design References: <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> <37B55439.683272D2@lyra.org> <004201bee668$ba6e9870$f29b12c2@secret.pythonware.com> Message-ID: <37B59F19.45C1D23B@lemburg.com> Fredrik Lundh wrote: > > Greg Stein wrote: > > > E.g. my mxTextTools > > > package uses "s#" on many APIs. Now someone could stick > > > in a Unicode object and get pretty strange results without > > > any notice about mxTextTools and Unicode being incompatible. > > > > They could also stick in an array of integers. That supports the buffer > > interface, meaning the "s#" in your code would extract the bytes from > > it. In other words, people can already stick bogus stuff into your code. > > Except that people may expect unicode strings > to work just like any other kind of string, while > arrays are surely a different thing. > > I'm beginning to suspect that the current buffer > design is partially broken; it tries to work around > at least two problems at once: > > a) the current use of "string" objects for two purposes: > as strings of 8-bit characters, and as buffers containing > arbitrary binary data. > > b) performance issues when reading/writing certain kinds > of data to/from streams. > > and fails to fully address either of them. True, a higher level interface for those two objectives would certainly address them much better than what we are trying to do at bit level. Buffers should probably only be treated as pointers to abstract memory areas and nothing more. BTW, what about my suggestion to extend buffers to also allocate memory (in case you pass None as object) ? Or should array be used for that purpose (its an undocumented feature of arrays) ? -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 139 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein@lyra.org Sun Aug 15 03:59:25 1999 From: gstein@lyra.org (Greg Stein) Date: Sat, 14 Aug 1999 19:59:25 -0700 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> Message-ID: <37B62D0D.6EC24240@lyra.org> Fredrik Lundh wrote: >... > besides, what about buffers and threads? if you > return a pointer from getreadbuf, wouldn't it be > good to know exactly when Python doesn't need > that pointer any more? explicit initbuffer/exitbuffer > calls around each sequence of buffer operations > would make that a lot safer... This is a pretty obvious one, I think: it lasts only as long as the object. PyString_AS_STRING is similar. Nothing new or funny here. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Sun Aug 15 04:09:19 1999 From: gstein@lyra.org (Greg Stein) Date: Sat, 14 Aug 1999 20:09:19 -0700 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us> <37B46F1C.1A513F33@lemburg.com> Message-ID: <37B62F5E.30C62070@lyra.org> M.-A. Lemburg wrote: > > Fred L. Drake, Jr. wrote: > > > > M.-A. Lemburg writes: > > > Aside: Is the buffer interface reachable in any way from within > > > Python ? Why isn't the interface exposed via __XXX__ methods > > > on normal Python instances (could be implemented by returning a > > > buffer object) ? > > > > Would it even make sense? I though a large part of the intent was > > to for performance, avoiding memory copies. Perhaps there should be > > an .__as_buffer__() which returned an object that supports the C > > buffer interface. I'm not sure how useful it would be; perhaps for > > classes that represent image data? They could return a buffer object > > created from a string/array/NumPy array. There is no way to do this. The buffer interface only returns pointers to memory. There would be no place to return an intermediary object, nor a way to retain the reference for it. For example, your class instance quickly sets up a PyBufferObject with the relevant data and returns that. The underlying C code must now hold that reference *and* return a pointer to the calling code. Impossible. Fredrik's open/close concept for buffer accesses would make this possible, as long as clients are aware that any returned pointer is valid only until the buffer_close call. The context argument he proposes would hold the object reference. Having class instances respond to the buffer interface is interesting, but until more code attempts to *use* the interface, I'm not quite sure of the utility... >... > Hmm, how about adding a writeable binary object to the core ? > This would be useful for the __getwritebbuffer__() API because > currently, I think, only mmap'ed files are useable as write > buffers -- no other in-memory type. Perhaps buffer objects > could be used for this purpose too, e.g. by having them > allocate the needed memory chunk in case you pass None as > object. Yes, this would be very good. I would recommend that you pass an integer, however, rather than None. You need to tell it the size of the buffer to allocate. Since buffer(5) has no meaning at the moment, altering the semantics to include this form would not be a problem. Cheers, -g -- Greg Stein, http://www.lyra.org/ From da@ski.org Sun Aug 15 07:10:59 1999 From: da@ski.org (David Ascher) Date: Sat, 14 Aug 1999 23:10:59 -0700 (Pacific Daylight Time) Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: <37B62F5E.30C62070@lyra.org> Message-ID: On Sat, 14 Aug 1999, Greg Stein wrote: > Having class instances respond to the buffer interface is interesting, > but until more code attempts to *use* the interface, I'm not quite sure > of the utility... Well, here's an example from my work today. Maybe someone can suggest an alternative that I haven't seen. I'm using buffer objects to pass pointers to structs back and forth between Python and Windows (Win32's GUI scheme involves sending messages to functions with, oftentimes, addresses of structs as arguments, and expect the called function to modify the struct directly -- similarly, I must call Win32 functions w/ pointers to memory that Windows will modify, and be able to read the modified memory). With 'raw' buffer object manipulation (after exposing the PyBuffer_FromMemoryReadWrite call to Python), this works fine [*]. So far, no instances. I also have a class which allows the user to describe the buffer memory layout in a natural way given the C struct, and manipulate the buffer layout w/ getattr/setattr. For example: class Win32MenuItemStruct(AutoStruct): # # for each slot, specify type (maps to a struct.pack specifier), # name (for setattr/getattr behavior) and optional defaults. # table = [(UINT, 'cbSize', AutoStruct.sizeOfStruct), (UINT, 'fMask', MIIM_STRING | MIIM_TYPE | MIIM_ID), (UINT, 'fType', MFT_STRING), (UINT, 'fState', MFS_ENABLED), (UINT, 'wID', None), (HANDLE, 'hSubMenu', 0), (HANDLE, 'hbmpChecked', 0), (HANDLE, 'hbmpUnchecked', 0), (DWORD, 'dwItemData', 0), (LPSTR, 'name', None), (UINT, 'cch', 0)] AutoStruct has machinery which allows setting of buffer slices by slot name, conversion of numeric types, etc. This is working well. The only hitch is that to send the buffer to the SWIG'ed function call, I have three options, none ideal: 1) define a __str__ method which makes a string of the buffer and pass that to the function which expects an "s#" argument. This send a copy of the data, not the address. As a result, this works well for structs which I create from scratch as long as I don't need to see any changes that Windows might have performed on the memory. 2) send the instance but make up my own 'get-the-instance-as-buffer' API -- complicates extension module code. 3) send the buffer attribute of the instance instead of the instance -- complicates Python code, and the C code isn't trivial because there is no 'buffer' typecode for PyArg_ParseTuple(). If I could define an def __aswritebuffer__ and if there was a PyArg_ParseTuple() typecode associated with read/write buffers (I nominate 'w'!), I believe things would be simpler -- I could then send the instance, specify in the PyArgParse_Tuple that I want a pointer to memory, and I'd be golden. What did I miss? --david [*] I feel naughty modifying random bits of memory from Python, but Bill Gates made me do it! From mal@lemburg.com Sun Aug 15 09:47:00 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 15 Aug 1999 10:47:00 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us> <37B46F1C.1A513F33@lemburg.com> <37B62F5E.30C62070@lyra.org> Message-ID: <37B67E84.6BBC8136@lemburg.com> Greg Stein wrote: > > [me suggesting new __XXX__ methods on Python instances to provide > the buffer slots to Python programmers] > > Having class instances respond to the buffer interface is interesting, > but until more code attempts to *use* the interface, I'm not quite sure > of the utility... Well, there already is lots of code supporting the interface, e.g. fp.write(), socket.write() etc. Basically all streaming interfaces I guess. So these APIs could be used to "write" the object directly into a file. > >... > > Hmm, how about adding a writeable binary object to the core ? > > This would be useful for the __getwritebbuffer__() API because > > currently, I think, only mmap'ed files are useable as write > > buffers -- no other in-memory type. Perhaps buffer objects > > could be used for this purpose too, e.g. by having them > > allocate the needed memory chunk in case you pass None as > > object. > > Yes, this would be very good. I would recommend that you pass an > integer, however, rather than None. You need to tell it the size of the > buffer to allocate. Since buffer(5) has no meaning at the moment, > altering the semantics to include this form would not be a problem. I was thinking of using the existing buffer(object,offset,size) constructor... that's why I took None as object. offset would then always be 0 and size gives the size of the memory chunk to allocate. Of course, buffer(size) would look nicer, but it seems a rather peculiar interface definition to say: ok, if you pass a real Python integer, we'll take that as size. Who knows, maybe at some in the future, you want to "write" integers via the buffer interface too... then you'd probably also want to write None... so how about a new builtin writebuffer(size) ? Also, I think it would make sense to extend buffers to have methods and attributes: .writeable - attribute that tells whether the buffer is writeable .chardata - true iff the getcharbuffer slot is available .asstring() - return the buffer as Python string object -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 138 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Sun Aug 15 09:59:21 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 15 Aug 1999 10:59:21 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: Message-ID: <37B68169.73E03C84@lemburg.com> David Ascher wrote: > > On Sat, 14 Aug 1999, Greg Stein wrote: > > > Having class instances respond to the buffer interface is interesting, > > but until more code attempts to *use* the interface, I'm not quite sure > > of the utility... > > Well, here's an example from my work today. Maybe someone can suggest an > alternative that I haven't seen. > > I'm using buffer objects to pass pointers to structs back and forth > between Python and Windows (Win32's GUI scheme involves sending messages > to functions with, oftentimes, addresses of structs as arguments, and > expect the called function to modify the struct directly -- similarly, I > must call Win32 functions w/ pointers to memory that Windows will modify, > and be able to read the modified memory). With 'raw' buffer object > manipulation (after exposing the PyBuffer_FromMemoryReadWrite call to > Python), this works fine [*]. So far, no instances. So that's why you were suggesting that struct.pack returns a buffer rather than a string ;-) Actually, I think you could use arrays to do the trick right now, because they are writeable (unlike strings). Until creating writeable buffer objects becomes possible that is... > I also have a class which allows the user to describe the buffer memory > layout in a natural way given the C struct, and manipulate the buffer > layout w/ getattr/setattr. For example: > > class Win32MenuItemStruct(AutoStruct): > # > # for each slot, specify type (maps to a struct.pack specifier), > # name (for setattr/getattr behavior) and optional defaults. > # > table = [(UINT, 'cbSize', AutoStruct.sizeOfStruct), > (UINT, 'fMask', MIIM_STRING | MIIM_TYPE | MIIM_ID), > (UINT, 'fType', MFT_STRING), > (UINT, 'fState', MFS_ENABLED), > (UINT, 'wID', None), > (HANDLE, 'hSubMenu', 0), > (HANDLE, 'hbmpChecked', 0), > (HANDLE, 'hbmpUnchecked', 0), > (DWORD, 'dwItemData', 0), > (LPSTR, 'name', None), > (UINT, 'cch', 0)] > > AutoStruct has machinery which allows setting of buffer slices by slot > name, conversion of numeric types, etc. This is working well. > > The only hitch is that to send the buffer to the SWIG'ed function call, I > have three options, none ideal: > > 1) define a __str__ method which makes a string of the buffer and pass > that to the function which expects an "s#" argument. This send > a copy of the data, not the address. As a result, this works > well for structs which I create from scratch as long as I don't need > to see any changes that Windows might have performed on the memory. > > 2) send the instance but make up my own 'get-the-instance-as-buffer' > API -- complicates extension module code. > > 3) send the buffer attribute of the instance instead of the instance -- > complicates Python code, and the C code isn't trivial because there > is no 'buffer' typecode for PyArg_ParseTuple(). > > If I could define an > > def __aswritebuffer__ > > and if there was a PyArg_ParseTuple() typecode associated with read/write > buffers (I nominate 'w'!), I believe things would be simpler -- I could > then send the instance, specify in the PyArgParse_Tuple that I want a > pointer to memory, and I'd be golden. > > What did I miss? Just a naming thingie: __getwritebuffer__ et al. would map to the C interfaces more directly. The new typecode "w#" for writeable buffer style objects is a good idea (it should only work on single segment buffers). -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 138 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik@pythonware.com Sun Aug 15 11:32:59 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Sun, 15 Aug 1999 12:32:59 +0200 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> Message-ID: <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> > Fredrik Lundh wrote: > >... > > besides, what about buffers and threads? if you > > return a pointer from getreadbuf, wouldn't it be > > good to know exactly when Python doesn't need > > that pointer any more? explicit initbuffer/exitbuffer > > calls around each sequence of buffer operations > > would make that a lot safer... > > This is a pretty obvious one, I think: it lasts only as long as the > object. PyString_AS_STRING is similar. Nothing new or funny here. well, I think the buffer behaviour is both new and pretty funny: from array import array a = array("f", [0]*8192) b = buffer(a) for i in range(1000): a.append(1234) print b in other words, the buffer interface should be redesigned, or removed. (though I'm sure AOL would find some inter- resting use for this ;-) "Confusing? Yes, but this is a lot better than allowing arbitrary pointers!" -- GvR on assignment operators, November 91 From da@ski.org Sun Aug 15 17:54:23 1999 From: da@ski.org (David Ascher) Date: Sun, 15 Aug 1999 09:54:23 -0700 (Pacific Daylight Time) Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: <37B68169.73E03C84@lemburg.com> Message-ID: On Sun, 15 Aug 1999, M.-A. Lemburg wrote: > Actually, I think you could use arrays to do the trick right now, > because they are writeable (unlike strings). Until creating > writeable buffer objects becomes possible that is... No, because I can't make an array around existing memory which Win32 allocates before I get to it. > Just a naming thingie: __getwritebuffer__ et al. would map to the > C interfaces more directly. Whatever. > The new typecode "w#" for writeable buffer style objects is a good idea > (it should only work on single segment buffers). Indeed. --david From gstein@lyra.org Sun Aug 15 21:27:57 1999 From: gstein@lyra.org (Greg Stein) Date: Sun, 15 Aug 1999 13:27:57 -0700 Subject: [Python-Dev] w# typecode (was: marshal (was:Buffer interface in abstract.c? )) References: Message-ID: <37B722CD.383A2A9E@lyra.org> David Ascher wrote: > On Sun, 15 Aug 1999, M.-A. Lemburg wrote: > ... > > The new typecode "w#" for writeable buffer style objects is a good idea > > (it should only work on single segment buffers). > > Indeed. I just borrowed Guido's time machine. That typecode is already in 1.5.2. :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Sun Aug 15 21:35:25 1999 From: gstein@lyra.org (Greg Stein) Date: Sun, 15 Aug 1999 13:35:25 -0700 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> Message-ID: <37B7248D.31E5D2BF@lyra.org> Fredrik Lundh wrote: >... > well, I think the buffer behaviour is both > new and pretty funny: I think the buffer interface was introduced in 1.5 (by Jack?). I added the 8-bit character buffer slot and buffer objects in 1.5.2. > from array import array > > a = array("f", [0]*8192) > > b = buffer(a) > > for i in range(1000): > a.append(1234) > > print b > > in other words, the buffer interface should > be redesigned, or removed. I don't understand what you believe is weird here. Also, are you saying the buffer *interface* is weird, or the buffer *object* ? thx, -g -- Greg Stein, http://www.lyra.org/ From da@ski.org Sun Aug 15 21:49:23 1999 From: da@ski.org (David Ascher) Date: Sun, 15 Aug 1999 13:49:23 -0700 (Pacific Daylight Time) Subject: [Python-Dev] w# typecode (was: marshal (was:Buffer interface in abstract.c? )) In-Reply-To: <37B722CD.383A2A9E@lyra.org> Message-ID: On Sun, 15 Aug 1999, Greg Stein wrote: > David Ascher wrote: > > On Sun, 15 Aug 1999, M.-A. Lemburg wrote: > > ... > > > The new typecode "w#" for writeable buffer style objects is a good idea > > > (it should only work on single segment buffers). > > > > Indeed. > > I just borrowed Guido's time machine. That typecode is already in 1.5.2. Ha. Cool. --da From gstein@lyra.org Sun Aug 15 21:53:51 1999 From: gstein@lyra.org (Greg Stein) Date: Sun, 15 Aug 1999 13:53:51 -0700 Subject: [Python-Dev] instances as buffers References: Message-ID: <37B728DF.2CA2A20A@lyra.org> David Ascher wrote: >... > I'm using buffer objects to pass pointers to structs back and forth > between Python and Windows (Win32's GUI scheme involves sending messages > to functions with, oftentimes, addresses of structs as arguments, and > expect the called function to modify the struct directly -- similarly, I > must call Win32 functions w/ pointers to memory that Windows will modify, > and be able to read the modified memory). With 'raw' buffer object > manipulation (after exposing the PyBuffer_FromMemoryReadWrite call to > Python), this works fine [*]. So far, no instances. How do you manage the lifetimes of the memory and objects? PyBuffer_FromReadWriteMemory() creates a buffer object that points to memory. You need to ensure that the memory exists as long as the buffer does. Would it make more sense to use PyBuffer_New(size)? Note: PyBuffer_FromMemory() (read-only) was built primarily for the case where you have static constants in an extension module (strings, code objects, etc) and want to expose them to Python without copying them into the heap. Currently, stuff like this must be copied into a dynamic string object to be exposed to Python. The PyBuffer_FromReadWriteMemory() is there for symmetry, but it can be very dangerous to use because of the lifetime problem. PyBuffer_New() allocates its own memory, so the lifetimes are managed properly. PyBuffer_From*Object maintains a reference to the target object so that the target object can be kept around at least as long as the buffer. > I also have a class which allows the user to describe the buffer memory > layout in a natural way given the C struct, and manipulate the buffer > layout w/ getattr/setattr. For example: This is a very cool class. Mark and I had discussed doing something just like this (a while back) for some of the COM stuff. Basically, we'd want to generate these structures from type libraries. >... > The only hitch is that to send the buffer to the SWIG'ed function call, I > have three options, none ideal: > > 1) define a __str__ method which makes a string of the buffer and pass > that to the function which expects an "s#" argument. This send > a copy of the data, not the address. As a result, this works > well for structs which I create from scratch as long as I don't need > to see any changes that Windows might have performed on the memory. Note that "s#" can be used directly against the buffer object. You could pass it directly rather than via __str__. > 2) send the instance but make up my own 'get-the-instance-as-buffer' > API -- complicates extension module code. > > 3) send the buffer attribute of the instance instead of the instance -- > complicates Python code, and the C code isn't trivial because there > is no 'buffer' typecode for PyArg_ParseTuple(). > > If I could define an > > def __aswritebuffer__ > > and if there was a PyArg_ParseTuple() typecode associated with read/write > buffers (I nominate 'w'!), I believe things would be simpler -- I could > then send the instance, specify in the PyArgParse_Tuple that I want a > pointer to memory, and I'd be golden. > > What did I miss? You can do #3 today since there is a buffer typecode present ("w" or "w#"). It will complicate Python code a bit since you need to pass the buffer, but it is the simplest of the three options. Allowing instances to return buffers does seem to make sense, although it exposes a lot of underlying machinery at the Python level. It might be nicer to find a better semantic for this than just exposing the buffer interface slots. Cheers, -g -- Greg Stein, http://www.lyra.org/ From da@ski.org Sun Aug 15 22:07:35 1999 From: da@ski.org (David Ascher) Date: Sun, 15 Aug 1999 14:07:35 -0700 (Pacific Daylight Time) Subject: [Python-Dev] Re: instances as buffers In-Reply-To: <37B728DF.2CA2A20A@lyra.org> Message-ID: On Sun, 15 Aug 1999, Greg Stein wrote: > How do you manage the lifetimes of the memory and objects? > PyBuffer_FromReadWriteMemory() creates a buffer object that points to > memory. You need to ensure that the memory exists as long as the buffer > does. For those cases where I use PyBuffer_FromReadWriteMemory, I have no control over the memory involved. Windows allocates the memory, lets me use it for a litle while, and it cleans it up whenever it feels like it. It hasn't been a problem yet, but I agree that it's possibly a problem. I'd call it a problem w/ the win32 API, though. > Would it make more sense to use PyBuffer_New(size)? Again, I can't because I am given a pointer and am expected to modify e.g. bytes 10-12 starting from that memory location. > This is a very cool class. Mark and I had discussed doing something just > like this (a while back) for some of the COM stuff. Basically, we'd want > to generate these structures from type libraries. I know zilch about type libraries. This is for CE work, although none about this class is CE-specific. Do type libraries give the same kind of info? > You can do #3 today since there is a buffer typecode present ("w" or > "w#"). It will complicate Python code a bit since you need to pass the > buffer, but it is the simplest of the three options. Ok. Time to patch SWIG again! --david From Vladimir.Marangozov@inrialpes.fr Mon Aug 16 00:35:10 1999 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Mon, 16 Aug 1999 00:35:10 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <000101bee51f$d7601de0$fb2d2399@tim> from "Tim Peters" at "Aug 12, 99 08:07:32 pm" Message-ID: <199908152335.AAA55842@pukapuka.inrialpes.fr> Tim Peters wrote: > > Would be more valuable to rethink the debugger's breakpoint approach so that > SET_LINENO is never needed (line-triggered callbacks are expensive because > called so frequently, turning each dynamic SET_LINENO into a full-blown > Python call; if I used the debugger often enough to care , I'd think > about munging in a new opcode to make breakpoint sites explicit). > > immutability-is-made-to-be-violated-ly y'rs - tim > Could you elaborate a bit more on this? Do you mean setting breakpoints on a per opcode basis (for example by exchanging the original opcode with a new BREAKPOINT opcode in the code object) and use the lineno tab for breakpoints based on the source listing? -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tim_one@email.msn.com Mon Aug 16 03:31:16 1999 From: tim_one@email.msn.com (Tim Peters) Date: Sun, 15 Aug 1999 22:31:16 -0400 Subject: [Python-Dev] about line numbers In-Reply-To: <199908152335.AAA55842@pukapuka.inrialpes.fr> Message-ID: <000101bee78f$6aa217e0$f22d2399@tim> [Vladimir Marangozov] > Could you elaborate a bit more on this? No time for this now -- sorry. > Do you mean setting breakpoints on a per opcode basis (for example > by exchanging the original opcode with a new BREAKPOINT opcode in > the code object) and use the lineno tab for breakpoints based on > the source listing? Something like that. The classic way to implement positional breakpoints is to perturb the code; the classic problem is how to get back the effect of the code that was overwritten. From gstein@lyra.org Mon Aug 16 05:42:19 1999 From: gstein@lyra.org (Greg Stein) Date: Sun, 15 Aug 1999 21:42:19 -0700 Subject: [Python-Dev] Re: why References: Message-ID: <37B796AB.34F6F93@lyra.org> David Ascher wrote: > > Why does buffer(array('c', 'test')) return a read-only buffer? Simply because the buffer() builtin always creates a read-only object, rather than selecting read/write when possible. Shouldn't be hard to alter the semantics of buffer() to do so. Maybe do this at the same time as updating it to create read/write buffers out of the blue. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim_one@email.msn.com Mon Aug 16 07:42:17 1999 From: tim_one@email.msn.com (Tim Peters) Date: Mon, 16 Aug 1999 02:42:17 -0400 Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: <19990813214817.5393C1C4742@oratrix.oratrix.nl> Message-ID: <000b01bee7b2$7c62d780$f22d2399@tim> [Jack Jansen] > ... A long time ago, Dianne Hackborn actually implemented a scheme like this, under the name VREF (for "virtual reference", or some such). IIRC, differences from your scheme were mainly that: 1) There was an elaborate proxy mechanism to avoid having to explicitly strengthen the weak. 2) Each object contained a pointer to a linked list of associated weak refs. This predates DejaNews, so may be a pain to find. > ... > We add a new builtin function (or a module with that function) > weak(). This returns a weak reference to the object passed as a > parameter. A weak object has one method: strong(), which returns the > corresponding real object or raises an exception if the object doesn't > exist anymore. This interface appears nearly isomorphic to MIT Scheme's "hash" and "unhash" functions, except that their hash returns an (unbounded) int and guarantees that hash(o1) != hash(o2) for any distinct objects o1 and o2 (this is a stronger guarantee than Python's "id", which may return the same int for objects with disjoint lifetimes; the other reason object address isn't appropriate for them is that objects can be moved by garbage collection, but hash is an object invariant). Of course unhash(hash(o)) is o, unless o has been gc'ed; then unhash raises an exception. By most accounts (I haven't used it seriously myself), it's a usable interface. > ... > to implement this I need to add a pointer to every object. That's unattractive, of course. > ... > (actually: we could make do with a single bit in every object, with > the bit meaning "this object has an associated weak object". We could > then use a global dictionary indexed by object address to find the > weak object) Is a single bit actually smaller than a pointer? For example, on most machines these days #define PyObject_HEAD \ int ob_refcnt; \ struct _typeobject *ob_type; is two 4-byte fields packed solid already, and structure padding prevents adding anything less than a 4-byte increment in reality. I guess on Alpha there's a 4-byte hole here, but I don't want weak pointers enough to switch machines . OTOH, sooner or later Guido is going to want a mark bit too, so the other way to view this is that 32 new flag bits are as cheap as one . There's one other thing I like about this: it can get rid of the dicey > Strong() checks that self->object->weak == self and returns > self->object (INCREFfed) if it is. check. If object has gone away, you're worried that self->object may (on some systems) point to a newly-invalid address. But worse than that, its memory may get reused, and then self->object may point into the *middle* of some other object where the bit pattern at the "weak" offset just happens to equal self. Let's try a sketch in pseduo-Python, where __xxx are secret functions that do the obvious things (and glossing over thread safety since these are presumably really implemented in C): # invariant: __is_weak_bit_set(obj) == id2weak.has_key(id(obj)) # So "the weak bit" is simply an optimization, sparing most objects # from a dict lookup when they die. # The invariant is delicate in the presence of threads. id2weak = {} class _Weak: def __init__(self, obj): self.id = id(obj) # obj's refcount not bumped __set_weak_bit(obj) id2weak[self.id] = self # note that "the system" (see below) sets self.id # to None if obj dies def strong(self): if self.id is None: raise DeadManWalkingError(self.id) return __id2obj(self.id) # will bump obj's refcount def __del__(self): # this is purely an optimization: if self gets nuked, # exempt its referent from greater expense when *it* # dies if self.id is not None: __clear_weak_bit(__id2obj(self.id)) del id2weak[self.id] def weak(obj): return id2weak.get(id(obj), None) or _Weak(obj) and then whenever an object of any kind is deleted the system does: if __is_weak_bit_set(obj): objid = id(obj) id2weak[objid].id = None del id2weak[objid] In my current over-tired state, I think that's safe (modulo threads), portable and reasonably fast; I do think the extra bit costs 4 bytes, though. > ... > The weak object isn't transparent, because you have to call strong() > before you can do anything with it, but this is an advantage (says he, > aspiring to a career in politics or sales:-): with a transparent weak > object the object could disappear at unexpected moments and with this > scheme it can't, because when you have the object itself in hand you > have a refcount too. Explicit is better than implicit for me. [M.-A. Lemburg] > Have you checked the weak reference dictionary implementation > by Dieter Maurer ? It's at: > > http://www.handshake.de/~dieter/weakdict.html A project where I work is using it; it blows up a lot . While some form of weak dict is what most people want in the end, I'm not sure Dieter's decision to support weak dicts with only weak values (not weak keys) is sufficient. For example, the aforementioned project wants to associate various computed long strings with certain hashable objects, and for some reason or other (ain't my project ...) these objects can't be changed. So they can't store the strings in the objects. So they'd like to map the objects to the strings via assorted dicts. But using the object as a dict key keeps it (and, via the dicts, also its associated strings) artificially alive; they really want a weakdict with weak *keys*. I'm not sure I know of a clear & fast way to implement a weakdict building only on the weak() function. Jack? Using weak objects as values (or keys) with an ordinary dict can prevent their referents from being kept artificially alive, but that doesn't get the dict itself cleaned up by magic. Perhaps "the system" should notify a weak object when its referent goes away; that would at least give the WO a chance to purge itself from structures it knows it's in ... > ... > BTW, how would this be done in JPython ? I guess it doesn't > make much sense there because cycles are no problem for the > Java VM GC. Weak refs have many uses beyond avoiding cycles, and Java 1.2 has all of "hard", "soft", "weak", and "phantom" references. See java.lang.ref for details. I stopped paying attention to Java, so it's up to you to tell us what you learn about it . From fredrik@pythonware.com Mon Aug 16 08:06:43 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Mon, 16 Aug 1999 09:06:43 +0200 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B7248D.31E5D2BF@lyra.org> Message-ID: <009401bee7b5$e5dc27e0$f29b12c2@secret.pythonware.com> > I think the buffer interface was introduced in 1.5 (by Jack?). I added > the 8-bit character buffer slot and buffer objects in 1.5.2. > > > from array import array > > > > a = array("f", [0]*8192) > > > > b = buffer(a) > > > > for i in range(1000): > > a.append(1234) > > > > print b > > > > in other words, the buffer interface should > > be redesigned, or removed. > > I don't understand what you believe is weird here. did you run that code? it may work, it may bomb, or it may generate bogus output. all depending on your memory allocator, the phase of the moon, etc. just like back in the C/C++ days... imo, that's not good enough for a core feature. From gstein@lyra.org Mon Aug 16 08:15:54 1999 From: gstein@lyra.org (Greg Stein) Date: Mon, 16 Aug 1999 00:15:54 -0700 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B7248D.31E5D2BF@lyra.org> <009401bee7b5$e5dc27e0$f29b12c2@secret.pythonware.com> Message-ID: <37B7BAAA.1E6EE4CA@lyra.org> Fredrik Lundh wrote: > > > I think the buffer interface was introduced in 1.5 (by Jack?). I added > > the 8-bit character buffer slot and buffer objects in 1.5.2. > > > > > from array import array > > > > > > a = array("f", [0]*8192) > > > > > > b = buffer(a) > > > > > > for i in range(1000): > > > a.append(1234) > > > > > > print b > > > > > > in other words, the buffer interface should > > > be redesigned, or removed. > > > > I don't understand what you believe is weird here. > > did you run that code? Yup. It printed nothing. > it may work, it may bomb, or it may generate bogus > output. all depending on your memory allocator, the > phase of the moon, etc. just like back in the C/C++ > days... It probably appeared as an empty string because the construction of the array filled it with zeroes (at least the first byte). Regardless, I'd be surprised if it crashed the interpreter. The print command is supposed to do a str() on the object, which creates a PyStringObject from the buffer contents. Shouldn't be a crash there. > imo, that's not good enough for a core feature. If it crashed, then sure. But I'd say that indicates a bug rather than a design problem. Do you have a stack trace from a crash? Ah. I just worked through, in my head, what is happening here. The buffer object caches the pointer returned by the array object. The append on the array does a realloc() somewhere, thereby invalidating the pointer inside the buffer object. Icky. Gotta think on this one... As an initial thought, it would seem that the buffer would have to re-query the pointer for each operation. There are performance implications there, of course, but that would certainly fix the problem. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jack@oratrix.nl Mon Aug 16 10:42:42 1999 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 16 Aug 1999 11:42:42 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: Message by David Ascher , Sun, 15 Aug 1999 09:54:23 -0700 (Pacific Daylight Time) , Message-ID: <19990816094243.3CE83303120@snelboot.oratrix.nl> > On Sun, 15 Aug 1999, M.-A. Lemburg wrote: > > > Actually, I think you could use arrays to do the trick right now, > > because they are writeable (unlike strings). Until creating > > writeable buffer objects becomes possible that is... > > No, because I can't make an array around existing memory which Win32 > allocates before I get to it. Would adding a buffer interface to cobject solve your problem? Cobject is described as being used for passing C objects between Python modules, but I've always thought of it as passing C objects from one C routine to another C routine through Python, which doesn't necessarily understand what the object is all about. That latter description seems to fit your bill quite nicely. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From jack@oratrix.nl Mon Aug 16 10:49:41 1999 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 16 Aug 1999 11:49:41 +0200 Subject: [Python-Dev] buffer interface considered harmful In-Reply-To: Message by Greg Stein , Sun, 15 Aug 1999 13:35:25 -0700 , <37B7248D.31E5D2BF@lyra.org> Message-ID: <19990816094941.83BE2303120@snelboot.oratrix.nl> > >... > > well, I think the buffer behaviour is both > > new and pretty funny: > > I think the buffer interface was introduced in 1.5 (by Jack?). I added > the 8-bit character buffer slot and buffer objects in 1.5.2. Ah, now I understand why I didn't understand some of the previous conversation: I hadn't never come across the buffer *objects* (as opposed to the buffer *interface*) until Fredrik's example. I've just look at it, and I'm not sure I understand the full intentions of the buffer object. Buffer objects can either behave as the "buffer-aspect" of the object behind them (without the rest of their functionality) or as array objects, and if they start out life as the first they can evolve into the second, is that right? Is there a rationale behind this design, or is it just something that happened? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From gstein@lyra.org Mon Aug 16 10:56:31 1999 From: gstein@lyra.org (Greg Stein) Date: Mon, 16 Aug 1999 02:56:31 -0700 Subject: [Python-Dev] buffer interface considered harmful References: <19990816094941.83BE2303120@snelboot.oratrix.nl> Message-ID: <37B7E04F.3843004@lyra.org> Jack Jansen wrote: >... > I've just look at it, and I'm not sure I understand the full intentions of the > buffer object. Buffer objects can either behave as the "buffer-aspect" of the > object behind them (without the rest of their functionality) or as array > objects, and if they start out life as the first they can evolve into the > second, is that right? > > Is there a rationale behind this design, or is it just something that > happened? The object doesn't change. You create it as a reference to an existing object's buffer (as exported via the buffer interface), or you create it as a reference to some arbitrary memory. The buffer object provides (optionally read/write) string-like behavior to any object that supports buffer behavior. It can also be used to make lightweight slices of another object. For example: >>> a = "abcdefghi" >>> b = buffer(a, 3, 3) >>> print b def >>> In the above example, there is only one copy of "def" (the portion inside of the string object referenced by ). The string-like behavior can be quite nice for memory-mapped files. Andrew's mmapfile module's file objects export the buffer interface. This means that you can open a file, wrap a buffer around it, and perform quick and easy random-access on the thing. You could even select slices of the file and pass them around as if they were strings, without loading anything into the process heap. (I want to try mmap'ing a .pyc and create code objects that have buffer-based bytecode streams; it will be interesting to see if this significantly reduces memory consumption (in terms of the heap size; the mmap'd .pyc can be shared across processes)). Cheers, -g -- Greg Stein, http://www.lyra.org/ From jim@digicool.com Mon Aug 16 13:30:41 1999 From: jim@digicool.com (Jim Fulton) Date: Mon, 16 Aug 1999 08:30:41 -0400 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> Message-ID: <37B80471.F0F467C9@digicool.com> Fredrik Lundh wrote: > > > Fredrik Lundh wrote: > > >... > > > besides, what about buffers and threads? if you > > > return a pointer from getreadbuf, wouldn't it be > > > good to know exactly when Python doesn't need > > > that pointer any more? explicit initbuffer/exitbuffer > > > calls around each sequence of buffer operations > > > would make that a lot safer... > > > > This is a pretty obvious one, I think: it lasts only as long as the > > object. PyString_AS_STRING is similar. Nothing new or funny here. > > well, I think the buffer behaviour is both > new and pretty funny: > > from array import array > > a = array("f", [0]*8192) > > b = buffer(a) > > for i in range(1000): > a.append(1234) > > print b > > in other words, the buffer interface should > be redesigned, or removed. A while ago I asked for some documentation on the Buffer interface. I basically got silence. At this point, I don't have a good idea what buffers are for and I don't see alot of evidence that there *is* a design. I assume that there was a design, but I can't see it. This whole discussion makes me very queasy. I'm probably just out of it, since I don't have time to read the Python list anymore. Presumably the buffer interface was proposed and discussed there at some distant point in the past. (I can't pay as much attention to this discussion as I suspect I should, due to time constaints and due to a basic understanding of the rational for the buffer interface. Jst now I caught a sniff of something I find kinda repulsive. I think I hear you all talking about beasies that hold a reference to some object's internal storage and that have write operations so you can write directly to the objects storage bypassing the object interfaces. I probably just imagined it.) Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From gstein@lyra.org Mon Aug 16 13:41:23 1999 From: gstein@lyra.org (Greg Stein) Date: Mon, 16 Aug 1999 05:41:23 -0700 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B80471.F0F467C9@digicool.com> Message-ID: <37B806F3.2C5EDC44@lyra.org> Jim Fulton wrote: >... > A while ago I asked for some documentation on the Buffer > interface. I basically got silence. At this point, I I think the silence was caused by the simple fact that the documentation does not (yet) exist. That's all... nothing nefarious. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal@lemburg.com Mon Aug 16 13:05:35 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 16 Aug 1999 14:05:35 +0200 Subject: [Python-Dev] Re: w# typecode (was: marshal (was:Buffer interface in abstract.c? )) References: <37B722CD.383A2A9E@lyra.org> Message-ID: <37B7FE8F.30C35284@lemburg.com> Greg Stein wrote: > > David Ascher wrote: > > On Sun, 15 Aug 1999, M.-A. Lemburg wrote: > > ... > > > The new typecode "w#" for writeable buffer style objects is a good idea > > > (it should only work on single segment buffers). > > > > Indeed. > > I just borrowed Guido's time machine. That typecode is already in 1.5.2. > > :-) Ah, cool :-) -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 138 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Mon Aug 16 13:29:31 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 16 Aug 1999 14:29:31 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <000b01bee7b2$7c62d780$f22d2399@tim> Message-ID: <37B8042B.21DE6053@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > Have you checked the weak reference dictionary implementation > > by Dieter Maurer ? It's at: > > > > http://www.handshake.de/~dieter/weakdict.html > > A project where I work is using it; it blows up a lot . > > While some form of weak dict is what most people want in the end, I'm not > sure Dieter's decision to support weak dicts with only weak values (not weak > keys) is sufficient. For example, the aforementioned project wants to > associate various computed long strings with certain hashable objects, and > for some reason or other (ain't my project ...) these objects can't be > changed. So they can't store the strings in the objects. So they'd like to > map the objects to the strings via assorted dicts. But using the object as > a dict key keeps it (and, via the dicts, also its associated strings) > artificially alive; they really want a weakdict with weak *keys*. > > I'm not sure I know of a clear & fast way to implement a weakdict building > only on the weak() function. Jack? > > Using weak objects as values (or keys) with an ordinary dict can prevent > their referents from being kept artificially alive, but that doesn't get the > dict itself cleaned up by magic. Perhaps "the system" should notify a weak > object when its referent goes away; that would at least give the WO a chance > to purge itself from structures it knows it's in ... Perhaps one could fiddle something out of the Proxy objects in mxProxy (you know where...). These support a special __cleanup__ protocol that I use a lot to work around circular garbage: the __cleanup__ method of the referenced object is called prior to destroying the proxy; even if the reference count on the object has not yet gone down to 0. This makes direct circles possible without problems: the parent can reference a child through the proxy and the child can reference the parent directly. As soon as the parent is cleaned up, the reference to the proxy is deleted which then automagically makes the back reference in the child disappear, allowing the parent to be deallocated after cleanup without leaving a circular reference around. > > ... > > BTW, how would this be done in JPython ? I guess it doesn't > > make much sense there because cycles are no problem for the > > Java VM GC. > > Weak refs have many uses beyond avoiding cycles, and Java 1.2 has all of > "hard", "soft", "weak", and "phantom" references. See java.lang.ref for > details. I stopped paying attention to Java, so it's up to you to tell us > what you learn about it . Thanks for the reference... but I guess this will remain a weak one for some time since the latter is currently a limited resource :-) -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 138 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Mon Aug 16 13:41:51 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 16 Aug 1999 14:41:51 +0200 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B7248D.31E5D2BF@lyra.org> <009401bee7b5$e5dc27e0$f29b12c2@secret.pythonware.com> <37B7BAAA.1E6EE4CA@lyra.org> Message-ID: <37B8070F.763C3FF8@lemburg.com> Greg Stein wrote: > > Fredrik Lundh wrote: > > > > > I think the buffer interface was introduced in 1.5 (by Jack?). I added > > > the 8-bit character buffer slot and buffer objects in 1.5.2. > > > > > > > from array import array > > > > > > > > a = array("f", [0]*8192) > > > > > > > > b = buffer(a) > > > > > > > > for i in range(1000): > > > > a.append(1234) > > > > > > > > print b > > > > > > > > in other words, the buffer interface should > > > > be redesigned, or removed. > > > > > > I don't understand what you believe is weird here. > > > > did you run that code? > > Yup. It printed nothing. > > > it may work, it may bomb, or it may generate bogus > > output. all depending on your memory allocator, the > > phase of the moon, etc. just like back in the C/C++ > > days... > > It probably appeared as an empty string because the construction of the > array filled it with zeroes (at least the first byte). > > Regardless, I'd be surprised if it crashed the interpreter. The print > command is supposed to do a str() on the object, which creates a > PyStringObject from the buffer contents. Shouldn't be a crash there. > > > imo, that's not good enough for a core feature. > > If it crashed, then sure. But I'd say that indicates a bug rather than a > design problem. Do you have a stack trace from a crash? > > Ah. I just worked through, in my head, what is happening here. The > buffer object caches the pointer returned by the array object. The > append on the array does a realloc() somewhere, thereby invalidating the > pointer inside the buffer object. > > Icky. Gotta think on this one... As an initial thought, it would seem > that the buffer would have to re-query the pointer for each operation. > There are performance implications there, of course, but that would > certainly fix the problem. I guess that's the way to go. I wouldn't want to think about those details when using buffer objects and a function call is still better than a copy... it would do the init/exit wrapping implicitly: init at the time the getreadbuffer call is made and exit next time a thread switch is done - provided that the functions using the memory pointer also keep a reference to the buffer object alive (but that should be natural as this is always done when dealing with references in a safe way). -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 138 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jim@digicool.com Mon Aug 16 14:26:40 1999 From: jim@digicool.com (Jim Fulton) Date: Mon, 16 Aug 1999 09:26:40 -0400 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B80471.F0F467C9@digicool.com> <37B806F3.2C5EDC44@lyra.org> Message-ID: <37B81190.165C373E@digicool.com> Greg Stein wrote: > > Jim Fulton wrote: > >... > > A while ago I asked for some documentation on the Buffer > > interface. I basically got silence. At this point, I > > I think the silence was caused by the simple fact that the documentation > does not (yet) exist. That's all... nothing nefarious. I didn't mean to suggest anything nefarious. I do think that a change that affects something as basic as the standard object type layout and that generates this much discussion really should be documented before it becomes part of the core. I'd especially like to see some kind of document that includes information like: - A problem statement that describes the problem the change is solving, - How does the solution solve the problem, - When and how should people writing new types support the new interfaces? We're not talking about a new library module here. There's been a change to the core object interface. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From jack@oratrix.nl Mon Aug 16 14:45:31 1999 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 16 Aug 1999 15:45:31 +0200 Subject: [Python-Dev] buffer interface considered harmful In-Reply-To: Message by Jim Fulton , Mon, 16 Aug 1999 08:30:41 -0400 , <37B80471.F0F467C9@digicool.com> Message-ID: <19990816134531.C30B5303120@snelboot.oratrix.nl> > A while ago I asked for some documentation on the Buffer > interface. I basically got silence. At this point, I > don't have a good idea what buffers are for and I don't see alot > of evidence that there *is* a design. I assume that there was > a design, but I can't see it. This whole discussion makes me > very queasy. Okay, as I'm apparently not the only one who is queasy let's start from scratch. First, there is the old buffer _interface_. This is a C interface that allows extension (and builtin) modules and functions a unified way to access objects if they want to write the object to file and similar things. It is also what the PyArg_ParseTuple "s#" returns. This is, in C, the getreadbuffer/getwritebuffer interface. Second, there's the extension the the buffer interface as of 1.5.2. This is again only available in C, and it allows C programmers to get an object _as an ASCII string_. This is meant for things like regexp modules, to access any "textual" object as an ASCII string. This is the getcharbuffer interface, and bound to the "t#" specifier in PyArg_ParseTuple. Third, there is the buffer _object_, also new in 1.5.2. This sort-of exports the functionality of the buffer interface to Python, but it does a bit more as well, because the buffer objects have a sort of copy-on-write semantics that means they may or may not be "attached" to a python object through the buffer interface. I think that the C interface and the object should be treated completely separately. I definitely want the C interface, but I personally don't use the Python buffer objects, so I don't really care all that much about those. Also, I think that the buffer objects might become easier to understand if we don't think of it as "the buffer interface exported to python", but as "Python buffer objects, that may share memory with other Python objects as an optimization". -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From jim@digicool.com Mon Aug 16 17:03:54 1999 From: jim@digicool.com (Jim Fulton) Date: Mon, 16 Aug 1999 12:03:54 -0400 Subject: [Python-Dev] buffer interface considered harmful References: <19990816134531.C30B5303120@snelboot.oratrix.nl> Message-ID: <37B8366A.82B305C7@digicool.com> Jack Jansen wrote: > > > A while ago I asked for some documentation on the Buffer > > interface. I basically got silence. At this point, I > > don't have a good idea what buffers are for and I don't see alot > > of evidence that there *is* a design. I assume that there was > > a design, but I can't see it. This whole discussion makes me > > very queasy. > > Okay, as I'm apparently not the only one who is queasy let's start from > scratch. Yee ha! > First, there is the old buffer _interface_. This is a C interface that allows > extension (and builtin) modules and functions a unified way to access objects > if they want to write the object to file and similar things. Is this serialization? What does this achiev that, say, the pickling protocols don't achiev? What other problems does it solve? > It is also what > the PyArg_ParseTuple "s#" returns. This is, in C, the > getreadbuffer/getwritebuffer interface. Huh? "s#" doesn't return a string? Or are you saying that you can pass a non-string object to a C function that uses "s#" and have it bufferized and then stringized? In either case, this is not consistent with the documentation (interface) of PyArg_ParseTuple. > Second, there's the extension the the buffer interface as of 1.5.2. This is > again only available in C, and it allows C programmers to get an object _as an > ASCII string_. This is meant for things like regexp modules, to access any > "textual" object as an ASCII string. This is the getcharbuffer interface, and > bound to the "t#" specifier in PyArg_ParseTuple. Hm. So this is making a little more sense. So, there is a notion that there are "textual" objects that want to provide a method for getting their "text". How does this text differ from what you get from __str__ or __repr__? > Third, there is the buffer _object_, also new in 1.5.2. This sort-of exports > the functionality of the buffer interface to Python, How so? Maybe I'm at sea because I still don't get what the C buffer interface is for. > but it does a bit more as > well, because the buffer objects have a sort of copy-on-write semantics that > means they may or may not be "attached" to a python object through the buffer > interface. What is this thing used for? Where does the slot in tp_as_buffer come into all of this? Why does this need to be a slot in the first place? Are these "textual" objects really common? Is the presense of this slot a flag for "textualness"? It would help alot, at least for me, if there was a clearer description of what motivates these things. What problems are they trying to solve? Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From da@ski.org Mon Aug 16 17:45:47 1999 From: da@ski.org (David Ascher) Date: Mon, 16 Aug 1999 09:45:47 -0700 (Pacific Daylight Time) Subject: [Python-Dev] buffer interface considered harmful In-Reply-To: <37B8366A.82B305C7@digicool.com> Message-ID: On Mon, 16 Aug 1999, Jim Fulton wrote: > > Second, there's the extension the the buffer interface as of 1.5.2. This is > > again only available in C, and it allows C programmers to get an object _as an > > ASCII string_. This is meant for things like regexp modules, to access any > > "textual" object as an ASCII string. This is the getcharbuffer interface, and > > bound to the "t#" specifier in PyArg_ParseTuple. > > Hm. So this is making a little more sense. So, there is a notion that > there are "textual" objects that want to provide a method for getting > their "text". How does this text differ from what you get from __str__ > or __repr__? I'll let others give a well thought out rationale. Here are some examples of use which I think worthwile: * Consider an mmap()'ed file, twelve gigabytes long. Making mmapfile objects fit this aspect of the buffer interface allows you to do regexp searches on it w/o ever building a twelve gigabyte PyString. * Consider a non-contiguous NumPy array. If the array type supported the multi-segment buffer interface, extension module writers could manipulate the data within this array w/o having to worry about the non-contiguous nature of the data. They'd still have to worry about the multi-byte nature of the data, but it's still a win. In other words, I think that the buffer interface could be useful even w/ non-textual data. * If NumPy was modified to have arrays with data stored in buffer objects as opposed to the current "char *", and if PIL was modified to have images stored in buffer objects as opposed to whatever it uses, one could have arrays and images which shared data. I think all of these provide examples of motivations which are appealing to at least some Python users. I make no claim that they motivate the specific interface. In all the cases I can think of, one or both of two features are the key asset: - access to subset of huge data regions w/o creation of huge temporary variables. - sharing of data space. Yes, it's a power tool, and as a such should come with safety goggles. But then again, the same is true for ExtensionClasses =). leaving-out-the-regexp-on-NumPy-arrays-example, --david PS: I take back the implicit suggestion that buffer() return read-write buffers when possible. From jim@digicool.com Mon Aug 16 18:06:19 1999 From: jim@digicool.com (Jim Fulton) Date: Mon, 16 Aug 1999 13:06:19 -0400 Subject: [Python-Dev] buffer interface considered harmful References: Message-ID: <37B8450B.C5D308E4@digicool.com> David Ascher wrote: > > On Mon, 16 Aug 1999, Jim Fulton wrote: > > > > Second, there's the extension the the buffer interface as of 1.5.2. This is > > > again only available in C, and it allows C programmers to get an object _as an > > > ASCII string_. This is meant for things like regexp modules, to access any > > > "textual" object as an ASCII string. This is the getcharbuffer interface, and > > > bound to the "t#" specifier in PyArg_ParseTuple. > > > > Hm. So this is making a little more sense. So, there is a notion that > > there are "textual" objects that want to provide a method for getting > > their "text". How does this text differ from what you get from __str__ > > or __repr__? > > I'll let others give a well thought out rationale. I eagerly await this. :) > Here are some examples > of use which I think worthwile: > > * Consider an mmap()'ed file, twelve gigabytes long. Making mmapfile > objects fit this aspect of the buffer interface allows you to do regexp > searches on it w/o ever building a twelve gigabyte PyString. This seems reasonable, if a bit exotic. :) > * Consider a non-contiguous NumPy array. If the array type supported the > multi-segment buffer interface, extension module writers could > manipulate the data within this array w/o having to worry about the > non-contiguous nature of the data. They'd still have to worry about > the multi-byte nature of the data, but it's still a win. In other > words, I think that the buffer interface could be useful even w/ > non-textual data. Why is this a good thing? Why should extension module writes worry abot the non-contiguous nature of the data now? Does the NumPy C API somehow expose this now? Will multi-segment buffers make it go away somehow? > * If NumPy was modified to have arrays with data stored in buffer objects > as opposed to the current "char *", and if PIL was modified to have > images stored in buffer objects as opposed to whatever it uses, one > could have arrays and images which shared data. Uh, and this would be a good thing? Maybe PIL should just be modified to use NumPy arrays. > I think all of these provide examples of motivations which are appealing > to at least some Python users. Perhaps, although Guido knows how they'd find out about them. ;) Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From da@ski.org Mon Aug 16 18:18:46 1999 From: da@ski.org (David Ascher) Date: Mon, 16 Aug 1999 10:18:46 -0700 (Pacific Daylight Time) Subject: [Python-Dev] buffer interface considered harmful In-Reply-To: <37B8450B.C5D308E4@digicool.com> Message-ID: On Mon, 16 Aug 1999, Jim Fulton wrote: >> [regexps on gigabyte files] > > This seems reasonable, if a bit exotic. :) In the bioinformatics world, I think it's everyday stuff. > Why is this a good thing? Why should extension module writes worry > abot the non-contiguous nature of the data now? Does the NumPy C API > somehow expose this now? Will multi-segment buffers make it go away > somehow? A NumPy extension module writer needs to create and modify NumPy arrays. These arrays may be non-contiguous (if e.g. they are the result of slicing). The NumPy C API exposes the non-contiguous nature, but it's hard enough to deal with it that I suspect most extension writers require contiguous arrays, which means unnecessary copies. Multi-segment buffers won't make the API go away necessarily (backwards compatibility and all that), but it could make it unnecessary for many extension writers. > > * If NumPy was modified to have arrays with data stored in buffer objects > > as opposed to the current "char *", and if PIL was modified to have > > images stored in buffer objects as opposed to whatever it uses, one > > could have arrays and images which shared data. > > Uh, and this would be a good thing? Maybe PIL should just be modified > to use NumPy arrays. Why? PIL was designed for image processing, and made design decisions appropriate to that domain. NumPy was designed for multidimensional numeric array processing, and made design decisions appropriate to that domain. The intersection of interests exists (e.g. in the medical imaging world), and I know people who spend a lot of their CPU time moving data between images and arrays with "stupid" tostring/fromstring operations. Given the size of the images, it's a prodigious waste of time, and kills the use of Python in many a project. > Perhaps, although Guido knows how they'd find out about them. ;) Uh? These issues have been discussed in the NumPy/PIL world for a while, with no solution in sight. Recently, I and others saw mentions of buffers in the source, and they seemed like a reasonable approach, which could be done w/o a rewrite of either PIL or NumPy. Don't get me wrong -- I'm all for better documentation of the buffer stuff, design guidelines, warnings and protocols. I stated as much on June 15: http://www.python.org/pipermail/python-dev/1999-June/000338.html --david From jim@digicool.com Mon Aug 16 18:38:22 1999 From: jim@digicool.com (Jim Fulton) Date: Mon, 16 Aug 1999 13:38:22 -0400 Subject: [Python-Dev] buffer interface considered harmful References: Message-ID: <37B84C8E.46885C8E@digicool.com> David Ascher wrote: > > On Mon, 16 Aug 1999, Jim Fulton wrote: > > >> [regexps on gigabyte files] > > > > This seems reasonable, if a bit exotic. :) > > In the bioinformatics world, I think it's everyday stuff. Right, in some (exotic ;) domains it's not exotic at all. > > Why is this a good thing? Why should extension module writes worry > > abot the non-contiguous nature of the data now? Does the NumPy C API > > somehow expose this now? Will multi-segment buffers make it go away > > somehow? > > A NumPy extension module writer needs to create and modify NumPy arrays. > These arrays may be non-contiguous (if e.g. they are the result of > slicing). The NumPy C API exposes the non-contiguous nature, but it's > hard enough to deal with it that I suspect most extension writers require > contiguous arrays, which means unnecessary copies. Hm. This sounds like an API problem to me. > Multi-segment buffers won't make the API go away necessarily (backwards > compatibility and all that), but it could make it unnecessary for many > extension writers. Multi-segment buffers don't make the mult-segmented nature of the memory go away. Do they really simplify the API that much? They seem to strip away an awful lot of information hiding. > > > * If NumPy was modified to have arrays with data stored in buffer objects > > > as opposed to the current "char *", and if PIL was modified to have > > > images stored in buffer objects as opposed to whatever it uses, one > > > could have arrays and images which shared data. > > > > Uh, and this would be a good thing? Maybe PIL should just be modified > > to use NumPy arrays. > > Why? PIL was designed for image processing, and made design decisions > appropriate to that domain. NumPy was designed for multidimensional > numeric array processing, and made design decisions appropriate to that > domain. The intersection of interests exists (e.g. in the medical imaging > world), and I know people who spend a lot of their CPU time moving data > between images and arrays with "stupid" tostring/fromstring operations. > Given the size of the images, it's a prodigious waste of time, and kills > the use of Python in many a project. It seems to me that NumPy is sufficiently broad enogh to encompass image processing. My main concern is having two systems rely on some low-level "shared memory" mechanism to achiev effiecient communication. > > Perhaps, although Guido knows how they'd find out about them. ;) > > Uh? These issues have been discussed in the NumPy/PIL world for a while, > with no solution in sight. Recently, I and others saw mentions of buffers > in the source, and they seemed like a reasonable approach, which could be > done w/o a rewrite of either PIL or NumPy. My point was that people would be lucky to find out about buffers or about how to use them as things stand. > Don't get me wrong -- I'm all for better documentation of the buffer > stuff, design guidelines, warnings and protocols. I stated as much on > June 15: > > http://www.python.org/pipermail/python-dev/1999-June/000338.html Yes, that was quite a jihad you launched. ;) Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From da@ski.org Mon Aug 16 19:25:54 1999 From: da@ski.org (David Ascher) Date: Mon, 16 Aug 1999 11:25:54 -0700 (Pacific Daylight Time) Subject: [Python-Dev] buffer interface considered harmful In-Reply-To: <37B84C8E.46885C8E@digicool.com> Message-ID: On Mon, 16 Aug 1999, Jim Fulton wrote: [ Aside: > It seems to me that NumPy is sufficiently broad enogh to encompass > image processing. Well, I'll just say that you could have been right, but w/ the current NumPy, I don't blame F/ for having developed his own data structures. NumPy is messy, and some of its design decisions are wrong for image things (memory handling, casting rules, etc.). It's all water under the bridge at this point. ] Back to the main topic: You say: > [Multi-segment buffers] seem to strip away an awful lot of information > hiding. My impression of the buffer notion was that it is intended to *provide* information hiding, by giving a simple API to byte arrays which could be stored in various ways. I do agree that whether those bytes should be shared or not is a decision which should be weighted carefully. > My main concern is having two systems rely on some low-level "shared > memory" mechanism to achiev effiecient communication. I don't particularly care about the specific buffer interface (the low-level nature of which is what I think you object to). I do care about having a well-defined mechanism for sharing memory between objects, and I think there is value in defining such an interface generically. Maybe the notion of segmented arrays of bytes is too low-level, and instead we should think of the data spaces as segmented arrays of chunks, where a chunk can be one or more bytes? Or do you object to any 'generic' interface? Just for fun, here's the list of things which either currently do or have been talked about possibly in the future supporting some sort of buffer interface, and my guesses as to chunk size, segmented status and writeability): - strings (1 byte, single-segment, r/o) - unicode strings (2 bytes, single-segment, r/o) - struct.pack() things (1 byte, single-segment,r/o) - arrays (1-4? bytes, single-segment, r/w) - NumPy arrays (1-8 bytes, multi-segment, r/w) - PIL images (1-? bytes, multi-segment, r/w) - CObjects (1-byte, single-segment, r/?) - mmapfiles (1-byte, multi-segment?, r/w) - non-python-owned memory (1-byte, single-segment, r/w) --david From jack@oratrix.nl Mon Aug 16 20:36:40 1999 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 16 Aug 1999 21:36:40 +0200 Subject: [Python-Dev] Buffer interface and multiple threads Message-ID: <19990816193645.9E5B5CF320@oratrix.oratrix.nl> Hmm, something that just struck me: the buffer _interface_ (i.e. the C routines, not the buffer object stuff) is potentially thread-unsafe. In the "old world", where "s#" only worked on string objects, you could be sure that the C pointer returned remained valid as long as you had a reference to the python string object in hand, as strings are immutable. In the "new world", where "s#" also works on, say, array objects, this doesn't hold anymore. So, potentially, while one thread is in a write() system call writing the contents of the array to a file another thread could come in and change the data. Is this a problem? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mal@lemburg.com Mon Aug 16 21:22:12 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 16 Aug 1999 22:22:12 +0200 Subject: [Python-Dev] New htmlentitydefs.py file Message-ID: <37B872F4.1C3F5D39@lemburg.com> This is a multi-part message in MIME format. --------------3B4AC9E96FE0666068F893B2 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Attached you find a new HTML entity definitions file taken and parsed from: http://www.w3.org/TR/1998/REC-html40-19980424/HTMLlat1.ent http://www.w3.org/TR/1998/REC-html40-19980424/HTMLsymbol.ent http://www.w3.org/TR/1998/REC-html40-19980424/HTMLspecial.ent The latter two contain Unicode charcodes which obviously cannot (yet) be mapped to Unicode strings... perhaps Fredrik wants to include a spiced up version in with his Unicode type. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 138 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ --------------3B4AC9E96FE0666068F893B2 Content-Type: text/plain; charset=us-ascii; name="htmlentitydefs.py" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="htmlentitydefs.py" """ Entity definitions for HTML4.0. Taken and parsed from: http://www.w3.org/TR/1998/REC-html40/HTMLlat1.ent http://www.w3.org/TR/1998/REC-html40/HTMLsymbol.ent http://www.w3.org/TR/1998/REC-html40/HTMLspecial.ent """ entitydefs = { 'AElig': chr(198), # latin capital letter AE = latin capital ligature AE, U+00C6 ISOlat1 'Aacute': chr(193), # latin capital letter A with acute, U+00C1 ISOlat1 'Acirc': chr(194), # latin capital letter A with circumflex, U+00C2 ISOlat1 'Agrave': chr(192), # latin capital letter A with grave = latin capital letter A grave, U+00C0 ISOlat1 'Alpha': 'Α', # greek capital letter alpha, U+0391 'Aring': chr(197), # latin capital letter A with ring above = latin capital letter A ring, U+00C5 ISOlat1 'Atilde': chr(195), # latin capital letter A with tilde, U+00C3 ISOlat1 'Auml': chr(196), # latin capital letter A with diaeresis, U+00C4 ISOlat1 'Beta': 'Β', # greek capital letter beta, U+0392 'Ccedil': chr(199), # latin capital letter C with cedilla, U+00C7 ISOlat1 'Chi': 'Χ', # greek capital letter chi, U+03A7 'Dagger': '‡', # double dagger, U+2021 ISOpub 'Delta': 'Δ', # greek capital letter delta, U+0394 ISOgrk3 'ETH': chr(208), # latin capital letter ETH, U+00D0 ISOlat1 'Eacute': chr(201), # latin capital letter E with acute, U+00C9 ISOlat1 'Ecirc': chr(202), # latin capital letter E with circumflex, U+00CA ISOlat1 'Egrave': chr(200), # latin capital letter E with grave, U+00C8 ISOlat1 'Epsilon': 'Ε', # greek capital letter epsilon, U+0395 'Eta': 'Η', # greek capital letter eta, U+0397 'Euml': chr(203), # latin capital letter E with diaeresis, U+00CB ISOlat1 'Gamma': 'Γ', # greek capital letter gamma, U+0393 ISOgrk3 'Iacute': chr(205), # latin capital letter I with acute, U+00CD ISOlat1 'Icirc': chr(206), # latin capital letter I with circumflex, U+00CE ISOlat1 'Igrave': chr(204), # latin capital letter I with grave, U+00CC ISOlat1 'Iota': 'Ι', # greek capital letter iota, U+0399 'Iuml': chr(207), # latin capital letter I with diaeresis, U+00CF ISOlat1 'Kappa': 'Κ', # greek capital letter kappa, U+039A 'Lambda': 'Λ', # greek capital letter lambda, U+039B ISOgrk3 'Mu': 'Μ', # greek capital letter mu, U+039C 'Ntilde': chr(209), # latin capital letter N with tilde, U+00D1 ISOlat1 'Nu': 'Ν', # greek capital letter nu, U+039D 'Oacute': chr(211), # latin capital letter O with acute, U+00D3 ISOlat1 'Ocirc': chr(212), # latin capital letter O with circumflex, U+00D4 ISOlat1 'Ograve': chr(210), # latin capital letter O with grave, U+00D2 ISOlat1 'Omega': 'Ω', # greek capital letter omega, U+03A9 ISOgrk3 'Omicron': 'Ο', # greek capital letter omicron, U+039F 'Oslash': chr(216), # latin capital letter O with stroke = latin capital letter O slash, U+00D8 ISOlat1 'Otilde': chr(213), # latin capital letter O with tilde, U+00D5 ISOlat1 'Ouml': chr(214), # latin capital letter O with diaeresis, U+00D6 ISOlat1 'Phi': 'Φ', # greek capital letter phi, U+03A6 ISOgrk3 'Pi': 'Π', # greek capital letter pi, U+03A0 ISOgrk3 'Prime': '″', # double prime = seconds = inches, U+2033 ISOtech 'Psi': 'Ψ', # greek capital letter psi, U+03A8 ISOgrk3 'Rho': 'Ρ', # greek capital letter rho, U+03A1 'Sigma': 'Σ', # greek capital letter sigma, U+03A3 ISOgrk3 'THORN': chr(222), # latin capital letter THORN, U+00DE ISOlat1 'Tau': 'Τ', # greek capital letter tau, U+03A4 'Theta': 'Θ', # greek capital letter theta, U+0398 ISOgrk3 'Uacute': chr(218), # latin capital letter U with acute, U+00DA ISOlat1 'Ucirc': chr(219), # latin capital letter U with circumflex, U+00DB ISOlat1 'Ugrave': chr(217), # latin capital letter U with grave, U+00D9 ISOlat1 'Upsilon': 'Υ', # greek capital letter upsilon, U+03A5 ISOgrk3 'Uuml': chr(220), # latin capital letter U with diaeresis, U+00DC ISOlat1 'Xi': 'Ξ', # greek capital letter xi, U+039E ISOgrk3 'Yacute': chr(221), # latin capital letter Y with acute, U+00DD ISOlat1 'Zeta': 'Ζ', # greek capital letter zeta, U+0396 'aacute': chr(225), # latin small letter a with acute, U+00E1 ISOlat1 'acirc': chr(226), # latin small letter a with circumflex, U+00E2 ISOlat1 'acute': chr(180), # acute accent = spacing acute, U+00B4 ISOdia 'aelig': chr(230), # latin small letter ae = latin small ligature ae, U+00E6 ISOlat1 'agrave': chr(224), # latin small letter a with grave = latin small letter a grave, U+00E0 ISOlat1 'alefsym': 'ℵ', # alef symbol = first transfinite cardinal, U+2135 NEW 'alpha': 'α', # greek small letter alpha, U+03B1 ISOgrk3 'and': '∧', # logical and = wedge, U+2227 ISOtech 'ang': '∠', # angle, U+2220 ISOamso 'aring': chr(229), # latin small letter a with ring above = latin small letter a ring, U+00E5 ISOlat1 'asymp': '≈', # almost equal to = asymptotic to, U+2248 ISOamsr 'atilde': chr(227), # latin small letter a with tilde, U+00E3 ISOlat1 'auml': chr(228), # latin small letter a with diaeresis, U+00E4 ISOlat1 'bdquo': '„', # double low-9 quotation mark, U+201E NEW 'beta': 'β', # greek small letter beta, U+03B2 ISOgrk3 'brvbar': chr(166), # broken bar = broken vertical bar, U+00A6 ISOnum 'bull': '•', # bullet = black small circle, U+2022 ISOpub 'cap': '∩', # intersection = cap, U+2229 ISOtech 'ccedil': chr(231), # latin small letter c with cedilla, U+00E7 ISOlat1 'cedil': chr(184), # cedilla = spacing cedilla, U+00B8 ISOdia 'cent': chr(162), # cent sign, U+00A2 ISOnum 'chi': 'χ', # greek small letter chi, U+03C7 ISOgrk3 'clubs': '♣', # black club suit = shamrock, U+2663 ISOpub 'cong': '≅', # approximately equal to, U+2245 ISOtech 'copy': chr(169), # copyright sign, U+00A9 ISOnum 'crarr': '↵', # downwards arrow with corner leftwards = carriage return, U+21B5 NEW 'cup': '∪', # union = cup, U+222A ISOtech 'curren': chr(164), # currency sign, U+00A4 ISOnum 'dArr': '⇓', # downwards double arrow, U+21D3 ISOamsa 'dagger': '†', # dagger, U+2020 ISOpub 'darr': '↓', # downwards arrow, U+2193 ISOnum 'deg': chr(176), # degree sign, U+00B0 ISOnum 'delta': 'δ', # greek small letter delta, U+03B4 ISOgrk3 'diams': '♦', # black diamond suit, U+2666 ISOpub 'divide': chr(247), # division sign, U+00F7 ISOnum 'eacute': chr(233), # latin small letter e with acute, U+00E9 ISOlat1 'ecirc': chr(234), # latin small letter e with circumflex, U+00EA ISOlat1 'egrave': chr(232), # latin small letter e with grave, U+00E8 ISOlat1 'empty': '∅', # empty set = null set = diameter, U+2205 ISOamso 'emsp': ' ', # em space, U+2003 ISOpub 'ensp': ' ', # en space, U+2002 ISOpub 'epsilon': 'ε', # greek small letter epsilon, U+03B5 ISOgrk3 'equiv': '≡', # identical to, U+2261 ISOtech 'eta': 'η', # greek small letter eta, U+03B7 ISOgrk3 'eth': chr(240), # latin small letter eth, U+00F0 ISOlat1 'euml': chr(235), # latin small letter e with diaeresis, U+00EB ISOlat1 'exist': '∃', # there exists, U+2203 ISOtech 'fnof': 'ƒ', # latin small f with hook = function = florin, U+0192 ISOtech 'forall': '∀', # for all, U+2200 ISOtech 'frac12': chr(189), # vulgar fraction one half = fraction one half, U+00BD ISOnum 'frac14': chr(188), # vulgar fraction one quarter = fraction one quarter, U+00BC ISOnum 'frac34': chr(190), # vulgar fraction three quarters = fraction three quarters, U+00BE ISOnum 'frasl': '⁄', # fraction slash, U+2044 NEW 'gamma': 'γ', # greek small letter gamma, U+03B3 ISOgrk3 'ge': '≥', # greater-than or equal to, U+2265 ISOtech 'hArr': '⇔', # left right double arrow, U+21D4 ISOamsa 'harr': '↔', # left right arrow, U+2194 ISOamsa 'hearts': '♥', # black heart suit = valentine, U+2665 ISOpub 'hellip': '…', # horizontal ellipsis = three dot leader, U+2026 ISOpub 'iacute': chr(237), # latin small letter i with acute, U+00ED ISOlat1 'icirc': chr(238), # latin small letter i with circumflex, U+00EE ISOlat1 'iexcl': chr(161), # inverted exclamation mark, U+00A1 ISOnum 'igrave': chr(236), # latin small letter i with grave, U+00EC ISOlat1 'image': 'ℑ', # blackletter capital I = imaginary part, U+2111 ISOamso 'infin': '∞', # infinity, U+221E ISOtech 'int': '∫', # integral, U+222B ISOtech 'iota': 'ι', # greek small letter iota, U+03B9 ISOgrk3 'iquest': chr(191), # inverted question mark = turned question mark, U+00BF ISOnum 'isin': '∈', # element of, U+2208 ISOtech 'iuml': chr(239), # latin small letter i with diaeresis, U+00EF ISOlat1 'kappa': 'κ', # greek small letter kappa, U+03BA ISOgrk3 'lArr': '⇐', # leftwards double arrow, U+21D0 ISOtech 'lambda': 'λ', # greek small letter lambda, U+03BB ISOgrk3 'lang': '〈', # left-pointing angle bracket = bra, U+2329 ISOtech 'laquo': chr(171), # left-pointing double angle quotation mark = left pointing guillemet, U+00AB ISOnum 'larr': '←', # leftwards arrow, U+2190 ISOnum 'lceil': '⌈', # left ceiling = apl upstile, U+2308 ISOamsc 'ldquo': '“', # left double quotation mark, U+201C ISOnum 'le': '≤', # less-than or equal to, U+2264 ISOtech 'lfloor': '⌊', # left floor = apl downstile, U+230A ISOamsc 'lowast': '∗', # asterisk operator, U+2217 ISOtech 'loz': '◊', # lozenge, U+25CA ISOpub 'lrm': '‎', # left-to-right mark, U+200E NEW RFC 2070 'lsaquo': '‹', # single left-pointing angle quotation mark, U+2039 ISO proposed 'lsquo': '‘', # left single quotation mark, U+2018 ISOnum 'macr': chr(175), # macron = spacing macron = overline = APL overbar, U+00AF ISOdia 'mdash': '—', # em dash, U+2014 ISOpub 'micro': chr(181), # micro sign, U+00B5 ISOnum 'middot': chr(183), # middle dot = Georgian comma = Greek middle dot, U+00B7 ISOnum 'minus': '−', # minus sign, U+2212 ISOtech 'mu': 'μ', # greek small letter mu, U+03BC ISOgrk3 'nabla': '∇', # nabla = backward difference, U+2207 ISOtech 'nbsp': chr(160), # no-break space = non-breaking space, U+00A0 ISOnum 'ndash': '–', # en dash, U+2013 ISOpub 'ne': '≠', # not equal to, U+2260 ISOtech 'ni': '∋', # contains as member, U+220B ISOtech 'not': chr(172), # not sign, U+00AC ISOnum 'notin': '∉', # not an element of, U+2209 ISOtech 'nsub': '⊄', # not a subset of, U+2284 ISOamsn 'ntilde': chr(241), # latin small letter n with tilde, U+00F1 ISOlat1 'nu': 'ν', # greek small letter nu, U+03BD ISOgrk3 'oacute': chr(243), # latin small letter o with acute, U+00F3 ISOlat1 'ocirc': chr(244), # latin small letter o with circumflex, U+00F4 ISOlat1 'ograve': chr(242), # latin small letter o with grave, U+00F2 ISOlat1 'oline': '‾', # overline = spacing overscore, U+203E NEW 'omega': 'ω', # greek small letter omega, U+03C9 ISOgrk3 'omicron': 'ο', # greek small letter omicron, U+03BF NEW 'oplus': '⊕', # circled plus = direct sum, U+2295 ISOamsb 'or': '∨', # logical or = vee, U+2228 ISOtech 'ordf': chr(170), # feminine ordinal indicator, U+00AA ISOnum 'ordm': chr(186), # masculine ordinal indicator, U+00BA ISOnum 'oslash': chr(248), # latin small letter o with stroke, = latin small letter o slash, U+00F8 ISOlat1 'otilde': chr(245), # latin small letter o with tilde, U+00F5 ISOlat1 'otimes': '⊗', # circled times = vector product, U+2297 ISOamsb 'ouml': chr(246), # latin small letter o with diaeresis, U+00F6 ISOlat1 'para': chr(182), # pilcrow sign = paragraph sign, U+00B6 ISOnum 'part': '∂', # partial differential, U+2202 ISOtech 'permil': '‰', # per mille sign, U+2030 ISOtech 'perp': '⊥', # up tack = orthogonal to = perpendicular, U+22A5 ISOtech 'phi': 'φ', # greek small letter phi, U+03C6 ISOgrk3 'pi': 'π', # greek small letter pi, U+03C0 ISOgrk3 'piv': 'ϖ', # greek pi symbol, U+03D6 ISOgrk3 'plusmn': chr(177), # plus-minus sign = plus-or-minus sign, U+00B1 ISOnum 'pound': chr(163), # pound sign, U+00A3 ISOnum 'prime': '′', # prime = minutes = feet, U+2032 ISOtech 'prod': '∏', # n-ary product = product sign, U+220F ISOamsb 'prop': '∝', # proportional to, U+221D ISOtech 'psi': 'ψ', # greek small letter psi, U+03C8 ISOgrk3 'rArr': '⇒', # rightwards double arrow, U+21D2 ISOtech 'radic': '√', # square root = radical sign, U+221A ISOtech 'rang': '〉', # right-pointing angle bracket = ket, U+232A ISOtech 'raquo': chr(187), # right-pointing double angle quotation mark = right pointing guillemet, U+00BB ISOnum 'rarr': '→', # rightwards arrow, U+2192 ISOnum 'rceil': '⌉', # right ceiling, U+2309 ISOamsc 'rdquo': '”', # right double quotation mark, U+201D ISOnum 'real': 'ℜ', # blackletter capital R = real part symbol, U+211C ISOamso 'reg': chr(174), # registered sign = registered trade mark sign, U+00AE ISOnum 'rfloor': '⌋', # right floor, U+230B ISOamsc 'rho': 'ρ', # greek small letter rho, U+03C1 ISOgrk3 'rlm': '‏', # right-to-left mark, U+200F NEW RFC 2070 'rsaquo': '›', # single right-pointing angle quotation mark, U+203A ISO proposed 'rsquo': '’', # right single quotation mark, U+2019 ISOnum 'sbquo': '‚', # single low-9 quotation mark, U+201A NEW 'sdot': '⋅', # dot operator, U+22C5 ISOamsb 'sect': chr(167), # section sign, U+00A7 ISOnum 'shy': chr(173), # soft hyphen = discretionary hyphen, U+00AD ISOnum 'sigma': 'σ', # greek small letter sigma, U+03C3 ISOgrk3 'sigmaf': 'ς', # greek small letter final sigma, U+03C2 ISOgrk3 'sim': '∼', # tilde operator = varies with = similar to, U+223C ISOtech 'spades': '♠', # black spade suit, U+2660 ISOpub 'sub': '⊂', # subset of, U+2282 ISOtech 'sube': '⊆', # subset of or equal to, U+2286 ISOtech 'sum': '∑', # n-ary sumation, U+2211 ISOamsb 'sup': '⊃', # superset of, U+2283 ISOtech 'sup1': chr(185), # superscript one = superscript digit one, U+00B9 ISOnum 'sup2': chr(178), # superscript two = superscript digit two = squared, U+00B2 ISOnum 'sup3': chr(179), # superscript three = superscript digit three = cubed, U+00B3 ISOnum 'supe': '⊇', # superset of or equal to, U+2287 ISOtech 'szlig': chr(223), # latin small letter sharp s = ess-zed, U+00DF ISOlat1 'tau': 'τ', # greek small letter tau, U+03C4 ISOgrk3 'there4': '∴', # therefore, U+2234 ISOtech 'theta': 'θ', # greek small letter theta, U+03B8 ISOgrk3 'thetasym': 'ϑ', # greek small letter theta symbol, U+03D1 NEW 'thinsp': ' ', # thin space, U+2009 ISOpub 'thorn': chr(254), # latin small letter thorn with, U+00FE ISOlat1 'times': chr(215), # multiplication sign, U+00D7 ISOnum 'trade': '™', # trade mark sign, U+2122 ISOnum 'uArr': '⇑', # upwards double arrow, U+21D1 ISOamsa 'uacute': chr(250), # latin small letter u with acute, U+00FA ISOlat1 'uarr': '↑', # upwards arrow, U+2191 ISOnum 'ucirc': chr(251), # latin small letter u with circumflex, U+00FB ISOlat1 'ugrave': chr(249), # latin small letter u with grave, U+00F9 ISOlat1 'uml': chr(168), # diaeresis = spacing diaeresis, U+00A8 ISOdia 'upsih': 'ϒ', # greek upsilon with hook symbol, U+03D2 NEW 'upsilon': 'υ', # greek small letter upsilon, U+03C5 ISOgrk3 'uuml': chr(252), # latin small letter u with diaeresis, U+00FC ISOlat1 'weierp': '℘', # script capital P = power set = Weierstrass p, U+2118 ISOamso 'xi': 'ξ', # greek small letter xi, U+03BE ISOgrk3 'yacute': chr(253), # latin small letter y with acute, U+00FD ISOlat1 'yen': chr(165), # yen sign = yuan sign, U+00A5 ISOnum 'yuml': chr(255), # latin small letter y with diaeresis, U+00FF ISOlat1 'zeta': 'ζ', # greek small letter zeta, U+03B6 ISOgrk3 'zwj': '‍', # zero width joiner, U+200D NEW RFC 2070 'zwnj': '‌', # zero width non-joiner, U+200C NEW RFC 2070 } --------------3B4AC9E96FE0666068F893B2-- From tim_one@email.msn.com Tue Aug 17 08:30:17 1999 From: tim_one@email.msn.com (Tim Peters) Date: Tue, 17 Aug 1999 03:30:17 -0400 Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: <37B8042B.21DE6053@lemburg.com> Message-ID: <000001bee882$5b7d8da0$112d2399@tim> [about weakdicts and the possibility of building them on weak references; the obvious way doesn't clean up the dict itself by magic; maybe a weak object should be notified when its referent goes away ] [M.-A. Lemburg] > Perhaps one could fiddle something out of the Proxy objects > in mxProxy (you know where...). These support a special __cleanup__ > protocol that I use a lot to work around circular garbage: > the __cleanup__ method of the referenced object is called prior > to destroying the proxy; even if the reference count on the > object has not yet gone down to 0. > > This makes direct circles possible without problems: the parent > can reference a child through the proxy and the child can reference the > parent directly. What you just wrote is: parent --> proxy --> child -->+ ^ v +<----------------------------+ Looks like a plain old cycle to me! > As soon as the parent is cleaned up, the reference to > the proxy is deleted which then automagically makes the > back reference in the child disappear, allowing the parent > to be deallocated after cleanup without leaving a circular > reference around. M-A, this is making less sense by the paragraph : skipping the middle, this says "as soon as the parent is cleaned up ... allowing the parent to be deallocated after cleanup". If we presume that the parent gets cleaned up explicitly (since the reference from the child is keeping it alive, it's not going to get cleaned up by magic, right?), then the parent could just as well call the __cleanup__ methods of the things it references directly without bothering with a proxy. For that matter, if it's the straightforward parent <-> child kind of cycle, the parent's cleanup method can just do self.__dict__.clear() and the cycle is broken without writing a __cleanup__ method anywhere (that's what I usually do, and in this kind of cycle that clears the last reference to the child, which then goes away, which in turn automagically clears its back reference to the parent). So, offhand, I don't see that the proxy protocol could help here. In a sense, what's really needed is the opposite: notifying the *proxy* when the *real* object goes away (which makes no sense in the context of what your proxy objects were designed to do). [about Java and its four reference strengths] Found a good introductory writeup at (sorry, my mailer will break this URL, so I'll break it myself at a sensible place): http://developer.java.sun.com/developer/ technicalArticles//ALT/RefObj/index.html They have a class for each of the three "not strong" flavors of references. For all three you pass the referenced object to the constructor, and all three accept (optional in two of the flavors) a second ReferenceQueue argument. In the latter case, when the referenced object goes away the weak/soft/phantom-ref proxy object is placed on the queue. Which, in turn, is a thread-safe queue with various put, get, and timeout-limited polling functions. So you have to write code to look at the queue from time to time, to find the proxies whose referents have gone away. The three flavors may (or may not ...) have these motivations: soft: an object reachable at strongest by soft references can go away at any time, but the garbage collector strives to keep it intact until it can't find any other way to get enough memory weak: an object reachable at strongest by weak references can go away at any time, and the collector makes no attempt to delay its death phantom: an object reachable at strongest by phantom references can get *finalized* at any time, but won't get *deallocated* before its phantom proxy does something or other (goes away? wasn't clear). This is the flavor that requires passing a queue argument to the constructor. Seems to be a major hack to worm around Java's notorious problems with order of finalization -- along the lines that you give phantom referents trivial finalizers, and put the real cleanup logic in the phantom proxy. This lets your program take responsibility for running the real cleanup code in the order-- and in the thread! --where it makes sense. Java 1.2 *also* tosses in a WeakHashMap class, which is a dict with under-the-cover weak keys (unlike Dieter's flavor with weak values), and where the key+value pairs vanish by magic when the key object goes away. The details and the implementation of these guys waren't clear to me, but then I didn't download the code, just scanned the online docs. Ah, a correction to my last post: class _Weak: ... def __del__(self): # this is purely an optimization: if self gets nuked, # exempt its referent from greater expense when *it* # dies if self.id is not None: __clear_weak_bit(__id2obj(self.id)) del id2weak[self.id] Root of all evil: this method is useless, since the id2weak dict keeps each _Weak object alive until its referent goes away (at which time self.id gets set to None, so _Weak.__del__ doesn't do anything). Even if it did do something, it's no cheaper to do it here than in the systemt cleanup code ("greater expense" was wrong). weakly y'rs - tim PS: Ooh! Ooh! Fellow at work today was whining about weakdicts, and called them "limp dicts". I'm not entirely sure it was an innocent Freudian slut, but it's a funny pun even if it wasn't (for you foreigners, it sounds like American slang for "flaccid one-eyed trouser snake" ...). From fredrik@pythonware.com Tue Aug 17 08:23:03 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 17 Aug 1999 09:23:03 +0200 Subject: [Python-Dev] buffer interface considered harmful References: Message-ID: <00c201bee884$42a10ad0$f29b12c2@secret.pythonware.com> David Ascher wrote: > Why? PIL was designed for image processing, and made design decisions > appropriate to that domain. NumPy was designed for multidimensional > numeric array processing, and made design decisions appropriate to that > domain. The intersection of interests exists (e.g. in the medical imaging > world), and I know people who spend a lot of their CPU time moving data > between images and arrays with "stupid" tostring/fromstring operations. > Given the size of the images, it's a prodigious waste of time, and kills > the use of Python in many a project. as an aside, PIL 1.1 (*) introduces "virtual image memories" which are, as I mentioned in an earlier post, accessed via an API rather than via direct pointers. it'll also include an adapter allowing you to use NumPy objects as image memories. unfortunately, the buffer interface is not good enough to use on top of the virtual image memory interface... *) 1.1 is our current development thread, which will be released to plus customers in a number of weeks... From mal@lemburg.com Tue Aug 17 09:50:01 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 17 Aug 1999 10:50:01 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <000001bee882$5b7d8da0$112d2399@tim> Message-ID: <37B92239.4076841E@lemburg.com> Tim Peters wrote: > > [about weakdicts and the possibility of building them on weak > references; the obvious way doesn't clean up the dict itself by > magic; maybe a weak object should be notified when its referent > goes away > ] > > [M.-A. Lemburg] > > Perhaps one could fiddle something out of the Proxy objects > > in mxProxy (you know where...). These support a special __cleanup__ > > protocol that I use a lot to work around circular garbage: > > the __cleanup__ method of the referenced object is called prior > > to destroying the proxy; even if the reference count on the > > object has not yet gone down to 0. > > > > This makes direct circles possible without problems: the parent > > can reference a child through the proxy and the child can reference the > > parent directly. > > What you just wrote is: > > parent --> proxy --> child -->+ > ^ v > +<----------------------------+ > > Looks like a plain old cycle to me! Sure :-) That was the intention. I'm using this to implement acquisition without turning to ExtensionClasses. [Nice picture, BTW] > > As soon as the parent is cleaned up, the reference to > > the proxy is deleted which then automagically makes the > > back reference in the child disappear, allowing the parent > > to be deallocated after cleanup without leaving a circular > > reference around. > > M-A, this is making less sense by the paragraph : skipping the > middle, this says "as soon as the parent is cleaned up ... allowing the > parent to be deallocated after cleanup". If we presume that the parent gets > cleaned up explicitly (since the reference from the child is keeping it > alive, it's not going to get cleaned up by magic, right?), then the parent > could just as well call the __cleanup__ methods of the things it references > directly without bothering with a proxy. For that matter, if it's the > straightforward > > parent <-> child > > kind of cycle, the parent's cleanup method can just do > > self.__dict__.clear() > > and the cycle is broken without writing a __cleanup__ method anywhere > (that's what I usually do, and in this kind of cycle that clears the last > reference to the child, which then goes away, which in turn automagically > clears its back reference to the parent). > > So, offhand, I don't see that the proxy protocol could help here. In a > sense, what's really needed is the opposite: notifying the *proxy* when the > *real* object goes away (which makes no sense in the context of what your > proxy objects were designed to do). All true :-). The nice thing about the proxy is that it takes care of the process automagically. And yes, the parent is used via a proxy too. So the picture looks like this: --> proxy --> parent --> proxy --> child -->+ ^ v +<----------------------------+ Since the proxy isn't noticed by the referencing objects (well, at least if they don't fiddle with internals), the picture for the objects looks like this: --> parent --> child -->+ ^ v +<------------------+ You could of course do the same via explicit invokation of the __cleanup__ method, but the object references involved could be hidden in some other structure, so they might be hard to find. And there's another feature about Proxies (as defined in mxProxy): they allow you to control access in a much more strict way than Python does. You can actually hide attributes and methods you don't want exposed in a way that doesn't even let you access them via some dict or pass me the frame object trick. This is very useful when you program multi-user application host servers where you don't want users to access internal structures of the server. > [about Java and its four reference strengths] > > Found a good introductory writeup at (sorry, my mailer will break this URL, > so I'll break it myself at a sensible place): > > http://developer.java.sun.com/developer/ > technicalArticles//ALT/RefObj/index.html Thanks for the reference... and for the summary ;-) > They have a class for each of the three "not strong" flavors of references. > For all three you pass the referenced object to the constructor, and all > three accept (optional in two of the flavors) a second ReferenceQueue > argument. In the latter case, when the referenced object goes away the > weak/soft/phantom-ref proxy object is placed on the queue. Which, in turn, > is a thread-safe queue with various put, get, and timeout-limited polling > functions. So you have to write code to look at the queue from time to > time, to find the proxies whose referents have gone away. > > The three flavors may (or may not ...) have these motivations: > > soft: an object reachable at strongest by soft references can go away at > any time, but the garbage collector strives to keep it intact until it can't > find any other way to get enough memory So there is a possibility of reviving these objects, right ? I've just recently added a hackish function to my mxTools which allows me to regain access to objects via their address (no, not thread safe, not even necessarily correct). sys.makeref(id) Provided that id is a valid address of a Python object (id(object) returns this address), this function returns a new reference to it. Only objects that are "alive" can be referenced this way, ones with zero reference count cause an exception to be raised. You can use this function to reaccess objects lost during garbage collection. USE WITH CARE: this is an expert-only function since it can cause instant core dumps and many other strange things -- even ruin your system if you don't know what you're doing ! SECURITY WARNING: This function can provide you with access to objects that are otherwise not visible, e.g. in restricted mode, and thus be a potential security hole. I use it for tracking objects via id-key based dictionary and hooks in the create/del mechanisms of Python instances. It helps finding those memory eating cycles. > weak: an object reachable at strongest by weak references can go away at > any time, and the collector makes no attempt to delay its death > > phantom: an object reachable at strongest by phantom references can get > *finalized* at any time, but won't get *deallocated* before its phantom > proxy does something or other (goes away? wasn't clear). This is the flavor > that requires passing a queue argument to the constructor. Seems to be a > major hack to worm around Java's notorious problems with order of > finalization -- along the lines that you give phantom referents trivial > finalizers, and put the real cleanup logic in the phantom proxy. This lets > your program take responsibility for running the real cleanup code in the > order-- and in the thread! --where it makes sense. Wouldn't these flavors be possible using the following setup ? Note that it's quite similar to your _Weak class except that I use a proxy without the need to first get a strong reference for the object and that it doesn't use a weak bit. --> proxy --> object ^ | all_managed_objects all_managed_objects is a dictionary indexed by address (its id) and keeps a strong reference to the objects. The proxy does not keep a strong reference to the object, but only the address as integer and checks the ref-count on the object in the all_managed_objects dictionary prior to every dereferencing action. In case this refcount falls down to 1 (only the all_managed_objects dict references it), the proxy takes appropriate action, e.g. raises an exceptions and deletes the reference in all_managed_objects to mimic a weak reference. The same check is done prior to garbage collection of the proxy. Add to this some queues, pepper and salt and place it in an oven at 220° for 20 minutes... plus take a look every 10 seconds or so... The downside is obvious: the zombified object will not get inspected (and then GCed) until the next time a weak reference to it is used. > Java 1.2 *also* tosses in a WeakHashMap class, which is a dict with > under-the-cover weak keys (unlike Dieter's flavor with weak values), and > where the key+value pairs vanish by magic when the key object goes away. > The details and the implementation of these guys waren't clear to me, but > then I didn't download the code, just scanned the online docs. Would the above help in creating such beasts ? > Ah, a correction to my last post: > > class _Weak: > ... > def __del__(self): > # this is purely an optimization: if self gets nuked, > # exempt its referent from greater expense when *it* > # dies > if self.id is not None: > __clear_weak_bit(__id2obj(self.id)) > del id2weak[self.id] > > Root of all evil: this method is useless, since the id2weak dict keeps each > _Weak object alive until its referent goes away (at which time self.id gets > set to None, so _Weak.__del__ doesn't do anything). Even if it did do > something, it's no cheaper to do it here than in the systemt cleanup code > ("greater expense" was wrong). > > weakly y'rs - tim > > PS: Ooh! Ooh! Fellow at work today was whining about weakdicts, and > called them "limp dicts". I'm not entirely sure it was an innocent Freudian > slut, but it's a funny pun even if it wasn't (for you foreigners, it sounds > like American slang for "flaccid one-eyed trouser snake" ...). :-) -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 136 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mhammond@skippinet.com.au Tue Aug 17 17:05:40 1999 From: mhammond@skippinet.com.au (Mark Hammond) Date: Wed, 18 Aug 1999 02:05:40 +1000 Subject: [Python-Dev] buffer interface considered harmful In-Reply-To: <00c201bee884$42a10ad0$f29b12c2@secret.pythonware.com> Message-ID: <000901bee8ca$5ceff4a0$1101a8c0@bobcat> Fredrik, Care to elaborate? Statements like "buffer interface needs a redesign" or "the buffer interface is not good enough to use on top of the virtual image memory interface" really only give me the impression you have a bee in your bonnet over these buffer interfaces. If you could actually stretch these statements out to provide even _some_ background, problem statement or potential solution it would help. All I know is "Fredrik doesnt like it for some unexplained reason". You found an issue with array reallocation - great - but thats a bug rather than a design flaw. Can you tell us why its not good enough, and an off-the-cuff design that would solve it? Or are you suggesting it is unsolvable? I really dont have a clue what your issue is. Jim (for example) has made his position and reasoning clear. You have only made your position clear, but your reasoning is still a mystery. Mark. > > unfortunately, the buffer interface is not good enough to use > on top of the virtual image memory interface... From fredrik@pythonware.com Tue Aug 17 17:48:31 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 17 Aug 1999 18:48:31 +0200 Subject: [Python-Dev] buffer interface considered harmful References: <000901bee8ca$5ceff4a0$1101a8c0@bobcat> Message-ID: <005201bee8d0$9b4737d0$f29b12c2@secret.pythonware.com> > Care to elaborate? Statements like "buffer interface needs a redesign" or > "the buffer interface is not good enough to use on top of the virtual image > memory interface" really only give me the impression you have a bee in your > bonnet over these buffer interfaces. re "good enough": http://www.python.org/pipermail/python-dev/1999-August/000650.html re "needs a redesign": http://www.python.org/pipermail/python-dev/1999-August/000659.html and to some extent: http://www.python.org/pipermail/python-dev/1999-August/000658.html > Jim (for example) has made his position and reasoning clear. among other things, Jim said: "At this point, I don't have a good idea what buffers are for and I don't see alot of evidence that there *is* a design. I assume that there was a design, but I can't see it". which pretty much echoes my concerns in: http://www.python.org/pipermail/python-dev/1999-August/000612.html http://www.python.org/pipermail/python-dev/1999-August/000648.html > You found an issue with array reallocation - great - but thats > a bug rather than a design flaw. for me, that bug (and the marshal glitch) indicates that the design isn't as chrystal-clear as it needs to be, for such a fundamental feature. otherwise, Greg would never have made that mistake, and Guido would have spotted it when he added the "buffer" built-in... so what are you folks waiting for? could someone who thinks he understands exactly what this thing is spend an hour on writing that design document, so me and Jim can put this entire thing behind us? PS. btw, was it luck or careful analysis behind the decision to make buffer() always return read-only buffers, also for objects implementing the read/write protocol? From da@ski.org Tue Aug 17 23:41:14 1999 From: da@ski.org (David Ascher) Date: Tue, 17 Aug 1999 15:41:14 -0700 (Pacific Daylight Time) Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: <19990816094243.3CE83303120@snelboot.oratrix.nl> Message-ID: On Mon, 16 Aug 1999, Jack Jansen wrote: > Would adding a buffer interface to cobject solve your problem? Cobject is > described as being used for passing C objects between Python modules, but I've > always thought of it as passing C objects from one C routine to another C > routine through Python, which doesn't necessarily understand what the object > is all about. > > That latter description seems to fit your bill quite nicely. It's an interesting idea, but it wouldn't do as it is, as I'd need the ability to create a CObject given a memory location and a size. Also, I am not expected to free() the memory, which would happen when the CObject got GC'ed. (BTW: I am *not* arguing that PyBuffer_FromReadWriteMemory() should be exposed by default. I'm happy with exposing it in my little extension module for my exotic needs.) --david From mal@lemburg.com Wed Aug 18 10:02:02 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 18 Aug 1999 11:02:02 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <000001bee882$5b7d8da0$112d2399@tim> <37B92239.4076841E@lemburg.com> Message-ID: <37BA768A.50DF5574@lemburg.com> [about weakdicts and the possibility of building them on weak references; the obvious way doesn't clean up the dict itself by magic; maybe a weak object should be notified when its referent goes away ] Here is a new version of my Proxy package which includes a self managing weak reference mechanism without the need to add extra bits or bytes to all Python objects: http://starship.skyport.net/~lemburg/mxProxy-pre0.2.0.zip The docs and an explanation of how the thingie works are included in the archive's Doc subdir. Basically it builds upon the idea I posted earlier on on this thread -- with a few extra kicks to get it right in the end ;-) Usage is pretty simple: from Proxy import WeakProxy object = [] wr = WeakProxy(object) wr.append(8) del object >>> wr[0] Traceback (innermost last): File "", line 1, in ? mxProxy.LostReferenceError: object already garbage collected I have checked the ref counts pretty thoroughly, but before going public I would like the Python-Dev crowd to run some tests as well: after all, the point is for the weak references to be weak and that's sometimes a bit hard to check. Hope you have as much fun with it as I had writing it ;-) Ah yes, for the raw details have a look at the code. The code uses a list of back references to the weak Proxies and notifies them when the object goes away... would it be useful to add a hook to the Proxies so that they can apply some other action as well ? -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 135 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Vladimir.Marangozov@inrialpes.fr Wed Aug 18 12:42:08 1999 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Wed, 18 Aug 1999 12:42:08 +0100 (NFT) Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: <37BA768A.50DF5574@lemburg.com> from "M.-A. Lemburg" at "Aug 18, 99 11:02:02 am" Message-ID: <199908181142.MAA22596@pukapuka.inrialpes.fr> M.-A. Lemburg wrote: > > Usage is pretty simple: > > from Proxy import WeakProxy > object = [] > wr = WeakProxy(object) > wr.append(8) > del object > > >>> wr[0] > Traceback (innermost last): > File "", line 1, in ? > mxProxy.LostReferenceError: object already garbage collected > > I have checked the ref counts pretty thoroughly, but before > going public I would like the Python-Dev crowd to run some > tests as well: after all, the point is for the weak references > to be weak and that's sometimes a bit hard to check. It's even harder to implement them without side effects. I used the same hack for the __heirs__ class attribute some time ago. But I knew that a parent class cannot be garbage collected before all of its descendants. That allowed me to keep weak refs in the parent class, and preserve the existing strong refs in the subclasses. On every dealloc of a subclass, the corresponding weak ref in the parent class' __heirs__ is removed. In your case, the lifetime of the objects cannot be predicted, so implementing weak refs by messing with refcounts or checking mem pointers is a dead end. I don't know whether this is the case with mxProxy as I just browsed the code quickly, but here's a scenario where your scheme (or implementation) is not working: >>> from Proxy import WeakProxy >>> o = [] >>> p = WeakProxy(o) >>> d = WeakProxy(o) >>> p >>> d >>> print p [] >>> print d [] >>> del o >>> p >>> d >>> print p Illegal instruction (core dumped) -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From jack@oratrix.nl Wed Aug 18 12:02:13 1999 From: jack@oratrix.nl (Jack Jansen) Date: Wed, 18 Aug 1999 13:02:13 +0200 Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: Message by "M.-A. Lemburg" , Wed, 18 Aug 1999 11:02:02 +0200 , <37BA768A.50DF5574@lemburg.com> Message-ID: <19990818110213.A558F303120@snelboot.oratrix.nl> The one thing I'm not thrilled by in mxProxy is that a call to CheckWeakReferences() is needed before an object is cleaned up. I guess this boils down to the same problem I had with my weak reference scheme: you somehow want the Python core to tell the proxy stuff that the object can be cleaned up (although the details are different: in my scheme this would be triggered by refcount==0 and in mxProxy by refcount==1). And because objects are created and destroyed in Python at a tremendous rate you don't want to do this call for every object, only if you have a hint that the object has a weak reference (or a proxy). -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mal@lemburg.com Wed Aug 18 12:46:45 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 18 Aug 1999 13:46:45 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <19990818110213.A558F303120@snelboot.oratrix.nl> Message-ID: <37BA9D25.95E46EA@lemburg.com> Jack Jansen wrote: > > The one thing I'm not thrilled by in mxProxy is that a call to > CheckWeakReferences() is needed before an object is cleaned up. I guess this > boils down to the same problem I had with my weak reference scheme: you > somehow want the Python core to tell the proxy stuff that the object can be > cleaned up (although the details are different: in my scheme this would be > triggered by refcount==0 and in mxProxy by refcount==1). And because objects > are created and destroyed in Python at a tremendous rate you don't want to do > this call for every object, only if you have a hint that the object has a weak > reference (or a proxy). Well, the check is done prior to every action using a proxy to the object and also when a proxy to it is deallocated. The addition checkweakrefs() API is only included to enable additional explicit checking of the whole weak refs dictionary, e.g. every 10 seconds or so (just like you would with a mark&sweep GC). But yes, GC of the phantom object is delayed a bit depending on how you set up the proxies. Still, I think most usages won't have this problem, since the proxies themselves are usually temporary objects. It may sometimes even make sense to have the phantom object around as long as possible, e.g. to implement the soft references Tim quoted from the Java paper. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 135 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Wed Aug 18 12:33:18 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 18 Aug 1999 13:33:18 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <199908181142.MAA22596@pukapuka.inrialpes.fr> Message-ID: <37BA99FE.45D582AD@lemburg.com> Vladimir Marangozov wrote: > > M.-A. Lemburg wrote: > > I have checked the ref counts pretty thoroughly, but before > > going public I would like the Python-Dev crowd to run some > > tests as well: after all, the point is for the weak references > > to be weak and that's sometimes a bit hard to check. > > It's even harder to implement them without side effects. I used > the same hack for the __heirs__ class attribute some time ago. > But I knew that a parent class cannot be garbage collected before > all of its descendants. That allowed me to keep weak refs in > the parent class, and preserve the existing strong refs in the > subclasses. On every dealloc of a subclass, the corresponding > weak ref in the parent class' __heirs__ is removed. > > In your case, the lifetime of the objects cannot be predicted, > so implementing weak refs by messing with refcounts or checking > mem pointers is a dead end. > I don't know whether this is the > case with mxProxy as I just browsed the code quickly, but here's > a scenario where your scheme (or implementation) is not working: > > >>> from Proxy import WeakProxy > >>> o = [] > >>> p = WeakProxy(o) > >>> d = WeakProxy(o) > >>> p > > >>> d > > >>> print p > [] > >>> print d > [] > >>> del o > >>> p > > >>> d > > >>> print p > Illegal instruction (core dumped) Could you tell me where the core dump originates ? Also, it would help to compile the package with the -DMAL_DEBUG switch turned on (edit Setup) and then run the same things using 'python -d'. The package will then print a pretty complete list of things it is doing to mxProxy.log, which would help track down errors like these. BTW, I get: >>> print p Traceback (innermost last): File "", line 1, in ? mxProxy.LostReferenceError: object already garbage collected >>> [Don't know why the print statement prints an empty line, though.] Thanks for trying it, -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 135 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Vladimir.Marangozov@inrialpes.fr Wed Aug 18 14:12:14 1999 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Wed, 18 Aug 1999 14:12:14 +0100 (NFT) Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: <37BA99FE.45D582AD@lemburg.com> from "M.-A. Lemburg" at "Aug 18, 99 01:33:18 pm" Message-ID: <199908181312.OAA20542@pukapuka.inrialpes.fr> [about mxProxy, WeakProxy] M.-A. Lemburg wrote: > > Could you tell me where the core dump originates ? Also, it would > help to compile the package with the -DMAL_DEBUG switch turned > on (edit Setup) and then run the same things using 'python -d'. > The package will then print a pretty complete list of things it > is doing to mxProxy.log, which would help track down errors like > these. > > BTW, I get: > >>> print p > > Traceback (innermost last): > File "", line 1, in ? > mxProxy.LostReferenceError: object already garbage collected > >>> > > [Don't know why the print statement prints an empty line, though.] > The previous example now *seems* to work fine in a freshly launched interpreter, so it's not a good example, but this shorter one definitely doesn't: >>> from Proxy import WeakProxy >>> o = [] >>> p = q = WeakProxy(o) >>> p = q = WeakProxy(o) >>> del o >>> print p or q Illegal instruction (core dumped) Or even shorter: >>> from Proxy import WeakProxy >>> o = [] >>> p = q = WeakProxy(o) >>> p = WeakProxy(o) >>> del o >>> print p Illegal instruction (core dumped) It crashes in PyDict_DelItem() called from mxProxy_CollectWeakReference(). I can mail you a complete trace in private, if you still need it. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal@lemburg.com Wed Aug 18 13:50:08 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 18 Aug 1999 14:50:08 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <199908181312.OAA20542@pukapuka.inrialpes.fr> Message-ID: <37BAAC00.27A34FF7@lemburg.com> Vladimir Marangozov wrote: > > [about mxProxy, WeakProxy] > > M.-A. Lemburg wrote: > > > > Could you tell me where the core dump originates ? Also, it would > > help to compile the package with the -DMAL_DEBUG switch turned > > on (edit Setup) and then run the same things using 'python -d'. > > The package will then print a pretty complete list of things it > > is doing to mxProxy.log, which would help track down errors like > > these. > > > > BTW, I get: > > >>> print p > > > > Traceback (innermost last): > > File "", line 1, in ? > > mxProxy.LostReferenceError: object already garbage collected > > >>> > > > > [Don't know why the print statement prints an empty line, though.] > > > > The previous example now *seems* to work fine in a freshly launched > interpreter, so it's not a good example, but this shorter one > definitely doesn't: > > >>> from Proxy import WeakProxy > >>> o = [] > >>> p = q = WeakProxy(o) > >>> p = q = WeakProxy(o) > >>> del o > >>> print p or q > Illegal instruction (core dumped) > > It crashes in PyDict_DelItem() called from mxProxy_CollectWeakReference(). > I can mail you a complete trace in private, if you still need it. That would be nice (please also include the log-file), because I get: >>> print p or q Traceback (innermost last): File "", line 1, in ? mxProxy.LostReferenceError: object already garbage collected >>> Thank you, -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 135 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From skip@mojam.com (Skip Montanaro) Wed Aug 18 15:47:23 1999 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 18 Aug 1999 09:47:23 -0500 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart Message-ID: <199908181447.JAA05151@dolphin.mojam.com> I posted a note to the main list yesterday in response to Dan Connolly's complaint that the os module isn't very portable. I saw no followups (it's amazing how fast a thread can die out :-), but I think it's a reasonable idea, perhaps for Python 2.0, so I'll repeat it here to get some feedback from people more interesting in long-term Python developments. The basic premise is that for each platform on which Python runs there are portable and nonportable interfaces to the underlying operating system. The term POSIX has some portability connotations, so let's assume that the posix module exposes the portable subset of the OS interface. To keep things simple, let's also assume there are only three supported general OS platforms: unix, nt and mac. The proposal then is that importing the platform's module by name will import both the portable and non-portable interface elements. Importing the posix module will import just that portion of the interface that is truly portable across all platforms. To add new functionality to the posix interface it would have to be added to all three platforms. The posix module will be able to ferret out the platform it is running on and import the correct OS-independent posix implementation: import sys _plat = sys.platform del sys if _plat == "mac": from posixmac import * elif _plat == "nt": from posixnt import * else: from posixunix import * # some unix variant The platform-dependent module would simply import everything it could, e.g.: from posixunix import * from nonposixunix import * The os module would vanish or be deprecated with its current behavior intact. The documentation would be modified so that the posix module documents the portable interface and the OS-dependent module's documentation documents the rest and just refers users to the posix module docs for the portable stuff. In theory, this could be done for 1.6, however as I've proposed it, the semantics of importing the posix module would change. Dan Connolly probably isn't going to have a problem with that, though I suppose Guido might... If this idea is good enough for 1.6, perhaps we leave os and posix module semantics alone and add a module named "portable", "portableos" or "portableposix" or something equally arcane. Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/~skip/ 847-971-7098 From guido@CNRI.Reston.VA.US Wed Aug 18 15:54:28 1999 From: guido@CNRI.Reston.VA.US (Guido van Rossum) Date: Wed, 18 Aug 1999 10:54:28 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: Your message of "Wed, 18 Aug 1999 09:47:23 CDT." <199908181447.JAA05151@dolphin.mojam.com> References: <199908181447.JAA05151@dolphin.mojam.com> Message-ID: <199908181454.KAA07692@eric.cnri.reston.va.us> > I posted a note to the main list yesterday in response to Dan Connolly's > complaint that the os module isn't very portable. I saw no followups (it's > amazing how fast a thread can die out :-), but I think it's a reasonable > idea, perhaps for Python 2.0, so I'll repeat it here to get some feedback > from people more interesting in long-term Python developments. > > The basic premise is that for each platform on which Python runs there are > portable and nonportable interfaces to the underlying operating system. The > term POSIX has some portability connotations, so let's assume that the posix > module exposes the portable subset of the OS interface. To keep things > simple, let's also assume there are only three supported general OS > platforms: unix, nt and mac. The proposal then is that importing the > platform's module by name will import both the portable and non-portable > interface elements. Importing the posix module will import just that > portion of the interface that is truly portable across all platforms. To > add new functionality to the posix interface it would have to be added to > all three platforms. The posix module will be able to ferret out the > platform it is running on and import the correct OS-independent posix > implementation: > > import sys > _plat = sys.platform > del sys > > if _plat == "mac": from posixmac import * > elif _plat == "nt": from posixnt import * > else: from posixunix import * # some unix variant > > The platform-dependent module would simply import everything it could, e.g.: > > from posixunix import * > from nonposixunix import * > > The os module would vanish or be deprecated with its current behavior > intact. The documentation would be modified so that the posix module > documents the portable interface and the OS-dependent module's documentation > documents the rest and just refers users to the posix module docs for the > portable stuff. > > In theory, this could be done for 1.6, however as I've proposed it, the > semantics of importing the posix module would change. Dan Connolly probably > isn't going to have a problem with that, though I suppose Guido might... If > this idea is good enough for 1.6, perhaps we leave os and posix module > semantics alone and add a module named "portable", "portableos" or > "portableposix" or something equally arcane. And the advantage of this would be...? Basically, it seems you're just renaming the functionality of os to posix. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@mojam.com (Skip Montanaro) Wed Aug 18 16:10:41 1999 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 18 Aug 1999 10:10:41 -0500 (CDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <199908181454.KAA07692@eric.cnri.reston.va.us> References: <199908181447.JAA05151@dolphin.mojam.com> <199908181454.KAA07692@eric.cnri.reston.va.us> Message-ID: <14266.51743.904066.470431@dolphin.mojam.com> Guido> And the advantage of this would be...? Guido> Basically, it seems you're just renaming the functionality of os Guido> to posix. I see a few advantages. 1. We will get the meaning of the noun "posix" more or less right. Programmers coming from other languages are used to thinking of programming to a POSIX API or the "POSIX subset of the OS API". Witness all the "#ifdef _POSIX" in the header files on my Linux box In Python, the exact opposite is true. Importing the posix module is documented to be the non-portable way to interface to Unix platforms. 2. You would make it clear on all platforms when you expect to be programming in a non-portable fashion, by importing the platform-specific os (unix, nt, mac). "import unix" would mean I expect this code to only run on Unix machines. You could argue that you are declaring your non-portability by importing the posix module today, but to the casual user or to a new Python programmer with a C or C++ background, that won't be obvious. 3. If Dan Connolly's contention is correct, importing the os module today is not all that portable. I can't really say one way or the other, because I'm lucky enough to be able to confine my serious programming to Unix. I'm sure there's someone out there that can try the following on a few platforms: import os dir(os) and compare the output. Skip From jack@oratrix.nl Wed Aug 18 16:33:20 1999 From: jack@oratrix.nl (Jack Jansen) Date: Wed, 18 Aug 1999 17:33:20 +0200 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: Message by Skip Montanaro , Wed, 18 Aug 1999 09:47:23 -0500 , <199908181447.JAA05151@dolphin.mojam.com> Message-ID: <19990818153320.D61F6303120@snelboot.oratrix.nl> > The proposal then is that importing the > platform's module by name will import both the portable and non-portable > interface elements. Importing the posix module will import just that > portion of the interface that is truly portable across all platforms. There's one slight problem with this: when you use functionality that is partially portable, i.e. a call that is available on Windows and Unix but not on the Mac. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From akuchlin@mems-exchange.org Wed Aug 18 16:39:30 1999 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Wed, 18 Aug 1999 11:39:30 -0400 (EDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14266.51743.904066.470431@dolphin.mojam.com> References: <199908181447.JAA05151@dolphin.mojam.com> <199908181454.KAA07692@eric.cnri.reston.va.us> <14266.51743.904066.470431@dolphin.mojam.com> Message-ID: <14266.54194.715887.808096@amarok.cnri.reston.va.us> Skip Montanaro writes: > 2. You would make it clear on all platforms when you expect to be > programming in a non-portable fashion, by importing the > platform-specific os (unix, nt, mac). "import unix" would mean I To my mind, POSIX == Unix; other platforms may have bits of POSIX-ish functionality, but most POSIX functions will only be found on Unix systems. One of my projects for 1.6 is to go through the O'Reilly POSIX book and add all the missing calls to the posix modules. Practically none of those functions would exist on Windows or Mac. Perhaps it's really a documentation fix: the os module should document only those features common to all of the big 3 platforms (Unix, Windows, Mac), and have pointers to a section for each of the platform-specific modules, listing the platform-specific functions. -- A.M. Kuchling http://starship.python.net/crew/amk/ Setting loose on the battlefield weapons that are able to learn may be one of the biggest mistakes mankind has ever made. It could also be one of the last. -- Richard Forsyth, "Machine Learning for Expert Systems" From skip@mojam.com (Skip Montanaro) Wed Aug 18 16:52:20 1999 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 18 Aug 1999 10:52:20 -0500 (CDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14266.54194.715887.808096@amarok.cnri.reston.va.us> References: <199908181447.JAA05151@dolphin.mojam.com> <199908181454.KAA07692@eric.cnri.reston.va.us> <14266.51743.904066.470431@dolphin.mojam.com> <14266.54194.715887.808096@amarok.cnri.reston.va.us> Message-ID: <14266.54907.143970.101594@dolphin.mojam.com> Andrew> Perhaps it's really a documentation fix: the os module should Andrew> document only those features common to all of the big 3 Andrew> platforms (Unix, Windows, Mac), and have pointers to a section Andrew> for each of the platform-specific modules, listing the Andrew> platform-specific functions. Perhaps. Should that read ... the os module should *expose* only those features common to all of the big 3 platforms ... ? Skip From skip@mojam.com (Skip Montanaro) Wed Aug 18 16:54:11 1999 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 18 Aug 1999 10:54:11 -0500 (CDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <19990818153320.D61F6303120@snelboot.oratrix.nl> References: <199908181447.JAA05151@dolphin.mojam.com> <19990818153320.D61F6303120@snelboot.oratrix.nl> Message-ID: <14266.54991.27912.12075@dolphin.mojam.com> >>>>> "Jack" == Jack Jansen writes: >> The proposal then is that importing the >> platform's module by name will import both the portable and non-portable >> interface elements. Importing the posix module will import just that >> portion of the interface that is truly portable across all platforms. Jack> There's one slight problem with this: when you use functionality that is Jack> partially portable, i.e. a call that is available on Windows and Unix but not Jack> on the Mac. Agreed. I'm not sure what to do there. Is the intersection of the common OS calls on Unix, Windows and Mac so small as to be useless (or are there some really gotta have functions not in the intersection because they are missing only on the Mac)? Skip From guido@CNRI.Reston.VA.US Wed Aug 18 17:16:27 1999 From: guido@CNRI.Reston.VA.US (Guido van Rossum) Date: Wed, 18 Aug 1999 12:16:27 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: Your message of "Wed, 18 Aug 1999 10:52:20 CDT." <14266.54907.143970.101594@dolphin.mojam.com> References: <199908181447.JAA05151@dolphin.mojam.com> <199908181454.KAA07692@eric.cnri.reston.va.us> <14266.51743.904066.470431@dolphin.mojam.com> <14266.54194.715887.808096@amarok.cnri.reston.va.us> <14266.54907.143970.101594@dolphin.mojam.com> Message-ID: <199908181616.MAA07901@eric.cnri.reston.va.us> > ... the os module should *expose* only those features common to all of > the big 3 platforms ... Why? My experience has been that functionality that was thought to be Unix specific has gradually become available on other platforms, which makes it hard to decide in which module a function should be placed. The proper test for portability of a program is not whether it imports certain module names, but whether it uses certain functions from those modules (and whether it uses them in a portable fashion). As platforms evolve, a program that was previously thought to be non-portable might become more portable. --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov@inrialpes.fr Wed Aug 18 18:33:44 1999 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Wed, 18 Aug 1999 18:33:44 +0100 (NFT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14266.54991.27912.12075@dolphin.mojam.com> from "Skip Montanaro" at "Aug 18, 99 10:54:11 am" Message-ID: <199908181733.SAA08434@pukapuka.inrialpes.fr> Everybody's right in this debate. I have to type a lot to express objectively my opinion, but better filter my reasoning and just say the conclusion. Having in mind: - what POSIX is - what an OS is - that an OS may or may not comply w/ the POSIX standard, and if it doesn't, it may do so in a couple of years (Windows 3K and PyOS come to mind ;-) - that the os module claims portability amongst the different OSes, mainly regarding their filesystem & process management services, hence it's exposing only a *subset* of the os specific services - the current state of Python It would be nice: - to leave the os module as a common denominator - to have a "unix" module (which could further incorporate the different brands of unix) - to have the posix module capture the fraction of posix functionality, exported from a particular OS specific module, and add the appropriate POSIX propaganda in the docs - to manage to do this, or argue what's wrong with the above -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal@lemburg.com Thu Aug 19 11:02:26 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 19 Aug 1999 12:02:26 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <199908181312.OAA20542@pukapuka.inrialpes.fr> <37BAAC00.27A34FF7@lemburg.com> Message-ID: <37BBD632.3F66419C@lemburg.com> [about weak references and a sample implementation in mxProxy] With the help of Vladimir, I have solved the problem and uploaded a modified version of the prerelease: http://starship.skyport.net/~lemburg/mxProxy-pre0.2.0.zip The archive now also contains a precompiled Win32 PYD file for those on WinXX platforms. Please give it a try and tell me what you think. Cheers, -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 134 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jack@oratrix.nl Thu Aug 19 15:06:01 1999 From: jack@oratrix.nl (Jack Jansen) Date: Thu, 19 Aug 1999 16:06:01 +0200 Subject: [Python-Dev] Optimization idea Message-ID: <19990819140602.433BC303120@snelboot.oratrix.nl> I just had yet another idea for optimizing Python that looks so plausible that I guess someone else must have looked into it already (and, hence, probably rejected it:-): We add to the type structure a "type identifier" number, a small integer for the common types (int=1, float=2, string=3, etc) and 0 for everything else. When eval_code2 sees, for instance, a MULTIPLY operation it does something like the following: case BINARY_MULTIPLY: w = POP(); v = POP(); code = (BINARY_MULTIPLY << 8) | ((v->ob_type->tp_typeid) << 4) | ((w->ob_type->tp_typeid); x = (binopfuncs[code])(v, w); .... etc ... The idea is that all the 256 BINARY_MULTIPLY entries would be filled with PyNumber_Multiply, except for a few common cases. The int*int field could point straight to int_mul(), etc. Assuming the common cases are really more common than the uncommon cases the fact that they jump straight out to the implementation function in stead of mucking around in PyNumber_Multiply and PyNumber_Coerce should easily offset the added overhead of shifts, ors and indexing. Any thoughts? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From guido@CNRI.Reston.VA.US Thu Aug 19 15:05:28 1999 From: guido@CNRI.Reston.VA.US (Guido van Rossum) Date: Thu, 19 Aug 1999 10:05:28 -0400 Subject: [Python-Dev] Localization expert needed Message-ID: <199908191405.KAA10401@eric.cnri.reston.va.us> My contact at HP is asking for expert advice on localization and multi-byte characters. I have little to share except pointing to Martin von Loewis and Pythonware. Does anyone on this list have a suggestion besides those? Don't hesitate to recommend yourself -- there's money in it! --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Wed, 18 Aug 1999 23:15:55 -0700 From: JOE_ELLSWORTH To: guido@CNRI.Reston.VA.US Subject: Localization efforts and state in Python. Hi Guido. Can you give me some references to The best references currently available for using Python in CGI applications when multi-byte localization is known to be needed? Who is the expert in this in the Python area? Can you recomend that they work with us in this area? Thanks, Joe E. ------- End of Forwarded Message From guido@CNRI.Reston.VA.US Thu Aug 19 15:15:28 1999 From: guido@CNRI.Reston.VA.US (Guido van Rossum) Date: Thu, 19 Aug 1999 10:15:28 -0400 Subject: [Python-Dev] Optimization idea In-Reply-To: Your message of "Thu, 19 Aug 1999 16:06:01 +0200." <19990819140602.433BC303120@snelboot.oratrix.nl> References: <19990819140602.433BC303120@snelboot.oratrix.nl> Message-ID: <199908191415.KAA10432@eric.cnri.reston.va.us> > I just had yet another idea for optimizing Python that looks so > plausible that I guess someone else must have looked into it already > (and, hence, probably rejected it:-): > > We add to the type structure a "type identifier" number, a small integer for > the common types (int=1, float=2, string=3, etc) and 0 for everything else. > > When eval_code2 sees, for instance, a MULTIPLY operation it does something > like the following: > case BINARY_MULTIPLY: > w = POP(); > v = POP(); > code = (BINARY_MULTIPLY << 8) | > ((v->ob_type->tp_typeid) << 4) | > ((w->ob_type->tp_typeid); > x = (binopfuncs[code])(v, w); > .... etc ... > > The idea is that all the 256 BINARY_MULTIPLY entries would be filled with > PyNumber_Multiply, except for a few common cases. The int*int field could > point straight to int_mul(), etc. > > Assuming the common cases are really more common than the uncommon cases the > fact that they jump straight out to the implementation function in stead of > mucking around in PyNumber_Multiply and PyNumber_Coerce should easily offset > the added overhead of shifts, ors and indexing. You're assuming that arithmetic operations are a major time sink. I doubt that; much of my code contains hardly any arithmetic these days. Of course, if you *do* have a piece of code that does a lot of basic arithmetic, it might pay off -- but even then I would guess that the majority of opcodes are things like list accessors and variable. But we needn't speculate. It's easy enough to measure the speedup: you can use tp_xxx5 in the type structure and plug a typecode into it for the int and float types. (Note that you would need a separate table of binopfuncs per operator.) --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov@inrialpes.fr Thu Aug 19 20:09:26 1999 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Thu, 19 Aug 1999 20:09:26 +0100 (NFT) Subject: [Python-Dev] about line numbers Message-ID: <199908191909.UAA20618@pukapuka.inrialpes.fr> [Tim, in an earlier msg] > > Would be more valuable to rethink the debugger's breakpoint approach so that > SET_LINENO is never needed (line-triggered callbacks are expensive because > called so frequently, turning each dynamic SET_LINENO into a full-blown > Python call; Ok. In the meantime I think that folding the redundant SET_LINENO doesn't hurt. I ended up with a patchlet that seems to have no side effects, that updates the lnotab as it should and that even makes pdb a bit more clever, IMHO. Consider an extreme case for the function f (listed below). Currently, we get the following: ------------------------------------------- >>> from test import f >>> import dis, pdb >>> dis.dis(f) 0 SET_LINENO 1 3 SET_LINENO 2 6 SET_LINENO 3 9 SET_LINENO 4 12 SET_LINENO 5 15 LOAD_CONST 1 (1) 18 STORE_FAST 0 (a) 21 SET_LINENO 6 24 SET_LINENO 7 27 SET_LINENO 8 30 LOAD_CONST 2 (None) 33 RETURN_VALUE >>> pdb.runcall(f) > test.py(1)f() -> def f(): (Pdb) list 1, 20 1 -> def f(): 2 """Comment about f""" 3 """Another one""" 4 """A third one""" 5 a = 1 6 """Forth""" 7 "and pdb can set a breakpoint on this one (simple quotes)" 8 """but it's intelligent about triple quotes...""" [EOF] (Pdb) step > test.py(2)f() -> """Comment about f""" (Pdb) step > test.py(3)f() -> """Another one""" (Pdb) step > test.py(4)f() -> """A third one""" (Pdb) step > test.py(5)f() -> a = 1 (Pdb) step > test.py(6)f() -> """Forth""" (Pdb) step > test.py(7)f() -> "and pdb can set a breakpoint on this one (simple quotes)" (Pdb) step > test.py(8)f() -> """but it's intelligent about triple quotes...""" (Pdb) step --Return-- > test.py(8)f()->None -> """but it's intelligent about triple quotes...""" (Pdb) >>> ------------------------------------------- With folded SET_LINENO, we have this: ------------------------------------------- >>> from test import f >>> import dis, pdb >>> dis.dis(f) 0 SET_LINENO 5 3 LOAD_CONST 1 (1) 6 STORE_FAST 0 (a) 9 SET_LINENO 8 12 LOAD_CONST 2 (None) 15 RETURN_VALUE >>> pdb.runcall(f) > test.py(5)f() -> a = 1 (Pdb) list 1, 20 1 def f(): 2 """Comment about f""" 3 """Another one""" 4 """A third one""" 5 -> a = 1 6 """Forth""" 7 "and pdb can set a breakpoint on this one (simple quotes)" 8 """but it's intelligent about triple quotes...""" [EOF] (Pdb) break 7 Breakpoint 1 at test.py:7 (Pdb) break 8 *** Blank or comment (Pdb) step > test.py(8)f() -> """but it's intelligent about triple quotes...""" (Pdb) step --Return-- > test.py(8)f()->None -> """but it's intelligent about triple quotes...""" (Pdb) >>> ------------------------------------------- i.e, pdb stops at (points to) the first real instruction and doesn't step trough the doc strings. Or is there something I'm missing here? -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 -------------------------------[ cut here ]--------------------------- *** compile.c-orig Thu Aug 19 19:27:13 1999 --- compile.c Thu Aug 19 19:00:31 1999 *************** *** 615,620 **** --- 615,623 ---- int arg; { if (op == SET_LINENO) { + if (!Py_OptimizeFlag && c->c_last_addr == c->c_nexti - 3) + /* Hack for folding several SET_LINENO in a row. */ + c->c_nexti -= 3; com_set_lineno(c, arg); if (Py_OptimizeFlag) return; From guido@CNRI.Reston.VA.US Thu Aug 19 22:10:33 1999 From: guido@CNRI.Reston.VA.US (Guido van Rossum) Date: Thu, 19 Aug 1999 17:10:33 -0400 Subject: [Python-Dev] about line numbers In-Reply-To: Your message of "Thu, 19 Aug 1999 20:09:26 BST." <199908191909.UAA20618@pukapuka.inrialpes.fr> References: <199908191909.UAA20618@pukapuka.inrialpes.fr> Message-ID: <199908192110.RAA12755@eric.cnri.reston.va.us> Earlier, you argued that this is "not an optimization," but rather avoiding redundancy. I should have responded right then that I disagree, or at least I'm lukewarm about your patch. Either you're not using -O, and then you don't care much about this; or you care, and then you should be using -O. Rather than encrusting the code with more and more ad-hoc micro optimizations, I'd prefer to have someone look into Tim's suggestion of supporting more efficient breakpoints... --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov@inrialpes.fr Fri Aug 20 13:45:46 1999 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Fri, 20 Aug 1999 13:45:46 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <199908192110.RAA12755@eric.cnri.reston.va.us> from "Guido van Rossum" at "Aug 19, 99 05:10:33 pm" Message-ID: <199908201245.NAA27098@pukapuka.inrialpes.fr> Guido van Rossum wrote: > > Earlier, you argued that this is "not an optimization," but rather > avoiding redundancy. I haven't argued so much; I asked whether this would be reasonable. Probably I should have said that I don't see the purpose of emitting SET_LINENO instructions for those nodes for which the compiler generates no code, mainly because (as I learned subsequently) SET_LINENO serve no other purpose but debugging. As I haven't payed much attention to this aspect of the code, I thought thay they might still be used for tracebacks. But I couldn't have said that because I didn't know it. > I should have responded right then that I disagree, ... Although I agree this is a minor issue, I'm interested in your argument here, if it's something else than the dialectic: "we're more interested in long term improvements" which is also my opinion. > ... or at least I'm lukewarm about your patch. No surprise here :-) But I haven't found another way of not generating SET_LINENO for doc strings other than backpatching. > Either you're > not using -O, and then you don't care much about this; or you care, > and then you should be using -O. Neither of those. I don't really care, frankly. I was just intrigued by the consecutive SET_LINENO in my disassemblies, so I started to think and ask questions about it. > > Rather than encrusting the code with more and more ad-hoc micro > optimizations, I'd prefer to have someone look into Tim's suggestion > of supporting more efficient breakpoints... This is *the* real issue with the real potential solution. I'm willing to have a look at this (although I don't know pdb/bdb in its finest details). All suggestions and thoughts are welcome. We would probably leave the SET_LINENO opcode as is and (eventually) introduce a new opcode (instead of transforming/renaming it) for compatibility reasons, methinks. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From gmcm@hypernet.com Fri Aug 20 17:04:22 1999 From: gmcm@hypernet.com (Gordon McMillan) Date: Fri, 20 Aug 1999 11:04:22 -0500 Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: <19990818110213.A558F303120@snelboot.oratrix.nl> References: Message by "M.-A. Lemburg" , Wed, 18 Aug 1999 11:02:02 +0200 , <37BA768A.50DF5574@lemburg.com> Message-ID: <1276961301-70195@hypernet.com> In reply to no one in particular: I've often wished that the instance type object had an (optimized) __decref__ slot. With nothing but hand-waving to support it, I'll claim that would enable all these games. - Gordon From gmcm@hypernet.com Fri Aug 20 17:04:22 1999 From: gmcm@hypernet.com (Gordon McMillan) Date: Fri, 20 Aug 1999 11:04:22 -0500 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/ In-Reply-To: <19990818153320.D61F6303120@snelboot.oratrix.nl> References: Message by Skip Montanaro , Wed, 18 Aug 1999 09:47:23 -0500 , <199908181447.JAA05151@dolphin.mojam.com> Message-ID: <1276961295-70552@hypernet.com> Jack Jansen wrote: > There's one slight problem with this: when you use functionality > that is partially portable, i.e. a call that is available on Windows > and Unix but not on the Mac. It gets worse, I think. How about the inconsistencies in POSIX support among *nixes? How about NT being a superset of Win9x? How about NTFS having capabilities that FAT does not? I'd guess there are inconsistencies between Mac flavors, too. The Java approach (if you can't do it everywhere, you can't do it) sucks. In some cases you could probably have the missing functionality (in os) fail silently, but in other cases that would be a disaster. "Least-worst"-is-not-necessarily-"good"-ly y'rs - Gordon From tismer@appliedbiometrics.com Fri Aug 20 16:05:47 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Fri, 20 Aug 1999 17:05:47 +0200 Subject: [Python-Dev] about line numbers References: <199908191909.UAA20618@pukapuka.inrialpes.fr> <199908192110.RAA12755@eric.cnri.reston.va.us> Message-ID: <37BD6ECB.9DD17460@appliedbiometrics.com> Guido van Rossum wrote: > > Earlier, you argued that this is "not an optimization," but rather > avoiding redundancy. I should have responded right then that I > disagree, or at least I'm lukewarm about your patch. Either you're > not using -O, and then you don't care much about this; or you care, > and then you should be using -O. > > Rather than encrusting the code with more and more ad-hoc micro > optimizations, I'd prefer to have someone look into Tim's suggestion > of supporting more efficient breakpoints... I didn't think of this before, but I just realized that I have something like that already in Stackless Python. It is possible to set a breakpoint at every opcode, for every frame. Adding an extra opcode for breakpoints is a good thing as well. The former are good for tracing, conditionla breakpoints and such, and cost a little more time since the is always one extra function call. The latter would be a quick, less versatile thing. The implementation of inserting extra breakpoint opcodes for running code turns out to be easy to implement, if the running frame gets a local extra copy of its code object, with the breakpoints replacing the original opcodes. The breakpoint handler would then simply look into the original code object. Inserting breakpoints on the source level gives us breakpoints per procedure. Doing it in a running frame gives "instance" level debugging of code. Checking a monitor function on every opcode is slightly more expensive but most general. We can have it all, what do you think. I'm going to finish and publish the stackless/continous package and submit a paper by end of September. Should I include this debugging feature? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From guido@CNRI.Reston.VA.US Fri Aug 20 16:09:32 1999 From: guido@CNRI.Reston.VA.US (Guido van Rossum) Date: Fri, 20 Aug 1999 11:09:32 -0400 Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: Your message of "Fri, 20 Aug 1999 11:04:22 CDT." <1276961301-70195@hypernet.com> References: Message by "M.-A. Lemburg" , Wed, 18 Aug 1999 11:02:02 +0200 , <37BA768A.50DF5574@lemburg.com> <1276961301-70195@hypernet.com> Message-ID: <199908201509.LAA14726@eric.cnri.reston.va.us> > In reply to no one in particular: > > I've often wished that the instance type object had an (optimized) > __decref__ slot. With nothing but hand-waving to support it, I'll > claim that would enable all these games. Without context, I don't know when this would be called. If you want this called on all DECREFs (regardless of the refcount value), realize that this is a huge slowdown because it would mean the DECREF macro has to inspect the type object, which means several indirections. This would slow down *every* DECREF operation, not just those on instances with a __decref__ slot, because the DECREF macro doesn't know the type of the object! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@CNRI.Reston.VA.US Fri Aug 20 16:13:16 1999 From: guido@CNRI.Reston.VA.US (Guido van Rossum) Date: Fri, 20 Aug 1999 11:13:16 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/ In-Reply-To: Your message of "Fri, 20 Aug 1999 11:04:22 CDT." <1276961295-70552@hypernet.com> References: Message by Skip Montanaro , Wed, 18 Aug 1999 09:47:23 -0500 , <199908181447.JAA05151@dolphin.mojam.com> <1276961295-70552@hypernet.com> Message-ID: <199908201513.LAA14741@eric.cnri.reston.va.us> From: "Gordon McMillan" > Jack Jansen wrote: > > > There's one slight problem with this: when you use functionality > > that is partially portable, i.e. a call that is available on Windows > > and Unix but not on the Mac. > > It gets worse, I think. How about the inconsistencies in POSIX > support among *nixes? How about NT being a superset of Win9x? How > about NTFS having capabilities that FAT does not? I'd guess there are > inconsistencies between Mac flavors, too. > > The Java approach (if you can't do it everywhere, you can't do it) > sucks. In some cases you could probably have the missing > functionality (in os) fail silently, but in other cases that would > be a disaster. The Python policy has always been "if it's available, there's a standard name and API for it; if it's not available, the function is not defined or will raise an exception; you can use hasattr(os, ...) or catch exceptions to cope if you can live without it." There are a few cases where unavailable calls are emulated, a few where they are made no-ops, and a few where they are made to raise an exception uncoditionally, but in most cases the function will simply not exist, so it's easy to test. --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov@inrialpes.fr Fri Aug 20 21:54:10 1999 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Fri, 20 Aug 1999 21:54:10 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <37BD6ECB.9DD17460@appliedbiometrics.com> from "Christian Tismer" at "Aug 20, 99 05:05:47 pm" Message-ID: <199908202054.VAA26970@pukapuka.inrialpes.fr> I'll try to sketch here the scheme I'm thinking of for the callback/breakpoint issue (without SET_LINENO), although some technical details are still missing. I'm assuming the following, in this order: 1) No radical changes in the current behavior, i.e. preserve the current architecture / strategy as much as possible. 2) We dont have breakpoints per opcode, but per source line. For that matter, we have sys.settrace (and for now, we don't aim to have sys.settracei that would be called on every opcode, although we might want this in the future) 3) SET_LINENO disappear. Actually, SET_LINENO are conditional breakpoints, used for callbacks from C to Python. So the basic problem is to generate these callbacks. If any of the above is not an appropriate assumption and we want a radical change in the strategy of setting breakpoints/ generating callbacks, then this post is invalid. The solution I'm thinking of: a) Currently, we have a function PyCode_Addr2Line which computes the source line from the opcode's address. I hereby assume that we can write the reverse function PyCode_Line2Addr that returns the address from a given source line number. I don't have the implementation, but it should be doable. Furthermore, we can compute, having the co_lnotab table and co_firstlineno, the source line range for a code object. As a consequence, even with the dumbiest of all algorithms, by looping trough this source line range, we can enumerate with PyCode_Line2Addr the sequence of addresses for the source lines of this code object. b) As Chris pointed out, in case sys.settrace is defined, we can allocate and keep a copy of the original code string per frame. We can further dynamically overwrite the original code string with a new (internal, one byte) CALL_TRACE opcode at the addresses we have enumerated in a). The CALL_TRACE opcodes will trigger the callbacks from C to Python, just as the current SET_LINENO does. c) At execution time, whenever a CALL_TRACE opcode is reached, we trigger the callback and if it returns successfully, we'll fetch the original opcode for the current location from the copy of the original co_code. Then we directly jump to the arg fetch code (or in case we fetch the entire original opcode in CALL_TRACE - we jump to the dispatch code). Hmm. I think that's all. At the heart of this scheme is the PyCode_Line2Addr function, which is the only blob in my head, for now. Christian Tismer wrote: > > I didn't think of this before, but I just realized that > I have something like that already in Stackless Python. > It is possible to set a breakpoint at every opcode, for every > frame. Adding an extra opcode for breakpoints is a good thing > as well. The former are good for tracing, conditionla breakpoints > and such, and cost a little more time since the is always one extra > function call. The latter would be a quick, less versatile thing. I don't think I understand clearly the difference you're talking about, and why the one thing is better that the other, probably because I'm a bit far from stackless python. > I'm going to finish and publish the stackless/continous package > and submit a paper by end of September. Should I include this debugging > feature? Write the paper first, you have more than enough material to talk about already ;-). Then if you have time to implement some debugging support, you could always add another section, but it won't be a central point of your paper. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From guido@CNRI.Reston.VA.US Fri Aug 20 20:59:24 1999 From: guido@CNRI.Reston.VA.US (Guido van Rossum) Date: Fri, 20 Aug 1999 15:59:24 -0400 Subject: [Python-Dev] about line numbers In-Reply-To: Your message of "Fri, 20 Aug 1999 21:54:10 BST." <199908202054.VAA26970@pukapuka.inrialpes.fr> References: <199908202054.VAA26970@pukapuka.inrialpes.fr> Message-ID: <199908201959.PAA16105@eric.cnri.reston.va.us> > I'll try to sketch here the scheme I'm thinking of for the > callback/breakpoint issue (without SET_LINENO), although some > technical details are still missing. > > I'm assuming the following, in this order: > > 1) No radical changes in the current behavior, i.e. preserve the > current architecture / strategy as much as possible. > > 2) We dont have breakpoints per opcode, but per source line. For that > matter, we have sys.settrace (and for now, we don't aim to have > sys.settracei that would be called on every opcode, although we might > want this in the future) > > 3) SET_LINENO disappear. Actually, SET_LINENO are conditional breakpoints, > used for callbacks from C to Python. So the basic problem is to generate > these callbacks. They used to be the only mechanism by which the traceback code knew the current line number (long before the debugger hooks existed), but with the lnotab, that's no longer necessary. > If any of the above is not an appropriate assumption and we want a radical > change in the strategy of setting breakpoints/ generating callbacks, then > this post is invalid. Sounds reasonable. > The solution I'm thinking of: > > a) Currently, we have a function PyCode_Addr2Line which computes the source > line from the opcode's address. I hereby assume that we can write the > reverse function PyCode_Line2Addr that returns the address from a given > source line number. I don't have the implementation, but it should be > doable. Furthermore, we can compute, having the co_lnotab table and > co_firstlineno, the source line range for a code object. > > As a consequence, even with the dumbiest of all algorithms, by looping > trough this source line range, we can enumerate with PyCode_Line2Addr > the sequence of addresses for the source lines of this code object. > > b) As Chris pointed out, in case sys.settrace is defined, we can allocate > and keep a copy of the original code string per frame. We can further > dynamically overwrite the original code string with a new (internal, > one byte) CALL_TRACE opcode at the addresses we have enumerated in a). > > The CALL_TRACE opcodes will trigger the callbacks from C to Python, > just as the current SET_LINENO does. > > c) At execution time, whenever a CALL_TRACE opcode is reached, we trigger > the callback and if it returns successfully, we'll fetch the original > opcode for the current location from the copy of the original co_code. > Then we directly jump to the arg fetch code (or in case we fetch the > entire original opcode in CALL_TRACE - we jump to the dispatch code). Tricky, but doable. > Hmm. I think that's all. > > At the heart of this scheme is the PyCode_Line2Addr function, which is > the only blob in my head, for now. I'm pretty sure that this would be straightforward. I'm a little anxious about modifying the code, and was thinking myself of a way to specify a bitvector of addresses where to break. But that would still cause some overhead for code without breakpoints, so I guess you're right (and it's certainly a long-standing tradition in breakpoint-setting!) --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov@inrialpes.fr Fri Aug 20 22:22:12 1999 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Fri, 20 Aug 1999 22:22:12 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <199908201959.PAA16105@eric.cnri.reston.va.us> from "Guido van Rossum" at "Aug 20, 99 03:59:24 pm" Message-ID: <199908202122.WAA26956@pukapuka.inrialpes.fr> Guido van Rossum wrote: > > > I'm a little anxious about modifying the code, and was thinking myself > of a way to specify a bitvector of addresses where to break. But that > would still cause some overhead for code without breakpoints, so I > guess you're right (and it's certainly a long-standing tradition in > breakpoint-setting!) > Hm. You're probably right, especially if someone wants to inspect a code object from the debugger or something. But I belive, that we can manage to redirect the instruction pointer in the beginning of eval_code2 to the *copy* of co_code, and modify the copy with CALL_TRACE, preserving the original intact. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From skip@mojam.com (Skip Montanaro) Fri Aug 20 21:25:25 1999 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 20 Aug 1999 15:25:25 -0500 (CDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/ In-Reply-To: <1276961295-70552@hypernet.com> References: <199908181447.JAA05151@dolphin.mojam.com> <19990818153320.D61F6303120@snelboot.oratrix.nl> <1276961295-70552@hypernet.com> Message-ID: <14269.47443.192469.525132@dolphin.mojam.com> Gordon> It gets worse, I think. How about the inconsistencies in POSIX Gordon> support among *nixes? How about NT being a superset of Win9x? Gordon> How about NTFS having capabilities that FAT does not? I'd guess Gordon> there are inconsistencies between Mac flavors, too. To a certain degree I think the C module(s) that interface to the underlying OS's API can iron out differences. In other cases you may have to document minor (known) differences. In still other cases you may have to relegate particular functionality to the OS-dependent modules. Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/~skip/ 847-971-7098 From gmcm@hypernet.com Fri Aug 20 23:38:14 1999 From: gmcm@hypernet.com (Gordon McMillan) Date: Fri, 20 Aug 1999 17:38:14 -0500 Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: <199908201509.LAA14726@eric.cnri.reston.va.us> References: Your message of "Fri, 20 Aug 1999 11:04:22 CDT." <1276961301-70195@hypernet.com> Message-ID: <1276937670-1491544@hypernet.com> [me] > > > > I've often wished that the instance type object had an (optimized) > > __decref__ slot. With nothing but hand-waving to support it, I'll > > claim that would enable all these games. [Guido] > Without context, I don't know when this would be called. If you > want this called on all DECREFs (regardless of the refcount value), > realize that this is a huge slowdown because it would mean the > DECREF macro has to inspect the type object, which means several > indirections. This would slow down *every* DECREF operation, not > just those on instances with a __decref__ slot, because the DECREF > macro doesn't know the type of the object! This was more 2.0-ish speculation, and really thinking of classic C++ ref counting where decref would be a function call, not a macro. Still a slowdown, of course, but not quite so massive. The upside is opening up all kinds of tricks at the type object and user class levels, (such as weak refs and copy on write etc). Worth it? I'd think so, but I'm not a speed demon. - Gordon From tim_one@email.msn.com Sat Aug 21 09:09:17 1999 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 21 Aug 1999 04:09:17 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14266.51743.904066.470431@dolphin.mojam.com> Message-ID: <000201beebac$776d32e0$0c2d2399@tim> [Skip Montanaro] > ... > 3. If Dan Connolly's contention is correct, importing the os module > today is not all that portable. I can't really say one way or the > other, because I'm lucky enough to be able to confine my serious > programming to Unix. I'm sure there's someone out there that > can try the following on a few platforms: > > import os > dir(os) > > and compare the output. There's no need to, Skip. Just read the os module docs; where a function says, e.g., "Availability: Unix.", it doesn't show up on a Windows or Mac box. In that sense using (some) os functions is certainly unportable. But I have no sympathy for the phrasing of Dan's complaint: if he calls os.getegid(), *he* knows perfectly well that's a Unix-specific function, and expressing outrage over it not working on NT is disingenuous. OTOH, I don't think you're going to find anything in the OS module documented as available only on Windows or only on Macs, and some semi-portable functions (notoriosly chmod) are documented in ways that make sense only to Unixheads. This certainly gives a strong impression of Unix-centricity to non-Unix weenies, and has got to baffle true newbies completely. So 'twould be nice to have a basic os module all of whose functions "run everywhere", whose interfaces aren't copies of cryptic old Unixisms, and whose docs are platform neutral. If Guido is right that the os functions tend to get more portable over time, fine, that module can grow over time too. In the meantime, life would be easier for everyone except Python's implementers. From Vladimir.Marangozov@inrialpes.fr Sat Aug 21 16:34:32 1999 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Sat, 21 Aug 1999 16:34:32 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <199908202122.WAA26956@pukapuka.inrialpes.fr> from "Vladimir Marangozov" at "Aug 20, 99 10:22:12 pm" Message-ID: <199908211534.QAA22392@pukapuka.inrialpes.fr> [me] > > Guido van Rossum wrote: > > > > > > I'm a little anxious about modifying the code, and was thinking myself > > of a way to specify a bitvector of addresses where to break. But that > > would still cause some overhead for code without breakpoints, so I > > guess you're right (and it's certainly a long-standing tradition in > > breakpoint-setting!) > > > > Hm. You're probably right, especially if someone wants to inspect > a code object from the debugger or something. But I belive, that > we can manage to redirect the instruction pointer in the beginning > of eval_code2 to the *copy* of co_code, and modify the copy with > CALL_TRACE, preserving the original intact. > I wrote a very rough first implementation of this idea. The files are at: http://sirac.inrialpes.fr/~marangoz/python/lineno/ Basically, what I did is: 1) what I said :-) 2) No more SET_LINENO 3) In tracing mode, a copy of the original code is put in an additional slot (co_tracecode) of the code object. Then it's overwritten with CALL_TRACE opcodes at the locations returned by PyCode_Line2Addr. The VM is routed to execute this code, and not the original one. 4) When tracing is off (i.e. sys_tracefunc is NULL) the VM fallbacks to normal execution of the original code. A couple of things that need finalization: a) how to deallocate the modified code string when tracing is off b) the value of CALL_TRACE (I almost randomly picked 76) c) I don't handle the cases where sys_tracefunc is enabled or disabled within the same code object. Tracing or not is determined before the main loop. d) update pdb, so that it does not allow setting breakpoints on lines with no code. To achieve this, I think that python versions of PyCode_Addr2Line & PyCode_Line2Addr have to be integrated into pdb as helper functions. e) correct bugs and design flaws f) something else? And here's the sample session of my lousy function f with this 'proof of concept' code: >>> from test import f >>> import dis, pdb >>> dis.dis(f) 0 LOAD_CONST 1 (1) 3 STORE_FAST 0 (a) 6 LOAD_CONST 2 (None) 9 RETURN_VALUE >>> pdb.runcall(f) > test.py(5)f() -> a = 1 (Pdb) list 1, 10 1 def f(): 2 """Comment about f""" 3 """Another one""" 4 """A third one""" 5 -> a = 1 6 """Forth""" 7 "and pdb can set a breakpoint on this one (simple quotes)" 8 """but it's intelligent about triple quotes...""" [EOF] (Pdb) step > test.py(8)f() -> """but it's intelligent about triple quotes...""" (Pdb) step --Return-- > test.py(8)f()->None -> """but it's intelligent about triple quotes...""" (Pdb) >>> -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tismer@appliedbiometrics.com Sat Aug 21 18:10:50 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Sat, 21 Aug 1999 19:10:50 +0200 Subject: [Python-Dev] about line numbers References: <199908211534.QAA22392@pukapuka.inrialpes.fr> Message-ID: <37BEDD9A.DBA817B1@appliedbiometrics.com> Vladimir Marangozov wrote: ... > I wrote a very rough first implementation of this idea. The files are at: > > http://sirac.inrialpes.fr/~marangoz/python/lineno/ > > Basically, what I did is: > > 1) what I said :-) > 2) No more SET_LINENO > 3) In tracing mode, a copy of the original code is put in an additional > slot (co_tracecode) of the code object. Then it's overwritten with > CALL_TRACE opcodes at the locations returned by PyCode_Line2Addr. I'd rather keep the original code object as it is, create a copy with inserted breakpoints and put that into the frame slot. Pointing back to the original from there. Then I'd redirect the code from the CALL_TRACE opcode completely to a user-defined function. Getting rid of the extra code object would be done by this function when tracing is off. It also vanishes automatically when the frame is released. > a) how to deallocate the modified code string when tracing is off By making the copy a frame property which is temporary, I think. Or, if tracing should work for all frames, by pushing the original in the back of the modified. Both works. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From Vladimir.Marangozov@inrialpes.fr Sat Aug 21 22:40:05 1999 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Sat, 21 Aug 1999 22:40:05 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <37BEDD9A.DBA817B1@appliedbiometrics.com> from "Christian Tismer" at "Aug 21, 99 07:10:50 pm" Message-ID: <199908212140.WAA51054@pukapuka.inrialpes.fr> Chris, could you please repeat that step by step in more detail? I'm not sure I understand your suggestions. Christian Tismer wrote: > > Vladimir Marangozov wrote: > ... > > I wrote a very rough first implementation of this idea. The files are at: > > > > http://sirac.inrialpes.fr/~marangoz/python/lineno/ > > > > Basically, what I did is: > > > > 1) what I said :-) > > 2) No more SET_LINENO > > 3) In tracing mode, a copy of the original code is put in an additional > > slot (co_tracecode) of the code object. Then it's overwritten with > > CALL_TRACE opcodes at the locations returned by PyCode_Line2Addr. > > I'd rather keep the original code object as it is, create a copy > with inserted breakpoints and put that into the frame slot. You seem to suggest to duplicate the entire code object, right? And reference the modified duplicata from the current frame? I actually duplicate only the opcode string (that is, the co_code string object) and I don't see the point of duplicating the entire code object. Keeping a reference from the current frame makes sense, but won't it deallocate the modified version on every frame release (then redo all the code duplication work for every frame) ? > Pointing back to the original from there. I don't understand this. What points back where? > > Then I'd redirect the code from the CALL_TRACE opcode completely > to a user-defined function. What user-defined function? I don't understand that either... Except the sys_tracefunc, what other (user-defined) function do we have here? Is it a Python or a C function? > Getting rid of the extra code object would be done by this function > when tracing is off. How exactly? This seems to be obvious for you, but obviously, not for me ;-) > It also vanishes automatically when the frame is released. The function or the extra code object? > > > a) how to deallocate the modified code string when tracing is off > > By making the copy a frame property which is temporary, I think. I understood that the frame lifetime could be exploited "somehow"... > Or, if tracing should work for all frames, by pushing the original > in the back of the modified. Both works. Tracing is done for all frames, if sys_tracefunc is not NULL, which is a function that usually ends up in the f_trace slot. > > ciao - chris I'm confused. I didn't understand your idea. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tismer@appliedbiometrics.com Sat Aug 21 22:23:10 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Sat, 21 Aug 1999 23:23:10 +0200 Subject: [Python-Dev] about line numbers References: <199908212140.WAA51054@pukapuka.inrialpes.fr> Message-ID: <37BF18BE.B3D58836@appliedbiometrics.com> Vladimir Marangozov wrote: > > Chris, could you please repeat that step by step in more detail? > I'm not sure I understand your suggestions. I think I was too quick. I thought of copying the whole code object, of course. ... > > I'd rather keep the original code object as it is, create a copy > > with inserted breakpoints and put that into the frame slot. > > You seem to suggest to duplicate the entire code object, right? > And reference the modified duplicata from the current frame? Yes. > I actually duplicate only the opcode string (that is, the co_code string > object) and I don't see the point of duplicating the entire code object. > > Keeping a reference from the current frame makes sense, but won't it > deallocate the modified version on every frame release (then redo all the > code duplication work for every frame) ? You get two options by that. 1) permanently modifying one code object to be traceable is pushing a copy of the original "behind" by means of some co_back pointer. This keeps the patched one where the original was, and makes a global debugging version. 2) Creating a copy for one frame, and putting the original in to an co_back pointer. This gives debugging just for this one frame. ... > > Then I'd redirect the code from the CALL_TRACE opcode completely > > to a user-defined function. > > What user-defined function? I don't understand that either... > Except the sys_tracefunc, what other (user-defined) function do we have here? > Is it a Python or a C function? I would suggest a Python function, of course. > > Getting rid of the extra code object would be done by this function > > when tracing is off. > > How exactly? This seems to be obvious for you, but obviously, not for me ;-) If the permanent tracing "1)" is used, just restore the code object's contents from the original in co_back, and drop co_back. In the "2)" version, just pull the co_back into the frame's code pointer and loose the reference to the copy. Occours automatically on frame release. > > It also vanishes automatically when the frame is released. > > The function or the extra code object? The extra code object. ... > I'm confused. I didn't understand your idea. Forget it, it isn't more than another brain fart :-) ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From tim_one@email.msn.com Sun Aug 22 02:25:22 1999 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 21 Aug 1999 21:25:22 -0400 Subject: [Python-Dev] about line numbers In-Reply-To: <199908131347.OAA30740@pukapuka.inrialpes.fr> Message-ID: <000001beec3d$348f0160$cb2d2399@tim> [going back a week here, to dict resizing ...] [Vladimir Marangozov] > ... > All in all, for performance reasons, dicts remain an exception > to the rule of releasing memory ASAP. Yes, except I don't think there is such a rule! The actual rule is a balancing act between the cost of keeping memory around "just in case", and the expense of getting rid of it. Resizing a dict is extraordinarily expensive because the entire table needs to be rearranged, but lists make this tradeoff too (when you del a list element or list slice, it still goes thru NRESIZE, which still keeps space for as many as 100 "extra" elements around). The various internal caches for int and frame objects (etc) also play this sort of game; e.g., if I happen to have a million ints sitting around at some time, Python effectively assumes I'll never want to reuse that int storage for anything other than ints again. python-rarely-releases-memory-asap-ly y'rs - tim From Vladimir.Marangozov@inrialpes.fr Sun Aug 22 20:41:59 1999 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Sun, 22 Aug 1999 20:41:59 +0100 (NFT) Subject: [Python-Dev] Memory (was: about line numbers, which was shrinking dicts) In-Reply-To: <000001beec3d$348f0160$cb2d2399@tim> from "Tim Peters" at "Aug 21, 99 09:25:22 pm" Message-ID: <199908221941.UAA54480@pukapuka.inrialpes.fr> Tim Peters wrote: > > [going back a week here, to dict resizing ...] Yes, and the subject line does not correspond to the contents because at the moment I've sent this message, I ran out of disk space and the mailer picked a random header after destroying half of the messages in this mailbox. > > [Vladimir Marangozov] > > ... > > All in all, for performance reasons, dicts remain an exception > > to the rule of releasing memory ASAP. > > Yes, except I don't think there is such a rule! The actual rule is a > balancing act between the cost of keeping memory around "just in case", and > the expense of getting rid of it. Good point. > > Resizing a dict is extraordinarily expensive because the entire table needs > to be rearranged, but lists make this tradeoff too (when you del a list > element or list slice, it still goes thru NRESIZE, which still keeps space > for as many as 100 "extra" elements around). > > The various internal caches for int and frame objects (etc) also play this > sort of game; e.g., if I happen to have a million ints sitting around at > some time, Python effectively assumes I'll never want to reuse that int > storage for anything other than ints again. > > python-rarely-releases-memory-asap-ly y'rs - tim Yes, and I'm somewhat sensible to this issue afer spending 6 years in a team which deals a lot with memory management (mainly DSM). In other words, you say that Python tolerates *virtual* memory fragmentation (a funny term :-). In the case of dicts and strings, we tolerate "internal fragmentation" (a contiguous chunk is allocated, then partially used). In the case of ints, floats or frames, we tolerate "external fragmentation". And as you said, Python tolerates this because of the speed/space tradeoff. Hopefully, all we deal with at this level is virtual memory, so even if you have zillions of ints, it's the OS VMM that will help you more with its long-term scheduling than Python's wild guesses about a hypothetical usage of zillions of ints later. I think that some OS concepts can give us hints on how to reduce our virtual fragmentation (which, as we all know, is a not a very good thing). A few kewords: compaction, segmentation, paging, sharing. We can't do much about our internal fragmentation, except changing the algorithms of dicts & strings (which is not appealing anyways). But it would be nice to think about the external fragmentation of Python's caches. Or even try to reduce the internal fragmentation in combination with the internal caches... BTW, this is the whole point of PyMalloc: in a virtual memory world, try to reduce the distance between the user view and the OS view on memory. PyMalloc addresses the fragmentation problem at a lower level of granularity than an OS (that is, *within* a page), because most Python's objects are very small. However, it can't handle efficiently large chunks like the int/float caches. Basically what it does is: segmentation of the virtual space and sharing of the cached free space. I think that Python could improve on sharing its internal caches, without significant slowdowns... The bottom line is that there's still plenty of room for exploring alternate mem mgt strategies that fit better Python's memory needs as a whole. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From jack@oratrix.nl Sun Aug 22 22:25:56 1999 From: jack@oratrix.nl (Jack Jansen) Date: Sun, 22 Aug 1999 23:25:56 +0200 Subject: [Python-Dev] Converting C objects to Python objects and back Message-ID: <19990822212601.2D4BE18BA0D@oratrix.oratrix.nl> Here's another siy idea, not having to do with optimization. On the Mac, and as far as I know on Windows as well, there are quite a few OS API structures that have a Python Object representation that is little more than the PyObject boilerplate plus a pointer to the C API object. (And, of course, lots of methods to operate on the object). To convert these from Python to C I always use boilerplate code like WindowPtr *win; PyArg_ParseTuple(args, "O&", PyWin_Convert, &win); where PyWin_Convert is the function that takes a PyObject * and a void **, does the typecheck and sets the pointer. A similar way is used to convert C pointers back to Python objects in Py_BuildValue. What I was thinking is that it would be nice (if you are _very_ careful) if this functionality was available in struct. So, if I would somehow obtain (in a Python string) a C structure that contained, say, a WindowPtr and two ints, I would be able to say win, x, y = struct.unpack("Ohh", Win.WindowType) and struct would be able, through the WindowType type object, to get at the PyWin_Convert and PyWin_New functions. A nice side issue is that you can add an option to PyArg_Parsetuple so you can say PyArg_ParseTuple(args, "O+", Win_WinObject, &win) and you don't have to remember the different names the various types use for their conversion routines. Is this worth pursuing is is it just too dangerous? And, if it is worth pursuing, I have to stash away the two function pointers somewhere in the TypeObject, should I grab one of the tp_xxx fields for this or is there a better place? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From Fred L. Drake, Jr." References: <14266.51743.904066.470431@dolphin.mojam.com> <000201beebac$776d32e0$0c2d2399@tim> Message-ID: <14273.24719.865520.797568@weyr.cnri.reston.va.us> Tim Peters writes: > OTOH, I don't think you're going to find anything in the OS module > documented as available only on Windows or only on Macs, and some Tim, Actually, the spawn*() functions are included in os and are documented as Windows-only, along with the related P_* constants. These are provided by the nt module. > everywhere", whose interfaces aren't copies of cryptic old Unixisms, and > whose docs are platform neutral. I'm alwasy glad to see documentation patches, or even pointers to specific problems. Being a Unix-weenie myself, making the documentation more readable to Windows-weenies can be difficult at times. But given useful pointers, I can usually pull it off, or at least drive someone who canto do so. ;-) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From tim_one@email.msn.com Tue Aug 24 07:32:49 1999 From: tim_one@email.msn.com (Tim Peters) Date: Tue, 24 Aug 1999 02:32:49 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14273.24719.865520.797568@weyr.cnri.reston.va.us> Message-ID: <000701beedfa$7c5c8e40$902d2399@tim> [Fred L. Drake, Jr.] > Actually, the spawn*() functions are included in os and are > documented as Windows-only, along with the related P_* constants. > These are provided by the nt module. I stand corrected, Fred -- so how do the Unix dweebs like this Windows crap cluttering "their" docs ? [Tim, pitching a portable sane interface to a portable sane subset of os functionality] > I'm alwasy glad to see documentation patches, or even pointers to > specific problems. Being a Unix-weenie myself, making the > documentation more readable to Windows-weenies can be difficult at > times. But given useful pointers, I can usually pull it off, or at > least drive someone who canto do so. ;-) No, it's deeper than that. Some of the inherited Unix interfaces are flatly incomprehensible to anyone other than a Unix-head, but the functionality is supplied only in that form (docs may ease the pain, but the interfaces still suck); for example, mkdir (path[, mode]) Create a directory named path with numeric mode mode. The default mode is 0777 (octal). On some systems, mode is ignored. Where it is used, the current umask value is first masked out. Availability: Macintosh, Unix, Windows. If you have a sister or parent or 3-year-old child (they're all equivalent for this purpose ), just imagine them reading that. If you can't, I'll have my sister call you . Raw numeric permission modes, octal mode notation, and the "umask" business are Unix-specific -- and even Unices supply symbolic ways to specify permissions. chmod is likely the one I hear the most gripes about. Windows heads are looking to change "file attributes", the name "chmod" is gibberish to them, most of the Unix mode bits make no sense under Windows (& contra Guido's optimism, never will) even if you know the secret octal code, and Windows has several attributes (hidden bit, system bit, archive bit) chmod can't get at. The only portable functionality here is the write bit, but no non-Unix person could possibly guess either that chmod is the function they need, or what to type after someone tells them it's chmod. So this is less a doc issue than that more of os needs to become more like os.path (i.e., intelligently named functions with intelligently abstracted interfaces). never-grasped-what-ken-thompson-had-against-trailing-"e"s-ly y'rs - tim From skip@mojam.com (Skip Montanaro) Tue Aug 24 18:21:53 1999 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 24 Aug 1999 12:21:53 -0500 (CDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <000701beedfa$7c5c8e40$902d2399@tim> References: <14273.24719.865520.797568@weyr.cnri.reston.va.us> <000701beedfa$7c5c8e40$902d2399@tim> Message-ID: <14274.53860.210265.71990@dolphin.mojam.com> Tim> chmod is likely the one I hear the most gripes about. Windows Tim> heads are looking to change "file attributes", the name "chmod" is Tim> gibberish to them Well, we could confuse everyone and rename "chmod" to "chfat" (is that like file system liposuction?). Windows probably has an equivalent function whose name is 17 characters long which we'd all love to type, I'm sure. ;-) Tim> most of the Unix mode bits make no sense under Windows (& contra Tim> Guido's optimism, never will) even if you know the secret octal Tim> code ... It beats a secret handshake. Imagine all the extra peripherals we'd have to make available for everyone's computer. ;-) Tim> So this is less a doc issue than that more of os needs to become Tim> more like os.path (i.e., intelligently named functions with Tim> intelligently abstracted interfaces). Hasn't Guido's position been that the interface modules like os, posix, etc are just a thin layer over the underlying API (Guido: note how I cleverly attributed this position to you but also placed the responsibility for correctness on your head!)? If that's the case, perhaps we should provide a slightly higher level module that abstracts the file system as objects, and adopts a more user-friendly approach to the secret octal codes. Those of us worried about job security could continue to use the lower level module and leave the higher level interface for former Visual Basic programmers. Tim> never-grasped-what-ken-thompson-had-against-trailing-"e"s-ly y'rs - maybe-the-"e"-key-stuck-on-his-TTY-ly y'rs... Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/~skip/ 847-971-7098 | Python: Programming the way Guido indented... From Fred L. Drake, Jr." References: <14273.24719.865520.797568@weyr.cnri.reston.va.us> <000701beedfa$7c5c8e40$902d2399@tim> <14274.53860.210265.71990@dolphin.mojam.com> Message-ID: <14274.58040.138331.413958@weyr.cnri.reston.va.us> Skip Montanaro writes: > whose name is 17 characters long which we'd all love to type, I'm sure. ;-) Just 17? ;-) > Tim> So this is less a doc issue than that more of os needs to become > Tim> more like os.path (i.e., intelligently named functions with > Tim> intelligently abstracted interfaces). Sounds like some doc improvements can really help improve things, at least in the short term. > correctness on your head!)? If that's the case, perhaps we should provide a > slightly higher level module that abstracts the file system as objects, and > adopts a more user-friendly approach to the secret octal codes. Those of us I'm all for an object interface to a logical filesystem; having had to write just such a thing in Java not long ago, and we have a similar construct in Python (not by me, though), that we use in our Knowbot work. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From tim_one@email.msn.com Wed Aug 25 08:02:21 1999 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 25 Aug 1999 03:02:21 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14274.53860.210265.71990@dolphin.mojam.com> Message-ID: <000801beeec7$c6f06b20$fc2d153f@tim> [Skip Montanaro] > Well, we could confuse everyone and rename "chmod" to "chfat" ... I don't want to rename anything, nor do I want to use MS-specific names. chmod is both the wrong spelling & the wrong functionality for all non-Unix systems. os.path did a Good Thing by, e.g., introducing getmtime(), despite that everyone knows it's just os.stat()[8]. New isreadonly(path) and setreadonly(path) are more what I'm after; nothing beyond that is portable, & never will be. > Windows probably has an equivalent function whose name is 17 > characters long Indeed, SetFileAttributes is exactly 17 characters long (you moonlighting on NT, Skip?!). But while Windows geeks would like to use that, it's both the wrong spelling & the wrong functionality for all non-Windows systems. > ... > Hasn't Guido's position been that the interface modules like os, > posix, etc are just a thin layer over the underlying API (Guido: > note how I cleverly attributed this position to you but also placed > the responsibility for correctness on your head!)? If that's the > case, perhaps we should provide a slightly higher level module that > abstracts the file system as objects, and adopts a more user-friendly > approach to the secret octal codes. Like that, yes. > Those of us worried about job security could continue to use the > lower level module and leave the higher level interface for former > Visual Basic programmers. You're just *begging* Guido to make the Python2 os module take all of its names from the Win32 API . it's-no-lamer-to-be-ignorant-of-unix-names-than-it-is- to-be-ignorant-of-chinese-ly y'rs - tim From tim_one@email.msn.com Wed Aug 25 08:05:31 1999 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 25 Aug 1999 03:05:31 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14274.58040.138331.413958@weyr.cnri.reston.va.us> Message-ID: <000901beeec8$380d05c0$fc2d153f@tim> [Fred L. Drake, Jr.] > ... > I'm all for an object interface to a logical filesystem; having > had to write just such a thing in Java not long ago, and we have > a similar construct in Python (not by me, though), that we use in > our Knowbot work. Well, don't read anything unintended into this, but Guido *is* out of town, and you *do* have the power to check in code outside the doc subtree ... barry-will-help-he's-been-itching-to-revolt-too-ly y'rs - tim From bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Wed Aug 25 12:20:16 1999 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) (Barry A. Warsaw) Date: Wed, 25 Aug 1999 07:20:16 -0400 (EDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart References: <14274.58040.138331.413958@weyr.cnri.reston.va.us> <000901beeec8$380d05c0$fc2d153f@tim> Message-ID: <14275.53616.585669.890621@anthem.cnri.reston.va.us> >>>>> "TP" == Tim Peters writes: TP> Well, don't read anything unintended into this, but Guido *is* TP> out of town, and you *do* have the power to check in code TP> outside the doc subtree ... TP> barry-will-help-he's-been-itching-to-revolt-too-ly y'rs I'll bring the pitchforks if you bring the torches! -Barry From skip@mojam.com (Skip Montanaro) Wed Aug 25 16:17:35 1999 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 25 Aug 1999 10:17:35 -0500 (CDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <000901beeec8$380d05c0$fc2d153f@tim> References: <14274.58040.138331.413958@weyr.cnri.reston.va.us> <000901beeec8$380d05c0$fc2d153f@tim> Message-ID: <14276.2229.983969.228891@dolphin.mojam.com> > I'm all for an object interface to a logical filesystem; having had to > write just such a thing in Java not long ago, and we have a similar > construct in Python (not by me, though), that we use in our Knowbot > work. Fred, Since this is the dev group, how about showing us the Knowbot's logical filesystem API, and let's do some dev-ing... Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/~skip/ 847-971-7098 | Python: Programming the way Guido indented... From Fred L. Drake, Jr." References: <14274.53860.210265.71990@dolphin.mojam.com> <000801beeec7$c6f06b20$fc2d153f@tim> Message-ID: <14276.6236.605103.369339@weyr.cnri.reston.va.us> Tim Peters writes: > os.path did a Good Thing by, e.g., introducing getmtime(), despite that > everyone knows it's just os.stat()[8]. New isreadonly(path) and > setreadonly(path) are more what I'm after; nothing beyond that is portable, Tim, I think we can simply declare that isreadonly() checks that the file doesn't allow the user to read it, but setreadonly() sounds to me like it wouldn't be portable to Unix. There's more than one (reasonable) way to make a file unreadable to a user just by manipulating permission bits, and which is best will vary according to both the user and the file's existing permissions. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Fred L. Drake, Jr." References: <14274.58040.138331.413958@weyr.cnri.reston.va.us> <000901beeec8$380d05c0$fc2d153f@tim> Message-ID: <14276.6449.428851.402955@weyr.cnri.reston.va.us> Tim Peters writes: > Well, don't read anything unintended into this, but Guido *is* out > of town, and you *do* have the power to check in code outside the > doc subtree ... Good thing I turned of the python-checkins list when I added the curly bracket patch I've been working on! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Fred L. Drake, Jr." References: <14274.58040.138331.413958@weyr.cnri.reston.va.us> <000901beeec8$380d05c0$fc2d153f@tim> <14276.2229.983969.228891@dolphin.mojam.com> Message-ID: <14276.14854.366220.664463@weyr.cnri.reston.va.us> Skip Montanaro writes: > Since this is the dev group, how about showing us the Knowbot's logical > filesystem API, and let's do some dev-ing... Well, I took a look at it, and I must confess it's just not really different from the set of interfaces in the os module; the important point is that they are methods instead of functions (other than a few data items: sep, pardir, curdir). The path attribute provided the same interface as os.path. Its only user-visible state is the current-directory setting, which may or may not be that useful. We left off chmod(), which would make Tim happy, but that was only because it wasn't meaningful in context. We'd have to add it (or something equivalent) for a general purpose filesystem object. So Tim's only happy if he can come up with a general interface that is actually portable (consider my earlier comments on setreadonly()). On the other hand, you don't need chmod() or anything like it for most situations where a filesystem object would be useful. An FTPFilesystem class would not be hard to write! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From jack@oratrix.nl Wed Aug 25 22:43:16 1999 From: jack@oratrix.nl (Jack Jansen) Date: Wed, 25 Aug 1999 23:43:16 +0200 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: Message by "Fred L. Drake, Jr." , Wed, 25 Aug 1999 12:22:52 -0400 (EDT) , <14276.6236.605103.369339@weyr.cnri.reston.va.us> Message-ID: <19990825214321.D50AD18BA0F@oratrix.oratrix.nl> But in Python, with its nice high-level datastructures, couldn't we design the Mother Of All File Attribute Calls, which would optionally map functionality from one platform to another? As an example consider the Mac resource fork size. If on unix I did fattrs = os.getfileattributes(filename) rfsize = fattrs.get('resourceforksize') it would raise an exception. If, however, I did rfsize = fattrs.get('resourceforksize', compat=1) I would get a "close approximation", 0. Note that you want some sort of a compat parameter, not a default value, as for some attributes (the various atime/mtime/ctimes, permission bits, etc) you'd get a default based on other file attributes that do exist on the current platform. Hmm, the file-attribute-object idea has the added advantage that you can then use setfileattributes(filename, fattrs) to be sure that you've copied all relevant attributes, independent of the platform you're on. Mapping permissions takes a bit more (design-) work, with unix having user/group/other only and Windows having full-fledged ACLs (or nothing at all, depending how you look at it:-), but should also be doable. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From Vladimir.Marangozov@inrialpes.fr Thu Aug 26 07:10:01 1999 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Thu, 26 Aug 1999 07:10:01 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <199908211534.QAA22392@pukapuka.inrialpes.fr> from "Vladimir Marangozov" at "Aug 21, 99 04:34:32 pm" Message-ID: <199908260610.HAA20304@pukapuka.inrialpes.fr> [me, dropping SET_LINENO] > > I wrote a very rough first implementation of this idea. The files are at: > > http://sirac.inrialpes.fr/~marangoz/python/lineno/ > > ... > > A couple of things that need finalization: > > ... An updated version is available at the same location. I think that this one does The Right Thing (tm). a) Everything is internal to the VM and totally hidden, as it should be. b) No modifications of the code and frame objects (no additional slots) c) The modified code string (used for tracing) is allocated dynamically when the 1st frame pointing to its original switches in trace mode, and is deallocated automatically when the last frame pointing to its original dies. I feel better with this code so I can stop thinking about it and move on :-) (leaving it to your appreciation). What's next? File attributes? ;-) It's not easy to weight what kind of common interface would be easy to grasp, intuitive and unambiguous for the average user. I think that the people on this list (being core developers) are more or less biased here (I'd say more than less). Perhaps some input from the community (c.l.py) would help? -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tim_one@email.msn.com Thu Aug 26 06:06:57 1999 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 26 Aug 1999 01:06:57 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14276.14854.366220.664463@weyr.cnri.reston.va.us> Message-ID: <000301beef80$d26158c0$522d153f@tim> [Fred L. Drake, Jr.] > ... > We left off chmod(), which would make Tim happy, but that was only > because it wasn't meaningful in context. I'd be appalled to see chmod go away; for many people it's comfortable and useful. I want *another* way, to do what little bit is portable in a way that doesn't require first mastering a badly designed interface from a dying OS . > We'd have to add it (or something equivalent) for a general purpose > filesystem object. So Tim's only happy if he can come up with a > general interface that is actually portable (consider my earlier > comments on setreadonly()). I don't care about general here; making up a general new way to spell everything that everyone may want to do under every OS would create an interface even worse than chmod's. My sister doesn't want to create files that are read-only to the world but writable to her group -- she just wants to mark certain precious files as read-only to minimize the chance of accidental destruction. What she wants is easy to do under Windows or Unix, and I expect she's the norm rather than the exception. > On the other hand, you don't need chmod() or anything like it for > most situations where a filesystem object would be useful. An > FTPFilesystem class would not be hard to write! An OO filesystem object with a .makereadonly method suits me fine . From tim_one@email.msn.com Thu Aug 26 06:06:54 1999 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 26 Aug 1999 01:06:54 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14276.6236.605103.369339@weyr.cnri.reston.va.us> Message-ID: <000201beef80$d072f640$522d153f@tim> [Fred L. Drake, Jr.] > I think we can simply declare that isreadonly() checks that the > file doesn't allow the user to read it, Had more in mind that the file doesn't allow the user to write it . > but setreadonly() sounds to me like it wouldn't be portable to Unix. > There's more than one (reasonable) way to make a file unreadable to > a user just by manipulating permission bits, and which is best will > vary according to both the user and the file's existing permissions. "Portable" implies least common denominator, and the plain meaning of read-only is that nobody (whether owner, group or world in Unix) has write permission. People wanting something beyond that are going beyond what's portable, and that's fine -- I'm not suggesting getting rid of chmod for Unix dweebs. But by the same token, Windows dweebs should get some other (as non-portable as chmod) way to fiddle the bits important on *their* OS (only one of which chmod can affect). Billions of newbies will delightedly stick to the portable interface with the name that makes sense. the-percentage-of-programmers-doing-systems-programming-shrinks-by- the-millisecond-ly y'rs - tim From mal@lemburg.com Sat Aug 28 15:37:50 1999 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 28 Aug 1999 16:37:50 +0200 Subject: [Python-Dev] Iterating over dictionaries and objects in general References: <990826114149.ZM59302@rayburn.hcs.tl> <199908261702.NAA01866@eric.cnri.reston.va.us> <37C57E01.2ADC02AE@digicool.com> <990826150216.ZM60002@rayburn.hcs.tl> <37C5BAF1.4D6C1031@lemburg.com> <37C5C320.CF11BC7C@digicool.com> <37C643B0.7ECA586@lemburg.com> <37C69FB3.9CB279C7@digicool.com> Message-ID: <37C7F43E.67EEAB98@lemburg.com> [Followup to a discussion on psa-members about iterating over dictionaries without creating intermediate lists] Jim Fulton wrote: > > "M.-A. Lemburg" wrote: > > > > Jim Fulton wrote: > > > > > > > The problem with the PyDict_Next() approach is that it will only > > > > work reliably from within a single C call. You can't return > > > > to Python between calls to PyDict_Next(), because those could > > > > modify the dictionary causing the next PyDict_Next() call to > > > > fail or core dump. > > > > > > I do this all the time without problem. Basically, you provide an > > > index and if the index is out of range, you simply get an end-of-data return. > > > The only downside of this approach is that you might get "incorrect" > > > results if the dictionary is modified between calls. This isn't > > > all that different from iterating over a list with an index. > > > > Hmm, that's true... but what if the dictionary gets resized > > in between iterations ? The item layout is then likely to > > change, so you could potentially get complet bogus. > > I think I said that. :) Just wanted to verify my understanding ;-) > > Even iterating over items twice may then occur, I guess. > > Yup. > > Again, this is not so different from iterating over > a list using a range: > > l=range(10) > for i in range.len(l): > l.insert(0,'Bruce') > print l[i] > > This always outputs 'Bruce'. :) Ok, so the "risk" is under user control. Fine with me... > > Or perhaps via a special dictionary iterator, so that the following > > works: > > > > for item in dictrange(d): > > ... > > Yup. > > > The iterator could then also take some extra actions to insure > > that the dictionary hasn't been resized. > > I don't think it should do that. It should simply > stop when it has run out of items. I think I'll give such an iterator a spin. Would be a nice extension to mxTools. BTW, a generic type slot for iterating over types would probably be a nice feature too. The type slot could provide hooks of the form it_first, it_last, it_next, it_prev which all work integer index based, e.g. in pseudo code: int i; PyObject *item; /* set up i and item to point to the first item */ if (obj.it_first(&i,&item) < 0) goto onError; while (1) { PyObject_Print(item); /* move i and item to the next item; an IndexError is raised in case there are no more items */ if (obj.it_next(&i,&item) < 0) { PyErr_Clear(); break; } } These slots would cover all problem instances where iteration over non-sequences or non-uniform sequences (i.e. sequences like objects which don't provide konvex index sets, e.g. 1,2,3,6,7,8,11,12) is required, e.g. dictionaries, multi-segment buffers -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 127 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gward@cnri.reston.va.us Mon Aug 30 20:02:22 1999 From: gward@cnri.reston.va.us (Greg Ward) Date: Mon, 30 Aug 1999 15:02:22 -0400 Subject: [Python-Dev] Portable "spawn" module for core? Message-ID: <19990830150222.B428@cnri.reston.va.us> Hi all -- it recently occured to me that the 'spawn' module I wrote for the Distutils (and which Perry Stoll extended to handle NT), could fit nicely in the core library. On Unix, it's just a front-end to fork-and-exec; on NT, it's a front-end to spawnv(). In either case, it's just enough code (and just tricky enough code) that not everybody should have to duplicate it for their own uses. The basic idea is this: from spawn import spawn ... spawn (['cmd', 'arg1', 'arg2']) # or spawn (['cmd'] + args) you get the idea: it takes a *list* representing the command to spawn: no strings to parse, no shells to get in the way, no sneaky meta-characters ruining your day, draining your efficiency, or compromising your security. (Conversely, no pipelines, redirection, etc.) The 'spawn()' function just calls '_spawn_posix()' or '_spawn_nt()' depending on os.name. Additionally, it takes a couple of optional keyword arguments (all booleans): 'search_path', 'verbose', and 'dry_run', which do pretty much what you'd expect. The module as it's currently in the Distutils code is attached. Let me know what you think... Greg -- Greg Ward - software developer gward@cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913 From skip@mojam.com (Skip Montanaro) Mon Aug 30 20:11:50 1999 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 30 Aug 1999 14:11:50 -0500 (CDT) Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: <19990830150222.B428@cnri.reston.va.us> References: <19990830150222.B428@cnri.reston.va.us> Message-ID: <14282.54880.922571.792484@dolphin.mojam.com> Greg> it recently occured to me that the 'spawn' module I wrote for the Greg> Distutils (and which Perry Stoll extended to handle NT), could fit Greg> nicely in the core library. How's spawn.spawn semantically different from the Windows-dependent os.spawn? How are stdout/stdin/stderr connected to the child process - just like fork+exec or something slightly higher level like os.popen? If it's semantically like os.spawn and a little bit higher level abstraction than fork+exec, I'd vote for having the os module simply import it: from spawn import spawn and thus make that function more widely available... Greg> The module as it's currently in the Distutils code is attached. Not in the message I saw... Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/~skip/ 847-971-7098 | Python: Programming the way Guido indented... From gward@cnri.reston.va.us Mon Aug 30 20:14:57 1999 From: gward@cnri.reston.va.us (Greg Ward) Date: Mon, 30 Aug 1999 15:14:57 -0400 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: <19990830150222.B428@cnri.reston.va.us>; from Greg Ward on Mon, Aug 30, 1999 at 03:02:22PM -0400 References: <19990830150222.B428@cnri.reston.va.us> Message-ID: <19990830151457.C428@cnri.reston.va.us> On 30 August 1999, To python-dev@python.org said: > The module as it's currently in the Distutils code is attached. Let me > know what you think... New definition of "attached": I'll just reply to my own message with the code I meant to attach. D'oh! ------------------------------------------------------------------------ """distutils.spawn Provides the 'spawn()' function, a front-end to various platform- specific functions for launching another program in a sub-process.""" # created 1999/07/24, Greg Ward __rcsid__ = "$Id: spawn.py,v 1.2 1999/08/29 18:20:56 gward Exp $" import sys, os, string from distutils.errors import * def spawn (cmd, search_path=1, verbose=0, dry_run=0): """Run another program, specified as a command list 'cmd', in a new process. 'cmd' is just the argument list for the new process, ie. cmd[0] is the program to run and cmd[1:] are the rest of its arguments. There is no way to run a program with a name different from that of its executable. If 'search_path' is true (the default), the system's executable search path will be used to find the program; otherwise, cmd[0] must be the exact path to the executable. If 'verbose' is true, a one-line summary of the command will be printed before it is run. If 'dry_run' is true, the command will not actually be run. Raise DistutilsExecError if running the program fails in any way; just return on success.""" if os.name == 'posix': _spawn_posix (cmd, search_path, verbose, dry_run) elif os.name in ( 'nt', 'windows' ): # ??? _spawn_nt (cmd, search_path, verbose, dry_run) else: raise DistutilsPlatformError, \ "don't know how to spawn programs on platform '%s'" % os.name # spawn () def _spawn_nt ( cmd, search_path=1, verbose=0, dry_run=0): import string executable = cmd[0] if search_path: paths = string.split( os.environ['PATH'], os.pathsep) base,ext = os.path.splitext(executable) if (ext != '.exe'): executable = executable + '.exe' if not os.path.isfile(executable): paths.reverse() # go over the paths and keep the last one for p in paths: f = os.path.join( p, executable ) if os.path.isfile ( f ): # the file exists, we have a shot at spawn working executable = f if verbose: print string.join ( [executable] + cmd[1:], ' ') if not dry_run: # spawn for NT requires a full path to the .exe rc = os.spawnv (os.P_WAIT, executable, cmd) if rc != 0: raise DistutilsExecError("command failed: %d" % rc) def _spawn_posix (cmd, search_path=1, verbose=0, dry_run=0): if verbose: print string.join (cmd, ' ') if dry_run: return exec_fn = search_path and os.execvp or os.execv pid = os.fork () if pid == 0: # in the child try: #print "cmd[0] =", cmd[0] #print "cmd =", cmd exec_fn (cmd[0], cmd) except OSError, e: sys.stderr.write ("unable to execute %s: %s\n" % (cmd[0], e.strerror)) os._exit (1) sys.stderr.write ("unable to execute %s for unknown reasons" % cmd[0]) os._exit (1) else: # in the parent # Loop until the child either exits or is terminated by a signal # (ie. keep waiting if it's merely stopped) while 1: (pid, status) = os.waitpid (pid, 0) if os.WIFSIGNALED (status): raise DistutilsExecError, \ "command %s terminated by signal %d" % \ (cmd[0], os.WTERMSIG (status)) elif os.WIFEXITED (status): exit_status = os.WEXITSTATUS (status) if exit_status == 0: return # hey, it succeeded! else: raise DistutilsExecError, \ "command %s failed with exit status %d" % \ (cmd[0], exit_status) elif os.WIFSTOPPED (status): continue else: raise DistutilsExecError, \ "unknown error executing %s: termination status %d" % \ (cmd[0], status) # _spawn_posix () ------------------------------------------------------------------------ -- Greg Ward - software developer gward@cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913 From gward@cnri.reston.va.us Mon Aug 30 20:31:55 1999 From: gward@cnri.reston.va.us (Greg Ward) Date: Mon, 30 Aug 1999 15:31:55 -0400 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: <14282.54880.922571.792484@dolphin.mojam.com>; from Skip Montanaro on Mon, Aug 30, 1999 at 02:11:50PM -0500 References: <19990830150222.B428@cnri.reston.va.us> <14282.54880.922571.792484@dolphin.mojam.com> Message-ID: <19990830153155.D428@cnri.reston.va.us> On 30 August 1999, Skip Montanaro said: > > Greg> it recently occured to me that the 'spawn' module I wrote for the > Greg> Distutils (and which Perry Stoll extended to handle NT), could fit > Greg> nicely in the core library. > > How's spawn.spawn semantically different from the Windows-dependent > os.spawn? My understanding (purely from reading Perry's code!) is that the Windows spawnv() and spawnve() calls require the full path of the executable, and there is no spawnvp(). Hence, the bulk of Perry's '_spawn_nt()' function is code to search the system path if the 'search_path' flag is true. In '_spawn_posix()', I just use either 'execv()' or 'execvp()' for this. The bulk of my code is the complicated dance required to wait for a fork'ed child process to finish. > How are stdout/stdin/stderr connected to the child process - just > like fork+exec or something slightly higher level like os.popen? Just like fork 'n exec -- '_spawn_posix()' is just a front end to fork and exec (either execv or execvp). In a previous life, I *did* implement a spawning module for a certain other popular scripting language that handles redirection and capturing (backticks in the shell and that other scripting language). It was a lot of fun, but pretty hairy. Took three attempts gradually developed over two years to get it right in the end. In fact, it does all the easy stuff that a Unix shell does in spawning commands, ie. search the path, fork 'n exec, and redirection and capturing. Doesn't handle the tricky stuff, ie. pipelines and job control. The documentation for this module is 22 pages long; the code is 600+ lines of somewhat tricky Perl (1300 lines if you leave in comments and blank lines). That's why the Distutils spawn module doesn't do anything with std{out,err,in}. > If it's semantically like os.spawn and a little bit higher level > abstraction than fork+exec, I'd vote for having the os module simply > import it: So os.spawnv and os.spawnve would be Windows-specific, but os.spawn portable? Could be confusing. And despite the recent extended discussion of the os module, I'm not sure if this fits the model. BTW, is there anything like this on the Mac? On what other OSs does it even make sense to talk about programs spawning other programs? (Surely those GUI user interfaces have to do *something*...) Greg -- Greg Ward - software developer gward@cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913 From skip@mojam.com (Skip Montanaro) Mon Aug 30 20:52:49 1999 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 30 Aug 1999 14:52:49 -0500 (CDT) Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: <19990830153155.D428@cnri.reston.va.us> References: <19990830150222.B428@cnri.reston.va.us> <14282.54880.922571.792484@dolphin.mojam.com> <19990830153155.D428@cnri.reston.va.us> Message-ID: <14282.57574.918011.54595@dolphin.mojam.com> Greg> BTW, is there anything like this on the Mac? There will be, once Jack Jansen contributes _spawn_mac... ;-) Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/~skip/ 847-971-7098 | Python: Programming the way Guido indented... From jack@oratrix.nl Mon Aug 30 22:25:04 1999 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 30 Aug 1999 23:25:04 +0200 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: Message by Greg Ward , Mon, 30 Aug 1999 15:31:55 -0400 , <19990830153155.D428@cnri.reston.va.us> Message-ID: <19990830212509.7F5C018B9FB@oratrix.oratrix.nl> Recently, Greg Ward said: > BTW, is there anything like this on the Mac? On what other OSs does it > even make sense to talk about programs spawning other programs? (Surely > those GUI user interfaces have to do *something*...) Yes, but the interface is quite a bit more high-level, so it's pretty difficult to reconcile with the Unix and Windows "every argument is a string" paradigm. You start the process and pass along an AppleEvent (basically an RPC-call) that will be presented to the program upon startup. So on the mac there's a serious difference between (inventing the API interface here, cut down to make it understandable to non-macheads:-) spawn("netscape", ("Open", "file.html")) and spawn("netscape", ("OpenURL", "http://foo.com/file.html")) The mac interface is (of course:-) infinitely more powerful, allowing you to talk to running apps, adressing stuff in it as COM/OLE does, etc. but unfortunately the simple case of spawn("rm", "-rf", "/") is impossible to represent in a meaningful way. Add to that the fact that there's no stdin/stdout/stderr and there's little common ground. The one area of common ground is "run program X on files Y and Z and wait (or don't wait) for completion", so that is something that could maybe have a special method that could be implemented on all three mentioned platforms (and probably everything else as well). And even then it'll be surprising to Mac users that they have to _exit_ their editor (if you specify wait), not something people commonly do. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From guido@CNRI.Reston.VA.US Mon Aug 30 22:29:55 1999 From: guido@CNRI.Reston.VA.US (Guido van Rossum) Date: Mon, 30 Aug 1999 17:29:55 -0400 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: Your message of "Mon, 30 Aug 1999 23:25:04 +0200." <19990830212509.7F5C018B9FB@oratrix.oratrix.nl> References: <19990830212509.7F5C018B9FB@oratrix.oratrix.nl> Message-ID: <199908302129.RAA08442@eric.cnri.reston.va.us> > Recently, Greg Ward said: > > BTW, is there anything like this on the Mac? On what other OSs does it > > even make sense to talk about programs spawning other programs? (Surely > > those GUI user interfaces have to do *something*...) > > Yes, but the interface is quite a bit more high-level, so it's pretty > difficult to reconcile with the Unix and Windows "every argument is a > string" paradigm. You start the process and pass along an AppleEvent > (basically an RPC-call) that will be presented to the program upon > startup. > > So on the mac there's a serious difference between (inventing the API > interface here, cut down to make it understandable to non-macheads:-) > spawn("netscape", ("Open", "file.html")) > and > spawn("netscape", ("OpenURL", "http://foo.com/file.html")) > > The mac interface is (of course:-) infinitely more powerful, allowing > you to talk to running apps, adressing stuff in it as COM/OLE does, > etc. but unfortunately the simple case of spawn("rm", "-rf", "/") is > impossible to represent in a meaningful way. > > Add to that the fact that there's no stdin/stdout/stderr and there's > little common ground. The one area of common ground is "run program X > on files Y and Z and wait (or don't wait) for completion", so that is > something that could maybe have a special method that could be > implemented on all three mentioned platforms (and probably everything > else as well). And even then it'll be surprising to Mac users that > they have to _exit_ their editor (if you specify wait), not something > people commonly do. Indeed. I'm guessing that Greg wrote his code specifically to drive compilers, not so much to invoke an editor on a specific file. It so happens that the Windows compilers have command lines that look sufficiently like the Unix compilers that this might actually work. On the Mac, driving the compilers is best done using AppleEvents, so it's probably better to to try to abuse the spawn() interface for that... (Greg, is there a higher level where the compiler actions are described without referring to specific programs, but perhaps just to compiler actions and input and output files?) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@CNRI.Reston.VA.US Mon Aug 30 22:35:45 1999 From: guido@CNRI.Reston.VA.US (Guido van Rossum) Date: Mon, 30 Aug 1999 17:35:45 -0400 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: Your message of "Mon, 30 Aug 1999 15:02:22 EDT." <19990830150222.B428@cnri.reston.va.us> References: <19990830150222.B428@cnri.reston.va.us> Message-ID: <199908302135.RAA08467@eric.cnri.reston.va.us> > it recently occured to me that the 'spawn' module I wrote for the > Distutils (and which Perry Stoll extended to handle NT), could fit > nicely in the core library. On Unix, it's just a front-end to > fork-and-exec; on NT, it's a front-end to spawnv(). In either case, > it's just enough code (and just tricky enough code) that not everybody > should have to duplicate it for their own uses. > > The basic idea is this: > > from spawn import spawn > ... > spawn (['cmd', 'arg1', 'arg2']) > # or > spawn (['cmd'] + args) > > you get the idea: it takes a *list* representing the command to spawn: > no strings to parse, no shells to get in the way, no sneaky > meta-characters ruining your day, draining your efficiency, or > compromising your security. (Conversely, no pipelines, redirection, > etc.) > > The 'spawn()' function just calls '_spawn_posix()' or '_spawn_nt()' > depending on os.name. Additionally, it takes a couple of optional > keyword arguments (all booleans): 'search_path', 'verbose', and > 'dry_run', which do pretty much what you'd expect. > > The module as it's currently in the Distutils code is attached. Let me > know what you think... I'm not sure that the verbose and dry_run options belong in the standard library. When both are given, this does something semi-useful; for Posix that's basically just printing the arguments, while for NT it prints the exact command that will be executed. Not sure if that's significant though. Perhaps it's better to extract the code that runs the path to find the right executable and make that into a separate routine. (Also, rather than reversing the path, I would break out of the loop at the first hit.) --Guido van Rossum (home page: http://www.python.org/~guido/) From gward@cnri.reston.va.us Mon Aug 30 22:38:36 1999 From: gward@cnri.reston.va.us (Greg Ward) Date: Mon, 30 Aug 1999 17:38:36 -0400 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: <199908302129.RAA08442@eric.cnri.reston.va.us>; from Guido van Rossum on Mon, Aug 30, 1999 at 05:29:55PM -0400 References: <19990830212509.7F5C018B9FB@oratrix.oratrix.nl> <199908302129.RAA08442@eric.cnri.reston.va.us> Message-ID: <19990830173836.F428@cnri.reston.va.us> On 30 August 1999, Guido van Rossum said: > Indeed. I'm guessing that Greg wrote his code specifically to drive > compilers, not so much to invoke an editor on a specific file. It so > happens that the Windows compilers have command lines that look > sufficiently like the Unix compilers that this might actually work. Correct, but the spawn module I posted should work for any case where you want to run an external command synchronously without redirecting I/O. (And it could probably be extended to handle those cases, but a) I don't need them for Distutils [yet!], and b) I don't know how to do it portably.) > On the Mac, driving the compilers is best done using AppleEvents, so > it's probably better to to try to abuse the spawn() interface for > that... (Greg, is there a higher level where the compiler actions are > described without referring to specific programs, but perhaps just to > compiler actions and input and output files?) [off-topic alert... probably belongs on distutils-sig, but there you go] Yes, my CCompiler class is all about providing a (hopefully) compiler- and platform-neutral interface to a C/C++ compiler. Currently there're only two concrete subclasses of this: UnixCCompiler and MSVCCompiler, and they both obviously use spawn, because Unix C compilers and MSVC both provide that kind of interface. A hypothetical sibling class that provides an interface to some Mac C compiler might use a souped-up spawn that "knows about" Apple Events, or it might use some other interface to Apple Events. If Jack's simplified summary of what passing Apple Events to a command looks like is accurate, maybe spawn can be souped up to work on the Mac. Or we might need a dedicated module for running Mac programs. So does anybody have code to run external programs on the Mac using Apple Events? Would it be possible/reasonable to add that as '_spawn_mac()' to my spawn module? Greg -- Greg Ward - software developer gward@cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913 From jack@oratrix.nl Mon Aug 30 22:52:29 1999 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 30 Aug 1999 23:52:29 +0200 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: Message by Greg Ward , Mon, 30 Aug 1999 17:38:36 -0400 , <19990830173836.F428@cnri.reston.va.us> Message-ID: <19990830215234.ED4E718B9FB@oratrix.oratrix.nl> Hmm, if we're talking a "Python Make" or some such here the best way would probably be to use Tool Server. Tool Server is a thing that is based on Apple's old MPW programming environment, that is still supported by compiler vendors like MetroWerks. The nice thing of Tool Server for this type of work is that it _is_ command-line based, so you can probably send it things like spawn("cc", "-O", "test.c") But, although I know it is possible to do this with ToolServer, I haven't a clue on how to do it... -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From tim_one@email.msn.com Tue Aug 31 06:44:18 1999 From: tim_one@email.msn.com (Tim Peters) Date: Tue, 31 Aug 1999 01:44:18 -0400 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: <19990830153155.D428@cnri.reston.va.us> Message-ID: <000101bef373$de2974c0$932d153f@tim> [Greg Ward] > ... > In a previous life, I *did* implement a spawning module for > a certain other popular scripting language that handles > redirection and capturing (backticks in the shell and that other > scripting language). It was a lot of fun, but pretty hairy. Took > three attempts gradually developed over two years to get it right > in the end. In fact, it does all the easy stuff that a Unix shell > does in spawning commands, ie. search the path, fork 'n exec, and > redirection and capturing. Doesn't handle the tricky stuff, ie. > pipelines and job control. > > The documentation for this module is 22 pages long; the code > is 600+ lines of somewhat tricky Perl (1300 lines if you leave > in comments and blank lines). That's why the Distutils spawn > module doesn't do anything with std{out,err,in}. Note that win/tclWinPipe.c-- which contains the Windows-specific support for Tcl's "exec" cmd --is about 3,200 lines of C. It does handle pipelines and redirection, and even fakes pipes as needed with temp files when it can identify a pipeline component as belonging to the 16-bit subsystem. Even so, the Tcl help page for "exec" bristles with hilarious caveats under the Windows subsection; e.g., When redirecting from NUL:, some applications may hang, others will get an infinite stream of "0x01" bytes, and some will actually correctly get an immediate end-of-file; the behavior seems to depend upon something compiled into the application itself. When redirecting greater than 4K or so to NUL:, some applications will hang. The above problems do not happen with 32-bit applications. Still, people seem very happy with Tcl's exec, and I'm certain no language tries harder to provide a portable way to "do command lines". Two points to that: 1) If Python ever wants to do something similar, let's steal the Tcl code (& unlike stealing Perl's code, stealing Tcl's code actually looks possible -- it's very much better organized and written). 2) For all its heroic efforts to hide platform limitations, int Tcl_ExecObjCmd(dummy, interp, objc, objv) ClientData dummy; /* Not used. */ Tcl_Interp *interp; /* Current interpreter. */ int objc; /* Number of arguments. */ Tcl_Obj *CONST objv[]; /* Argument objects. */ { #ifdef MAC_TCL Tcl_AppendResult(interp, "exec not implemented under Mac OS", (char *)NULL); return TCL_ERROR; #else ... a-generalized-spawn-is-a-good-start-ly y'rs - tim From fredrik@pythonware.com Tue Aug 31 07:39:56 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 31 Aug 1999 08:39:56 +0200 Subject: [Python-Dev] Portable "spawn" module for core? References: <19990830150222.B428@cnri.reston.va.us> Message-ID: <005101bef37b$b0415070$f29b12c2@secret.pythonware.com> Greg Ward wrote: > it recently occured to me that the 'spawn' module I wrote for the > Distutils (and which Perry Stoll extended to handle NT), could fit > nicely in the core library. On Unix, it's just a front-end to > fork-and-exec; on NT, it's a front-end to spawnv(). any reason this couldn't go into the os module instead? just add parts of it to os.py, and change the docs to say that spawn* are supported on Windows and Unix... (supporting the full set of spawn* primitives would of course be nice, btw. just like os.py provides all exec variants...) From da at ski.org Tue Aug 3 01:01:26 1999 From: da at ski.org (David Ascher) Date: Mon, 2 Aug 1999 16:01:26 -0700 (Pacific Daylight Time) Subject: [Python-Dev] Pickling w/ low overhead Message-ID: An issue which has dogged the NumPy project is that there is (to my knowledge) no way to pickle very large arrays without creating strings which contain all of the data. This can be a problem given that NumPy arrays tend to be very large -- often several megabytes, sometimes much bigger. This slows things down, sometimes a lot, depending on the platform. It seems that it should be possible to do something more efficient. Two alternatives come to mind: -- define a new pickling protocol which passes a file-like object to the instance and have the instance write itself to that file, being as efficient or inefficient as it cares to. This protocol is used only if the instance/type defines the appropriate slot. Alternatively, enrich the semantics of the getstate interaction, so that an object can return partial data and tell the pickling mechanism to come back for more. -- make pickling of objects which support the buffer interface use that inteface's notion of segments and use that 'chunk' size to do something more efficient if not necessarily most efficient. (oh, and make NumPy arrays support the buffer interface =). This is simple for NumPy arrays since we want to pickle "everything", but may not be what other buffer-supporting objects want. Thoughts? Alternatives? --david From mhammond at skippinet.com.au Tue Aug 3 02:41:23 1999 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue, 3 Aug 1999 10:41:23 +1000 Subject: [Python-Dev] Buffer interface in abstract.c? Message-ID: <001001bedd48$ea796280$1101a8c0@bobcat> Hi all, Im trying to slowly wean myself over to the buffer interfaces. My exploration so far indicates that, for most cases, simply replacing "PyString_FromStringAndSize" with "PyBuffer_FromMemory" handles the vast majority of cases, and is preferred when the data contains arbitary bytes. PyArg_ParseTuple("s#", ...) still works correctly as we would hope. However, performing this explicitly is a pain. Looking at getargs.c, the code to achieve this is a little too convoluted to cut-and-paste each time. Therefore, I would like to propose these functions to be added to abstract.c: int PyObject_GetBufferSize(); void *PyObject_GetReadWriteBuffer(); /* or "char *"? */ const void *PyObject_GetReadOnlyBuffer(); Although equivalent functions exist for the buffer object, I can't see the equivalent abstract implementations - ie, that work with any object supporting the protocol. Im willing to provide a patch if there is agreement a) the general idea is good, and b) my specific spelling of the idea is OK (less likely - PyBuffer_* seems better, but loses any implication of being abstract?). Thoughts? Mark. From gstein at lyra.org Tue Aug 3 03:51:43 1999 From: gstein at lyra.org (Greg Stein) Date: Mon, 02 Aug 1999 18:51:43 -0700 Subject: [Python-Dev] Buffer interface in abstract.c? References: <001001bedd48$ea796280$1101a8c0@bobcat> Message-ID: <37A64B2F.3386F0A9@lyra.org> Mark Hammond wrote: > ... > Therefore, I would like to propose these functions to be added to > abstract.c: > > int PyObject_GetBufferSize(); > void *PyObject_GetReadWriteBuffer(); /* or "char *"? */ > const void *PyObject_GetReadOnlyBuffer(); > > Although equivalent functions exist for the buffer object, I can't see the > equivalent abstract implementations - ie, that work with any object > supporting the protocol. > > Im willing to provide a patch if there is agreement a) the general idea is > good, and b) my specific spelling of the idea is OK (less likely - > PyBuffer_* seems better, but loses any implication of being abstract?). Marc-Andre proposed exactly the same thing back at the end of March (to me and Guido). The two of us hashed out some of the stuff and M.A. came up with a full patch for the stuff. Guido was relatively non-committal at the point one way or another, but said they seemed fine. It appears the stuff never made it into source control. If Marc-Andre can resurface the final proposal/patch, then we'd be set. Until then: use the bufferprocs :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Tue Aug 3 11:11:11 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 03 Aug 1999 11:11:11 +0200 Subject: [Python-Dev] Pickling w/ low overhead References: Message-ID: <37A6B22F.7A14BA2C@lemburg.com> David Ascher wrote: > > An issue which has dogged the NumPy project is that there is (to my > knowledge) no way to pickle very large arrays without creating strings > which contain all of the data. This can be a problem given that NumPy > arrays tend to be very large -- often several megabytes, sometimes much > bigger. This slows things down, sometimes a lot, depending on the > platform. It seems that it should be possible to do something more > efficient. > > Two alternatives come to mind: > > -- define a new pickling protocol which passes a file-like object to the > instance and have the instance write itself to that file, being as > efficient or inefficient as it cares to. This protocol is used only > if the instance/type defines the appropriate slot. Alternatively, > enrich the semantics of the getstate interaction, so that an object > can return partial data and tell the pickling mechanism to come back > for more. > > -- make pickling of objects which support the buffer interface use that > inteface's notion of segments and use that 'chunk' size to do > something more efficient if not necessarily most efficient. (oh, and > make NumPy arrays support the buffer interface =). This is simple > for NumPy arrays since we want to pickle "everything", but may not be > what other buffer-supporting objects want. > > Thoughts? Alternatives? Hmm, types can register their own pickling/unpickling functions via copy_reg, so they can access the self.write method in pickle.py to implement the write to file interface. Don't know how this would be done for cPickle.c though. For instances the situation is different since there is no dispatching done on a per-class basis. I guess an optional argument could help here. Perhaps some lazy pickling wrapper would help fix this in general: an object which calls back into the to-be-pickled object to access the data rather than store the data in a huge string. Yet another idea would be using memory mapped files instead of strings as temporary storage (but this is probably hard to implement right and not as portable). Dunno... just some thoughts. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 150 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Tue Aug 3 09:50:33 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 03 Aug 1999 09:50:33 +0200 Subject: [Python-Dev] Buffer interface in abstract.c? References: <001001bedd48$ea796280$1101a8c0@bobcat> <37A64B2F.3386F0A9@lyra.org> Message-ID: <37A69F49.3575AE85@lemburg.com> Greg Stein wrote: > > Mark Hammond wrote: > > ... > > Therefore, I would like to propose these functions to be added to > > abstract.c: > > > > int PyObject_GetBufferSize(); > > void *PyObject_GetReadWriteBuffer(); /* or "char *"? */ > > const void *PyObject_GetReadOnlyBuffer(); > > > > Although equivalent functions exist for the buffer object, I can't see the > > equivalent abstract implementations - ie, that work with any object > > supporting the protocol. > > > > Im willing to provide a patch if there is agreement a) the general idea is > > good, and b) my specific spelling of the idea is OK (less likely - > > PyBuffer_* seems better, but loses any implication of being abstract?). > > Marc-Andre proposed exactly the same thing back at the end of March (to > me and Guido). The two of us hashed out some of the stuff and M.A. came > up with a full patch for the stuff. Guido was relatively non-committal > at the point one way or another, but said they seemed fine. It appears > the stuff never made it into source control. > > If Marc-Andre can resurface the final proposal/patch, then we'd be set. Below is the code I currently use. I don't really remember if this is what Greg and I discussed a while back, but I'm sure he'll correct me ;-) Note that you the buffer length is implicitly returned by these APIs. /* Takes an arbitrary object which must support the character (single segment) buffer interface and returns a pointer to a read-only memory location useable as character based input for subsequent processing. buffer and buffer_len are only set in case no error occurrs. Otherwise, -1 is returned and an exception set. */ static int PyObject_AsCharBuffer(PyObject *obj, const char **buffer, int *buffer_len) { PyBufferProcs *pb = obj->ob_type->tp_as_buffer; const char *pp; int len; if ( pb == NULL || pb->bf_getcharbuffer == NULL || pb->bf_getsegcount == NULL ) { PyErr_SetString(PyExc_TypeError, "expected a character buffer object"); goto onError; } if ( (*pb->bf_getsegcount)(obj,NULL) != 1 ) { PyErr_SetString(PyExc_TypeError, "expected a single-segment buffer object"); goto onError; } len = (*pb->bf_getcharbuffer)(obj,0,&pp); if (len < 0) goto onError; *buffer = pp; *buffer_len = len; return 0; onError: return -1; } /* Same as PyObject_AsCharBuffer() except that this API expects a readable (single segment) buffer interface and returns a pointer to a read-only memory location which can contain arbitrary data. buffer and buffer_len are only set in case no error occurrs. Otherwise, -1 is returned and an exception set. */ static int PyObject_AsReadBuffer(PyObject *obj, const void **buffer, int *buffer_len) { PyBufferProcs *pb = obj->ob_type->tp_as_buffer; void *pp; int len; if ( pb == NULL || pb->bf_getreadbuffer == NULL || pb->bf_getsegcount == NULL ) { PyErr_SetString(PyExc_TypeError, "expected a readable buffer object"); goto onError; } if ( (*pb->bf_getsegcount)(obj,NULL) != 1 ) { PyErr_SetString(PyExc_TypeError, "expected a single-segment buffer object"); goto onError; } len = (*pb->bf_getreadbuffer)(obj,0,&pp); if (len < 0) goto onError; *buffer = pp; *buffer_len = len; return 0; onError: return -1; } /* Takes an arbitrary object which must support the writeable (single segment) buffer interface and returns a pointer to a writeable memory location in buffer of size buffer_len. buffer and buffer_len are only set in case no error occurrs. Otherwise, -1 is returned and an exception set. */ static int PyObject_AsWriteBuffer(PyObject *obj, void **buffer, int *buffer_len) { PyBufferProcs *pb = obj->ob_type->tp_as_buffer; void*pp; int len; if ( pb == NULL || pb->bf_getwritebuffer == NULL || pb->bf_getsegcount == NULL ) { PyErr_SetString(PyExc_TypeError, "expected a writeable buffer object"); goto onError; } if ( (*pb->bf_getsegcount)(obj,NULL) != 1 ) { PyErr_SetString(PyExc_TypeError, "expected a single-segment buffer object"); goto onError; } len = (*pb->bf_getwritebuffer)(obj,0,&pp); if (len < 0) goto onError; *buffer = pp; *buffer_len = len; return 0; onError: return -1; } -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 150 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jack at oratrix.nl Tue Aug 3 11:53:39 1999 From: jack at oratrix.nl (Jack Jansen) Date: Tue, 03 Aug 1999 11:53:39 +0200 Subject: [Python-Dev] Buffer interface in abstract.c? In-Reply-To: Message by "M.-A. Lemburg" , Tue, 03 Aug 1999 09:50:33 +0200 , <37A69F49.3575AE85@lemburg.com> Message-ID: <19990803095339.E02CE303120@snelboot.oratrix.nl> Why not pass the index to the As*Buffer routines as well and make getsegcount available too? Then you could code things like for(i=0; i Message-ID: <37A6C387.7360D792@lyra.org> Jack Jansen wrote: > > Why not pass the index to the As*Buffer routines as well and make getsegcount > available too? Then you could code things like > for(i=0; i if ( PyObject_AsCharBuffer(obj, &buf, &count, i) < 0 ) > return -1; > write(fp, buf, count); > } Simply because multiple segments hasn't been seen. All objects supporting the buffer interface have a single segment. IMO, it is best to drop the argument to make typical usage easier. For handling multiple segments, a caller can use the raw interface rather than the handy functions. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jim at digicool.com Tue Aug 3 12:58:54 1999 From: jim at digicool.com (Jim Fulton) Date: Tue, 03 Aug 1999 06:58:54 -0400 Subject: [Python-Dev] Buffer interface in abstract.c? References: <001001bedd48$ea796280$1101a8c0@bobcat> Message-ID: <37A6CB6E.C990F561@digicool.com> Mark Hammond wrote: > > Hi all, > Im trying to slowly wean myself over to the buffer interfaces. OK, I'll bite. Where is the buffer interface documented? I found references to it in various places (e.g. built-in buffer()) but didn't find the interface itself. Jim -- Jim Fulton mailto:jim at digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From mal at lemburg.com Tue Aug 3 13:06:46 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 03 Aug 1999 13:06:46 +0200 Subject: [Python-Dev] Buffer interface in abstract.c? References: <19990803095339.E02CE303120@snelboot.oratrix.nl> Message-ID: <37A6CD46.642A9C6D@lemburg.com> Jack Jansen wrote: > > Why not pass the index to the As*Buffer routines as well and make getsegcount > available too? Then you could code things like > for(i=0; i if ( PyObject_AsCharBuffer(obj, &buf, &count, i) < 0 ) > return -1; > write(fp, buf, count); > } Well, just like Greg said, this is not much different than using the buffer interface directly. While the above would be a handy PyObject_WriteAsBuffer() kind of helper, I don't think that this is really used all that much. E.g. in mxODBC I use the APIs for accessing the raw char data in a buffer: the pointer is passed directly to the ODBC APIs without copying, which makes things quite fast. IMHO, this is the greatest advantage of the buffer interface. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 150 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake at cnri.reston.va.us Tue Aug 3 15:07:44 1999 From: fdrake at cnri.reston.va.us (Fred L. Drake) Date: Tue, 3 Aug 1999 09:07:44 -0400 (EDT) Subject: [Python-Dev] Buffer interface in abstract.c? In-Reply-To: <37A64B2F.3386F0A9@lyra.org> References: <001001bedd48$ea796280$1101a8c0@bobcat> <37A64B2F.3386F0A9@lyra.org> Message-ID: <14246.59808.561395.761772@weyr.cnri.reston.va.us> Greg Stein writes: > Until then: use the bufferprocs :-) Greg, On the topic of the buffer interface: Have you written documentation for this that I can include in the API reference? Bugging you about this is on my to-do list. ;-) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mal at lemburg.com Tue Aug 3 13:29:43 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 03 Aug 1999 13:29:43 +0200 Subject: [Python-Dev] Buffer interface in abstract.c? References: <001001bedd48$ea796280$1101a8c0@bobcat> <37A6CB6E.C990F561@digicool.com> Message-ID: <37A6D2A7.27F27554@lemburg.com> Jim Fulton wrote: > > Mark Hammond wrote: > > > > Hi all, > > Im trying to slowly wean myself over to the buffer interfaces. > > OK, I'll bite. Where is the buffer interface documented? I found references > to it in various places (e.g. built-in buffer()) but didn't find the interface > itself. I guess it's a read-the-source feature :-) Objects/bufferobject.c and Include/object.h provide a start. Objects/stringobject.c has a "sample" implementation. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 150 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jack at oratrix.nl Tue Aug 3 16:45:25 1999 From: jack at oratrix.nl (Jack Jansen) Date: Tue, 03 Aug 1999 16:45:25 +0200 Subject: [Python-Dev] Buffer interface in abstract.c? In-Reply-To: Message by Greg Stein , Tue, 03 Aug 1999 03:25:11 -0700 , <37A6C387.7360D792@lyra.org> Message-ID: <19990803144526.6B796303120@snelboot.oratrix.nl> > > Why not pass the index to the As*Buffer routines as well and make getsegcount > > available too? > > Simply because multiple segments hasn't been seen. All objects > supporting the buffer interface have a single segment. Hmm. And I went out of my way to include this stupid multi-buffer stuff because the NumPy folks said they couldn't live without it (and one of the reasons for the buffer stuff was to allow NumPy arrays, which may be discontiguous, to be written efficiently). Can someone confirm that the Numeric stuff indeed doesn't use this? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From da at ski.org Tue Aug 3 18:19:19 1999 From: da at ski.org (David Ascher) Date: Tue, 3 Aug 1999 09:19:19 -0700 (Pacific Daylight Time) Subject: [Python-Dev] Pickling w/ low overhead In-Reply-To: <37A6B22F.7A14BA2C@lemburg.com> Message-ID: On Tue, 3 Aug 1999, M.-A. Lemburg wrote: > Hmm, types can register their own pickling/unpickling functions > via copy_reg, so they can access the self.write method in pickle.py > to implement the write to file interface. Are you sure? My understanding of copy_reg is, as stated in the doc: pickle (type, function[, constructor]) Declares that function should be used as a ``reduction'' function for objects of type or class type. function should return either a string or a tuple. The optional constructor parameter, if provided, is a callable object which can be used to reconstruct the object when called with the tuple of arguments returned by function at pickling time. How does one access the 'self.write method in pickle.py'? > Perhaps some lazy pickling wrapper would help fix this in general: > an object which calls back into the to-be-pickled object to > access the data rather than store the data in a huge string. Right. That's an idea. > Yet another idea would be using memory mapped files instead > of strings as temporary storage (but this is probably hard to implement > right and not as portable). That's a very interesting idea! I'll try that -- it might just be the easiest way to do this. I think that portability isn't a huge concern -- the folks who are coming up with the speed issue are on platforms which have mmap support. Thanks for the suggestions. --david From da at ski.org Tue Aug 3 18:20:37 1999 From: da at ski.org (David Ascher) Date: Tue, 3 Aug 1999 09:20:37 -0700 (Pacific Daylight Time) Subject: [Python-Dev] Buffer interface in abstract.c? In-Reply-To: <37A6C387.7360D792@lyra.org> Message-ID: On Tue, 3 Aug 1999, Greg Stein wrote: > Simply because multiple segments hasn't been seen. All objects > supporting the buffer interface have a single segment. IMO, it is best FYI, if/when NumPy objects support the buffer API, they will require multiple-segments. From da at ski.org Tue Aug 3 18:23:31 1999 From: da at ski.org (David Ascher) Date: Tue, 3 Aug 1999 09:23:31 -0700 (Pacific Daylight Time) Subject: [Python-Dev] Buffer interface in abstract.c? In-Reply-To: <19990803144526.6B796303120@snelboot.oratrix.nl> Message-ID: On Tue, 3 Aug 1999, Jack Jansen wrote: > > > Why not pass the index to the As*Buffer routines as well and make getsegcount > > > available too? > > > > Simply because multiple segments hasn't been seen. All objects > > supporting the buffer interface have a single segment. > > Hmm. And I went out of my way to include this stupid multi-buffer stuff > because the NumPy folks said they couldn't live without it (and one of the > reasons for the buffer stuff was to allow NumPy arrays, which may be > discontiguous, to be written efficiently). > > Can someone confirm that the Numeric stuff indeed doesn't use this? /usr/LLNLDistribution/Numerical/Include$ grep buffer *.h /usr/LLNLDistribution/Numerical/Include$ Yes. =) See the other thread on low-overhead pickling. But again, *if* multiarrays supported the buffer interface, they'd have to use the multi-segment feature (repeating myself). --david From mal at lemburg.com Tue Aug 3 21:17:16 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 03 Aug 1999 21:17:16 +0200 Subject: [Python-Dev] Pickling w/ low overhead References: Message-ID: <37A7403C.3BC05D02@lemburg.com> David Ascher wrote: > > On Tue, 3 Aug 1999, M.-A. Lemburg wrote: > > > Hmm, types can register their own pickling/unpickling functions > > via copy_reg, so they can access the self.write method in pickle.py > > to implement the write to file interface. > > Are you sure? My understanding of copy_reg is, as stated in the doc: > > pickle (type, function[, constructor]) > Declares that function should be used as a ``reduction'' function for > objects of type or class type. function should return either a string > or a tuple. The optional constructor parameter, if provided, is a > callable object which can be used to reconstruct the object when > called with the tuple of arguments returned by function at pickling > time. > > How does one access the 'self.write method in pickle.py'? Ooops. Sorry, that doesn't work... well at least not using "normal" Python ;-) You could of course simply go up one stack frame and then grab the self object and then... well, you know... -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 150 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From skip at mojam.com Tue Aug 3 22:47:04 1999 From: skip at mojam.com (Skip Montanaro) Date: Tue, 3 Aug 1999 15:47:04 -0500 (CDT) Subject: [Python-Dev] Pickling w/ low overhead In-Reply-To: References: Message-ID: <14247.21628.225029.392711@dolphin.mojam.com> David> An issue which has dogged the NumPy project is that there is (to David> my knowledge) no way to pickle very large arrays without creating David> strings which contain all of the data. This can be a problem David> given that NumPy arrays tend to be very large -- often several David> megabytes, sometimes much bigger. This slows things down, David> sometimes a lot, depending on the platform. It seems that it David> should be possible to do something more efficient. David, Using __getstate__/__setstate__, could you create a compressed representation using zlib or some other scheme? I don't know how well numeric data compresses in general, but that might help. Also, I trust you use cPickle when it's available, yes? Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/~skip/ 847-475-3758 From da at ski.org Tue Aug 3 22:58:23 1999 From: da at ski.org (David Ascher) Date: Tue, 3 Aug 1999 13:58:23 -0700 (Pacific Daylight Time) Subject: [Python-Dev] Pickling w/ low overhead In-Reply-To: <14247.21628.225029.392711@dolphin.mojam.com> Message-ID: On Tue, 3 Aug 1999, Skip Montanaro wrote: > Using __getstate__/__setstate__, could you create a compressed > representation using zlib or some other scheme? I don't know how well > numeric data compresses in general, but that might help. Also, I trust you > use cPickle when it's available, yes? I *really* hate to admit it, but I've found the source of the most massive problem in the pickling process that I was using. I didn't use binary mode, which meant that the huge strings were written & read one-character-at-a-time. I think I'll put a big fat note in the NumPy doc to that effect. (note that luckily this just affected my usage, not all NumPy users). --da From gstein at lyra.org Wed Aug 4 21:15:27 1999 From: gstein at lyra.org (Greg Stein) Date: Wed, 04 Aug 1999 12:15:27 -0700 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/api api.tex References: <199908041313.JAA26344@weyr.cnri.reston.va.us> Message-ID: <37A8914F.6F5B9971@lyra.org> Fred L. Drake wrote: > > Update of /projects/cvsroot/python/dist/src/Doc/api > In directory weyr:/home/fdrake/projects/python/Doc/api > > Modified Files: > api.tex > Log Message: > > Started documentation on buffer objects & types. Very preliminary. > > Greg Stein: Please help with this; it's your baby! > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://www.python.org/mailman/listinfo/python-checkins All righty. I'll send some doc on this stuff. Somebody else did the initial buffer interface, but it seems that it has fallen to me now :-) Please give me a little while to get to this, though. I'm in and out of town for the next four weeks. I'm in the process of moving into a new house in Palo Alto, CA, and I'm travelling back and forth until Anni and I move for real in September. I should be able to get to this by the weekend, or possibly in a couple weeks. Cheers, -g -- Greg Stein, http://www.lyra.org/ From fdrake at acm.org Wed Aug 4 23:00:26 1999 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 4 Aug 1999 17:00:26 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/api api.tex In-Reply-To: <37A8914F.6F5B9971@lyra.org> References: <199908041313.JAA26344@weyr.cnri.reston.va.us> <37A8914F.6F5B9971@lyra.org> Message-ID: <14248.43498.664539.597656@weyr.cnri.reston.va.us> Greg Stein writes: > All righty. I'll send some doc on this stuff. Somebody else did the > initial buffer interface, but it seems that it has fallen to me now :-) I was not aware that you were not the origin of this work; feel free to pass it to the right person. > Please give me a little while to get to this, though. I'm in and out of > town for the next four weeks. I'm in the process of > moving into a new house in Palo Alto, CA, and I'm travelling back and > forth until Anni and I move for real in September. Cool! > I should be able to get to this by the weekend, or possibly in a couple > weeks. That's good enough for me. I expect it may be a couple of months or more before I try and get another release out with various fixes and additions. There's not a huge need to update the released doc set, other than a few embarassing editorial...er, "oversights" (!). -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From jack at oratrix.nl Thu Aug 5 11:57:33 1999 From: jack at oratrix.nl (Jack Jansen) Date: Thu, 05 Aug 1999 11:57:33 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/api api.tex In-Reply-To: Message by Greg Stein , Wed, 04 Aug 1999 12:15:27 -0700 , <37A8914F.6F5B9971@lyra.org> Message-ID: <19990805095733.69D90303120@snelboot.oratrix.nl> > All righty. I'll send some doc on this stuff. Somebody else did the > initial buffer interface, but it seems that it has fallen to me now :-) I think I did, but I gladly bequeath it to you. (Hmm, that's the first time I typed "bequeath", I think). -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From fredrik at pythonware.com Thu Aug 5 17:46:43 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 5 Aug 1999 17:46:43 +0200 Subject: [Python-Dev] Buffer interface in abstract.c? References: Message-ID: <009801bedf59$b8150020$f29b12c2@secret.pythonware.com> > > Simply because multiple segments hasn't been seen. All objects > > supporting the buffer interface have a single segment. IMO, it is best > > FYI, if/when NumPy objects support the buffer API, they will require > multiple-segments. same goes for PIL. in the worst case, there's one segment per line. ... on the other hand, I think something is missing from the buffer design; I definitely don't like that people can write and marshal objects that happen to implement the buffer interface, only to find that Python didn't do what they expected... >>> import unicode >>> import marshal >>> u = unicode.unicode >>> s = u("foo") >>> data = marshal.dumps(s) >>> marshal.loads(data) 'f\000o\000o\000' >>> type(marshal.loads(data)) as for PIL, I would also prefer if the exported buffer corresponded to what you get from im.tostring(). iirc, that cannot be done -- I cannot export via a temporary memory buffer, since there's no way to know when to get rid of it... From jack at oratrix.nl Thu Aug 5 22:59:46 1999 From: jack at oratrix.nl (Jack Jansen) Date: Thu, 05 Aug 1999 22:59:46 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: Message by "Fredrik Lundh" , Thu, 5 Aug 1999 17:46:43 +0200 , <009801bedf59$b8150020$f29b12c2@secret.pythonware.com> Message-ID: <19990805205952.531B9E267A@oratrix.oratrix.nl> Recently, "Fredrik Lundh" said: > on the other hand, I think something is missing from > the buffer design; I definitely don't like that people > can write and marshal objects that happen to > implement the buffer interface, only to find that > Python didn't do what they expected... > > >>> import unicode > >>> import marshal > >>> u = unicode.unicode > >>> s = u("foo") > >>> data = marshal.dumps(s) > >>> marshal.loads(data) > 'f\000o\000o\000' > >>> type(marshal.loads(data)) > Hmm. Looking at the code there is a catchall at the end, with a comment explicitly saying "Write unknown buffer-style objects as a string". IMHO this is an incorrect design, but that's a bit philosophical (so I'll gladly defer to Our Great Philosopher if he has anything to say on the matter:-). Unless, of course, there are buffer-style non-string objects around that are better read back as strings than not read back at all. Hmm again, I think I'd like it better if marshal.dumps() would barf on attempts to write unrepresentable data. Currently unrepresentable objects are written as TYPE_UNKNOWN (unless they have bufferness (or should I call that "a buffer-aspect"? :-)), which means you think you are writing correctly marshalled data but you'll be in for an exception when you try to read it back... -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From akuchlin at mems-exchange.org Fri Aug 6 00:24:03 1999 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Thu, 5 Aug 1999 18:24:03 -0400 (EDT) Subject: [Python-Dev] mmapfile module Message-ID: <199908052224.SAA24159@amarok.cnri.reston.va.us> A while back the suggestion was made that the mmapfile module be added to the core distribution, and there was a guardedly positive reaction. Should I go ahead and do that? No one reported any problems when I asked for bug reports, but that was probably because no one tried it; putting it in the core would cause more people to try it. I suppose this leads to a more important question: at what point should we start checking 1.6-only things into the CVS tree? For example, once the current alphas of the re module are up to it (they're not yet), when should they be checked in? -- A.M. Kuchling http://starship.python.net/crew/amk/ Kids! Bringing about Armageddon can be dangerous. Do not attempt it in your home. -- Terry Pratchett & Neil Gaiman, _Good Omens_ From bwarsaw at cnri.reston.va.us Fri Aug 6 04:10:18 1999 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 5 Aug 1999 22:10:18 -0400 (EDT) Subject: [Python-Dev] mmapfile module References: <199908052224.SAA24159@amarok.cnri.reston.va.us> Message-ID: <14250.17418.781127.684009@anthem.cnri.reston.va.us> >>>>> "AMK" == Andrew M Kuchling writes: AMK> I suppose this leads to a more important question: at what AMK> point should we start checking 1.6-only things into the CVS AMK> tree? For example, once the current alphas of the re module AMK> are up to it (they're not yet), when should they be checked AMK> in? Good question. I've had a bunch of people ask about the string methods branch, which I'm assuming will be a 1.6 feature, and I'd like to get that checked in at some point too. I think what's holding this up is that Guido hasn't decided whether there will be a patch release to 1.5.2 or not. -Barry From tim_one at email.msn.com Fri Aug 6 04:26:06 1999 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 5 Aug 1999 22:26:06 -0400 Subject: [Python-Dev] mmapfile module In-Reply-To: <199908052224.SAA24159@amarok.cnri.reston.va.us> Message-ID: <000201bedfb3$09a99000$98a22299@tim> [Andrew M. Kuchling] > ... > I suppose this leads to a more important question: at what point > should we start checking 1.6-only things into the CVS tree? For > example, once the current alphas of the re module are up to it > (they're not yet), when should they be checked in? I'd like to see a bugfix release of 1.5.2 put out first, then have at it. There are several bugfixes that ought to go out ASAP. Thread tstate races, the cpickle/cookie.py snafu, and playing nice with current Tcl/Tk pop to mind immediately. I'm skeptical that anyone other than Guido could decide what *needs* to go out, so it's a good thing he's got nothing to do . one-boy's-opinion-ly y'rs - tim From mhammond at skippinet.com.au Fri Aug 6 05:30:55 1999 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri, 6 Aug 1999 13:30:55 +1000 Subject: [Python-Dev] mmapfile module In-Reply-To: <000201bedfb3$09a99000$98a22299@tim> Message-ID: <00a801bedfbc$1871a7e0$1101a8c0@bobcat> [Tim laments] > mind immediately. I'm skeptical that anyone other than Guido > could decide > what *needs* to go out, so it's a good thing he's got nothing > to do . He has been very quiet recently - where are you hiding Guido. > one-boy's-opinion-ly y'rs - tim Here is another. Lets take a different tack - what has been checked in since 1.5.2 that should _not_ go out - ie, is too controversial? If nothing else, makes a good starting point, and may help Guido out: Below summary of the CVS diff I just did, and categorized by my opinion. It turns out that most of the changes would appear candidates. While not actually "bug-fixes", many have better documentation, removal of unused imports etc, so would definately not hurt to get out. Looks like some build issues have been fixed too. Apart from possibly Tim's recent "UnboundLocalError" (which is the only serious behaviour change) I can't see anything that should obviously be ommitted. Hopefully this is of interest... [Disclaimer - lots of files here - it is quite possible I missed something...] Mark. UNCONTROVERSIAL: ---------------- RCS file: /projects/cvsroot/python/dist/src/README,v RCS file: /projects/cvsroot/python/dist/src/Lib/cgi.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/ftplib.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/poplib.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/re.py,v RCS file: /projects/cvsroot/python/dist/src/Tools/audiopy/README,v Doc changes. RCS file: /projects/cvsroot/python/dist/src/Lib/SimpleHTTPServer.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/cmd.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/htmllib.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/netrc.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/pipes.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/pty.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/shlex.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/urlparse.py,v Remove unused imports RCS file: /projects/cvsroot/python/dist/src/Lib/pdb.py,v Remove unused globals RCS file: /projects/cvsroot/python/dist/src/Lib/popen2.py,v Change to cleanup RCS file: /projects/cvsroot/python/dist/src/Lib/profile.py,v Remove unused imports and changes to comments. RCS file: /projects/cvsroot/python/dist/src/Lib/pyclbr.py,v Better doc, and support for module level functions. RCS file: /projects/cvsroot/python/dist/src/Lib/repr.py,v self.maxlist changed to self.maxdict RCS file: /projects/cvsroot/python/dist/src/Lib/rfc822.py,v Doc changes, and better date handling. RCS file: /projects/cvsroot/python/dist/src/configure,v RCS file: /projects/cvsroot/python/dist/src/configure.in,v Looks like FreeBSD build flag changes. RCS file: /projects/cvsroot/python/dist/src/Demo/classes/bitvec.py,v RCS file: /projects/cvsroot/python/dist/src/Python/pythonrun.c,v Whitespace fixes. RCS file: /projects/cvsroot/python/dist/src/Demo/scripts/makedir.py,v Check we have passed a non empty string RCS file: /projects/cvsroot/python/dist/src/Include/patchlevel.h,v 1.5.2+ RCS file: /projects/cvsroot/python/dist/src/Lib/BaseHTTPServer.py,v Remove import rfc822 and more robust errors. RCS file: /projects/cvsroot/python/dist/src/Lib/CGIHTTPServer.py,v Support for HTTP_COOKIE RCS file: /projects/cvsroot/python/dist/src/Lib/fpformat.py,v NotANumber supports class exceptions. RCS file: /projects/cvsroot/python/dist/src/Lib/macpath.py,v Use constants from stat module RCS file: /projects/cvsroot/python/dist/src/Lib/macurl2path.py,v Minor changes to path parsing RCS file: /projects/cvsroot/python/dist/src/Lib/mimetypes.py,v Recognise '.js': 'application/x-javascript', RCS file: /projects/cvsroot/python/dist/src/Lib/sunau.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/wave.py,v Support for binary files. RCS file: /projects/cvsroot/python/dist/src/Lib/whichdb.py,v Reads file header to check for bsddb format. RCS file: /projects/cvsroot/python/dist/src/Lib/xmllib.py,v XML may be at the start of the string, instead of the whole string. RCS file: /projects/cvsroot/python/dist/src/Lib/lib-tk/tkSimpleDialog.py,v Destroy method added. RCS file: /projects/cvsroot/python/dist/src/Modules/cPickle.c,v As in the log :-) RCS file: /projects/cvsroot/python/dist/src/Modules/cStringIO.c,v No longer a Py_FatalError on module init failure. RCS file: /projects/cvsroot/python/dist/src/Modules/fpectlmodule.c,v Support for OSF in #ifdefs RCS file: /projects/cvsroot/python/dist/src/Modules/makesetup,v # to handle backslashes for sh's that don't automatically # continue a read when the last char is a backslash RCS file: /projects/cvsroot/python/dist/src/Modules/posixmodule.c,v Better error handling RCS file: /projects/cvsroot/python/dist/src/Modules/timemodule.c,v #ifdef changes for __GNU_LIBRARY__/_GLIBC_ RCS file: /projects/cvsroot/python/dist/src/Python/errors.c,v Better error messages on Win32 RCS file: /projects/cvsroot/python/dist/src/Python/getversion.c,v Bigger buffer and strings. RCS file: /projects/cvsroot/python/dist/src/Python/pystate.c,v Threading bug RCS file: /projects/cvsroot/python/dist/src/Objects/floatobject.c,v Tim Peters writes:1. Fixes float divmod etc. RCS file: /projects/cvsroot/python/dist/src/Objects/listobject.c,v Doc changes, and When deallocating a list, DECREF the items from the end back to the start. RCS file: /projects/cvsroot/python/dist/src/Objects/stringobject.c,v Bug for to do with width of a formatspecifier RCS file: /projects/cvsroot/python/dist/src/Objects/tupleobject.c,v Appropriate overflow checks so that things like sys.maxint*(1,) can'tdump core. RCS file: /projects/cvsroot/python/dist/src/Lib/tempfile.py,v don't cache attributes of type int RCS file: /projects/cvsroot/python/dist/src/Lib/urllib.py,v Number of revisions. RCS file: /projects/cvsroot/python/dist/src/Lib/aifc.py,v Chunk moved to new module. RCS file: /projects/cvsroot/python/dist/src/Lib/audiodev.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/dbhash.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/dis.py,v Changes in comments. RCS file: /projects/cvsroot/python/dist/src/Lib/cmpcache.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/cmp.py,v New "shallow" arg. RCS file: /projects/cvsroot/python/dist/src/Lib/dumbdbm.py,v Coerce f.tell() to int. RCS file: /projects/cvsroot/python/dist/src/Modules/main.c,v Fix to tracebacks off by a line with -x RCS file: /projects/cvsroot/python/dist/src/Lib/lib-tk/Tkinter.py,v Number of changes you can review! OTHERS: -------- RCS file: /projects/cvsroot/python/dist/src/Lib/asynchat.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/asyncore.py,v Latest versions from Sam??? RCS file: /projects/cvsroot/python/dist/src/Lib/smtplib.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/sched.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/ConfigParser.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/SocketServer.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/calendar.py,v Sorry - out of time to detail RCS file: /projects/cvsroot/python/dist/src/Python/bltinmodule.c,v Unbound local, docstring, and better support for ExtensionClasses. Freeze: Few changes IDLE: Lotsa changes :-) Number of .h files have #ifdef changes for CE I wont detail (but would be great to get a few of these in - and I have more :-) Tools directory: Number of changes - outa time to detail From mal at lemburg.com Fri Aug 6 10:54:20 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 06 Aug 1999 10:54:20 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> Message-ID: <37AAA2BC.466750B5@lemburg.com> Jack Jansen wrote: > > Recently, "Fredrik Lundh" said: > > on the other hand, I think something is missing from > > the buffer design; I definitely don't like that people > > can write and marshal objects that happen to > > implement the buffer interface, only to find that > > Python didn't do what they expected... > > > > >>> import unicode > > >>> import marshal > > >>> u = unicode.unicode > > >>> s = u("foo") > > >>> data = marshal.dumps(s) > > >>> marshal.loads(data) > > 'f\000o\000o\000' > > >>> type(marshal.loads(data)) > > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought that unicode objects use a two-byte character representation. Note that implementing the char buffer interface will also give you strange results with other code that uses PyArg_ParseTuple(...,"s#",...), e.g. you could search through Unicode strings as if they were normal 1-byte/char strings (and most certainly not find what you're looking for, I guess). > Hmm again, I think I'd like it better if marshal.dumps() would barf on > attempts to write unrepresentable data. Currently unrepresentable > objects are written as TYPE_UNKNOWN (unless they have bufferness (or > should I call that "a buffer-aspect"? :-)), which means you think you > are writing correctly marshalled data but you'll be in for an > exception when you try to read it back... I'd prefer an exception on write too. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 147 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake at acm.org Fri Aug 6 16:44:35 1999 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 6 Aug 1999 10:44:35 -0400 (EDT) Subject: [Python-Dev] mmapfile module In-Reply-To: <00a801bedfbc$1871a7e0$1101a8c0@bobcat> References: <000201bedfb3$09a99000$98a22299@tim> <00a801bedfbc$1871a7e0$1101a8c0@bobcat> Message-ID: <14250.62675.807129.878242@weyr.cnri.reston.va.us> Mark Hammond writes: > Apart from possibly Tim's recent "UnboundLocalError" (which is the only > serious behaviour change) I can't see anything that should obviously be Since UnboundLocalError is a subclass of NameError (what you got before) normally, and they are the same string when -X is used, this only represents a new name in the __builtin__ module for legacy code. This should not be a problem; the only real difference is that, using class exceptions for built-in exceptions, you get more useful information in your tracebacks. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fredrik at pythonware.com Sat Aug 7 12:51:56 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat, 7 Aug 1999 12:51:56 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> Message-ID: <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> > > > >>> import unicode > > > >>> import marshal > > > >>> u = unicode.unicode > > > >>> s = u("foo") > > > >>> data = marshal.dumps(s) > > > >>> marshal.loads(data) > > > 'f\000o\000o\000' > > > >>> type(marshal.loads(data)) > > > > > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought > that unicode objects use a two-byte character representation. >>> import array >>> import marshal >>> a = array.array >>> s = a("f", [1, 2, 3]) >>> data = marshal.dumps(s) >>> marshal.loads(data) '\000\000\200?\000\000\000@\000\000@@' looks like the various implementors haven't really understood the intentions of whoever designed the buffer interface... From mal at lemburg.com Sat Aug 7 18:14:56 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 07 Aug 1999 18:14:56 +0200 Subject: [Python-Dev] Some more constants for the socket module Message-ID: <37AC5B80.56F740DD@lemburg.com> Following the recent discussion on c.l.p about socket options, I found that the socket module does not define all constants defined in the (Linux) socket header file. Below is a patch that adds a few more (note that the SOL_* constants should be used for the setsockopt() level, not the IPPROTO_* constants). --- socketmodule.c~ Sat Aug 7 17:56:05 1999 +++ socketmodule.c Sat Aug 7 18:10:07 1999 @@ -2005,14 +2005,48 @@ initsocket() PySocketSock_Type.tp_doc = sockettype_doc; Py_INCREF(&PySocketSock_Type); if (PyDict_SetItemString(d, "SocketType", (PyObject *)&PySocketSock_Type) != 0) return; + + /* Address families (we only support AF_INET and AF_UNIX) */ +#ifdef AF_UNSPEC + insint(moddict, "AF_UNSPEC", AF_UNSPEC); +#endif insint(d, "AF_INET", AF_INET); #ifdef AF_UNIX insint(d, "AF_UNIX", AF_UNIX); #endif /* AF_UNIX */ +#ifdef AF_AX25 + insint(moddict, "AF_AX25", AF_AX25); /* Amateur Radio AX.25 */ +#endif +#ifdef AF_IPX + insint(moddict, "AF_IPX", AF_IPX); /* Novell IPX */ +#endif +#ifdef AF_APPLETALK + insint(moddict, "AF_APPLETALK", AF_APPLETALK); /* Appletalk DDP */ +#endif +#ifdef AF_NETROM + insint(moddict, "AF_NETROM", AF_NETROM); /* Amateur radio NetROM */ +#endif +#ifdef AF_BRIDGE + insint(moddict, "AF_BRIDGE", AF_BRIDGE); /* Multiprotocol bridge */ +#endif +#ifdef AF_AAL5 + insint(moddict, "AF_AAL5", AF_AAL5); /* Reserved for Werner's ATM */ +#endif +#ifdef AF_X25 + insint(moddict, "AF_X25", AF_X25); /* Reserved for X.25 project */ +#endif +#ifdef AF_INET6 + insint(moddict, "AF_INET6", AF_INET6); /* IP version 6 */ +#endif +#ifdef AF_ROSE + insint(moddict, "AF_ROSE", AF_ROSE); /* Amateur Radio X.25 PLP */ +#endif + + /* Socket types */ insint(d, "SOCK_STREAM", SOCK_STREAM); insint(d, "SOCK_DGRAM", SOCK_DGRAM); #ifndef __BEOS__ /* We have incomplete socket support. */ insint(d, "SOCK_RAW", SOCK_RAW); @@ -2048,11 +2082,10 @@ initsocket() insint(d, "SO_OOBINLINE", SO_OOBINLINE); #endif #ifdef SO_REUSEPORT insint(d, "SO_REUSEPORT", SO_REUSEPORT); #endif - #ifdef SO_SNDBUF insint(d, "SO_SNDBUF", SO_SNDBUF); #endif #ifdef SO_RCVBUF insint(d, "SO_RCVBUF", SO_RCVBUF); @@ -2111,14 +2144,43 @@ initsocket() #ifdef MSG_ETAG insint(d, "MSG_ETAG", MSG_ETAG); #endif /* Protocol level and numbers, usable for [gs]etsockopt */ -/* Sigh -- some systems (e.g. Linux) use enums for these. */ #ifdef SOL_SOCKET insint(d, "SOL_SOCKET", SOL_SOCKET); #endif +#ifdef SOL_IP + insint(moddict, "SOL_IP", SOL_IP); +#else + insint(moddict, "SOL_IP", 0); +#endif +#ifdef SOL_IPX + insint(moddict, "SOL_IPX", SOL_IPX); +#endif +#ifdef SOL_AX25 + insint(moddict, "SOL_AX25", SOL_AX25); +#endif +#ifdef SOL_ATALK + insint(moddict, "SOL_ATALK", SOL_ATALK); +#endif +#ifdef SOL_NETROM + insint(moddict, "SOL_NETROM", SOL_NETROM); +#endif +#ifdef SOL_ROSE + insint(moddict, "SOL_ROSE", SOL_ROSE); +#endif +#ifdef SOL_TCP + insint(moddict, "SOL_TCP", SOL_TCP); +#else + insint(moddict, "SOL_TCP", 6); +#endif +#ifdef SOL_UDP + insint(moddict, "SOL_UDP", SOL_UDP); +#else + insint(moddict, "SOL_UDP", 17); +#endif #ifdef IPPROTO_IP insint(d, "IPPROTO_IP", IPPROTO_IP); #else insint(d, "IPPROTO_IP", 0); #endif @@ -2266,10 +2328,32 @@ initsocket() #ifdef IP_ADD_MEMBERSHIP insint(d, "IP_ADD_MEMBERSHIP", IP_ADD_MEMBERSHIP); #endif #ifdef IP_DROP_MEMBERSHIP insint(d, "IP_DROP_MEMBERSHIP", IP_DROP_MEMBERSHIP); +#endif +#ifdef IP_DEFAULT_MULTICAST_TTL + insint(moddict, "IP_DEFAULT_MULTICAST_TTL", IP_DEFAULT_MULTICAST_TTL); +#endif +#ifdef IP_DEFAULT_MULTICAST_LOOP + insint(moddict, "IP_DEFAULT_MULTICAST_LOOP", IP_DEFAULT_MULTICAST_LOOP); +#endif +#ifdef IP_MAX_MEMBERSHIPS + insint(moddict, "IP_MAX_MEMBERSHIPS", IP_MAX_MEMBERSHIPS); +#endif + + /* TCP options */ +#ifdef TCP_NODELAY + insint(moddict, "TCP_NODELAY", TCP_NODELAY); +#endif +#ifdef TCP_MAXSEG + insint(moddict, "TCP_MAXSEG", TCP_MAXSEG); +#endif + + /* IPX options */ +#ifdef IPX_TYPE + insint(moddict, "IPX_TYPE", IPX_TYPE); #endif /* Initialize gethostbyname lock */ #ifdef USE_GETHOSTBYNAME_LOCK gethostbyname_lock = PyThread_allocate_lock(); -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 146 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein at lyra.org Sat Aug 7 22:15:08 1999 From: gstein at lyra.org (Greg Stein) Date: Sat, 07 Aug 1999 13:15:08 -0700 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> Message-ID: <37AC93CC.53982F3F@lyra.org> Fredrik Lundh wrote: > > > > > >>> import unicode > > > > >>> import marshal > > > > >>> u = unicode.unicode > > > > >>> s = u("foo") > > > > >>> data = marshal.dumps(s) > > > > >>> marshal.loads(data) > > > > 'f\000o\000o\000' > > > > >>> type(marshal.loads(data)) > > > > This was a "nicety" that was put during a round of patches that I submitted to Guido. We both had questions about it but figured that it couldn't hurt since it at least let some things be marshalled out that couldn't be marshalled before. I would suggest backing out the marshalling of buffer-interface objects and adding a mechanism for arbitrary type objects to marshal themselves. Without the second part, arrays and Unicode objects aren't marshallable at all (seems bad). > > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought > > that unicode objects use a two-byte character representation. Unicode objects should *not* implement the getcharbuffer slot. Only read, write, and segcount. > >>> import array > >>> import marshal > >>> a = array.array > >>> s = a("f", [1, 2, 3]) > >>> data = marshal.dumps(s) > >>> marshal.loads(data) > '\000\000\200?\000\000\000@\000\000@@' > > looks like the various implementors haven't > really understood the intentions of whoever > designed the buffer interface... Arrays can/should support both the getreadbuffer and getcharbuffer interface. The former: definitely. The latter: only if the contents are byte-sized. The loading back as a string is a different matter, as pointed out above. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jack at oratrix.nl Sun Aug 8 22:20:52 1999 From: jack at oratrix.nl (Jack Jansen) Date: Sun, 08 Aug 1999 22:20:52 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: Message by Greg Stein , Sat, 07 Aug 1999 13:15:08 -0700 , <37AC93CC.53982F3F@lyra.org> Message-ID: <19990808202057.DB803E267A@oratrix.oratrix.nl> Recently, Greg Stein said: > I would suggest backing out the marshalling of buffer-interface objects > and adding a mechanism for arbitrary type objects to marshal themselves. > Without the second part, arrays and Unicode objects aren't marshallable > at all (seems bad). This sounds like the right approach. It would require 2 slots in the tp_ structure and a little extra glue for the typecodes (currently marshall knows all the 1-letter typecodes for all objecttypes it can handle, but types marshalling their own objects would require a centralized registry of object types. For the time being it would probably suffice to have the mapping of type<->letter be hardcoded in marshal.h, but eventually you probably want a more extensible scheme, where Joe R. Extension-Writer could add a marshaller to his objects and know it won't collide with someone else's. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mal at lemburg.com Mon Aug 9 10:56:30 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 09 Aug 1999 10:56:30 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990808202057.DB803E267A@oratrix.oratrix.nl> Message-ID: <37AE97BE.2CADF48E@lemburg.com> Jack Jansen wrote: > > Recently, Greg Stein said: > > I would suggest backing out the marshalling of buffer-interface objects > > and adding a mechanism for arbitrary type objects to marshal themselves. > > Without the second part, arrays and Unicode objects aren't marshallable > > at all (seems bad). > > This sounds like the right approach. It would require 2 slots in the > tp_ structure and a little extra glue for the typecodes (currently > marshall knows all the 1-letter typecodes for all objecttypes it can > handle, but types marshalling their own objects would require a > centralized registry of object types. For the time being it would > probably suffice to have the mapping of type<->letter be hardcoded in > marshal.h, but eventually you probably want a more extensible scheme, > where Joe R. Extension-Writer could add a marshaller to his objects > and know it won't collide with someone else's. This registry should ideally be reachable via C APIs. Then a module writer could call these APIs in the init function of his module and he'd be set. Since marshal won't be able to handle imports on the fly (like pickle et al.), these modules will have to be imported before unmarshalling. Aside: wouldn't it make sense to move from marshal to pickle and depreciate marshal altogether ? cPickle is quite fast and much more flexible than marshal, plus it already provides mechanisms for registering new types. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 144 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jack at oratrix.nl Mon Aug 9 15:49:44 1999 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 09 Aug 1999 15:49:44 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: Message by "M.-A. Lemburg" , Mon, 09 Aug 1999 10:56:30 +0200 , <37AE97BE.2CADF48E@lemburg.com> Message-ID: <19990809134944.BB2FC303120@snelboot.oratrix.nl> > Aside: wouldn't it make sense to move from marshal to pickle and > depreciate marshal altogether ? cPickle is quite fast and much more > flexible than marshal, plus it already provides mechanisms for > registering new types. This is probably the best idea so far. Just remove the buffer-workaround in marshall, keep it functioning for the things it is used for now (like pyc files) and refer people to (c)Pickle for new development. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From guido at CNRI.Reston.VA.US Mon Aug 9 16:50:46 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Mon, 09 Aug 1999 10:50:46 -0400 Subject: [Python-Dev] Some more constants for the socket module In-Reply-To: Your message of "Sat, 07 Aug 1999 18:14:56 +0200." <37AC5B80.56F740DD@lemburg.com> References: <37AC5B80.56F740DD@lemburg.com> Message-ID: <199908091450.KAA29179@eric.cnri.reston.va.us> Thanks for the socketmodule patch, Marc. This was on my mental TO-DO list for a long time! I've checked it in. (One note: I had a bit of trouble applying the patch; apparently your mailer expanded all tabs to spaces. Perhaps you could use attachments to mail diffs? Also, you seem to have renamed 'd' to 'moddict' but you didn't send the patch for that...) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at CNRI.Reston.VA.US Mon Aug 9 18:26:28 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Mon, 09 Aug 1999 12:26:28 -0400 Subject: [Python-Dev] preferred conference date? Message-ID: <199908091626.MAA29411@eric.cnri.reston.va.us> I need your input about the date of the next Python conference. Foretec is close to a deal for a Python conference in January 2000 at the Alexandria Old Town Hilton hotel. Given our requirement of a good location in the DC area, this is a very good deal (it's a brand new hotel). The prices are high (they tell me that the whole conference will cost $900, with a room rate of $129) but it's a class A location (metro, tons of restaurants, close to National Airport, etc.) and we have found no cheaper DC hotel suitable for our purposes (even in drab suburban locations). I'm worried that I'll be flamed to hell for this by the PSA members, but I don't think we can get the price any lower without starting all over in a different location, probably causing several months of delay. If people won't come, Foretec (and I) will have learned a valuable lesson and we'll rethink the issue for the 2001 conference. Anyway, given that Foretec is likely to go with this hotel, we have a choice of two dates: January 16-19, or 23-26 (both starting on a Sunday with the tutorials). This is where I need your help: which date would you prefer? Please mail me personally. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Mon Aug 9 18:31:43 1999 From: skip at mojam.com (Skip Montanaro) Date: Mon, 9 Aug 1999 11:31:43 -0500 (CDT) Subject: [Python-Dev] preferred conference date? In-Reply-To: <199908091626.MAA29411@eric.cnri.reston.va.us> References: <199908091626.MAA29411@eric.cnri.reston.va.us> Message-ID: <14255.557.474160.824877@dolphin.mojam.com> Guido> The prices are high (they tell me that the whole conference will Guido> cost $900, with a room rate of $129) but it's a class A location No way I (or my company) can afford to plunk down $900 for me to attend... Skip From mal at lemburg.com Mon Aug 9 18:40:45 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 09 Aug 1999 18:40:45 +0200 Subject: [Python-Dev] Some more constants for the socket module References: <37AC5B80.56F740DD@lemburg.com> <199908091450.KAA29179@eric.cnri.reston.va.us> Message-ID: <37AF048D.FC0A540@lemburg.com> Guido van Rossum wrote: > > Thanks for the socketmodule patch, Marc. This was on my mental TO-DO > list for a long time! I've checked it in. Cool, thanks. > (One note: I had a bit of trouble applying the patch; apparently your > mailer expanded all tabs to spaces. Perhaps you could use attachments > to mail diffs? Ok. > Also, you seem to have renamed 'd' to 'moddict' but > you didn't send the patch for that...) Oops, sorry... my "#define to insint" script uses 'd' as moddict, that's the reason why. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 144 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at CNRI.Reston.VA.US Mon Aug 9 19:30:36 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Mon, 09 Aug 1999 13:30:36 -0400 Subject: [Python-Dev] preferred conference date? In-Reply-To: Your message of "Mon, 09 Aug 1999 11:31:43 CDT." <14255.557.474160.824877@dolphin.mojam.com> References: <199908091626.MAA29411@eric.cnri.reston.va.us> <14255.557.474160.824877@dolphin.mojam.com> Message-ID: <199908091730.NAA29559@eric.cnri.reston.va.us> > Guido> The prices are high (they tell me that the whole conference will > Guido> cost $900, with a room rate of $129) but it's a class A location > > No way I (or my company) can afford to plunk down $900 for me to attend... Let me clarify this. The $900 is for the whole 4-day conference, including a day of tutorials and developers' day. I don't know what the exact price breakdown will be, but the tutorials will probably be $300. Last year the total price was $700, with $250 for tutorials. --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov at inrialpes.fr Tue Aug 10 14:04:27 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Tue, 10 Aug 1999 13:04:27 +0100 (NFT) Subject: [Python-Dev] shrinking dicts Message-ID: <199908101204.NAA29572@pukapuka.inrialpes.fr> Currently, dictionaries always grow until they are deallocated from memory. This happens in PyDict_SetItem according to the following code (before inserting the new item into the dict): /* if fill >= 2/3 size, double in size */ if (mp->ma_fill*3 >= mp->ma_size*2) { if (dictresize(mp, mp->ma_used*2) != 0) { if (mp->ma_fill+1 > mp->ma_size) return -1; } } The symmetric case is missing and this has intrigued me for a long time, but I've never had the courage to look deeply into this portion of code and try to propose a solution. Which is: reduce the size of the dict by half when the nb of used items <= 1/6 the size. This situation occurs far less frequently than dict growing, but anyways, it seems useful for the degenerate cases where a dict has a peek usage, then most of the items are deleted. This is usually the case for global dicts holding dynamic object collections, etc. A bonus effect of shrinking big dicts with deleted items is that the lookup speed may be improved, because of the cleaned entries and the reduced overall size (resulting in a better hit ratio). The (only) solution I could came with for this pb is the appended patch. It is not immediately obvious, but in practice, it seems to work fine. (inserting a print statement after the condition, showing the dict size and current usage helps in monitoring what's going on). Any other ideas on how to deal with this? Thoughts, comments? -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 -------------------------------[ cut here ]--------------------------- *** dictobject.c-1.5.2 Fri Aug 6 18:51:02 1999 --- dictobject.c Tue Aug 10 12:21:15 1999 *************** *** 417,423 **** ep->me_value = NULL; mp->ma_used--; Py_DECREF(old_value); ! Py_DECREF(old_key); return 0; } --- 417,430 ---- ep->me_value = NULL; mp->ma_used--; Py_DECREF(old_value); ! Py_DECREF(old_key); ! /* For bigger dictionaries, if used <= 1/6 size, half the size */ ! if (mp->ma_size > MINSIZE*4 && mp->ma_used*6 <= mp->ma_size) { ! if (dictresize(mp, mp->ma_used*2) != 0) { ! if (mp->ma_fill > mp->ma_size) ! return -1; ! } ! } return 0; } From Vladimir.Marangozov at inrialpes.fr Tue Aug 10 15:20:36 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Tue, 10 Aug 1999 14:20:36 +0100 (NFT) Subject: [Python-Dev] shrinking dicts In-Reply-To: <199908101204.NAA29572@pukapuka.inrialpes.fr> from "Vladimir Marangozov" at "Aug 10, 99 01:04:27 pm" Message-ID: <199908101320.OAA21986@pukapuka.inrialpes.fr> I wrote: > > The (only) solution I could came with for this pb is the appended patch. > It is not immediately obvious, but in practice, it seems to work fine. > (inserting a print statement after the condition, showing the dict size > and current usage helps in monitoring what's going on). > > Any other ideas on how to deal with this? Thoughts, comments? > To clarify a bit what the patch does "as is", here's a short description: The code is triggered in PyDict_DelItem only for sizes which are > MINSIZE*4, i.e. greater than 4*4 = 16. Therefore, resizing will occur for a min size of 32 items. one third 32 / 3 = 10 two thirds 32 * 2/3 = 21 one sixth 32 / 6 = 5 So the shinking will happen for a dict size of 32, of which 5 items are used (the sixth was just deleted). After the dictresize, the size will be 16, of which 5 items are used, i.e. one third. The threshold is fixed by the first condition of the patch. It could be made 64, instead of 32. This is subject to discussion... Obviously, this is most useful for bigger dicts, not for small ones. A threshold of 32 items seemed to me to be a reasonable compromise. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From fredrik at pythonware.com Tue Aug 10 14:35:33 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 10 Aug 1999 14:35:33 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> Message-ID: <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> Greg Stein wrote: > > > > > >>> import unicode > > > > > >>> import marshal > > > > > >>> u = unicode.unicode > > > > > >>> s = u("foo") > > > > > >>> data = marshal.dumps(s) > > > > > >>> marshal.loads(data) > > > > > 'f\000o\000o\000' > > > > > >>> type(marshal.loads(data)) > > > > > > > > > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought > > > that unicode objects use a two-byte character representation. > > Unicode objects should *not* implement the getcharbuffer slot. Only > read, write, and segcount. unicode objects do not implement the getcharbuffer slot. here's the relevant descriptor: static PyBufferProcs unicode_as_buffer = { (getreadbufferproc) unicode_buffer_getreadbuf, (getwritebufferproc) unicode_buffer_getwritebuf, (getsegcountproc) unicode_buffer_getsegcount }; the array module uses a similar descriptor. maybe the unicode class shouldn't implement the buffer interface at all? sure looks like the best way to avoid trivial mistakes (the current behaviour of fp.write(unicodeobj) is even more serious than the marshal glitch...) or maybe the buffer design needs an overhaul? From guido at CNRI.Reston.VA.US Tue Aug 10 16:12:23 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Tue, 10 Aug 1999 10:12:23 -0400 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: Your message of "Tue, 10 Aug 1999 14:35:33 +0200." <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> Message-ID: <199908101412.KAA02065@eric.cnri.reston.va.us> > Greg Stein wrote: > > > > > > >>> import unicode > > > > > > >>> import marshal > > > > > > >>> u = unicode.unicode > > > > > > >>> s = u("foo") > > > > > > >>> data = marshal.dumps(s) > > > > > > >>> marshal.loads(data) > > > > > > 'f\000o\000o\000' > > > > > > >>> type(marshal.loads(data)) > > > > > > > > > > > > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought > > > > that unicode objects use a two-byte character representation. > > > > Unicode objects should *not* implement the getcharbuffer slot. Only > > read, write, and segcount. > > unicode objects do not implement the getcharbuffer slot. > here's the relevant descriptor: > > static PyBufferProcs unicode_as_buffer = { > (getreadbufferproc) unicode_buffer_getreadbuf, > (getwritebufferproc) unicode_buffer_getwritebuf, > (getsegcountproc) unicode_buffer_getsegcount > }; > > the array module uses a similar descriptor. > > maybe the unicode class shouldn't implement the > buffer interface at all? sure looks like the best way > to avoid trivial mistakes (the current behaviour of > fp.write(unicodeobj) is even more serious than the > marshal glitch...) > > or maybe the buffer design needs an overhaul? I think most places that should use the charbuffer interface actually use the readbuffer interface. This is what should be fixed. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Tue Aug 10 19:53:56 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 10 Aug 1999 19:53:56 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> Message-ID: <37B06734.4339D3BF@lemburg.com> Fredrik Lundh wrote: > > unicode objects do not implement the getcharbuffer slot. >... > or maybe the buffer design needs an overhaul? I think its usage does. The character slot should be used whenever character data is needed, not the read buffer slot. The latter one is for passing around raw binary data (without reinterpretation !), if I understood Greg correctly back when I gave those abstract APIs a try. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 143 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Tue Aug 10 19:39:29 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 10 Aug 1999 19:39:29 +0200 Subject: [Python-Dev] shrinking dicts References: <199908101204.NAA29572@pukapuka.inrialpes.fr> Message-ID: <37B063D1.29F3106A@lemburg.com> Vladimir Marangozov wrote: > > Currently, dictionaries always grow until they are deallocated from > memory. This happens in PyDict_SetItem according to the following > code (before inserting the new item into the dict): > > /* if fill >= 2/3 size, double in size */ > if (mp->ma_fill*3 >= mp->ma_size*2) { > if (dictresize(mp, mp->ma_used*2) != 0) { > if (mp->ma_fill+1 > mp->ma_size) > return -1; > } > } > > The symmetric case is missing and this has intrigued me for a long time, > but I've never had the courage to look deeply into this portion of code > and try to propose a solution. Which is: reduce the size of the dict by > half when the nb of used items <= 1/6 the size. > > This situation occurs far less frequently than dict growing, but anyways, > it seems useful for the degenerate cases where a dict has a peek usage, > then most of the items are deleted. This is usually the case for global > dicts holding dynamic object collections, etc. > > A bonus effect of shrinking big dicts with deleted items is that > the lookup speed may be improved, because of the cleaned entries > and the reduced overall size (resulting in a better hit ratio). > > The (only) solution I could came with for this pb is the appended patch. > It is not immediately obvious, but in practice, it seems to work fine. > (inserting a print statement after the condition, showing the dict size > and current usage helps in monitoring what's going on). > > Any other ideas on how to deal with this? Thoughts, comments? I think that integrating this into the C code is not really that effective since the situation will not occur that often and then it often better to let the programmer decide rather than integrate an automatic downsize. You can call dict.update({}) to force an internal resize (the empty dictionary can be made global since it is not manipulated in any way and thus does not cause creation overhead). Perhaps a new method .resize(approx_size) would make this even clearer. This would also have the benefit of allowing a programmer to force allocation of the wanted size, e.g. d = {} d.resize(10000) # Insert 10000 items in a batch insert -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 143 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Vladimir.Marangozov at inrialpes.fr Tue Aug 10 21:58:27 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Tue, 10 Aug 1999 20:58:27 +0100 (NFT) Subject: [Python-Dev] shrinking dicts In-Reply-To: <37B063D1.29F3106A@lemburg.com> from "M.-A. Lemburg" at "Aug 10, 99 07:39:29 pm" Message-ID: <199908101958.UAA22028@pukapuka.inrialpes.fr> M.-A. Lemburg wrote: > > [me] > > Any other ideas on how to deal with this? Thoughts, comments? > > I think that integrating this into the C code is not really that > effective since the situation will not occur that often and then > it often better to let the programmer decide rather than integrate > an automatic downsize. Agreed that the situation is rare. But if it occurs, its Python's responsability to manage its data structures (and system resources) efficiently. As a programmer, I really don't want to be bothered with internals -- I trust the interpreter for that. Moreover, how could I decide that at some point, some dict needs to be resized in my fairly big app, say IDLE? > > You can call dict.update({}) to force an internal > resize (the empty dictionary can be made global since it is not > manipulated in any way and thus does not cause creation overhead). I know that I can force the resize in other ways, but this is not the point. I'm usually against the idea of changing the programming logic because of my advanced knowledge of the internals. > > Perhaps a new method .resize(approx_size) would make this even > clearer. This would also have the benefit of allowing a programmer > to force allocation of the wanted size, e.g. > > d = {} > d.resize(10000) > # Insert 10000 items in a batch insert This is interesting, but the two ideas are not mutually excusive. Python has to dowsize dicts automatically (just the same way it doubles the size automatically). Offering more through an API is a plus for hackers. ;-) -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal at lemburg.com Tue Aug 10 22:19:46 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 10 Aug 1999 22:19:46 +0200 Subject: [Python-Dev] shrinking dicts References: <199908101958.UAA22028@pukapuka.inrialpes.fr> Message-ID: <37B08962.6DFB3F0@lemburg.com> Vladimir Marangozov wrote: > > M.-A. Lemburg wrote: > > > > [me] > > > Any other ideas on how to deal with this? Thoughts, comments? > > > > I think that integrating this into the C code is not really that > > effective since the situation will not occur that often and then > > it often better to let the programmer decide rather than integrate > > an automatic downsize. > > Agreed that the situation is rare. But if it occurs, its Python's > responsability to manage its data structures (and system resources) > efficiently. As a programmer, I really don't want to be bothered with > internals -- I trust the interpreter for that. Moreover, how could > I decide that at some point, some dict needs to be resized in my > fairly big app, say IDLE? You usually don't ;-) because "normal" dict only grow (well, more or less). The downsizing thing will only become a problem if you use dictionaries in certain algorithms and there you handle the problem manually. My stack implementation uses the same trick, BTW. Memory is cheap and with an extra resize method (which the mxStack implementation has), problems can be dealt with explicitly for everyone to see in the code. > > You can call dict.update({}) to force an internal > > resize (the empty dictionary can be made global since it is not > > manipulated in any way and thus does not cause creation overhead). > > I know that I can force the resize in other ways, but this is not > the point. I'm usually against the idea of changing the programming > logic because of my advanced knowledge of the internals. True, that why I mentioned... > > > > Perhaps a new method .resize(approx_size) would make this even > > clearer. This would also have the benefit of allowing a programmer > > to force allocation of the wanted size, e.g. > > > > d = {} > > d.resize(10000) > > # Insert 10000 items in a batch insert > > This is interesting, but the two ideas are not mutually excusive. > Python has to dowsize dicts automatically (just the same way it doubles > the size automatically). Offering more through an API is a plus for > hackers. ;-) It's not really for hackers: the point is that it makes the technique visible and understandable (as opposed to the hack above). The same could be useful for lists too (the hack there is l = [None] * size, which I find rather difficult to understand at first sight...). -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 143 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mhammond at skippinet.com.au Wed Aug 11 00:39:30 1999 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 11 Aug 1999 08:39:30 +1000 Subject: [Python-Dev] shrinking dicts In-Reply-To: <37B08962.6DFB3F0@lemburg.com> Message-ID: <010901bee381$36ee5d30$1101a8c0@bobcat> Looking over the messages from Marc and Vladimir, Im going to add my 2c worth. IMO, Marc's position is untenable iff it can be demonstrated that the "average" program is likely to see "sparse" dictionaries, and such dictionaries have an adverse effect on either speed or memory. The analogy is quite simple - you dont need to manually resize lists or dicts before inserting (to allocate more storage - an internal implementation issue) so neither should you need to manually resize when deleting (to reclaim that storage - still internal implementation). Suggesting that the allocation of resources should be automatic, but the recycling of them not be automatic flies in the face of everything else - eg, you dont need to delete each object - when it is no longer referenced, its memory is reclaimed automatically. Marc's position is only reasonable if the specific case we are talking about is very very rare, and unlikely to be hit by anyone with normal, real-world requirements or programs. In this case, exposing the implementation detail is reasonable. So, the question comes down to: "What is the benefit to Vladmir's patch?" Maybe we need some metrics on some dictionaries. For example, maybe a doctored Python that kept stats for each dictionary and log this info. The output of this should be able to tell you what savings you could possibly expect. If you find that the average program really would not benefit at all (say only a few K from a small number of dicts) then the horse was probably dead well before we started flogging it. If however you can demonstrate serious benefits could be achieved, then interest may pick up and I too would lobby for automatic downsizing. Mark. From tim_one at email.msn.com Wed Aug 11 07:30:20 1999 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 11 Aug 1999 01:30:20 -0400 Subject: [Python-Dev] shrinking dicts In-Reply-To: <199908101204.NAA29572@pukapuka.inrialpes.fr> Message-ID: <000001bee3ba$9b226f60$8d2d2399@tim> [Vladimir] > Currently, dictionaries always grow until they are deallocated from > memory. It's more accurate to say they never shrink <0.9 wink>. Even that has exceptions, though, starting with: > This happens in PyDict_SetItem according to the following > code (before inserting the new item into the dict): > > /* if fill >= 2/3 size, double in size */ > if (mp->ma_fill*3 >= mp->ma_size*2) { > if (dictresize(mp, mp->ma_used*2) != 0) { > if (mp->ma_fill+1 > mp->ma_size) > return -1; > } > } This code can shrink the dict too. The load factor computation is based on "fill", but the resize is based on "used". If you grow a huge dict, then delete all the entries one by one, "used" falls to 0 but "fill" stays at its high-water mark. At least 1/3rd of the entries are NULL, so "fill" continues to climb as keys are added again: when the load factor computation triggers again, "used" may be as small as 1, and dictresize can shrink the dict dramatically. The only clear a priori return I see in your patch is that I might save memory if I delete gobs of stuff from a dict and then neither get rid of it nor add keys to it again. But my programs generally grow dicts forever, grow then delete them entirely, or cycle through fat and lean times (in which case the code above already shrinks them from time to time). So I don't expect that your patch would be buy me anything I want, but would cost me more on every delete. > ... > Any other ideas on how to deal with this? Thoughts, comments? Just that slowing the expected case to prevent theoretical bad cases is usually a net loss -- I think the onus is on you to demonstrate that this change is an exception to that rule. I do recall one real-life complaint about it on c.l.py a couple years ago: the poster had a huge dict, eventually deleted most of the items, and then kept it around purely for lookups. They were happy enough to copy the dict into a fresh one a key+value pair at a time; today they could just do d = d.copy() or even d.update({}) to shrink the dict. It would certainly be good to document these tricks! if-it-wasn't-a-problem-the-first-8-years-of-python's-life-it's-hard-to- see-why-1999-is-special-ly y'rs - tim From tim_one at email.msn.com Wed Aug 11 08:45:49 1999 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 11 Aug 1999 02:45:49 -0400 Subject: [Python-Dev] preferred conference date? In-Reply-To: <199908091626.MAA29411@eric.cnri.reston.va.us> Message-ID: <000201bee3c5$25b47b00$8d2d2399@tim> [Guido] > ... > The prices are high (they tell me that the whole conference will cost > $900, with a room rate of $129) Is room rental in addition to, or included in, that $900? > ... > I'm worried that I'll be flamed to hell for this by the PSA members, So have JulieK announce it . > ... > Anyway, given that Foretec is likely to go with this hotel, we have a > choice of two dates: January 16-19, or 23-26 (both starting on a > Sunday with the tutorials). This is where I need your help: which > date would you prefer? 23-26 for me; 16-19 may not be doable. or-everyone-can-switch-to-windows-and-we'll-do-the-conference-via- netmeeting-ly y'rs - tim From Vladimir.Marangozov at inrialpes.fr Wed Aug 11 16:33:17 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Wed, 11 Aug 1999 15:33:17 +0100 (NFT) Subject: [Python-Dev] shrinking dicts In-Reply-To: <000001bee3ba$9b226f60$8d2d2399@tim> from "Tim Peters" at "Aug 11, 99 01:30:20 am" Message-ID: <199908111433.PAA31842@pukapuka.inrialpes.fr> Tim Peters wrote: > > [Vladimir] > > Currently, dictionaries always grow until they are deallocated from > > memory. > > It's more accurate to say they never shrink <0.9 wink>. Even that has > exceptions, though, starting with: > > > This happens in PyDict_SetItem according to the following > > code (before inserting the new item into the dict): > > > > /* if fill >= 2/3 size, double in size */ > > if (mp->ma_fill*3 >= mp->ma_size*2) { > > if (dictresize(mp, mp->ma_used*2) != 0) { > > if (mp->ma_fill+1 > mp->ma_size) > > return -1; > > } > > } > > This code can shrink the dict too. The load factor computation is based on > "fill", but the resize is based on "used". If you grow a huge dict, then > delete all the entries one by one, "used" falls to 0 but "fill" stays at its > high-water mark. At least 1/3rd of the entries are NULL, so "fill" > continues to climb as keys are added again: when the load factor > computation triggers again, "used" may be as small as 1, and dictresize can > shrink the dict dramatically. Thanks for clarifying this! > [snip] > > > ... > > Any other ideas on how to deal with this? Thoughts, comments? > > Just that slowing the expected case to prevent theoretical bad cases is > usually a net loss -- I think the onus is on you to demonstrate that this > change is an exception to that rule. I won't, because this case is rare in practice, classifying it already as an exception. A real exception. I'll have to think a bit more about all this. Adding 1/3 new entries to trigger the next resize sounds suboptimal (if it happens at all). > I do recall one real-life complaint > about it on c.l.py a couple years ago: the poster had a huge dict, > eventually deleted most of the items, and then kept it around purely for > lookups. They were happy enough to copy the dict into a fresh one a > key+value pair at a time; today they could just do > > d = d.copy() > > or even > > d.update({}) > > to shrink the dict. > > It would certainly be good to document these tricks! I think that officializing these tricks in the documentation is a bad idea. > > if-it-wasn't-a-problem-the-first-8-years-of-python's-life-it's-hard-to- > see-why-1999-is-special-ly y'rs - tim > This is a good (your favorite ;-) argument, but don't forget that you've been around, teaching people various tricks. And 1999 is special -- we just had a solar eclipse today, the next being scheduled for 2081. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From fredrik at pythonware.com Wed Aug 11 16:07:44 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 11 Aug 1999 16:07:44 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> Message-ID: <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> > > or maybe the buffer design needs an overhaul? > > I think most places that should use the charbuffer interface actually > use the readbuffer interface. This is what should be fixed. ok. btw, how about adding support for buffer access to data that have strange internal formats (like cer- tain PIL image memories) or isn't directly accessible (like "virtual" and "abstract" image buffers in PIL 1.1). something like: int initbuffer(PyObject* obj, void** context); int exitbuffer(PyObject* obj, void* context); and corresponding context arguments to the rest of the functions... From guido at CNRI.Reston.VA.US Wed Aug 11 16:42:10 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Wed, 11 Aug 1999 10:42:10 -0400 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: Your message of "Wed, 11 Aug 1999 16:07:44 +0200." <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> Message-ID: <199908111442.KAA04423@eric.cnri.reston.va.us> > btw, how about adding support for buffer access > to data that have strange internal formats (like cer- > tain PIL image memories) or isn't directly accessible > (like "virtual" and "abstract" image buffers in PIL 1.1). > something like: > > int initbuffer(PyObject* obj, void** context); > int exitbuffer(PyObject* obj, void* context); > > and corresponding context arguments to the > rest of the functions... Can you explain this idea more? Without more understanding of PIL I have no idea what you're talking about... --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one at email.msn.com Thu Aug 12 07:15:39 1999 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 12 Aug 1999 01:15:39 -0400 Subject: [Python-Dev] shrinking dicts In-Reply-To: <199908111433.PAA31842@pukapuka.inrialpes.fr> Message-ID: <000301bee481$b78ae5c0$4e2d2399@tim> [Tim] >> ...slowing the expected case to prevent theoretical bad cases is >> usually a net loss -- I think the onus is on you to demonstrate >> that this change is an exception to that rule. [Vladimir Marangozov] > I won't, because this case is rare in practice, classifying it already > as an exception. A real exception. I'll have to think a bit more about > all this. Adding 1/3 new entries to trigger the next resize sounds > suboptimal (if it happens at all). "Suboptimal" with respect to which specific cost model? Exhibiting a specific bad case isn't compelling, and especially not when it's considered to be "a real exception". Adding new expense to every delete is an obvious new burden -- where's the payback, and is the expected net effect amortized across all dict usage a win or loss? Offhand it sounds like a small loss to me, although I haven't worked up a formal cost model either . > ... > I think that officializing these tricks in the documentation is a > bad idea. It's rarely a good idea to keep truths secret, although implementation-du-jour tricks don't belong in the current doc set. Probably in a HowTo. >> if-it-wasn't-a-problem-the-first-8-years-of-python's-life-it's-hard- >> to-see-why-1999-is-special-ly y'rs - tim > This is a good (your favorite ;-) argument, I actually hate that kind of argument -- it's one of *Guido's* favorites, and in his current silent state I'm simply channeling him . > but don't forget that you've been around, teaching people various > tricks. As I said, this particular trick has come up only once in real life in my experience; it's never come up in my own code; it's an anti-FAQ. People are 100x more likely to whine about theoretical quadratic-time list growth nobody has ever encountered (although it looks like they may finally get it under an out-of-the-box BDW collector!). > And 1999 is special -- we just had a solar eclipse today, the next being > scheduled for 2081. Ya, like any of us will survive Y2K to see it . 1999-is-special-cuz-it's-the-end-of-civilization-ly y'rs - tim From Vladimir.Marangozov at inrialpes.fr Thu Aug 12 20:22:06 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Thu, 12 Aug 1999 19:22:06 +0100 (NFT) Subject: [Python-Dev] about line numbers Message-ID: <199908121822.TAA40444@pukapuka.inrialpes.fr> Just curious: Is python with vs. without "-O" equivalent today regarding line numbers? Are SET_LINENO opcodes a plus in some situations or not? Next, I see quite often several SET_LINENO in a row in the beginning of code objects due to doc strings, etc. Since I don't think that folding them into one SET_LINENO would be an optimisation (it would rather be avoiding the redundancy), is it possible and/or reasonable to do something in this direction? A trivial example: >>> def f(): ... "This is a comment about f" ... a = 1 ... >>> import dis >>> dis.dis(f) 0 SET_LINENO 1 3 SET_LINENO 2 6 SET_LINENO 3 9 LOAD_CONST 1 (1) 12 STORE_FAST 0 (a) 15 LOAD_CONST 2 (None) 18 RETURN_VALUE >>> Can the above become something like this instead: 0 SET_LINENO 3 3 LOAD_CONST 1 (1) 6 STORE_FAST 0 (a) 9 LOAD_CONST 2 (None) 12 RETURN_VALUE -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From jack at oratrix.nl Fri Aug 13 00:02:06 1999 From: jack at oratrix.nl (Jack Jansen) Date: Fri, 13 Aug 1999 00:02:06 +0200 Subject: [Python-Dev] about line numbers In-Reply-To: Message by Vladimir Marangozov , Thu, 12 Aug 1999 19:22:06 +0100 (NFT) , <199908121822.TAA40444@pukapuka.inrialpes.fr> Message-ID: <19990812220211.B3CED993@oratrix.oratrix.nl> The only possible problem I can see with folding linenumbers is if someone sets a breakpoint on such a line. And I think it'll be difficult to explain the missing line numbers to pdb, so there isn't an easy workaround (at least, it takes more than my 30 seconds of brainpoewr to come up with one:-). -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From Vladimir.Marangozov at inrialpes.fr Fri Aug 13 01:10:26 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 13 Aug 1999 00:10:26 +0100 (NFT) Subject: [Python-Dev] shrinking dicts In-Reply-To: <000301bee481$b78ae5c0$4e2d2399@tim> from "Tim Peters" at "Aug 12, 99 01:15:39 am" Message-ID: <199908122310.AAA29618@pukapuka.inrialpes.fr> Tim Peters wrote: > > [Tim] > >> ...slowing the expected case to prevent theoretical bad cases is > >> usually a net loss -- I think the onus is on you to demonstrate > >> that this change is an exception to that rule. > > [Vladimir Marangozov] > > I won't, because this case is rare in practice, classifying it already > > as an exception. A real exception. I'll have to think a bit more about > > all this. Adding 1/3 new entries to trigger the next resize sounds > > suboptimal (if it happens at all). > > "Suboptimal" with respect to which specific cost model? Exhibiting a > specific bad case isn't compelling, and especially not when it's considered > to be "a real exception". Adding new expense to every delete is an obvious > new burden -- where's the payback, and is the expected net effect amortized > across all dict usage a win or loss? Offhand it sounds like a small loss to > me, although I haven't worked up a formal cost model either . C'mon Tim, don't try to impress me with cost models. I'm already impressed :-) Anyways, I've looked at some traces. As expected, the conclusion is that this case is extremely rare wrt the average dict usage. There are 3 reasons: (1) dicts are usually deleted entirely and (2) del d[key] is rare in practice (3) often d[key] = None is used instead of (2). There is, however, a small percentage of dicts which are used below 1/3 of their size. I must say, below 1/3 of their peek size, because dowsizing is also rare. To trigger a downsize, 1/3 new entries of the peek size must be inserted. Besides these observations, after looking at the code one more time, I can't really understand why the resize logic is based on the "fill" watermark and not on "used". fill = used + dummy, but since lookdict returns the first free slot (null or dummy), I don't really see what's the point of using a fill watermark... Perhaps you can enlighten me on this. Using only the "used" metrics seems fine to me. I even deactivated "fill" and replaced it with "used" to see what happens -- no visible changes, except a tiny speedup I'm willing to neglect. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From Vladimir.Marangozov at inrialpes.fr Fri Aug 13 01:21:48 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 13 Aug 1999 00:21:48 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <19990812220211.B3CED993@oratrix.oratrix.nl> from "Jack Jansen" at "Aug 13, 99 00:02:06 am" Message-ID: <199908122321.AAA29572@pukapuka.inrialpes.fr> Jack Jansen wrote: > > > The only possible problem I can see with folding linenumbers is if > someone sets a breakpoint on such a line. And I think it'll be > difficult to explain the missing line numbers to pdb, so there isn't > an easy workaround (at least, it takes more than my 30 seconds of > brainpoewr to come up with one:-). > Eek! We can set a breakpoint on a doc string? :-) There's no code in there. It should be treated as a comment by pdb. I can't set a breakpoint on a comment line even in C ;-) There must be something deeper about it... -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tim_one at email.msn.com Fri Aug 13 02:07:32 1999 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 12 Aug 1999 20:07:32 -0400 Subject: [Python-Dev] about line numbers In-Reply-To: <199908121822.TAA40444@pukapuka.inrialpes.fr> Message-ID: <000101bee51f$d7601de0$fb2d2399@tim> [Vladimir Marangozov] > Is python with vs. without "-O" equivalent today regarding > line numbers? > > Are SET_LINENO opcodes a plus in some situations or not? In theory it should make no difference, except that the trace mechanism makes a callback on each SET_LINENO, and that's how the debugger implements line-number breakpoints. Under -O, there are no SET_LINENOs, so debugger line-number breakpoints don't work under -O. I think there's also a sporadic buglet, which I've never bothered to track down: sometimes a line number reported in a traceback under -O (&, IIRC, it's always the topmost line number) comes out as a senseless negative value. > Next, I see quite often several SET_LINENO in a row in the beginning > of code objects due to doc strings, etc. Since I don't think that > folding them into one SET_LINENO would be an optimisation (it would > rather be avoiding the redundancy), is it possible and/or reasonable > to do something in this direction? All opcodes consume time, although a wasted trip or two around the eval loop at the start of a function isn't worth much effort to avoid. Still, it's a legitimate opportunity for provable speedup, even if unmeasurable speedup . Would be more valuable to rethink the debugger's breakpoint approach so that SET_LINENO is never needed (line-triggered callbacks are expensive because called so frequently, turning each dynamic SET_LINENO into a full-blown Python call; if I used the debugger often enough to care , I'd think about munging in a new opcode to make breakpoint sites explicit). immutability-is-made-to-be-violated-ly y'rs - tim From tim_one at email.msn.com Fri Aug 13 06:53:38 1999 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 13 Aug 1999 00:53:38 -0400 Subject: [Python-Dev] shrinking dicts In-Reply-To: <199908122307.AAA06018@pukapuka.inrialpes.fr> Message-ID: <000101bee547$cffaa020$992d2399@tim> [Vladimir Marangozov, *almost* seems ready to give up on a counter- productive dict pessimization ] > ... > There is, however, a small percentage of dicts which are used > below 1/3 of their size. I must say, below 1/3 of their peek size, > because dowsizing is also rare. To trigger a downsize, 1/3 new > entries of the peek size must be inserted. Not so, although "on average" 1/6 may be correct. Look at an extreme: Say a dict has size 333 (it can't, but it makes the math obvious ...). Say it contains 221 items. Now someone deletes them all, one at a time. used==0 and fill==221 at this point. They insert one new key that happens to hit one of the 333-221 = 112 remaining NULL keys. Then used==1 and fill==222. They insert a 2nd key, and before the dict is searched the new fill of 222 triggers the 2/3rds load-factor resizing -- which asks for a new size of 1*2 == 2. For the minority of dicts that go up and down in size wildly many times, the current behavior is fine. > Besides these observations, after looking at the code one more > time, I can't really understand why the resize logic is based on > the "fill" watermark and not on "used". fill = used + dummy, but > since lookdict returns the first free slot (null or dummy), I don't > really see what's the point of using a fill watermark... Let's just consider an unsuccessful search. Then it does return "the first" free slot, but not necessarily at the time it *sees* the first free slot. So long as it sees a dummy, it has to keep searching; the search doesn't end until it finds a NULL. So consider this, assuming the resize triggered only on "used": d = {} for i in xrange(50000): d[random.randrange(1000000)] = 1 for k in d.keys(): del d[k] # now there are 50000 dummy dict keys, and some number of NULLs # loop invariant: used == 0 for i in xrange(sys.maxint): j = random.randrange(10000000) d[j] = 1 del d[j] assert not d.has_key(i) However many NULL slots remained, the last loop eventually transforms them *all* into dummies. The dummies act exactly like "real keys" with respect to expected time for an unsuccessful search, which is why it's thoroughly appropriate to include dummies in the load factor computation. The loop will run slower and slower as the percentage of dummies approaches 100%, and each failing has_key approaches O(N) time. In most hash table implementations that's the worst that can happen (and it's a disaster), but under Python's implementation it's worse: Python never checks to see whether the probe sequence "wraps around", so the first search after the last NULL is changed to a dummy never ends. Counting the dummies in the load-factor computation prevents all that: no matter how much inserts and deletes are intermixed, the "effective load factor" stays under 2/3rds so gives excellent expected-case behavior; and it also protects against an all-dummy dict, making the lack of an expensive inner-loop "wrapped around?" check safe. > Perhaps you can enlighten me on this. Using only the "used" metrics > seems fine to me. I even deactivated "fill" and replaced it with "used" > to see what happens -- no visible changes, except a tiny speedup I'm > willing to neglect. You need a mix of deletes and inserts for the dummies to make a difference; dicts that always grow don't have dummies, so they're not likely to have any dummy-related problems either . Try this (untested): import time from random import randrange N = 1000 thatmany = [None] * N while 1: start = time.clock() for i in thatmany: d[randrange(10000000)] = 1 for i in d.keys(): del d[i] finish = time.clock() print round(finish - start, 3) Succeeding iterations of the outer loop should grow dramatically slower, and finally get into an infinite loop, despite that "used" never exceeds N. Short course rewording: for purposes of predicting expected search time, a dummy is the same as a live key, because finding a dummy doesn't end a search -- it has to press on until either finding the key it was looking for, or finding a NULL. And with a mix of insertions and deletions, and if the hash function is doing a good job, then over time all the slots in the table will become either live or dummy, even if "used" stays within a very small range. So, that's why . dictobject-may-be-the-subtlest-object-there-is-ly y'rs - tim From gstein at lyra.org Fri Aug 13 11:13:55 1999 From: gstein at lyra.org (Greg Stein) Date: Fri, 13 Aug 1999 02:13:55 -0700 (PDT) Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> Message-ID: On Tue, 10 Aug 1999, Fredrik Lundh wrote: >... > unicode objects do not implement the getcharbuffer slot. This is Goodness. All righty. >... > maybe the unicode class shouldn't implement the > buffer interface at all? sure looks like the best way It is needed for fp.write(unicodeobj) ... It is also very handy for C functions to deal with Unicode strings. > to avoid trivial mistakes (the current behaviour of > fp.write(unicodeobj) is even more serious than the > marshal glitch...) What's wrong with fp.write(unicodeobj)? It should write the unicode value to the file. Are you suggesting that it will need to be done differently? Icky. > or maybe the buffer design needs an overhaul? Not that I know of. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Aug 13 12:36:13 1999 From: gstein at lyra.org (Greg Stein) Date: Fri, 13 Aug 1999 03:36:13 -0700 (PDT) Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: <199908101412.KAA02065@eric.cnri.reston.va.us> Message-ID: On Tue, 10 Aug 1999, Guido van Rossum wrote: >... > > or maybe the buffer design needs an overhaul? > > I think most places that should use the charbuffer interface actually > use the readbuffer interface. This is what should be fixed. I believe that I properly changed all of these within the core distribution. Per your requested design, third-party extensions must switch from "s#" to "t#" to move to the charbuffer interface, as needed. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Vladimir.Marangozov at inrialpes.fr Fri Aug 13 15:47:05 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 13 Aug 1999 14:47:05 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <000101bee51f$d7601de0$fb2d2399@tim> from "Tim Peters" at "Aug 12, 99 08:07:32 pm" Message-ID: <199908131347.OAA30740@pukapuka.inrialpes.fr> Tim Peters wrote: > > [Vladimir Marangozov, *almost* seems ready to give up on a counter- > productive dict pessimization ] > Of course I will! Now everything is perfectly clear. Thanks. > ... > So, that's why . > Now, *this* one explanation of yours should go into a HowTo/BecauseOf for developers. I timed your scripts and a couple of mine which attest (again) the validity of the current implementation. My patch is out of bounds. It even disturbs from time to time the existing harmony in the results ;-) because of early resizing. All in all, for performance reasons, dicts remain an exception to the rule of releasing memory ASAP. They have been designed to tolerate caching because of their dynamics, which is the main reason for the rare case addressed by my patch. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal at lemburg.com Fri Aug 13 19:27:19 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 13 Aug 1999 19:27:19 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: Message-ID: <37B45577.7772CAA1@lemburg.com> Greg Stein wrote: > > On Tue, 10 Aug 1999, Guido van Rossum wrote: > >... > > > or maybe the buffer design needs an overhaul? > > > > I think most places that should use the charbuffer interface actually > > use the readbuffer interface. This is what should be fixed. > > I believe that I properly changed all of these within the core > distribution. Per your requested design, third-party extensions must > switch from "s#" to "t#" to move to the charbuffer interface, as needed. Shouldn't this be the other way around ? After all, extensions using "s#" do expect character data and not arbitrary binary encodings of information. IMHO, the latter should be special cased, not the prior. E.g. it doesn't make sense to use the re module to scan over 2-byte Unicode with single character based search patterns. Aside: Is the buffer interface reachable in any way from within Python ? Why isn't the interface exposed via __XXX__ methods on normal Python instances (could be implemented by returning a buffer object) ? -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 140 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake at acm.org Fri Aug 13 17:32:40 1999 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 13 Aug 1999 11:32:40 -0400 (EDT) Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: <37B45577.7772CAA1@lemburg.com> References: <37B45577.7772CAA1@lemburg.com> Message-ID: <14260.15000.398399.840716@weyr.cnri.reston.va.us> M.-A. Lemburg writes: > Aside: Is the buffer interface reachable in any way from within > Python ? Why isn't the interface exposed via __XXX__ methods > on normal Python instances (could be implemented by returning a > buffer object) ? Would it even make sense? I though a large part of the intent was to for performance, avoiding memory copies. Perhaps there should be an .__as_buffer__() which returned an object that supports the C buffer interface. I'm not sure how useful it would be; perhaps for classes that represent image data? They could return a buffer object created from a string/array/NumPy array. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fredrik at pythonware.com Fri Aug 13 17:59:12 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 13 Aug 1999 17:59:12 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us> Message-ID: <00d401bee5a7$13f84750$f29b12c2@secret.pythonware.com> > Would it even make sense? I though a large part of the intent was > to for performance, avoiding memory copies. looks like there's some confusion here over what the buffer interface is all about. time for a new GvR essay, perhaps? From fdrake at acm.org Fri Aug 13 18:22:09 1999 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 13 Aug 1999 12:22:09 -0400 (EDT) Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: <00d401bee5a7$13f84750$f29b12c2@secret.pythonware.com> References: <37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us> <00d401bee5a7$13f84750$f29b12c2@secret.pythonware.com> Message-ID: <14260.17969.497916.382752@weyr.cnri.reston.va.us> Fredrik Lundh writes: > looks like there's some confusion here over > what the buffer interface is all about. time > for a new GvR essay, perhaps? If he'll write something about it, I'll be glad to adapt it to the extending & embedding manual. It seems important that it be included in the standard documentation since it will be important for extension writers to understand when they should implement it. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fredrik at pythonware.com Fri Aug 13 18:34:46 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 13 Aug 1999 18:34:46 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> Message-ID: <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> Guido van Rossum wrote: > > btw, how about adding support for buffer access > > to data that have strange internal formats (like cer- > > tain PIL image memories) or isn't directly accessible > > (like "virtual" and "abstract" image buffers in PIL 1.1). > > something like: > > > > int initbuffer(PyObject* obj, void** context); > > int exitbuffer(PyObject* obj, void* context); > > > > and corresponding context arguments to the > > rest of the functions... > > Can you explain this idea more? Without more understanding of PIL I > have no idea what you're talking about... in code: void* context; // this can be done at any time segments = pb->getsegcount(obj, NULL, context); if (!pb->bf_initbuffer(obj, &context)) ... failed to initialise buffer api ... ... allocate segment size buffer ... pb->getsegcount(obj, &bytes, context); ... calculate total buffer size and allocate buffer ... for (i = offset = 0; i < segments; i++) { n = pb->getreadbuffer(obj, i, &p, context); if (n < 0) ... failed to fetch a given segment ... memcpy(buf + offset, p, n); // or write to file, or whatevef offset = offset + n; } pb->bf_exitbuffer(obj, context); in other words, this would given the target object a chance to keep some local context (like a temporary buffer) during a sequence of buffer operations... for PIL, this would make it possible to 1) store required metadata (size, mode, palette) along with the actual buffer contents. 2) possibly pack formats that use extra internal storage for performance reasons -- RGB pixels are stored as 32-bit integers, for example. 3) access virtual image memories (that can only be accessed via a buffer-like interface in them- selves -- given an image object, you acquire an access handle, and use a getdata method to access the actual data. without initbuffer, there's no way to do two buffer access in parallel. without exitbuffer, there's no way to release the access handle. without the context variable, there's nowhere to keep the access handle between calls.) 4) access abstract image memories (like virtual memories, but they reside outside PIL, like on a remote server, or inside another image pro- cessing library, or on a hardware device). 5) convert to external formats on the fly: fp.write(im.buffer("JPEG")) and probably a lot more. as far as I can tell, nothing of this can be done using the current design... ... besides, what about buffers and threads? if you return a pointer from getreadbuf, wouldn't it be good to know exactly when Python doesn't need that pointer any more? explicit initbuffer/exitbuffer calls around each sequence of buffer operations would make that a lot safer... From mal at lemburg.com Fri Aug 13 21:16:44 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 13 Aug 1999 21:16:44 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us> Message-ID: <37B46F1C.1A513F33@lemburg.com> Fred L. Drake, Jr. wrote: > > M.-A. Lemburg writes: > > Aside: Is the buffer interface reachable in any way from within > > Python ? Why isn't the interface exposed via __XXX__ methods > > on normal Python instances (could be implemented by returning a > > buffer object) ? > > Would it even make sense? I though a large part of the intent was > to for performance, avoiding memory copies. Perhaps there should be > an .__as_buffer__() which returned an object that supports the C > buffer interface. I'm not sure how useful it would be; perhaps for > classes that represent image data? They could return a buffer object > created from a string/array/NumPy array. That's what I had in mind. def __getreadbuffer__(self): return buffer(self.data) def __getcharbuffer__(self): return buffer(self.string_data) def __getwritebuffer__(self): return buffer(self.mmaped_file) Note that buffer() does not copy the data, it only adds a reference to the object being used. Hmm, how about adding a writeable binary object to the core ? This would be useful for the __getwritebbuffer__() API because currently, I think, only mmap'ed files are useable as write buffers -- no other in-memory type. Perhaps buffer objects could be used for this purpose too, e.g. by having them allocate the needed memory chunk in case you pass None as object. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 140 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jack at oratrix.nl Fri Aug 13 23:48:12 1999 From: jack at oratrix.nl (Jack Jansen) Date: Fri, 13 Aug 1999 23:48:12 +0200 Subject: [Python-Dev] Quick-and-dirty weak references Message-ID: <19990813214817.5393C1C4742@oratrix.oratrix.nl> This week again I was bitten by the fact that Python doesn't have any form of weak references, and while I was toying with some ideas I came up with the following quick-and-dirty scheme that I thought I'd bounce off this list. I might even volunteer to implement it, if people agree it is worth it:-) We add a new builtin function (or a module with that function) weak(). This returns a weak reference to the object passed as a parameter. A weak object has one method: strong(), which returns the corresponding real object or raises an exception if the object doesn't exist anymore. For convenience we could add a method exists() that returns true if the real object still exists. Now comes the bit that I'm unsure about: to implement this I need to add a pointer to every object. This pointer is either NULL or points to the corresponding weak objectt (so for every object there is either no weak reference object or exactly one). But, for the price of 4 bytes extra in every object we get the nicety that there is little cpu-overhead: refcounting macros work identical to the way they do now, the only thing to take care of is that during object deallocation we have to zero the weak pointer. (actually: we could make do with a single bit in every object, with the bit meaning "this object has an associated weak object". We could then use a global dictionary indexed by object address to find the weak object) From mal at lemburg.com Sat Aug 14 01:15:39 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 14 Aug 1999 01:15:39 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: Message-ID: <37B4A71B.2073875F@lemburg.com> Greg Stein wrote: > > On Tue, 10 Aug 1999, Fredrik Lundh wrote: > > maybe the unicode class shouldn't implement the > > buffer interface at all? sure looks like the best way > > It is needed for fp.write(unicodeobj) ... > > It is also very handy for C functions to deal with Unicode strings. Wouldn't a special C API be (even) more convenient ? > > to avoid trivial mistakes (the current behaviour of > > fp.write(unicodeobj) is even more serious than the > > marshal glitch...) > > What's wrong with fp.write(unicodeobj)? It should write the unicode value > to the file. Are you suggesting that it will need to be done differently? > Icky. Would this also write some kind of Unicode encoding header ? [Sorry, this is my Unicode ignorance shining through... I only remember lots of talk about these things on the string-sig.] Since fp.write() uses "s#" this would use the getreadbuffer slot in 1.5.2... I think what it *should* do is use the getcharbuffer slot instead (see my other post), since dumping the raw unicode data would loose too much information. Again, such things should be handled by extra methods, e.g. fp.rawwrite(). Hmm, I guess the philosophy behind the interface is not really clear. Binary data is fetched via getreadbuffer and then interpreted as character data... I always thought that the getcharbuffer should be used for such an interpretation. Or maybe, we should dump the getcharbufer slot again and use the getreadbuffer information just as we would a void* pointer in C: with no explicit or implicit type information. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 140 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein at lyra.org Sat Aug 14 10:53:04 1999 From: gstein at lyra.org (Greg Stein) Date: Sat, 14 Aug 1999 01:53:04 -0700 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <37B4A71B.2073875F@lemburg.com> Message-ID: <37B52E70.2D957546@lyra.org> M.-A. Lemburg wrote: > > Greg Stein wrote: > > > > On Tue, 10 Aug 1999, Fredrik Lundh wrote: > > > maybe the unicode class shouldn't implement the > > > buffer interface at all? sure looks like the best way > > > > It is needed for fp.write(unicodeobj) ... > > > > It is also very handy for C functions to deal with Unicode strings. > > Wouldn't a special C API be (even) more convenient ? Why? Accessing the Unicode values as a series of bytes matches exactly to the semantics of the buffer interface. Why throw in Yet Another Function? Your abstract.c functions make it quite simple. > > > to avoid trivial mistakes (the current behaviour of > > > fp.write(unicodeobj) is even more serious than the > > > marshal glitch...) > > > > What's wrong with fp.write(unicodeobj)? It should write the unicode value > > to the file. Are you suggesting that it will need to be done differently? > > Icky. > > Would this also write some kind of Unicode encoding header ? > [Sorry, this is my Unicode ignorance shining through... I only > remember lots of talk about these things on the string-sig.] Absolutely not. Placing the Byte Order Mark (BOM) into an output stream is an application-level task. It should never by done by any subsystem. There are no other "encoding headers" that would go into the output stream. The output would simply be UTF-16 (2-byte values in host byte order). > Since fp.write() uses "s#" this would use the getreadbuffer > slot in 1.5.2... I think what it *should* do is use the > getcharbuffer slot instead (see my other post), since dumping > the raw unicode data would loose too much information. Again, I very much disagree. To me, fp.write() is not about writing characters to a stream. I think it makes much more sense as "writing bytes to a stream" and the buffer interface fits that perfectly. There is no loss of data. You could argue that the byte order is lost, but I think that is incorrect. The application defines the semantics: the file might be defined as using host-order, or the application may be writing a BOM at the head of the file. > such things should be handled by extra methods, e.g. fp.rawwrite(). I believe this would be a needless complication of the interface. > Hmm, I guess the philosophy behind the interface is not > really clear. I didn't design or implement it initially, but (as you may have guessed) I am a proponent of its existence. > Binary data is fetched via getreadbuffer and then > interpreted as character data... I always thought that the > getcharbuffer should be used for such an interpretation. The former is bad behavior. That is why getcharbuffer was added (by me, for 1.5.2). It was a preventative measure for the introduction of Unicode strings. Using getreadbuffer for characters would break badly given a Unicode string. Therefore, "clients" that want (8-bit) characters from an object supporting the buffer interface should use getcharbuffer. The Unicode object doesn't implement it, implying that it cannot provide 8-bit characters. You can get the raw bytes thru getreadbuffer. > Or maybe, we should dump the getcharbufer slot again and > use the getreadbuffer information just as we would a > void* pointer in C: with no explicit or implicit type information. Nope. That path is frought with failure :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Sat Aug 14 12:21:51 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 14 Aug 1999 12:21:51 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <19990813214817.5393C1C4742@oratrix.oratrix.nl> Message-ID: <37B5433F.61CE6F76@lemburg.com> Jack Jansen wrote: > > This week again I was bitten by the fact that Python doesn't have any > form of weak references, and while I was toying with some ideas I came > up with the following quick-and-dirty scheme that I thought I'd bounce > off this list. I might even volunteer to implement it, if people agree > it is worth it:-) Have you checked the weak reference dictionary implementation by Dieter Maurer ? It's at: http://www.handshake.de/~dieter/weakdict.html While I like the idea of having weak references in the core, I think 4 extra bytes for *every* object is just a little too much. The flag bit idea (with the added global dictionary of weak referenced objects) looks promising though. BTW, how would this be done in JPython ? I guess it doesn't make much sense there because cycles are no problem for the Java VM GC. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 139 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Sat Aug 14 14:30:45 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 14 Aug 1999 14:30:45 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> Message-ID: <37B56175.23ABB350@lemburg.com> Greg Stein wrote: > > M.-A. Lemburg wrote: > > > > Greg Stein wrote: > > > > > > On Tue, 10 Aug 1999, Fredrik Lundh wrote: > > > > maybe the unicode class shouldn't implement the > > > > buffer interface at all? sure looks like the best way > > > > > > It is needed for fp.write(unicodeobj) ... > > > > > > It is also very handy for C functions to deal with Unicode strings. > > > > Wouldn't a special C API be (even) more convenient ? > > Why? Accessing the Unicode values as a series of bytes matches exactly > to the semantics of the buffer interface. Why throw in Yet Another > Function? I meant PyUnicode_* style APIs for dealing with all the aspects of Unicode objects -- much like the PyString_* APIs available. > Your abstract.c functions make it quite simple. BTW, do we need an extra set of those with buffer index or not ? Those would really be one-liners for the sake of hiding the type slots from applications. > > > > to avoid trivial mistakes (the current behaviour of > > > > fp.write(unicodeobj) is even more serious than the > > > > marshal glitch...) > > > > > > What's wrong with fp.write(unicodeobj)? It should write the unicode value > > > to the file. Are you suggesting that it will need to be done differently? > > > Icky. > > > > Would this also write some kind of Unicode encoding header ? > > [Sorry, this is my Unicode ignorance shining through... I only > > remember lots of talk about these things on the string-sig.] > > Absolutely not. Placing the Byte Order Mark (BOM) into an output stream > is an application-level task. It should never by done by any subsystem. > > There are no other "encoding headers" that would go into the output > stream. The output would simply be UTF-16 (2-byte values in host byte > order). Ok. > > Since fp.write() uses "s#" this would use the getreadbuffer > > slot in 1.5.2... I think what it *should* do is use the > > getcharbuffer slot instead (see my other post), since dumping > > the raw unicode data would loose too much information. Again, > > I very much disagree. To me, fp.write() is not about writing characters > to a stream. I think it makes much more sense as "writing bytes to a > stream" and the buffer interface fits that perfectly. This is perfectly ok, but shouldn't the behaviour of fp.write() mimic that of previous Python versions ? How does JPython write the data ? Inlined different subject: I think the internal semantics of "s#" using the getreadbuffer slot and "t#" the getcharbuffer slot should be switched; see my other post. In previous Python versions "s#" had the semantics of string data with possibly embedded NULL bytes. Now it suddenly has the meaning of binary data and you can't simply change extensions to use the new "t#" because people are still using them with older Python versions. > There is no loss of data. You could argue that the byte order is lost, > but I think that is incorrect. The application defines the semantics: > the file might be defined as using host-order, or the application may be > writing a BOM at the head of the file. The problem here is that many application were not written to handle these kind of objects. Previously they could only handle strings, now they can suddenly handle any object having the buffer interface and then fail when the data gets read back in. > > such things should be handled by extra methods, e.g. fp.rawwrite(). > > I believe this would be a needless complication of the interface. It would clarify things and make the interface 100% backward compatible again. > > Hmm, I guess the philosophy behind the interface is not > > really clear. > > I didn't design or implement it initially, but (as you may have guessed) > I am a proponent of its existence. > > > Binary data is fetched via getreadbuffer and then > > interpreted as character data... I always thought that the > > getcharbuffer should be used for such an interpretation. > > The former is bad behavior. That is why getcharbuffer was added (by me, > for 1.5.2). It was a preventative measure for the introduction of > Unicode strings. Using getreadbuffer for characters would break badly > given a Unicode string. Therefore, "clients" that want (8-bit) > characters from an object supporting the buffer interface should use > getcharbuffer. The Unicode object doesn't implement it, implying that it > cannot provide 8-bit characters. You can get the raw bytes thru > getreadbuffer. I agree 100%, but did you add the "t#" instead of having "s#" use the getcharbuffer interface ? E.g. my mxTextTools package uses "s#" on many APIs. Now someone could stick in a Unicode object and get pretty strange results without any notice about mxTextTools and Unicode being incompatible. You could argue that I change to "t#", but that doesn't work since many people out there still use Python versions <1.5.2 and those didn't have "t#", so mxTextTools would then fail completely for them. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 139 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein at lyra.org Sat Aug 14 13:34:17 1999 From: gstein at lyra.org (Greg Stein) Date: Sat, 14 Aug 1999 04:34:17 -0700 Subject: [Python-Dev] buffer design (was: marshal (was:Buffer interface in abstract.c?)) References: <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> Message-ID: <37B55439.683272D2@lyra.org> M.-A. Lemburg wrote: >... > I meant PyUnicode_* style APIs for dealing with all the aspects > of Unicode objects -- much like the PyString_* APIs available. Sure, these could be added as necessary. For raw access to the bytes, I would refer people to the abstract buffer functions, tho. > > Your abstract.c functions make it quite simple. > > BTW, do we need an extra set of those with buffer index or not ? > Those would really be one-liners for the sake of hiding the > type slots from applications. It sounds like NumPy and PIL would need it, which makes the landscape quite a bit different from the last time we discussed this (when we didn't imagine anybody needing those). >... > > > Since fp.write() uses "s#" this would use the getreadbuffer > > > slot in 1.5.2... I think what it *should* do is use the > > > getcharbuffer slot instead (see my other post), since dumping > > > the raw unicode data would loose too much information. Again, > > > > I very much disagree. To me, fp.write() is not about writing characters > > to a stream. I think it makes much more sense as "writing bytes to a > > stream" and the buffer interface fits that perfectly. > > This is perfectly ok, but shouldn't the behaviour of fp.write() > mimic that of previous Python versions ? How does JPython > write the data ? fp.write() had no semantics for writing Unicode objects since they didn't exist. Therefore, we are not breaking or changing any behavior. > Inlined different subject: > I think the internal semantics of "s#" using the getreadbuffer slot > and "t#" the getcharbuffer slot should be switched; see my other post. 1) Too late 2) The use of "t#" ("text") for the getcharbuffer slot was decided by the Benevolent Dictator. 3) see (2) > In previous Python versions "s#" had the semantics of string data > with possibly embedded NULL bytes. Now it suddenly has the meaning > of binary data and you can't simply change extensions to use the > new "t#" because people are still using them with older Python > versions. Guido and I had a pretty long discussion on what the best approach here was. I think we even pulled in Tim as a final arbiter, as I recall. I believe "s#" remained getreadbuffer simply because it *also* meant "give me the bytes of that object". If it changed to getcharbuffer, then you could see exceptions in code that didn't raise exceptions beforehand. (more below) > > There is no loss of data. You could argue that the byte order is lost, > > but I think that is incorrect. The application defines the semantics: > > the file might be defined as using host-order, or the application may be > > writing a BOM at the head of the file. > > The problem here is that many application were not written > to handle these kind of objects. Previously they could only > handle strings, now they can suddenly handle any object > having the buffer interface and then fail when the data > gets read back in. An application is a complete unit. How are you suddenly going to manifest Unicode objects within that application? The only way is if the developer goes in and changes things; let them deal with the issues and fallout of their change. The other is external changes such as an upgrade to the interpreter or a module. Again, (IMO) if you're perturbing a system, then you are responsible for also correcting any problems you introduce. In any case, Guido's position was that things can easily switch over to the "t#" interface to prevent the class of error where you pass a Unicode string to a function that expects a standard string. > > > such things should be handled by extra methods, e.g. fp.rawwrite(). > > > > I believe this would be a needless complication of the interface. > > It would clarify things and make the interface 100% backward > compatible again. No. "s#" used to pull bytes from any buffer-capable object. Your suggestion for "s#" to use the getcharbuffer could introduce exceptions into currently-working code. (this was probably Guido's prime motivation for the currently meaning of "t#"... I can dig up the mail thread if people need an authoritative commentary on the decision that was made) > > > Hmm, I guess the philosophy behind the interface is not > > > really clear. > > > > I didn't design or implement it initially, but (as you may have guessed) > > I am a proponent of its existence. > > > > > Binary data is fetched via getreadbuffer and then > > > interpreted as character data... I always thought that the > > > getcharbuffer should be used for such an interpretation. > > > > The former is bad behavior. That is why getcharbuffer was added (by me, > > for 1.5.2). It was a preventative measure for the introduction of > > Unicode strings. Using getreadbuffer for characters would break badly > > given a Unicode string. Therefore, "clients" that want (8-bit) > > characters from an object supporting the buffer interface should use > > getcharbuffer. The Unicode object doesn't implement it, implying that it > > cannot provide 8-bit characters. You can get the raw bytes thru > > getreadbuffer. > > I agree 100%, but did you add the "t#" instead of having > "s#" use the getcharbuffer interface ? Yes. For reasons detailed above. > E.g. my mxTextTools > package uses "s#" on many APIs. Now someone could stick > in a Unicode object and get pretty strange results without > any notice about mxTextTools and Unicode being incompatible. They could also stick in an array of integers. That supports the buffer interface, meaning the "s#" in your code would extract the bytes from it. In other words, people can already stick bogus stuff into your code. This seems to be a moot argument. > You could argue that I change to "t#", but that doesn't > work since many people out there still use Python versions > <1.5.2 and those didn't have "t#", so mxTextTools would then > fail completely for them. If support for the older versions is needed, then use an #ifdef to set up the appropriate macro in some header. Use that throughout your code. In any case: yes -- I would argue that you should absolutely be using "t#". Cheers, -g -- Greg Stein, http://www.lyra.org/ From fredrik at pythonware.com Sat Aug 14 15:19:07 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat, 14 Aug 1999 15:19:07 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> Message-ID: <003101bee657$972d1550$f29b12c2@secret.pythonware.com> M.-A. Lemburg wrote: > I meant PyUnicode_* style APIs for dealing with all the aspects > of Unicode objects -- much like the PyString_* APIs available. it's already there, of course. see unicode.h in the unicode distribution (Mark is hopefully adding this to 1.6 in this very moment...) > > I very much disagree. To me, fp.write() is not about writing characters > > to a stream. I think it makes much more sense as "writing bytes to a > > stream" and the buffer interface fits that perfectly. > > This is perfectly ok, but shouldn't the behaviour of fp.write() > mimic that of previous Python versions ? How does JPython > write the data ? the crucial point is how an average user expects things to work. the current design is quite assymmetric -- you can easily *write* things that implement the buffer inter- face to a stream, but how the heck do you get them back? (as illustrated by the marshal buglet...) From fredrik at pythonware.com Sat Aug 14 17:21:48 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat, 14 Aug 1999 17:21:48 +0200 Subject: [Python-Dev] buffer design (was: marshal (was:Buffer interface in abstract.c?)) References: <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> <37B55439.683272D2@lyra.org> Message-ID: <004201bee668$ba6e9870$f29b12c2@secret.pythonware.com> Greg Stein wrote: > > E.g. my mxTextTools > > package uses "s#" on many APIs. Now someone could stick > > in a Unicode object and get pretty strange results without > > any notice about mxTextTools and Unicode being incompatible. > > They could also stick in an array of integers. That supports the buffer > interface, meaning the "s#" in your code would extract the bytes from > it. In other words, people can already stick bogus stuff into your code. Except that people may expect unicode strings to work just like any other kind of string, while arrays are surely a different thing. I'm beginning to suspect that the current buffer design is partially broken; it tries to work around at least two problems at once: a) the current use of "string" objects for two purposes: as strings of 8-bit characters, and as buffers containing arbitrary binary data. b) performance issues when reading/writing certain kinds of data to/from streams. and fails to fully address either of them. From mal at lemburg.com Sat Aug 14 18:30:21 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 14 Aug 1999 18:30:21 +0200 Subject: [Python-Dev] Re: buffer design References: <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> <37B55439.683272D2@lyra.org> Message-ID: <37B5999D.201EA88C@lemburg.com> Greg Stein wrote: > > M.-A. Lemburg wrote: > >... > > I meant PyUnicode_* style APIs for dealing with all the aspects > > of Unicode objects -- much like the PyString_* APIs available. > > Sure, these could be added as necessary. For raw access to the bytes, I > would refer people to the abstract buffer functions, tho. I guess that's up to them... PyUnicode_AS_WCHAR() could also be exposed I guess (are C's wchar strings useable as Unicode basis ?). > > > Your abstract.c functions make it quite simple. > > > > BTW, do we need an extra set of those with buffer index or not ? > > Those would really be one-liners for the sake of hiding the > > type slots from applications. > > It sounds like NumPy and PIL would need it, which makes the landscape > quite a bit different from the last time we discussed this (when we > didn't imagine anybody needing those). Ok, then I'll add them and post the new set next week. > >... > > > > Since fp.write() uses "s#" this would use the getreadbuffer > > > > slot in 1.5.2... I think what it *should* do is use the > > > > getcharbuffer slot instead (see my other post), since dumping > > > > the raw unicode data would loose too much information. Again, > > > > > > I very much disagree. To me, fp.write() is not about writing characters > > > to a stream. I think it makes much more sense as "writing bytes to a > > > stream" and the buffer interface fits that perfectly. > > > > This is perfectly ok, but shouldn't the behaviour of fp.write() > > mimic that of previous Python versions ? How does JPython > > write the data ? > > fp.write() had no semantics for writing Unicode objects since they > didn't exist. Therefore, we are not breaking or changing any behavior. The problem is hidden in polymorph functions and tools: previously they could not handle anything but strings, now they also work on arbitrary buffers without raising exceptions. That's what I'm concerned about. > > Inlined different subject: > > I think the internal semantics of "s#" using the getreadbuffer slot > > and "t#" the getcharbuffer slot should be switched; see my other post. > > 1) Too late > 2) The use of "t#" ("text") for the getcharbuffer slot was decided by > the Benevolent Dictator. > 3) see (2) 1) It's not too late: most people aren't even aware of the buffer interface (except maybe the small crowd on this list). 2) A mistake in patchlevel release of Python can easily be undone in the next minor release. No big deal. 3) Too remain even compatible to 1.5.2 in future revisions, a new explicit marker, e.g. "r#" for raw data, could be added to hold the code for getreadbuffer. "s#" and "z#" should then switch to using getcharbuffer. > > In previous Python versions "s#" had the semantics of string data > > with possibly embedded NULL bytes. Now it suddenly has the meaning > > of binary data and you can't simply change extensions to use the > > new "t#" because people are still using them with older Python > > versions. > > Guido and I had a pretty long discussion on what the best approach here > was. I think we even pulled in Tim as a final arbiter, as I recall. What was the final argument then ? (I guess the discussion was held *before* the addition of getcharbuffer, right ?) > I believe "s#" remained getreadbuffer simply because it *also* meant > "give me the bytes of that object". If it changed to getcharbuffer, then > you could see exceptions in code that didn't raise exceptions > beforehand. > > (more below) "s#" historically always meant "give be char* data with length". It did not mean: "give me a pointer to the data area and its length". That interpretation is new in 1.5.2. Even integers and lists could provide buffer access with the new interpretation... (sound evil ;-) > > > There is no loss of data. You could argue that the byte order is lost, > > > but I think that is incorrect. The application defines the semantics: > > > the file might be defined as using host-order, or the application may be > > > writing a BOM at the head of the file. > > > > The problem here is that many application were not written > > to handle these kind of objects. Previously they could only > > handle strings, now they can suddenly handle any object > > having the buffer interface and then fail when the data > > gets read back in. > > An application is a complete unit. How are you suddenly going to > manifest Unicode objects within that application? The only way is if the > developer goes in and changes things; let them deal with the issues and > fallout of their change. The other is external changes such as an > upgrade to the interpreter or a module. Again, (IMO) if you're > perturbing a system, then you are responsible for also correcting any > problems you introduce. Well, ok, if you're talking about standalone apps. I was referring to applications which interact with other applications, e.g. via files or sockets. You could pass a Unicode obj to a socket and have it transfer the data to the other end without getting an exception on the sending part of the connection. The receiver would read the data as string and most probably fail. The whole application sitting in between and dealing with the protocol and connection management wouldn't even notice that you've just tried to extended its capabilities. > In any case, Guido's position was that things can easily switch over to > the "t#" interface to prevent the class of error where you pass a > Unicode string to a function that expects a standard string. Strange, why should code that relies on 8-bit character data be changed because a new unsupported object type pops up ? Code supporting the new type will have to be rewritten anyway, but why break existing extensions in unpredicted ways ? > > > > such things should be handled by extra methods, e.g. fp.rawwrite(). > > > > > > I believe this would be a needless complication of the interface. > > > > It would clarify things and make the interface 100% backward > > compatible again. > > No. "s#" used to pull bytes from any buffer-capable object. Your > suggestion for "s#" to use the getcharbuffer could introduce exceptions > into currently-working code. The buffer objects were introduced in 1.5.1, AFAIR. Changing the semantics back to the original ones would only break extensions relying on the behaviour you desribe -- the distribution can easily be adapted to use some other marker, such as "r#". > (this was probably Guido's prime motivation for the currently meaning of > "t#"... I can dig up the mail thread if people need an authoritative > commentary on the decision that was made) > > > > > Hmm, I guess the philosophy behind the interface is not > > > > really clear. > > > > > > I didn't design or implement it initially, but (as you may have guessed) > > > I am a proponent of its existence. > > > > > > > Binary data is fetched via getreadbuffer and then > > > > interpreted as character data... I always thought that the > > > > getcharbuffer should be used for such an interpretation. > > > > > > The former is bad behavior. That is why getcharbuffer was added (by me, > > > for 1.5.2). It was a preventative measure for the introduction of > > > Unicode strings. Using getreadbuffer for characters would break badly > > > given a Unicode string. Therefore, "clients" that want (8-bit) > > > characters from an object supporting the buffer interface should use > > > getcharbuffer. The Unicode object doesn't implement it, implying that it > > > cannot provide 8-bit characters. You can get the raw bytes thru > > > getreadbuffer. > > > > I agree 100%, but did you add the "t#" instead of having > > "s#" use the getcharbuffer interface ? > > Yes. For reasons detailed above. > > > E.g. my mxTextTools > > package uses "s#" on many APIs. Now someone could stick > > in a Unicode object and get pretty strange results without > > any notice about mxTextTools and Unicode being incompatible. > > They could also stick in an array of integers. That supports the buffer > interface, meaning the "s#" in your code would extract the bytes from > it. In other words, people can already stick bogus stuff into your code. Right now they can with 1.5.1 and 1.5.2 which is unfortunate. I'd rather have the parsing function raise an exception. > This seems to be a moot argument. Not really when you have to support extensions across three different patchlevels of Python. > > You could argue that I change to "t#", but that doesn't > > work since many people out there still use Python versions > > <1.5.2 and those didn't have "t#", so mxTextTools would then > > fail completely for them. > > If support for the older versions is needed, then use an #ifdef to set > up the appropriate macro in some header. Use that throughout your code. > > In any case: yes -- I would argue that you should absolutely be using > "t#". I can easily change my code, no big deal, but what about the dozens of other extensions I don't want to bother diving into ? I'd rather see an exception then complete garbage written to a file or a socket. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 139 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Sat Aug 14 18:53:45 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 14 Aug 1999 18:53:45 +0200 Subject: [Python-Dev] buffer design References: <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> <37B55439.683272D2@lyra.org> <004201bee668$ba6e9870$f29b12c2@secret.pythonware.com> Message-ID: <37B59F19.45C1D23B@lemburg.com> Fredrik Lundh wrote: > > Greg Stein wrote: > > > E.g. my mxTextTools > > > package uses "s#" on many APIs. Now someone could stick > > > in a Unicode object and get pretty strange results without > > > any notice about mxTextTools and Unicode being incompatible. > > > > They could also stick in an array of integers. That supports the buffer > > interface, meaning the "s#" in your code would extract the bytes from > > it. In other words, people can already stick bogus stuff into your code. > > Except that people may expect unicode strings > to work just like any other kind of string, while > arrays are surely a different thing. > > I'm beginning to suspect that the current buffer > design is partially broken; it tries to work around > at least two problems at once: > > a) the current use of "string" objects for two purposes: > as strings of 8-bit characters, and as buffers containing > arbitrary binary data. > > b) performance issues when reading/writing certain kinds > of data to/from streams. > > and fails to fully address either of them. True, a higher level interface for those two objectives would certainly address them much better than what we are trying to do at bit level. Buffers should probably only be treated as pointers to abstract memory areas and nothing more. BTW, what about my suggestion to extend buffers to also allocate memory (in case you pass None as object) ? Or should array be used for that purpose (its an undocumented feature of arrays) ? -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 139 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein at lyra.org Sun Aug 15 04:59:25 1999 From: gstein at lyra.org (Greg Stein) Date: Sat, 14 Aug 1999 19:59:25 -0700 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> Message-ID: <37B62D0D.6EC24240@lyra.org> Fredrik Lundh wrote: >... > besides, what about buffers and threads? if you > return a pointer from getreadbuf, wouldn't it be > good to know exactly when Python doesn't need > that pointer any more? explicit initbuffer/exitbuffer > calls around each sequence of buffer operations > would make that a lot safer... This is a pretty obvious one, I think: it lasts only as long as the object. PyString_AS_STRING is similar. Nothing new or funny here. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sun Aug 15 05:09:19 1999 From: gstein at lyra.org (Greg Stein) Date: Sat, 14 Aug 1999 20:09:19 -0700 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us> <37B46F1C.1A513F33@lemburg.com> Message-ID: <37B62F5E.30C62070@lyra.org> M.-A. Lemburg wrote: > > Fred L. Drake, Jr. wrote: > > > > M.-A. Lemburg writes: > > > Aside: Is the buffer interface reachable in any way from within > > > Python ? Why isn't the interface exposed via __XXX__ methods > > > on normal Python instances (could be implemented by returning a > > > buffer object) ? > > > > Would it even make sense? I though a large part of the intent was > > to for performance, avoiding memory copies. Perhaps there should be > > an .__as_buffer__() which returned an object that supports the C > > buffer interface. I'm not sure how useful it would be; perhaps for > > classes that represent image data? They could return a buffer object > > created from a string/array/NumPy array. There is no way to do this. The buffer interface only returns pointers to memory. There would be no place to return an intermediary object, nor a way to retain the reference for it. For example, your class instance quickly sets up a PyBufferObject with the relevant data and returns that. The underlying C code must now hold that reference *and* return a pointer to the calling code. Impossible. Fredrik's open/close concept for buffer accesses would make this possible, as long as clients are aware that any returned pointer is valid only until the buffer_close call. The context argument he proposes would hold the object reference. Having class instances respond to the buffer interface is interesting, but until more code attempts to *use* the interface, I'm not quite sure of the utility... >... > Hmm, how about adding a writeable binary object to the core ? > This would be useful for the __getwritebbuffer__() API because > currently, I think, only mmap'ed files are useable as write > buffers -- no other in-memory type. Perhaps buffer objects > could be used for this purpose too, e.g. by having them > allocate the needed memory chunk in case you pass None as > object. Yes, this would be very good. I would recommend that you pass an integer, however, rather than None. You need to tell it the size of the buffer to allocate. Since buffer(5) has no meaning at the moment, altering the semantics to include this form would not be a problem. Cheers, -g -- Greg Stein, http://www.lyra.org/ From da at ski.org Sun Aug 15 08:10:59 1999 From: da at ski.org (David Ascher) Date: Sat, 14 Aug 1999 23:10:59 -0700 (Pacific Daylight Time) Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: <37B62F5E.30C62070@lyra.org> Message-ID: On Sat, 14 Aug 1999, Greg Stein wrote: > Having class instances respond to the buffer interface is interesting, > but until more code attempts to *use* the interface, I'm not quite sure > of the utility... Well, here's an example from my work today. Maybe someone can suggest an alternative that I haven't seen. I'm using buffer objects to pass pointers to structs back and forth between Python and Windows (Win32's GUI scheme involves sending messages to functions with, oftentimes, addresses of structs as arguments, and expect the called function to modify the struct directly -- similarly, I must call Win32 functions w/ pointers to memory that Windows will modify, and be able to read the modified memory). With 'raw' buffer object manipulation (after exposing the PyBuffer_FromMemoryReadWrite call to Python), this works fine [*]. So far, no instances. I also have a class which allows the user to describe the buffer memory layout in a natural way given the C struct, and manipulate the buffer layout w/ getattr/setattr. For example: class Win32MenuItemStruct(AutoStruct): # # for each slot, specify type (maps to a struct.pack specifier), # name (for setattr/getattr behavior) and optional defaults. # table = [(UINT, 'cbSize', AutoStruct.sizeOfStruct), (UINT, 'fMask', MIIM_STRING | MIIM_TYPE | MIIM_ID), (UINT, 'fType', MFT_STRING), (UINT, 'fState', MFS_ENABLED), (UINT, 'wID', None), (HANDLE, 'hSubMenu', 0), (HANDLE, 'hbmpChecked', 0), (HANDLE, 'hbmpUnchecked', 0), (DWORD, 'dwItemData', 0), (LPSTR, 'name', None), (UINT, 'cch', 0)] AutoStruct has machinery which allows setting of buffer slices by slot name, conversion of numeric types, etc. This is working well. The only hitch is that to send the buffer to the SWIG'ed function call, I have three options, none ideal: 1) define a __str__ method which makes a string of the buffer and pass that to the function which expects an "s#" argument. This send a copy of the data, not the address. As a result, this works well for structs which I create from scratch as long as I don't need to see any changes that Windows might have performed on the memory. 2) send the instance but make up my own 'get-the-instance-as-buffer' API -- complicates extension module code. 3) send the buffer attribute of the instance instead of the instance -- complicates Python code, and the C code isn't trivial because there is no 'buffer' typecode for PyArg_ParseTuple(). If I could define an def __aswritebuffer__ and if there was a PyArg_ParseTuple() typecode associated with read/write buffers (I nominate 'w'!), I believe things would be simpler -- I could then send the instance, specify in the PyArgParse_Tuple that I want a pointer to memory, and I'd be golden. What did I miss? --david [*] I feel naughty modifying random bits of memory from Python, but Bill Gates made me do it! From mal at lemburg.com Sun Aug 15 10:47:00 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 15 Aug 1999 10:47:00 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us> <37B46F1C.1A513F33@lemburg.com> <37B62F5E.30C62070@lyra.org> Message-ID: <37B67E84.6BBC8136@lemburg.com> Greg Stein wrote: > > [me suggesting new __XXX__ methods on Python instances to provide > the buffer slots to Python programmers] > > Having class instances respond to the buffer interface is interesting, > but until more code attempts to *use* the interface, I'm not quite sure > of the utility... Well, there already is lots of code supporting the interface, e.g. fp.write(), socket.write() etc. Basically all streaming interfaces I guess. So these APIs could be used to "write" the object directly into a file. > >... > > Hmm, how about adding a writeable binary object to the core ? > > This would be useful for the __getwritebbuffer__() API because > > currently, I think, only mmap'ed files are useable as write > > buffers -- no other in-memory type. Perhaps buffer objects > > could be used for this purpose too, e.g. by having them > > allocate the needed memory chunk in case you pass None as > > object. > > Yes, this would be very good. I would recommend that you pass an > integer, however, rather than None. You need to tell it the size of the > buffer to allocate. Since buffer(5) has no meaning at the moment, > altering the semantics to include this form would not be a problem. I was thinking of using the existing buffer(object,offset,size) constructor... that's why I took None as object. offset would then always be 0 and size gives the size of the memory chunk to allocate. Of course, buffer(size) would look nicer, but it seems a rather peculiar interface definition to say: ok, if you pass a real Python integer, we'll take that as size. Who knows, maybe at some in the future, you want to "write" integers via the buffer interface too... then you'd probably also want to write None... so how about a new builtin writebuffer(size) ? Also, I think it would make sense to extend buffers to have methods and attributes: .writeable - attribute that tells whether the buffer is writeable .chardata - true iff the getcharbuffer slot is available .asstring() - return the buffer as Python string object -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 138 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Sun Aug 15 10:59:21 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 15 Aug 1999 10:59:21 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: Message-ID: <37B68169.73E03C84@lemburg.com> David Ascher wrote: > > On Sat, 14 Aug 1999, Greg Stein wrote: > > > Having class instances respond to the buffer interface is interesting, > > but until more code attempts to *use* the interface, I'm not quite sure > > of the utility... > > Well, here's an example from my work today. Maybe someone can suggest an > alternative that I haven't seen. > > I'm using buffer objects to pass pointers to structs back and forth > between Python and Windows (Win32's GUI scheme involves sending messages > to functions with, oftentimes, addresses of structs as arguments, and > expect the called function to modify the struct directly -- similarly, I > must call Win32 functions w/ pointers to memory that Windows will modify, > and be able to read the modified memory). With 'raw' buffer object > manipulation (after exposing the PyBuffer_FromMemoryReadWrite call to > Python), this works fine [*]. So far, no instances. So that's why you were suggesting that struct.pack returns a buffer rather than a string ;-) Actually, I think you could use arrays to do the trick right now, because they are writeable (unlike strings). Until creating writeable buffer objects becomes possible that is... > I also have a class which allows the user to describe the buffer memory > layout in a natural way given the C struct, and manipulate the buffer > layout w/ getattr/setattr. For example: > > class Win32MenuItemStruct(AutoStruct): > # > # for each slot, specify type (maps to a struct.pack specifier), > # name (for setattr/getattr behavior) and optional defaults. > # > table = [(UINT, 'cbSize', AutoStruct.sizeOfStruct), > (UINT, 'fMask', MIIM_STRING | MIIM_TYPE | MIIM_ID), > (UINT, 'fType', MFT_STRING), > (UINT, 'fState', MFS_ENABLED), > (UINT, 'wID', None), > (HANDLE, 'hSubMenu', 0), > (HANDLE, 'hbmpChecked', 0), > (HANDLE, 'hbmpUnchecked', 0), > (DWORD, 'dwItemData', 0), > (LPSTR, 'name', None), > (UINT, 'cch', 0)] > > AutoStruct has machinery which allows setting of buffer slices by slot > name, conversion of numeric types, etc. This is working well. > > The only hitch is that to send the buffer to the SWIG'ed function call, I > have three options, none ideal: > > 1) define a __str__ method which makes a string of the buffer and pass > that to the function which expects an "s#" argument. This send > a copy of the data, not the address. As a result, this works > well for structs which I create from scratch as long as I don't need > to see any changes that Windows might have performed on the memory. > > 2) send the instance but make up my own 'get-the-instance-as-buffer' > API -- complicates extension module code. > > 3) send the buffer attribute of the instance instead of the instance -- > complicates Python code, and the C code isn't trivial because there > is no 'buffer' typecode for PyArg_ParseTuple(). > > If I could define an > > def __aswritebuffer__ > > and if there was a PyArg_ParseTuple() typecode associated with read/write > buffers (I nominate 'w'!), I believe things would be simpler -- I could > then send the instance, specify in the PyArgParse_Tuple that I want a > pointer to memory, and I'd be golden. > > What did I miss? Just a naming thingie: __getwritebuffer__ et al. would map to the C interfaces more directly. The new typecode "w#" for writeable buffer style objects is a good idea (it should only work on single segment buffers). -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 138 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik at pythonware.com Sun Aug 15 12:32:59 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sun, 15 Aug 1999 12:32:59 +0200 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> Message-ID: <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> > Fredrik Lundh wrote: > >... > > besides, what about buffers and threads? if you > > return a pointer from getreadbuf, wouldn't it be > > good to know exactly when Python doesn't need > > that pointer any more? explicit initbuffer/exitbuffer > > calls around each sequence of buffer operations > > would make that a lot safer... > > This is a pretty obvious one, I think: it lasts only as long as the > object. PyString_AS_STRING is similar. Nothing new or funny here. well, I think the buffer behaviour is both new and pretty funny: from array import array a = array("f", [0]*8192) b = buffer(a) for i in range(1000): a.append(1234) print b in other words, the buffer interface should be redesigned, or removed. (though I'm sure AOL would find some inter- resting use for this ;-) "Confusing? Yes, but this is a lot better than allowing arbitrary pointers!" -- GvR on assignment operators, November 91 From da at ski.org Sun Aug 15 18:54:23 1999 From: da at ski.org (David Ascher) Date: Sun, 15 Aug 1999 09:54:23 -0700 (Pacific Daylight Time) Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: <37B68169.73E03C84@lemburg.com> Message-ID: On Sun, 15 Aug 1999, M.-A. Lemburg wrote: > Actually, I think you could use arrays to do the trick right now, > because they are writeable (unlike strings). Until creating > writeable buffer objects becomes possible that is... No, because I can't make an array around existing memory which Win32 allocates before I get to it. > Just a naming thingie: __getwritebuffer__ et al. would map to the > C interfaces more directly. Whatever. > The new typecode "w#" for writeable buffer style objects is a good idea > (it should only work on single segment buffers). Indeed. --david From gstein at lyra.org Sun Aug 15 22:27:57 1999 From: gstein at lyra.org (Greg Stein) Date: Sun, 15 Aug 1999 13:27:57 -0700 Subject: [Python-Dev] w# typecode (was: marshal (was:Buffer interface in abstract.c? )) References: Message-ID: <37B722CD.383A2A9E@lyra.org> David Ascher wrote: > On Sun, 15 Aug 1999, M.-A. Lemburg wrote: > ... > > The new typecode "w#" for writeable buffer style objects is a good idea > > (it should only work on single segment buffers). > > Indeed. I just borrowed Guido's time machine. That typecode is already in 1.5.2. :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sun Aug 15 22:35:25 1999 From: gstein at lyra.org (Greg Stein) Date: Sun, 15 Aug 1999 13:35:25 -0700 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> Message-ID: <37B7248D.31E5D2BF@lyra.org> Fredrik Lundh wrote: >... > well, I think the buffer behaviour is both > new and pretty funny: I think the buffer interface was introduced in 1.5 (by Jack?). I added the 8-bit character buffer slot and buffer objects in 1.5.2. > from array import array > > a = array("f", [0]*8192) > > b = buffer(a) > > for i in range(1000): > a.append(1234) > > print b > > in other words, the buffer interface should > be redesigned, or removed. I don't understand what you believe is weird here. Also, are you saying the buffer *interface* is weird, or the buffer *object* ? thx, -g -- Greg Stein, http://www.lyra.org/ From da at ski.org Sun Aug 15 22:49:23 1999 From: da at ski.org (David Ascher) Date: Sun, 15 Aug 1999 13:49:23 -0700 (Pacific Daylight Time) Subject: [Python-Dev] w# typecode (was: marshal (was:Buffer interface in abstract.c? )) In-Reply-To: <37B722CD.383A2A9E@lyra.org> Message-ID: On Sun, 15 Aug 1999, Greg Stein wrote: > David Ascher wrote: > > On Sun, 15 Aug 1999, M.-A. Lemburg wrote: > > ... > > > The new typecode "w#" for writeable buffer style objects is a good idea > > > (it should only work on single segment buffers). > > > > Indeed. > > I just borrowed Guido's time machine. That typecode is already in 1.5.2. Ha. Cool. --da From gstein at lyra.org Sun Aug 15 22:53:51 1999 From: gstein at lyra.org (Greg Stein) Date: Sun, 15 Aug 1999 13:53:51 -0700 Subject: [Python-Dev] instances as buffers References: Message-ID: <37B728DF.2CA2A20A@lyra.org> David Ascher wrote: >... > I'm using buffer objects to pass pointers to structs back and forth > between Python and Windows (Win32's GUI scheme involves sending messages > to functions with, oftentimes, addresses of structs as arguments, and > expect the called function to modify the struct directly -- similarly, I > must call Win32 functions w/ pointers to memory that Windows will modify, > and be able to read the modified memory). With 'raw' buffer object > manipulation (after exposing the PyBuffer_FromMemoryReadWrite call to > Python), this works fine [*]. So far, no instances. How do you manage the lifetimes of the memory and objects? PyBuffer_FromReadWriteMemory() creates a buffer object that points to memory. You need to ensure that the memory exists as long as the buffer does. Would it make more sense to use PyBuffer_New(size)? Note: PyBuffer_FromMemory() (read-only) was built primarily for the case where you have static constants in an extension module (strings, code objects, etc) and want to expose them to Python without copying them into the heap. Currently, stuff like this must be copied into a dynamic string object to be exposed to Python. The PyBuffer_FromReadWriteMemory() is there for symmetry, but it can be very dangerous to use because of the lifetime problem. PyBuffer_New() allocates its own memory, so the lifetimes are managed properly. PyBuffer_From*Object maintains a reference to the target object so that the target object can be kept around at least as long as the buffer. > I also have a class which allows the user to describe the buffer memory > layout in a natural way given the C struct, and manipulate the buffer > layout w/ getattr/setattr. For example: This is a very cool class. Mark and I had discussed doing something just like this (a while back) for some of the COM stuff. Basically, we'd want to generate these structures from type libraries. >... > The only hitch is that to send the buffer to the SWIG'ed function call, I > have three options, none ideal: > > 1) define a __str__ method which makes a string of the buffer and pass > that to the function which expects an "s#" argument. This send > a copy of the data, not the address. As a result, this works > well for structs which I create from scratch as long as I don't need > to see any changes that Windows might have performed on the memory. Note that "s#" can be used directly against the buffer object. You could pass it directly rather than via __str__. > 2) send the instance but make up my own 'get-the-instance-as-buffer' > API -- complicates extension module code. > > 3) send the buffer attribute of the instance instead of the instance -- > complicates Python code, and the C code isn't trivial because there > is no 'buffer' typecode for PyArg_ParseTuple(). > > If I could define an > > def __aswritebuffer__ > > and if there was a PyArg_ParseTuple() typecode associated with read/write > buffers (I nominate 'w'!), I believe things would be simpler -- I could > then send the instance, specify in the PyArgParse_Tuple that I want a > pointer to memory, and I'd be golden. > > What did I miss? You can do #3 today since there is a buffer typecode present ("w" or "w#"). It will complicate Python code a bit since you need to pass the buffer, but it is the simplest of the three options. Allowing instances to return buffers does seem to make sense, although it exposes a lot of underlying machinery at the Python level. It might be nicer to find a better semantic for this than just exposing the buffer interface slots. Cheers, -g -- Greg Stein, http://www.lyra.org/ From da at ski.org Sun Aug 15 23:07:35 1999 From: da at ski.org (David Ascher) Date: Sun, 15 Aug 1999 14:07:35 -0700 (Pacific Daylight Time) Subject: [Python-Dev] Re: instances as buffers In-Reply-To: <37B728DF.2CA2A20A@lyra.org> Message-ID: On Sun, 15 Aug 1999, Greg Stein wrote: > How do you manage the lifetimes of the memory and objects? > PyBuffer_FromReadWriteMemory() creates a buffer object that points to > memory. You need to ensure that the memory exists as long as the buffer > does. For those cases where I use PyBuffer_FromReadWriteMemory, I have no control over the memory involved. Windows allocates the memory, lets me use it for a litle while, and it cleans it up whenever it feels like it. It hasn't been a problem yet, but I agree that it's possibly a problem. I'd call it a problem w/ the win32 API, though. > Would it make more sense to use PyBuffer_New(size)? Again, I can't because I am given a pointer and am expected to modify e.g. bytes 10-12 starting from that memory location. > This is a very cool class. Mark and I had discussed doing something just > like this (a while back) for some of the COM stuff. Basically, we'd want > to generate these structures from type libraries. I know zilch about type libraries. This is for CE work, although none about this class is CE-specific. Do type libraries give the same kind of info? > You can do #3 today since there is a buffer typecode present ("w" or > "w#"). It will complicate Python code a bit since you need to pass the > buffer, but it is the simplest of the three options. Ok. Time to patch SWIG again! --david From Vladimir.Marangozov at inrialpes.fr Mon Aug 16 01:35:10 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Mon, 16 Aug 1999 00:35:10 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <000101bee51f$d7601de0$fb2d2399@tim> from "Tim Peters" at "Aug 12, 99 08:07:32 pm" Message-ID: <199908152335.AAA55842@pukapuka.inrialpes.fr> Tim Peters wrote: > > Would be more valuable to rethink the debugger's breakpoint approach so that > SET_LINENO is never needed (line-triggered callbacks are expensive because > called so frequently, turning each dynamic SET_LINENO into a full-blown > Python call; if I used the debugger often enough to care , I'd think > about munging in a new opcode to make breakpoint sites explicit). > > immutability-is-made-to-be-violated-ly y'rs - tim > Could you elaborate a bit more on this? Do you mean setting breakpoints on a per opcode basis (for example by exchanging the original opcode with a new BREAKPOINT opcode in the code object) and use the lineno tab for breakpoints based on the source listing? -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tim_one at email.msn.com Mon Aug 16 04:31:16 1999 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 15 Aug 1999 22:31:16 -0400 Subject: [Python-Dev] about line numbers In-Reply-To: <199908152335.AAA55842@pukapuka.inrialpes.fr> Message-ID: <000101bee78f$6aa217e0$f22d2399@tim> [Vladimir Marangozov] > Could you elaborate a bit more on this? No time for this now -- sorry. > Do you mean setting breakpoints on a per opcode basis (for example > by exchanging the original opcode with a new BREAKPOINT opcode in > the code object) and use the lineno tab for breakpoints based on > the source listing? Something like that. The classic way to implement positional breakpoints is to perturb the code; the classic problem is how to get back the effect of the code that was overwritten. From gstein at lyra.org Mon Aug 16 06:42:19 1999 From: gstein at lyra.org (Greg Stein) Date: Sun, 15 Aug 1999 21:42:19 -0700 Subject: [Python-Dev] Re: why References: Message-ID: <37B796AB.34F6F93@lyra.org> David Ascher wrote: > > Why does buffer(array('c', 'test')) return a read-only buffer? Simply because the buffer() builtin always creates a read-only object, rather than selecting read/write when possible. Shouldn't be hard to alter the semantics of buffer() to do so. Maybe do this at the same time as updating it to create read/write buffers out of the blue. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim_one at email.msn.com Mon Aug 16 08:42:17 1999 From: tim_one at email.msn.com (Tim Peters) Date: Mon, 16 Aug 1999 02:42:17 -0400 Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: <19990813214817.5393C1C4742@oratrix.oratrix.nl> Message-ID: <000b01bee7b2$7c62d780$f22d2399@tim> [Jack Jansen] > ... A long time ago, Dianne Hackborn actually implemented a scheme like this, under the name VREF (for "virtual reference", or some such). IIRC, differences from your scheme were mainly that: 1) There was an elaborate proxy mechanism to avoid having to explicitly strengthen the weak. 2) Each object contained a pointer to a linked list of associated weak refs. This predates DejaNews, so may be a pain to find. > ... > We add a new builtin function (or a module with that function) > weak(). This returns a weak reference to the object passed as a > parameter. A weak object has one method: strong(), which returns the > corresponding real object or raises an exception if the object doesn't > exist anymore. This interface appears nearly isomorphic to MIT Scheme's "hash" and "unhash" functions, except that their hash returns an (unbounded) int and guarantees that hash(o1) != hash(o2) for any distinct objects o1 and o2 (this is a stronger guarantee than Python's "id", which may return the same int for objects with disjoint lifetimes; the other reason object address isn't appropriate for them is that objects can be moved by garbage collection, but hash is an object invariant). Of course unhash(hash(o)) is o, unless o has been gc'ed; then unhash raises an exception. By most accounts (I haven't used it seriously myself), it's a usable interface. > ... > to implement this I need to add a pointer to every object. That's unattractive, of course. > ... > (actually: we could make do with a single bit in every object, with > the bit meaning "this object has an associated weak object". We could > then use a global dictionary indexed by object address to find the > weak object) Is a single bit actually smaller than a pointer? For example, on most machines these days #define PyObject_HEAD \ int ob_refcnt; \ struct _typeobject *ob_type; is two 4-byte fields packed solid already, and structure padding prevents adding anything less than a 4-byte increment in reality. I guess on Alpha there's a 4-byte hole here, but I don't want weak pointers enough to switch machines . OTOH, sooner or later Guido is going to want a mark bit too, so the other way to view this is that 32 new flag bits are as cheap as one . There's one other thing I like about this: it can get rid of the dicey > Strong() checks that self->object->weak == self and returns > self->object (INCREFfed) if it is. check. If object has gone away, you're worried that self->object may (on some systems) point to a newly-invalid address. But worse than that, its memory may get reused, and then self->object may point into the *middle* of some other object where the bit pattern at the "weak" offset just happens to equal self. Let's try a sketch in pseduo-Python, where __xxx are secret functions that do the obvious things (and glossing over thread safety since these are presumably really implemented in C): # invariant: __is_weak_bit_set(obj) == id2weak.has_key(id(obj)) # So "the weak bit" is simply an optimization, sparing most objects # from a dict lookup when they die. # The invariant is delicate in the presence of threads. id2weak = {} class _Weak: def __init__(self, obj): self.id = id(obj) # obj's refcount not bumped __set_weak_bit(obj) id2weak[self.id] = self # note that "the system" (see below) sets self.id # to None if obj dies def strong(self): if self.id is None: raise DeadManWalkingError(self.id) return __id2obj(self.id) # will bump obj's refcount def __del__(self): # this is purely an optimization: if self gets nuked, # exempt its referent from greater expense when *it* # dies if self.id is not None: __clear_weak_bit(__id2obj(self.id)) del id2weak[self.id] def weak(obj): return id2weak.get(id(obj), None) or _Weak(obj) and then whenever an object of any kind is deleted the system does: if __is_weak_bit_set(obj): objid = id(obj) id2weak[objid].id = None del id2weak[objid] In my current over-tired state, I think that's safe (modulo threads), portable and reasonably fast; I do think the extra bit costs 4 bytes, though. > ... > The weak object isn't transparent, because you have to call strong() > before you can do anything with it, but this is an advantage (says he, > aspiring to a career in politics or sales:-): with a transparent weak > object the object could disappear at unexpected moments and with this > scheme it can't, because when you have the object itself in hand you > have a refcount too. Explicit is better than implicit for me. [M.-A. Lemburg] > Have you checked the weak reference dictionary implementation > by Dieter Maurer ? It's at: > > http://www.handshake.de/~dieter/weakdict.html A project where I work is using it; it blows up a lot . While some form of weak dict is what most people want in the end, I'm not sure Dieter's decision to support weak dicts with only weak values (not weak keys) is sufficient. For example, the aforementioned project wants to associate various computed long strings with certain hashable objects, and for some reason or other (ain't my project ...) these objects can't be changed. So they can't store the strings in the objects. So they'd like to map the objects to the strings via assorted dicts. But using the object as a dict key keeps it (and, via the dicts, also its associated strings) artificially alive; they really want a weakdict with weak *keys*. I'm not sure I know of a clear & fast way to implement a weakdict building only on the weak() function. Jack? Using weak objects as values (or keys) with an ordinary dict can prevent their referents from being kept artificially alive, but that doesn't get the dict itself cleaned up by magic. Perhaps "the system" should notify a weak object when its referent goes away; that would at least give the WO a chance to purge itself from structures it knows it's in ... > ... > BTW, how would this be done in JPython ? I guess it doesn't > make much sense there because cycles are no problem for the > Java VM GC. Weak refs have many uses beyond avoiding cycles, and Java 1.2 has all of "hard", "soft", "weak", and "phantom" references. See java.lang.ref for details. I stopped paying attention to Java, so it's up to you to tell us what you learn about it . From fredrik at pythonware.com Mon Aug 16 09:06:43 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon, 16 Aug 1999 09:06:43 +0200 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B7248D.31E5D2BF@lyra.org> Message-ID: <009401bee7b5$e5dc27e0$f29b12c2@secret.pythonware.com> > I think the buffer interface was introduced in 1.5 (by Jack?). I added > the 8-bit character buffer slot and buffer objects in 1.5.2. > > > from array import array > > > > a = array("f", [0]*8192) > > > > b = buffer(a) > > > > for i in range(1000): > > a.append(1234) > > > > print b > > > > in other words, the buffer interface should > > be redesigned, or removed. > > I don't understand what you believe is weird here. did you run that code? it may work, it may bomb, or it may generate bogus output. all depending on your memory allocator, the phase of the moon, etc. just like back in the C/C++ days... imo, that's not good enough for a core feature. From gstein at lyra.org Mon Aug 16 09:15:54 1999 From: gstein at lyra.org (Greg Stein) Date: Mon, 16 Aug 1999 00:15:54 -0700 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B7248D.31E5D2BF@lyra.org> <009401bee7b5$e5dc27e0$f29b12c2@secret.pythonware.com> Message-ID: <37B7BAAA.1E6EE4CA@lyra.org> Fredrik Lundh wrote: > > > I think the buffer interface was introduced in 1.5 (by Jack?). I added > > the 8-bit character buffer slot and buffer objects in 1.5.2. > > > > > from array import array > > > > > > a = array("f", [0]*8192) > > > > > > b = buffer(a) > > > > > > for i in range(1000): > > > a.append(1234) > > > > > > print b > > > > > > in other words, the buffer interface should > > > be redesigned, or removed. > > > > I don't understand what you believe is weird here. > > did you run that code? Yup. It printed nothing. > it may work, it may bomb, or it may generate bogus > output. all depending on your memory allocator, the > phase of the moon, etc. just like back in the C/C++ > days... It probably appeared as an empty string because the construction of the array filled it with zeroes (at least the first byte). Regardless, I'd be surprised if it crashed the interpreter. The print command is supposed to do a str() on the object, which creates a PyStringObject from the buffer contents. Shouldn't be a crash there. > imo, that's not good enough for a core feature. If it crashed, then sure. But I'd say that indicates a bug rather than a design problem. Do you have a stack trace from a crash? Ah. I just worked through, in my head, what is happening here. The buffer object caches the pointer returned by the array object. The append on the array does a realloc() somewhere, thereby invalidating the pointer inside the buffer object. Icky. Gotta think on this one... As an initial thought, it would seem that the buffer would have to re-query the pointer for each operation. There are performance implications there, of course, but that would certainly fix the problem. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jack at oratrix.nl Mon Aug 16 11:42:42 1999 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 16 Aug 1999 11:42:42 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: Message by David Ascher , Sun, 15 Aug 1999 09:54:23 -0700 (Pacific Daylight Time) , Message-ID: <19990816094243.3CE83303120@snelboot.oratrix.nl> > On Sun, 15 Aug 1999, M.-A. Lemburg wrote: > > > Actually, I think you could use arrays to do the trick right now, > > because they are writeable (unlike strings). Until creating > > writeable buffer objects becomes possible that is... > > No, because I can't make an array around existing memory which Win32 > allocates before I get to it. Would adding a buffer interface to cobject solve your problem? Cobject is described as being used for passing C objects between Python modules, but I've always thought of it as passing C objects from one C routine to another C routine through Python, which doesn't necessarily understand what the object is all about. That latter description seems to fit your bill quite nicely. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From jack at oratrix.nl Mon Aug 16 11:49:41 1999 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 16 Aug 1999 11:49:41 +0200 Subject: [Python-Dev] buffer interface considered harmful In-Reply-To: Message by Greg Stein , Sun, 15 Aug 1999 13:35:25 -0700 , <37B7248D.31E5D2BF@lyra.org> Message-ID: <19990816094941.83BE2303120@snelboot.oratrix.nl> > >... > > well, I think the buffer behaviour is both > > new and pretty funny: > > I think the buffer interface was introduced in 1.5 (by Jack?). I added > the 8-bit character buffer slot and buffer objects in 1.5.2. Ah, now I understand why I didn't understand some of the previous conversation: I hadn't never come across the buffer *objects* (as opposed to the buffer *interface*) until Fredrik's example. I've just look at it, and I'm not sure I understand the full intentions of the buffer object. Buffer objects can either behave as the "buffer-aspect" of the object behind them (without the rest of their functionality) or as array objects, and if they start out life as the first they can evolve into the second, is that right? Is there a rationale behind this design, or is it just something that happened? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From gstein at lyra.org Mon Aug 16 11:56:31 1999 From: gstein at lyra.org (Greg Stein) Date: Mon, 16 Aug 1999 02:56:31 -0700 Subject: [Python-Dev] buffer interface considered harmful References: <19990816094941.83BE2303120@snelboot.oratrix.nl> Message-ID: <37B7E04F.3843004@lyra.org> Jack Jansen wrote: >... > I've just look at it, and I'm not sure I understand the full intentions of the > buffer object. Buffer objects can either behave as the "buffer-aspect" of the > object behind them (without the rest of their functionality) or as array > objects, and if they start out life as the first they can evolve into the > second, is that right? > > Is there a rationale behind this design, or is it just something that > happened? The object doesn't change. You create it as a reference to an existing object's buffer (as exported via the buffer interface), or you create it as a reference to some arbitrary memory. The buffer object provides (optionally read/write) string-like behavior to any object that supports buffer behavior. It can also be used to make lightweight slices of another object. For example: >>> a = "abcdefghi" >>> b = buffer(a, 3, 3) >>> print b def >>> In the above example, there is only one copy of "def" (the portion inside of the string object referenced by ). The string-like behavior can be quite nice for memory-mapped files. Andrew's mmapfile module's file objects export the buffer interface. This means that you can open a file, wrap a buffer around it, and perform quick and easy random-access on the thing. You could even select slices of the file and pass them around as if they were strings, without loading anything into the process heap. (I want to try mmap'ing a .pyc and create code objects that have buffer-based bytecode streams; it will be interesting to see if this significantly reduces memory consumption (in terms of the heap size; the mmap'd .pyc can be shared across processes)). Cheers, -g -- Greg Stein, http://www.lyra.org/ From jim at digicool.com Mon Aug 16 14:30:41 1999 From: jim at digicool.com (Jim Fulton) Date: Mon, 16 Aug 1999 08:30:41 -0400 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> Message-ID: <37B80471.F0F467C9@digicool.com> Fredrik Lundh wrote: > > > Fredrik Lundh wrote: > > >... > > > besides, what about buffers and threads? if you > > > return a pointer from getreadbuf, wouldn't it be > > > good to know exactly when Python doesn't need > > > that pointer any more? explicit initbuffer/exitbuffer > > > calls around each sequence of buffer operations > > > would make that a lot safer... > > > > This is a pretty obvious one, I think: it lasts only as long as the > > object. PyString_AS_STRING is similar. Nothing new or funny here. > > well, I think the buffer behaviour is both > new and pretty funny: > > from array import array > > a = array("f", [0]*8192) > > b = buffer(a) > > for i in range(1000): > a.append(1234) > > print b > > in other words, the buffer interface should > be redesigned, or removed. A while ago I asked for some documentation on the Buffer interface. I basically got silence. At this point, I don't have a good idea what buffers are for and I don't see alot of evidence that there *is* a design. I assume that there was a design, but I can't see it. This whole discussion makes me very queasy. I'm probably just out of it, since I don't have time to read the Python list anymore. Presumably the buffer interface was proposed and discussed there at some distant point in the past. (I can't pay as much attention to this discussion as I suspect I should, due to time constaints and due to a basic understanding of the rational for the buffer interface. Jst now I caught a sniff of something I find kinda repulsive. I think I hear you all talking about beasies that hold a reference to some object's internal storage and that have write operations so you can write directly to the objects storage bypassing the object interfaces. I probably just imagined it.) Jim -- Jim Fulton mailto:jim at digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From gstein at lyra.org Mon Aug 16 14:41:23 1999 From: gstein at lyra.org (Greg Stein) Date: Mon, 16 Aug 1999 05:41:23 -0700 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B80471.F0F467C9@digicool.com> Message-ID: <37B806F3.2C5EDC44@lyra.org> Jim Fulton wrote: >... > A while ago I asked for some documentation on the Buffer > interface. I basically got silence. At this point, I I think the silence was caused by the simple fact that the documentation does not (yet) exist. That's all... nothing nefarious. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Mon Aug 16 14:05:35 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 16 Aug 1999 14:05:35 +0200 Subject: [Python-Dev] Re: w# typecode (was: marshal (was:Buffer interface in abstract.c? )) References: <37B722CD.383A2A9E@lyra.org> Message-ID: <37B7FE8F.30C35284@lemburg.com> Greg Stein wrote: > > David Ascher wrote: > > On Sun, 15 Aug 1999, M.-A. Lemburg wrote: > > ... > > > The new typecode "w#" for writeable buffer style objects is a good idea > > > (it should only work on single segment buffers). > > > > Indeed. > > I just borrowed Guido's time machine. That typecode is already in 1.5.2. > > :-) Ah, cool :-) -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 138 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Aug 16 14:29:31 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 16 Aug 1999 14:29:31 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <000b01bee7b2$7c62d780$f22d2399@tim> Message-ID: <37B8042B.21DE6053@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > Have you checked the weak reference dictionary implementation > > by Dieter Maurer ? It's at: > > > > http://www.handshake.de/~dieter/weakdict.html > > A project where I work is using it; it blows up a lot . > > While some form of weak dict is what most people want in the end, I'm not > sure Dieter's decision to support weak dicts with only weak values (not weak > keys) is sufficient. For example, the aforementioned project wants to > associate various computed long strings with certain hashable objects, and > for some reason or other (ain't my project ...) these objects can't be > changed. So they can't store the strings in the objects. So they'd like to > map the objects to the strings via assorted dicts. But using the object as > a dict key keeps it (and, via the dicts, also its associated strings) > artificially alive; they really want a weakdict with weak *keys*. > > I'm not sure I know of a clear & fast way to implement a weakdict building > only on the weak() function. Jack? > > Using weak objects as values (or keys) with an ordinary dict can prevent > their referents from being kept artificially alive, but that doesn't get the > dict itself cleaned up by magic. Perhaps "the system" should notify a weak > object when its referent goes away; that would at least give the WO a chance > to purge itself from structures it knows it's in ... Perhaps one could fiddle something out of the Proxy objects in mxProxy (you know where...). These support a special __cleanup__ protocol that I use a lot to work around circular garbage: the __cleanup__ method of the referenced object is called prior to destroying the proxy; even if the reference count on the object has not yet gone down to 0. This makes direct circles possible without problems: the parent can reference a child through the proxy and the child can reference the parent directly. As soon as the parent is cleaned up, the reference to the proxy is deleted which then automagically makes the back reference in the child disappear, allowing the parent to be deallocated after cleanup without leaving a circular reference around. > > ... > > BTW, how would this be done in JPython ? I guess it doesn't > > make much sense there because cycles are no problem for the > > Java VM GC. > > Weak refs have many uses beyond avoiding cycles, and Java 1.2 has all of > "hard", "soft", "weak", and "phantom" references. See java.lang.ref for > details. I stopped paying attention to Java, so it's up to you to tell us > what you learn about it . Thanks for the reference... but I guess this will remain a weak one for some time since the latter is currently a limited resource :-) -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 138 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Aug 16 14:41:51 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 16 Aug 1999 14:41:51 +0200 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B7248D.31E5D2BF@lyra.org> <009401bee7b5$e5dc27e0$f29b12c2@secret.pythonware.com> <37B7BAAA.1E6EE4CA@lyra.org> Message-ID: <37B8070F.763C3FF8@lemburg.com> Greg Stein wrote: > > Fredrik Lundh wrote: > > > > > I think the buffer interface was introduced in 1.5 (by Jack?). I added > > > the 8-bit character buffer slot and buffer objects in 1.5.2. > > > > > > > from array import array > > > > > > > > a = array("f", [0]*8192) > > > > > > > > b = buffer(a) > > > > > > > > for i in range(1000): > > > > a.append(1234) > > > > > > > > print b > > > > > > > > in other words, the buffer interface should > > > > be redesigned, or removed. > > > > > > I don't understand what you believe is weird here. > > > > did you run that code? > > Yup. It printed nothing. > > > it may work, it may bomb, or it may generate bogus > > output. all depending on your memory allocator, the > > phase of the moon, etc. just like back in the C/C++ > > days... > > It probably appeared as an empty string because the construction of the > array filled it with zeroes (at least the first byte). > > Regardless, I'd be surprised if it crashed the interpreter. The print > command is supposed to do a str() on the object, which creates a > PyStringObject from the buffer contents. Shouldn't be a crash there. > > > imo, that's not good enough for a core feature. > > If it crashed, then sure. But I'd say that indicates a bug rather than a > design problem. Do you have a stack trace from a crash? > > Ah. I just worked through, in my head, what is happening here. The > buffer object caches the pointer returned by the array object. The > append on the array does a realloc() somewhere, thereby invalidating the > pointer inside the buffer object. > > Icky. Gotta think on this one... As an initial thought, it would seem > that the buffer would have to re-query the pointer for each operation. > There are performance implications there, of course, but that would > certainly fix the problem. I guess that's the way to go. I wouldn't want to think about those details when using buffer objects and a function call is still better than a copy... it would do the init/exit wrapping implicitly: init at the time the getreadbuffer call is made and exit next time a thread switch is done - provided that the functions using the memory pointer also keep a reference to the buffer object alive (but that should be natural as this is always done when dealing with references in a safe way). -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 138 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jim at digicool.com Mon Aug 16 15:26:40 1999 From: jim at digicool.com (Jim Fulton) Date: Mon, 16 Aug 1999 09:26:40 -0400 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B80471.F0F467C9@digicool.com> <37B806F3.2C5EDC44@lyra.org> Message-ID: <37B81190.165C373E@digicool.com> Greg Stein wrote: > > Jim Fulton wrote: > >... > > A while ago I asked for some documentation on the Buffer > > interface. I basically got silence. At this point, I > > I think the silence was caused by the simple fact that the documentation > does not (yet) exist. That's all... nothing nefarious. I didn't mean to suggest anything nefarious. I do think that a change that affects something as basic as the standard object type layout and that generates this much discussion really should be documented before it becomes part of the core. I'd especially like to see some kind of document that includes information like: - A problem statement that describes the problem the change is solving, - How does the solution solve the problem, - When and how should people writing new types support the new interfaces? We're not talking about a new library module here. There's been a change to the core object interface. Jim -- Jim Fulton mailto:jim at digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From jack at oratrix.nl Mon Aug 16 15:45:31 1999 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 16 Aug 1999 15:45:31 +0200 Subject: [Python-Dev] buffer interface considered harmful In-Reply-To: Message by Jim Fulton , Mon, 16 Aug 1999 08:30:41 -0400 , <37B80471.F0F467C9@digicool.com> Message-ID: <19990816134531.C30B5303120@snelboot.oratrix.nl> > A while ago I asked for some documentation on the Buffer > interface. I basically got silence. At this point, I > don't have a good idea what buffers are for and I don't see alot > of evidence that there *is* a design. I assume that there was > a design, but I can't see it. This whole discussion makes me > very queasy. Okay, as I'm apparently not the only one who is queasy let's start from scratch. First, there is the old buffer _interface_. This is a C interface that allows extension (and builtin) modules and functions a unified way to access objects if they want to write the object to file and similar things. It is also what the PyArg_ParseTuple "s#" returns. This is, in C, the getreadbuffer/getwritebuffer interface. Second, there's the extension the the buffer interface as of 1.5.2. This is again only available in C, and it allows C programmers to get an object _as an ASCII string_. This is meant for things like regexp modules, to access any "textual" object as an ASCII string. This is the getcharbuffer interface, and bound to the "t#" specifier in PyArg_ParseTuple. Third, there is the buffer _object_, also new in 1.5.2. This sort-of exports the functionality of the buffer interface to Python, but it does a bit more as well, because the buffer objects have a sort of copy-on-write semantics that means they may or may not be "attached" to a python object through the buffer interface. I think that the C interface and the object should be treated completely separately. I definitely want the C interface, but I personally don't use the Python buffer objects, so I don't really care all that much about those. Also, I think that the buffer objects might become easier to understand if we don't think of it as "the buffer interface exported to python", but as "Python buffer objects, that may share memory with other Python objects as an optimization". -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From jim at digicool.com Mon Aug 16 18:03:54 1999 From: jim at digicool.com (Jim Fulton) Date: Mon, 16 Aug 1999 12:03:54 -0400 Subject: [Python-Dev] buffer interface considered harmful References: <19990816134531.C30B5303120@snelboot.oratrix.nl> Message-ID: <37B8366A.82B305C7@digicool.com> Jack Jansen wrote: > > > A while ago I asked for some documentation on the Buffer > > interface. I basically got silence. At this point, I > > don't have a good idea what buffers are for and I don't see alot > > of evidence that there *is* a design. I assume that there was > > a design, but I can't see it. This whole discussion makes me > > very queasy. > > Okay, as I'm apparently not the only one who is queasy let's start from > scratch. Yee ha! > First, there is the old buffer _interface_. This is a C interface that allows > extension (and builtin) modules and functions a unified way to access objects > if they want to write the object to file and similar things. Is this serialization? What does this achiev that, say, the pickling protocols don't achiev? What other problems does it solve? > It is also what > the PyArg_ParseTuple "s#" returns. This is, in C, the > getreadbuffer/getwritebuffer interface. Huh? "s#" doesn't return a string? Or are you saying that you can pass a non-string object to a C function that uses "s#" and have it bufferized and then stringized? In either case, this is not consistent with the documentation (interface) of PyArg_ParseTuple. > Second, there's the extension the the buffer interface as of 1.5.2. This is > again only available in C, and it allows C programmers to get an object _as an > ASCII string_. This is meant for things like regexp modules, to access any > "textual" object as an ASCII string. This is the getcharbuffer interface, and > bound to the "t#" specifier in PyArg_ParseTuple. Hm. So this is making a little more sense. So, there is a notion that there are "textual" objects that want to provide a method for getting their "text". How does this text differ from what you get from __str__ or __repr__? > Third, there is the buffer _object_, also new in 1.5.2. This sort-of exports > the functionality of the buffer interface to Python, How so? Maybe I'm at sea because I still don't get what the C buffer interface is for. > but it does a bit more as > well, because the buffer objects have a sort of copy-on-write semantics that > means they may or may not be "attached" to a python object through the buffer > interface. What is this thing used for? Where does the slot in tp_as_buffer come into all of this? Why does this need to be a slot in the first place? Are these "textual" objects really common? Is the presense of this slot a flag for "textualness"? It would help alot, at least for me, if there was a clearer description of what motivates these things. What problems are they trying to solve? Jim -- Jim Fulton mailto:jim at digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From da at ski.org Mon Aug 16 18:45:47 1999 From: da at ski.org (David Ascher) Date: Mon, 16 Aug 1999 09:45:47 -0700 (Pacific Daylight Time) Subject: [Python-Dev] buffer interface considered harmful In-Reply-To: <37B8366A.82B305C7@digicool.com> Message-ID: On Mon, 16 Aug 1999, Jim Fulton wrote: > > Second, there's the extension the the buffer interface as of 1.5.2. This is > > again only available in C, and it allows C programmers to get an object _as an > > ASCII string_. This is meant for things like regexp modules, to access any > > "textual" object as an ASCII string. This is the getcharbuffer interface, and > > bound to the "t#" specifier in PyArg_ParseTuple. > > Hm. So this is making a little more sense. So, there is a notion that > there are "textual" objects that want to provide a method for getting > their "text". How does this text differ from what you get from __str__ > or __repr__? I'll let others give a well thought out rationale. Here are some examples of use which I think worthwile: * Consider an mmap()'ed file, twelve gigabytes long. Making mmapfile objects fit this aspect of the buffer interface allows you to do regexp searches on it w/o ever building a twelve gigabyte PyString. * Consider a non-contiguous NumPy array. If the array type supported the multi-segment buffer interface, extension module writers could manipulate the data within this array w/o having to worry about the non-contiguous nature of the data. They'd still have to worry about the multi-byte nature of the data, but it's still a win. In other words, I think that the buffer interface could be useful even w/ non-textual data. * If NumPy was modified to have arrays with data stored in buffer objects as opposed to the current "char *", and if PIL was modified to have images stored in buffer objects as opposed to whatever it uses, one could have arrays and images which shared data. I think all of these provide examples of motivations which are appealing to at least some Python users. I make no claim that they motivate the specific interface. In all the cases I can think of, one or both of two features are the key asset: - access to subset of huge data regions w/o creation of huge temporary variables. - sharing of data space. Yes, it's a power tool, and as a such should come with safety goggles. But then again, the same is true for ExtensionClasses =). leaving-out-the-regexp-on-NumPy-arrays-example, --david PS: I take back the implicit suggestion that buffer() return read-write buffers when possible. From jim at digicool.com Mon Aug 16 19:06:19 1999 From: jim at digicool.com (Jim Fulton) Date: Mon, 16 Aug 1999 13:06:19 -0400 Subject: [Python-Dev] buffer interface considered harmful References: Message-ID: <37B8450B.C5D308E4@digicool.com> David Ascher wrote: > > On Mon, 16 Aug 1999, Jim Fulton wrote: > > > > Second, there's the extension the the buffer interface as of 1.5.2. This is > > > again only available in C, and it allows C programmers to get an object _as an > > > ASCII string_. This is meant for things like regexp modules, to access any > > > "textual" object as an ASCII string. This is the getcharbuffer interface, and > > > bound to the "t#" specifier in PyArg_ParseTuple. > > > > Hm. So this is making a little more sense. So, there is a notion that > > there are "textual" objects that want to provide a method for getting > > their "text". How does this text differ from what you get from __str__ > > or __repr__? > > I'll let others give a well thought out rationale. I eagerly await this. :) > Here are some examples > of use which I think worthwile: > > * Consider an mmap()'ed file, twelve gigabytes long. Making mmapfile > objects fit this aspect of the buffer interface allows you to do regexp > searches on it w/o ever building a twelve gigabyte PyString. This seems reasonable, if a bit exotic. :) > * Consider a non-contiguous NumPy array. If the array type supported the > multi-segment buffer interface, extension module writers could > manipulate the data within this array w/o having to worry about the > non-contiguous nature of the data. They'd still have to worry about > the multi-byte nature of the data, but it's still a win. In other > words, I think that the buffer interface could be useful even w/ > non-textual data. Why is this a good thing? Why should extension module writes worry abot the non-contiguous nature of the data now? Does the NumPy C API somehow expose this now? Will multi-segment buffers make it go away somehow? > * If NumPy was modified to have arrays with data stored in buffer objects > as opposed to the current "char *", and if PIL was modified to have > images stored in buffer objects as opposed to whatever it uses, one > could have arrays and images which shared data. Uh, and this would be a good thing? Maybe PIL should just be modified to use NumPy arrays. > I think all of these provide examples of motivations which are appealing > to at least some Python users. Perhaps, although Guido knows how they'd find out about them. ;) Jim -- Jim Fulton mailto:jim at digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From da at ski.org Mon Aug 16 19:18:46 1999 From: da at ski.org (David Ascher) Date: Mon, 16 Aug 1999 10:18:46 -0700 (Pacific Daylight Time) Subject: [Python-Dev] buffer interface considered harmful In-Reply-To: <37B8450B.C5D308E4@digicool.com> Message-ID: On Mon, 16 Aug 1999, Jim Fulton wrote: >> [regexps on gigabyte files] > > This seems reasonable, if a bit exotic. :) In the bioinformatics world, I think it's everyday stuff. > Why is this a good thing? Why should extension module writes worry > abot the non-contiguous nature of the data now? Does the NumPy C API > somehow expose this now? Will multi-segment buffers make it go away > somehow? A NumPy extension module writer needs to create and modify NumPy arrays. These arrays may be non-contiguous (if e.g. they are the result of slicing). The NumPy C API exposes the non-contiguous nature, but it's hard enough to deal with it that I suspect most extension writers require contiguous arrays, which means unnecessary copies. Multi-segment buffers won't make the API go away necessarily (backwards compatibility and all that), but it could make it unnecessary for many extension writers. > > * If NumPy was modified to have arrays with data stored in buffer objects > > as opposed to the current "char *", and if PIL was modified to have > > images stored in buffer objects as opposed to whatever it uses, one > > could have arrays and images which shared data. > > Uh, and this would be a good thing? Maybe PIL should just be modified > to use NumPy arrays. Why? PIL was designed for image processing, and made design decisions appropriate to that domain. NumPy was designed for multidimensional numeric array processing, and made design decisions appropriate to that domain. The intersection of interests exists (e.g. in the medical imaging world), and I know people who spend a lot of their CPU time moving data between images and arrays with "stupid" tostring/fromstring operations. Given the size of the images, it's a prodigious waste of time, and kills the use of Python in many a project. > Perhaps, although Guido knows how they'd find out about them. ;) Uh? These issues have been discussed in the NumPy/PIL world for a while, with no solution in sight. Recently, I and others saw mentions of buffers in the source, and they seemed like a reasonable approach, which could be done w/o a rewrite of either PIL or NumPy. Don't get me wrong -- I'm all for better documentation of the buffer stuff, design guidelines, warnings and protocols. I stated as much on June 15: http://www.python.org/pipermail/python-dev/1999-June/000338.html --david From jim at digicool.com Mon Aug 16 19:38:22 1999 From: jim at digicool.com (Jim Fulton) Date: Mon, 16 Aug 1999 13:38:22 -0400 Subject: [Python-Dev] buffer interface considered harmful References: Message-ID: <37B84C8E.46885C8E@digicool.com> David Ascher wrote: > > On Mon, 16 Aug 1999, Jim Fulton wrote: > > >> [regexps on gigabyte files] > > > > This seems reasonable, if a bit exotic. :) > > In the bioinformatics world, I think it's everyday stuff. Right, in some (exotic ;) domains it's not exotic at all. > > Why is this a good thing? Why should extension module writes worry > > abot the non-contiguous nature of the data now? Does the NumPy C API > > somehow expose this now? Will multi-segment buffers make it go away > > somehow? > > A NumPy extension module writer needs to create and modify NumPy arrays. > These arrays may be non-contiguous (if e.g. they are the result of > slicing). The NumPy C API exposes the non-contiguous nature, but it's > hard enough to deal with it that I suspect most extension writers require > contiguous arrays, which means unnecessary copies. Hm. This sounds like an API problem to me. > Multi-segment buffers won't make the API go away necessarily (backwards > compatibility and all that), but it could make it unnecessary for many > extension writers. Multi-segment buffers don't make the mult-segmented nature of the memory go away. Do they really simplify the API that much? They seem to strip away an awful lot of information hiding. > > > * If NumPy was modified to have arrays with data stored in buffer objects > > > as opposed to the current "char *", and if PIL was modified to have > > > images stored in buffer objects as opposed to whatever it uses, one > > > could have arrays and images which shared data. > > > > Uh, and this would be a good thing? Maybe PIL should just be modified > > to use NumPy arrays. > > Why? PIL was designed for image processing, and made design decisions > appropriate to that domain. NumPy was designed for multidimensional > numeric array processing, and made design decisions appropriate to that > domain. The intersection of interests exists (e.g. in the medical imaging > world), and I know people who spend a lot of their CPU time moving data > between images and arrays with "stupid" tostring/fromstring operations. > Given the size of the images, it's a prodigious waste of time, and kills > the use of Python in many a project. It seems to me that NumPy is sufficiently broad enogh to encompass image processing. My main concern is having two systems rely on some low-level "shared memory" mechanism to achiev effiecient communication. > > Perhaps, although Guido knows how they'd find out about them. ;) > > Uh? These issues have been discussed in the NumPy/PIL world for a while, > with no solution in sight. Recently, I and others saw mentions of buffers > in the source, and they seemed like a reasonable approach, which could be > done w/o a rewrite of either PIL or NumPy. My point was that people would be lucky to find out about buffers or about how to use them as things stand. > Don't get me wrong -- I'm all for better documentation of the buffer > stuff, design guidelines, warnings and protocols. I stated as much on > June 15: > > http://www.python.org/pipermail/python-dev/1999-June/000338.html Yes, that was quite a jihad you launched. ;) Jim -- Jim Fulton mailto:jim at digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From da at ski.org Mon Aug 16 20:25:54 1999 From: da at ski.org (David Ascher) Date: Mon, 16 Aug 1999 11:25:54 -0700 (Pacific Daylight Time) Subject: [Python-Dev] buffer interface considered harmful In-Reply-To: <37B84C8E.46885C8E@digicool.com> Message-ID: On Mon, 16 Aug 1999, Jim Fulton wrote: [ Aside: > It seems to me that NumPy is sufficiently broad enogh to encompass > image processing. Well, I'll just say that you could have been right, but w/ the current NumPy, I don't blame F/ for having developed his own data structures. NumPy is messy, and some of its design decisions are wrong for image things (memory handling, casting rules, etc.). It's all water under the bridge at this point. ] Back to the main topic: You say: > [Multi-segment buffers] seem to strip away an awful lot of information > hiding. My impression of the buffer notion was that it is intended to *provide* information hiding, by giving a simple API to byte arrays which could be stored in various ways. I do agree that whether those bytes should be shared or not is a decision which should be weighted carefully. > My main concern is having two systems rely on some low-level "shared > memory" mechanism to achiev effiecient communication. I don't particularly care about the specific buffer interface (the low-level nature of which is what I think you object to). I do care about having a well-defined mechanism for sharing memory between objects, and I think there is value in defining such an interface generically. Maybe the notion of segmented arrays of bytes is too low-level, and instead we should think of the data spaces as segmented arrays of chunks, where a chunk can be one or more bytes? Or do you object to any 'generic' interface? Just for fun, here's the list of things which either currently do or have been talked about possibly in the future supporting some sort of buffer interface, and my guesses as to chunk size, segmented status and writeability): - strings (1 byte, single-segment, r/o) - unicode strings (2 bytes, single-segment, r/o) - struct.pack() things (1 byte, single-segment,r/o) - arrays (1-4? bytes, single-segment, r/w) - NumPy arrays (1-8 bytes, multi-segment, r/w) - PIL images (1-? bytes, multi-segment, r/w) - CObjects (1-byte, single-segment, r/?) - mmapfiles (1-byte, multi-segment?, r/w) - non-python-owned memory (1-byte, single-segment, r/w) --david From jack at oratrix.nl Mon Aug 16 21:36:40 1999 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 16 Aug 1999 21:36:40 +0200 Subject: [Python-Dev] Buffer interface and multiple threads Message-ID: <19990816193645.9E5B5CF320@oratrix.oratrix.nl> Hmm, something that just struck me: the buffer _interface_ (i.e. the C routines, not the buffer object stuff) is potentially thread-unsafe. In the "old world", where "s#" only worked on string objects, you could be sure that the C pointer returned remained valid as long as you had a reference to the python string object in hand, as strings are immutable. In the "new world", where "s#" also works on, say, array objects, this doesn't hold anymore. So, potentially, while one thread is in a write() system call writing the contents of the array to a file another thread could come in and change the data. Is this a problem? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mal at lemburg.com Mon Aug 16 22:22:12 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 16 Aug 1999 22:22:12 +0200 Subject: [Python-Dev] New htmlentitydefs.py file Message-ID: <37B872F4.1C3F5D39@lemburg.com> Attached you find a new HTML entity definitions file taken and parsed from: http://www.w3.org/TR/1998/REC-html40-19980424/HTMLlat1.ent http://www.w3.org/TR/1998/REC-html40-19980424/HTMLsymbol.ent http://www.w3.org/TR/1998/REC-html40-19980424/HTMLspecial.ent The latter two contain Unicode charcodes which obviously cannot (yet) be mapped to Unicode strings... perhaps Fredrik wants to include a spiced up version in with his Unicode type. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 138 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ -------------- next part -------------- """ Entity definitions for HTML4.0. Taken and parsed from: http://www.w3.org/TR/1998/REC-html40/HTMLlat1.ent http://www.w3.org/TR/1998/REC-html40/HTMLsymbol.ent http://www.w3.org/TR/1998/REC-html40/HTMLspecial.ent """ entitydefs = { 'AElig': chr(198), # latin capital letter AE = latin capital ligature AE, U+00C6 ISOlat1 'Aacute': chr(193), # latin capital letter A with acute, U+00C1 ISOlat1 'Acirc': chr(194), # latin capital letter A with circumflex, U+00C2 ISOlat1 'Agrave': chr(192), # latin capital letter A with grave = latin capital letter A grave, U+00C0 ISOlat1 'Alpha': 'Α', # greek capital letter alpha, U+0391 'Aring': chr(197), # latin capital letter A with ring above = latin capital letter A ring, U+00C5 ISOlat1 'Atilde': chr(195), # latin capital letter A with tilde, U+00C3 ISOlat1 'Auml': chr(196), # latin capital letter A with diaeresis, U+00C4 ISOlat1 'Beta': 'Β', # greek capital letter beta, U+0392 'Ccedil': chr(199), # latin capital letter C with cedilla, U+00C7 ISOlat1 'Chi': 'Χ', # greek capital letter chi, U+03A7 'Dagger': '‡', # double dagger, U+2021 ISOpub 'Delta': 'Δ', # greek capital letter delta, U+0394 ISOgrk3 'ETH': chr(208), # latin capital letter ETH, U+00D0 ISOlat1 'Eacute': chr(201), # latin capital letter E with acute, U+00C9 ISOlat1 'Ecirc': chr(202), # latin capital letter E with circumflex, U+00CA ISOlat1 'Egrave': chr(200), # latin capital letter E with grave, U+00C8 ISOlat1 'Epsilon': 'Ε', # greek capital letter epsilon, U+0395 'Eta': 'Η', # greek capital letter eta, U+0397 'Euml': chr(203), # latin capital letter E with diaeresis, U+00CB ISOlat1 'Gamma': 'Γ', # greek capital letter gamma, U+0393 ISOgrk3 'Iacute': chr(205), # latin capital letter I with acute, U+00CD ISOlat1 'Icirc': chr(206), # latin capital letter I with circumflex, U+00CE ISOlat1 'Igrave': chr(204), # latin capital letter I with grave, U+00CC ISOlat1 'Iota': 'Ι', # greek capital letter iota, U+0399 'Iuml': chr(207), # latin capital letter I with diaeresis, U+00CF ISOlat1 'Kappa': 'Κ', # greek capital letter kappa, U+039A 'Lambda': 'Λ', # greek capital letter lambda, U+039B ISOgrk3 'Mu': 'Μ', # greek capital letter mu, U+039C 'Ntilde': chr(209), # latin capital letter N with tilde, U+00D1 ISOlat1 'Nu': 'Ν', # greek capital letter nu, U+039D 'Oacute': chr(211), # latin capital letter O with acute, U+00D3 ISOlat1 'Ocirc': chr(212), # latin capital letter O with circumflex, U+00D4 ISOlat1 'Ograve': chr(210), # latin capital letter O with grave, U+00D2 ISOlat1 'Omega': 'Ω', # greek capital letter omega, U+03A9 ISOgrk3 'Omicron': 'Ο', # greek capital letter omicron, U+039F 'Oslash': chr(216), # latin capital letter O with stroke = latin capital letter O slash, U+00D8 ISOlat1 'Otilde': chr(213), # latin capital letter O with tilde, U+00D5 ISOlat1 'Ouml': chr(214), # latin capital letter O with diaeresis, U+00D6 ISOlat1 'Phi': 'Φ', # greek capital letter phi, U+03A6 ISOgrk3 'Pi': 'Π', # greek capital letter pi, U+03A0 ISOgrk3 'Prime': '″', # double prime = seconds = inches, U+2033 ISOtech 'Psi': 'Ψ', # greek capital letter psi, U+03A8 ISOgrk3 'Rho': 'Ρ', # greek capital letter rho, U+03A1 'Sigma': 'Σ', # greek capital letter sigma, U+03A3 ISOgrk3 'THORN': chr(222), # latin capital letter THORN, U+00DE ISOlat1 'Tau': 'Τ', # greek capital letter tau, U+03A4 'Theta': 'Θ', # greek capital letter theta, U+0398 ISOgrk3 'Uacute': chr(218), # latin capital letter U with acute, U+00DA ISOlat1 'Ucirc': chr(219), # latin capital letter U with circumflex, U+00DB ISOlat1 'Ugrave': chr(217), # latin capital letter U with grave, U+00D9 ISOlat1 'Upsilon': 'Υ', # greek capital letter upsilon, U+03A5 ISOgrk3 'Uuml': chr(220), # latin capital letter U with diaeresis, U+00DC ISOlat1 'Xi': 'Ξ', # greek capital letter xi, U+039E ISOgrk3 'Yacute': chr(221), # latin capital letter Y with acute, U+00DD ISOlat1 'Zeta': 'Ζ', # greek capital letter zeta, U+0396 'aacute': chr(225), # latin small letter a with acute, U+00E1 ISOlat1 'acirc': chr(226), # latin small letter a with circumflex, U+00E2 ISOlat1 'acute': chr(180), # acute accent = spacing acute, U+00B4 ISOdia 'aelig': chr(230), # latin small letter ae = latin small ligature ae, U+00E6 ISOlat1 'agrave': chr(224), # latin small letter a with grave = latin small letter a grave, U+00E0 ISOlat1 'alefsym': 'ℵ', # alef symbol = first transfinite cardinal, U+2135 NEW 'alpha': 'α', # greek small letter alpha, U+03B1 ISOgrk3 'and': '∧', # logical and = wedge, U+2227 ISOtech 'ang': '∠', # angle, U+2220 ISOamso 'aring': chr(229), # latin small letter a with ring above = latin small letter a ring, U+00E5 ISOlat1 'asymp': '≈', # almost equal to = asymptotic to, U+2248 ISOamsr 'atilde': chr(227), # latin small letter a with tilde, U+00E3 ISOlat1 'auml': chr(228), # latin small letter a with diaeresis, U+00E4 ISOlat1 'bdquo': '„', # double low-9 quotation mark, U+201E NEW 'beta': 'β', # greek small letter beta, U+03B2 ISOgrk3 'brvbar': chr(166), # broken bar = broken vertical bar, U+00A6 ISOnum 'bull': '•', # bullet = black small circle, U+2022 ISOpub 'cap': '∩', # intersection = cap, U+2229 ISOtech 'ccedil': chr(231), # latin small letter c with cedilla, U+00E7 ISOlat1 'cedil': chr(184), # cedilla = spacing cedilla, U+00B8 ISOdia 'cent': chr(162), # cent sign, U+00A2 ISOnum 'chi': 'χ', # greek small letter chi, U+03C7 ISOgrk3 'clubs': '♣', # black club suit = shamrock, U+2663 ISOpub 'cong': '≅', # approximately equal to, U+2245 ISOtech 'copy': chr(169), # copyright sign, U+00A9 ISOnum 'crarr': '↵', # downwards arrow with corner leftwards = carriage return, U+21B5 NEW 'cup': '∪', # union = cup, U+222A ISOtech 'curren': chr(164), # currency sign, U+00A4 ISOnum 'dArr': '⇓', # downwards double arrow, U+21D3 ISOamsa 'dagger': '†', # dagger, U+2020 ISOpub 'darr': '↓', # downwards arrow, U+2193 ISOnum 'deg': chr(176), # degree sign, U+00B0 ISOnum 'delta': 'δ', # greek small letter delta, U+03B4 ISOgrk3 'diams': '♦', # black diamond suit, U+2666 ISOpub 'divide': chr(247), # division sign, U+00F7 ISOnum 'eacute': chr(233), # latin small letter e with acute, U+00E9 ISOlat1 'ecirc': chr(234), # latin small letter e with circumflex, U+00EA ISOlat1 'egrave': chr(232), # latin small letter e with grave, U+00E8 ISOlat1 'empty': '∅', # empty set = null set = diameter, U+2205 ISOamso 'emsp': ' ', # em space, U+2003 ISOpub 'ensp': ' ', # en space, U+2002 ISOpub 'epsilon': 'ε', # greek small letter epsilon, U+03B5 ISOgrk3 'equiv': '≡', # identical to, U+2261 ISOtech 'eta': 'η', # greek small letter eta, U+03B7 ISOgrk3 'eth': chr(240), # latin small letter eth, U+00F0 ISOlat1 'euml': chr(235), # latin small letter e with diaeresis, U+00EB ISOlat1 'exist': '∃', # there exists, U+2203 ISOtech 'fnof': 'ƒ', # latin small f with hook = function = florin, U+0192 ISOtech 'forall': '∀', # for all, U+2200 ISOtech 'frac12': chr(189), # vulgar fraction one half = fraction one half, U+00BD ISOnum 'frac14': chr(188), # vulgar fraction one quarter = fraction one quarter, U+00BC ISOnum 'frac34': chr(190), # vulgar fraction three quarters = fraction three quarters, U+00BE ISOnum 'frasl': '⁄', # fraction slash, U+2044 NEW 'gamma': 'γ', # greek small letter gamma, U+03B3 ISOgrk3 'ge': '≥', # greater-than or equal to, U+2265 ISOtech 'hArr': '⇔', # left right double arrow, U+21D4 ISOamsa 'harr': '↔', # left right arrow, U+2194 ISOamsa 'hearts': '♥', # black heart suit = valentine, U+2665 ISOpub 'hellip': '…', # horizontal ellipsis = three dot leader, U+2026 ISOpub 'iacute': chr(237), # latin small letter i with acute, U+00ED ISOlat1 'icirc': chr(238), # latin small letter i with circumflex, U+00EE ISOlat1 'iexcl': chr(161), # inverted exclamation mark, U+00A1 ISOnum 'igrave': chr(236), # latin small letter i with grave, U+00EC ISOlat1 'image': 'ℑ', # blackletter capital I = imaginary part, U+2111 ISOamso 'infin': '∞', # infinity, U+221E ISOtech 'int': '∫', # integral, U+222B ISOtech 'iota': 'ι', # greek small letter iota, U+03B9 ISOgrk3 'iquest': chr(191), # inverted question mark = turned question mark, U+00BF ISOnum 'isin': '∈', # element of, U+2208 ISOtech 'iuml': chr(239), # latin small letter i with diaeresis, U+00EF ISOlat1 'kappa': 'κ', # greek small letter kappa, U+03BA ISOgrk3 'lArr': '⇐', # leftwards double arrow, U+21D0 ISOtech 'lambda': 'λ', # greek small letter lambda, U+03BB ISOgrk3 'lang': '〈', # left-pointing angle bracket = bra, U+2329 ISOtech 'laquo': chr(171), # left-pointing double angle quotation mark = left pointing guillemet, U+00AB ISOnum 'larr': '←', # leftwards arrow, U+2190 ISOnum 'lceil': '⌈', # left ceiling = apl upstile, U+2308 ISOamsc 'ldquo': '“', # left double quotation mark, U+201C ISOnum 'le': '≤', # less-than or equal to, U+2264 ISOtech 'lfloor': '⌊', # left floor = apl downstile, U+230A ISOamsc 'lowast': '∗', # asterisk operator, U+2217 ISOtech 'loz': '◊', # lozenge, U+25CA ISOpub 'lrm': '‎', # left-to-right mark, U+200E NEW RFC 2070 'lsaquo': '‹', # single left-pointing angle quotation mark, U+2039 ISO proposed 'lsquo': '‘', # left single quotation mark, U+2018 ISOnum 'macr': chr(175), # macron = spacing macron = overline = APL overbar, U+00AF ISOdia 'mdash': '—', # em dash, U+2014 ISOpub 'micro': chr(181), # micro sign, U+00B5 ISOnum 'middot': chr(183), # middle dot = Georgian comma = Greek middle dot, U+00B7 ISOnum 'minus': '−', # minus sign, U+2212 ISOtech 'mu': 'μ', # greek small letter mu, U+03BC ISOgrk3 'nabla': '∇', # nabla = backward difference, U+2207 ISOtech 'nbsp': chr(160), # no-break space = non-breaking space, U+00A0 ISOnum 'ndash': '–', # en dash, U+2013 ISOpub 'ne': '≠', # not equal to, U+2260 ISOtech 'ni': '∋', # contains as member, U+220B ISOtech 'not': chr(172), # not sign, U+00AC ISOnum 'notin': '∉', # not an element of, U+2209 ISOtech 'nsub': '⊄', # not a subset of, U+2284 ISOamsn 'ntilde': chr(241), # latin small letter n with tilde, U+00F1 ISOlat1 'nu': 'ν', # greek small letter nu, U+03BD ISOgrk3 'oacute': chr(243), # latin small letter o with acute, U+00F3 ISOlat1 'ocirc': chr(244), # latin small letter o with circumflex, U+00F4 ISOlat1 'ograve': chr(242), # latin small letter o with grave, U+00F2 ISOlat1 'oline': '‾', # overline = spacing overscore, U+203E NEW 'omega': 'ω', # greek small letter omega, U+03C9 ISOgrk3 'omicron': 'ο', # greek small letter omicron, U+03BF NEW 'oplus': '⊕', # circled plus = direct sum, U+2295 ISOamsb 'or': '∨', # logical or = vee, U+2228 ISOtech 'ordf': chr(170), # feminine ordinal indicator, U+00AA ISOnum 'ordm': chr(186), # masculine ordinal indicator, U+00BA ISOnum 'oslash': chr(248), # latin small letter o with stroke, = latin small letter o slash, U+00F8 ISOlat1 'otilde': chr(245), # latin small letter o with tilde, U+00F5 ISOlat1 'otimes': '⊗', # circled times = vector product, U+2297 ISOamsb 'ouml': chr(246), # latin small letter o with diaeresis, U+00F6 ISOlat1 'para': chr(182), # pilcrow sign = paragraph sign, U+00B6 ISOnum 'part': '∂', # partial differential, U+2202 ISOtech 'permil': '‰', # per mille sign, U+2030 ISOtech 'perp': '⊥', # up tack = orthogonal to = perpendicular, U+22A5 ISOtech 'phi': 'φ', # greek small letter phi, U+03C6 ISOgrk3 'pi': 'π', # greek small letter pi, U+03C0 ISOgrk3 'piv': 'ϖ', # greek pi symbol, U+03D6 ISOgrk3 'plusmn': chr(177), # plus-minus sign = plus-or-minus sign, U+00B1 ISOnum 'pound': chr(163), # pound sign, U+00A3 ISOnum 'prime': '′', # prime = minutes = feet, U+2032 ISOtech 'prod': '∏', # n-ary product = product sign, U+220F ISOamsb 'prop': '∝', # proportional to, U+221D ISOtech 'psi': 'ψ', # greek small letter psi, U+03C8 ISOgrk3 'rArr': '⇒', # rightwards double arrow, U+21D2 ISOtech 'radic': '√', # square root = radical sign, U+221A ISOtech 'rang': '〉', # right-pointing angle bracket = ket, U+232A ISOtech 'raquo': chr(187), # right-pointing double angle quotation mark = right pointing guillemet, U+00BB ISOnum 'rarr': '→', # rightwards arrow, U+2192 ISOnum 'rceil': '⌉', # right ceiling, U+2309 ISOamsc 'rdquo': '”', # right double quotation mark, U+201D ISOnum 'real': 'ℜ', # blackletter capital R = real part symbol, U+211C ISOamso 'reg': chr(174), # registered sign = registered trade mark sign, U+00AE ISOnum 'rfloor': '⌋', # right floor, U+230B ISOamsc 'rho': 'ρ', # greek small letter rho, U+03C1 ISOgrk3 'rlm': '‏', # right-to-left mark, U+200F NEW RFC 2070 'rsaquo': '›', # single right-pointing angle quotation mark, U+203A ISO proposed 'rsquo': '’', # right single quotation mark, U+2019 ISOnum 'sbquo': '‚', # single low-9 quotation mark, U+201A NEW 'sdot': '⋅', # dot operator, U+22C5 ISOamsb 'sect': chr(167), # section sign, U+00A7 ISOnum 'shy': chr(173), # soft hyphen = discretionary hyphen, U+00AD ISOnum 'sigma': 'σ', # greek small letter sigma, U+03C3 ISOgrk3 'sigmaf': 'ς', # greek small letter final sigma, U+03C2 ISOgrk3 'sim': '∼', # tilde operator = varies with = similar to, U+223C ISOtech 'spades': '♠', # black spade suit, U+2660 ISOpub 'sub': '⊂', # subset of, U+2282 ISOtech 'sube': '⊆', # subset of or equal to, U+2286 ISOtech 'sum': '∑', # n-ary sumation, U+2211 ISOamsb 'sup': '⊃', # superset of, U+2283 ISOtech 'sup1': chr(185), # superscript one = superscript digit one, U+00B9 ISOnum 'sup2': chr(178), # superscript two = superscript digit two = squared, U+00B2 ISOnum 'sup3': chr(179), # superscript three = superscript digit three = cubed, U+00B3 ISOnum 'supe': '⊇', # superset of or equal to, U+2287 ISOtech 'szlig': chr(223), # latin small letter sharp s = ess-zed, U+00DF ISOlat1 'tau': 'τ', # greek small letter tau, U+03C4 ISOgrk3 'there4': '∴', # therefore, U+2234 ISOtech 'theta': 'θ', # greek small letter theta, U+03B8 ISOgrk3 'thetasym': 'ϑ', # greek small letter theta symbol, U+03D1 NEW 'thinsp': ' ', # thin space, U+2009 ISOpub 'thorn': chr(254), # latin small letter thorn with, U+00FE ISOlat1 'times': chr(215), # multiplication sign, U+00D7 ISOnum 'trade': '™', # trade mark sign, U+2122 ISOnum 'uArr': '⇑', # upwards double arrow, U+21D1 ISOamsa 'uacute': chr(250), # latin small letter u with acute, U+00FA ISOlat1 'uarr': '↑', # upwards arrow, U+2191 ISOnum 'ucirc': chr(251), # latin small letter u with circumflex, U+00FB ISOlat1 'ugrave': chr(249), # latin small letter u with grave, U+00F9 ISOlat1 'uml': chr(168), # diaeresis = spacing diaeresis, U+00A8 ISOdia 'upsih': 'ϒ', # greek upsilon with hook symbol, U+03D2 NEW 'upsilon': 'υ', # greek small letter upsilon, U+03C5 ISOgrk3 'uuml': chr(252), # latin small letter u with diaeresis, U+00FC ISOlat1 'weierp': '℘', # script capital P = power set = Weierstrass p, U+2118 ISOamso 'xi': 'ξ', # greek small letter xi, U+03BE ISOgrk3 'yacute': chr(253), # latin small letter y with acute, U+00FD ISOlat1 'yen': chr(165), # yen sign = yuan sign, U+00A5 ISOnum 'yuml': chr(255), # latin small letter y with diaeresis, U+00FF ISOlat1 'zeta': 'ζ', # greek small letter zeta, U+03B6 ISOgrk3 'zwj': '‍', # zero width joiner, U+200D NEW RFC 2070 'zwnj': '‌', # zero width non-joiner, U+200C NEW RFC 2070 } From tim_one at email.msn.com Tue Aug 17 09:30:17 1999 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 17 Aug 1999 03:30:17 -0400 Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: <37B8042B.21DE6053@lemburg.com> Message-ID: <000001bee882$5b7d8da0$112d2399@tim> [about weakdicts and the possibility of building them on weak references; the obvious way doesn't clean up the dict itself by magic; maybe a weak object should be notified when its referent goes away ] [M.-A. Lemburg] > Perhaps one could fiddle something out of the Proxy objects > in mxProxy (you know where...). These support a special __cleanup__ > protocol that I use a lot to work around circular garbage: > the __cleanup__ method of the referenced object is called prior > to destroying the proxy; even if the reference count on the > object has not yet gone down to 0. > > This makes direct circles possible without problems: the parent > can reference a child through the proxy and the child can reference the > parent directly. What you just wrote is: parent --> proxy --> child -->+ ^ v +<----------------------------+ Looks like a plain old cycle to me! > As soon as the parent is cleaned up, the reference to > the proxy is deleted which then automagically makes the > back reference in the child disappear, allowing the parent > to be deallocated after cleanup without leaving a circular > reference around. M-A, this is making less sense by the paragraph : skipping the middle, this says "as soon as the parent is cleaned up ... allowing the parent to be deallocated after cleanup". If we presume that the parent gets cleaned up explicitly (since the reference from the child is keeping it alive, it's not going to get cleaned up by magic, right?), then the parent could just as well call the __cleanup__ methods of the things it references directly without bothering with a proxy. For that matter, if it's the straightforward parent <-> child kind of cycle, the parent's cleanup method can just do self.__dict__.clear() and the cycle is broken without writing a __cleanup__ method anywhere (that's what I usually do, and in this kind of cycle that clears the last reference to the child, which then goes away, which in turn automagically clears its back reference to the parent). So, offhand, I don't see that the proxy protocol could help here. In a sense, what's really needed is the opposite: notifying the *proxy* when the *real* object goes away (which makes no sense in the context of what your proxy objects were designed to do). [about Java and its four reference strengths] Found a good introductory writeup at (sorry, my mailer will break this URL, so I'll break it myself at a sensible place): http://developer.java.sun.com/developer/ technicalArticles//ALT/RefObj/index.html They have a class for each of the three "not strong" flavors of references. For all three you pass the referenced object to the constructor, and all three accept (optional in two of the flavors) a second ReferenceQueue argument. In the latter case, when the referenced object goes away the weak/soft/phantom-ref proxy object is placed on the queue. Which, in turn, is a thread-safe queue with various put, get, and timeout-limited polling functions. So you have to write code to look at the queue from time to time, to find the proxies whose referents have gone away. The three flavors may (or may not ...) have these motivations: soft: an object reachable at strongest by soft references can go away at any time, but the garbage collector strives to keep it intact until it can't find any other way to get enough memory weak: an object reachable at strongest by weak references can go away at any time, and the collector makes no attempt to delay its death phantom: an object reachable at strongest by phantom references can get *finalized* at any time, but won't get *deallocated* before its phantom proxy does something or other (goes away? wasn't clear). This is the flavor that requires passing a queue argument to the constructor. Seems to be a major hack to worm around Java's notorious problems with order of finalization -- along the lines that you give phantom referents trivial finalizers, and put the real cleanup logic in the phantom proxy. This lets your program take responsibility for running the real cleanup code in the order-- and in the thread! --where it makes sense. Java 1.2 *also* tosses in a WeakHashMap class, which is a dict with under-the-cover weak keys (unlike Dieter's flavor with weak values), and where the key+value pairs vanish by magic when the key object goes away. The details and the implementation of these guys waren't clear to me, but then I didn't download the code, just scanned the online docs. Ah, a correction to my last post: class _Weak: ... def __del__(self): # this is purely an optimization: if self gets nuked, # exempt its referent from greater expense when *it* # dies if self.id is not None: __clear_weak_bit(__id2obj(self.id)) del id2weak[self.id] Root of all evil: this method is useless, since the id2weak dict keeps each _Weak object alive until its referent goes away (at which time self.id gets set to None, so _Weak.__del__ doesn't do anything). Even if it did do something, it's no cheaper to do it here than in the systemt cleanup code ("greater expense" was wrong). weakly y'rs - tim PS: Ooh! Ooh! Fellow at work today was whining about weakdicts, and called them "limp dicts". I'm not entirely sure it was an innocent Freudian slut, but it's a funny pun even if it wasn't (for you foreigners, it sounds like American slang for "flaccid one-eyed trouser snake" ...). From fredrik at pythonware.com Tue Aug 17 09:23:03 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 17 Aug 1999 09:23:03 +0200 Subject: [Python-Dev] buffer interface considered harmful References: Message-ID: <00c201bee884$42a10ad0$f29b12c2@secret.pythonware.com> David Ascher wrote: > Why? PIL was designed for image processing, and made design decisions > appropriate to that domain. NumPy was designed for multidimensional > numeric array processing, and made design decisions appropriate to that > domain. The intersection of interests exists (e.g. in the medical imaging > world), and I know people who spend a lot of their CPU time moving data > between images and arrays with "stupid" tostring/fromstring operations. > Given the size of the images, it's a prodigious waste of time, and kills > the use of Python in many a project. as an aside, PIL 1.1 (*) introduces "virtual image memories" which are, as I mentioned in an earlier post, accessed via an API rather than via direct pointers. it'll also include an adapter allowing you to use NumPy objects as image memories. unfortunately, the buffer interface is not good enough to use on top of the virtual image memory interface... *) 1.1 is our current development thread, which will be released to plus customers in a number of weeks... From mal at lemburg.com Tue Aug 17 10:50:01 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 17 Aug 1999 10:50:01 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <000001bee882$5b7d8da0$112d2399@tim> Message-ID: <37B92239.4076841E@lemburg.com> Tim Peters wrote: > > [about weakdicts and the possibility of building them on weak > references; the obvious way doesn't clean up the dict itself by > magic; maybe a weak object should be notified when its referent > goes away > ] > > [M.-A. Lemburg] > > Perhaps one could fiddle something out of the Proxy objects > > in mxProxy (you know where...). These support a special __cleanup__ > > protocol that I use a lot to work around circular garbage: > > the __cleanup__ method of the referenced object is called prior > > to destroying the proxy; even if the reference count on the > > object has not yet gone down to 0. > > > > This makes direct circles possible without problems: the parent > > can reference a child through the proxy and the child can reference the > > parent directly. > > What you just wrote is: > > parent --> proxy --> child -->+ > ^ v > +<----------------------------+ > > Looks like a plain old cycle to me! Sure :-) That was the intention. I'm using this to implement acquisition without turning to ExtensionClasses. [Nice picture, BTW] > > As soon as the parent is cleaned up, the reference to > > the proxy is deleted which then automagically makes the > > back reference in the child disappear, allowing the parent > > to be deallocated after cleanup without leaving a circular > > reference around. > > M-A, this is making less sense by the paragraph : skipping the > middle, this says "as soon as the parent is cleaned up ... allowing the > parent to be deallocated after cleanup". If we presume that the parent gets > cleaned up explicitly (since the reference from the child is keeping it > alive, it's not going to get cleaned up by magic, right?), then the parent > could just as well call the __cleanup__ methods of the things it references > directly without bothering with a proxy. For that matter, if it's the > straightforward > > parent <-> child > > kind of cycle, the parent's cleanup method can just do > > self.__dict__.clear() > > and the cycle is broken without writing a __cleanup__ method anywhere > (that's what I usually do, and in this kind of cycle that clears the last > reference to the child, which then goes away, which in turn automagically > clears its back reference to the parent). > > So, offhand, I don't see that the proxy protocol could help here. In a > sense, what's really needed is the opposite: notifying the *proxy* when the > *real* object goes away (which makes no sense in the context of what your > proxy objects were designed to do). All true :-). The nice thing about the proxy is that it takes care of the process automagically. And yes, the parent is used via a proxy too. So the picture looks like this: --> proxy --> parent --> proxy --> child -->+ ^ v +<----------------------------+ Since the proxy isn't noticed by the referencing objects (well, at least if they don't fiddle with internals), the picture for the objects looks like this: --> parent --> child -->+ ^ v +<------------------+ You could of course do the same via explicit invokation of the __cleanup__ method, but the object references involved could be hidden in some other structure, so they might be hard to find. And there's another feature about Proxies (as defined in mxProxy): they allow you to control access in a much more strict way than Python does. You can actually hide attributes and methods you don't want exposed in a way that doesn't even let you access them via some dict or pass me the frame object trick. This is very useful when you program multi-user application host servers where you don't want users to access internal structures of the server. > [about Java and its four reference strengths] > > Found a good introductory writeup at (sorry, my mailer will break this URL, > so I'll break it myself at a sensible place): > > http://developer.java.sun.com/developer/ > technicalArticles//ALT/RefObj/index.html Thanks for the reference... and for the summary ;-) > They have a class for each of the three "not strong" flavors of references. > For all three you pass the referenced object to the constructor, and all > three accept (optional in two of the flavors) a second ReferenceQueue > argument. In the latter case, when the referenced object goes away the > weak/soft/phantom-ref proxy object is placed on the queue. Which, in turn, > is a thread-safe queue with various put, get, and timeout-limited polling > functions. So you have to write code to look at the queue from time to > time, to find the proxies whose referents have gone away. > > The three flavors may (or may not ...) have these motivations: > > soft: an object reachable at strongest by soft references can go away at > any time, but the garbage collector strives to keep it intact until it can't > find any other way to get enough memory So there is a possibility of reviving these objects, right ? I've just recently added a hackish function to my mxTools which allows me to regain access to objects via their address (no, not thread safe, not even necessarily correct). sys.makeref(id) Provided that id is a valid address of a Python object (id(object) returns this address), this function returns a new reference to it. Only objects that are "alive" can be referenced this way, ones with zero reference count cause an exception to be raised. You can use this function to reaccess objects lost during garbage collection. USE WITH CARE: this is an expert-only function since it can cause instant core dumps and many other strange things -- even ruin your system if you don't know what you're doing ! SECURITY WARNING: This function can provide you with access to objects that are otherwise not visible, e.g. in restricted mode, and thus be a potential security hole. I use it for tracking objects via id-key based dictionary and hooks in the create/del mechanisms of Python instances. It helps finding those memory eating cycles. > weak: an object reachable at strongest by weak references can go away at > any time, and the collector makes no attempt to delay its death > > phantom: an object reachable at strongest by phantom references can get > *finalized* at any time, but won't get *deallocated* before its phantom > proxy does something or other (goes away? wasn't clear). This is the flavor > that requires passing a queue argument to the constructor. Seems to be a > major hack to worm around Java's notorious problems with order of > finalization -- along the lines that you give phantom referents trivial > finalizers, and put the real cleanup logic in the phantom proxy. This lets > your program take responsibility for running the real cleanup code in the > order-- and in the thread! --where it makes sense. Wouldn't these flavors be possible using the following setup ? Note that it's quite similar to your _Weak class except that I use a proxy without the need to first get a strong reference for the object and that it doesn't use a weak bit. --> proxy --> object ^ | all_managed_objects all_managed_objects is a dictionary indexed by address (its id) and keeps a strong reference to the objects. The proxy does not keep a strong reference to the object, but only the address as integer and checks the ref-count on the object in the all_managed_objects dictionary prior to every dereferencing action. In case this refcount falls down to 1 (only the all_managed_objects dict references it), the proxy takes appropriate action, e.g. raises an exceptions and deletes the reference in all_managed_objects to mimic a weak reference. The same check is done prior to garbage collection of the proxy. Add to this some queues, pepper and salt and place it in an oven at 220? for 20 minutes... plus take a look every 10 seconds or so... The downside is obvious: the zombified object will not get inspected (and then GCed) until the next time a weak reference to it is used. > Java 1.2 *also* tosses in a WeakHashMap class, which is a dict with > under-the-cover weak keys (unlike Dieter's flavor with weak values), and > where the key+value pairs vanish by magic when the key object goes away. > The details and the implementation of these guys waren't clear to me, but > then I didn't download the code, just scanned the online docs. Would the above help in creating such beasts ? > Ah, a correction to my last post: > > class _Weak: > ... > def __del__(self): > # this is purely an optimization: if self gets nuked, > # exempt its referent from greater expense when *it* > # dies > if self.id is not None: > __clear_weak_bit(__id2obj(self.id)) > del id2weak[self.id] > > Root of all evil: this method is useless, since the id2weak dict keeps each > _Weak object alive until its referent goes away (at which time self.id gets > set to None, so _Weak.__del__ doesn't do anything). Even if it did do > something, it's no cheaper to do it here than in the systemt cleanup code > ("greater expense" was wrong). > > weakly y'rs - tim > > PS: Ooh! Ooh! Fellow at work today was whining about weakdicts, and > called them "limp dicts". I'm not entirely sure it was an innocent Freudian > slut, but it's a funny pun even if it wasn't (for you foreigners, it sounds > like American slang for "flaccid one-eyed trouser snake" ...). :-) -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 136 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mhammond at skippinet.com.au Tue Aug 17 18:05:40 1999 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 18 Aug 1999 02:05:40 +1000 Subject: [Python-Dev] buffer interface considered harmful In-Reply-To: <00c201bee884$42a10ad0$f29b12c2@secret.pythonware.com> Message-ID: <000901bee8ca$5ceff4a0$1101a8c0@bobcat> Fredrik, Care to elaborate? Statements like "buffer interface needs a redesign" or "the buffer interface is not good enough to use on top of the virtual image memory interface" really only give me the impression you have a bee in your bonnet over these buffer interfaces. If you could actually stretch these statements out to provide even _some_ background, problem statement or potential solution it would help. All I know is "Fredrik doesnt like it for some unexplained reason". You found an issue with array reallocation - great - but thats a bug rather than a design flaw. Can you tell us why its not good enough, and an off-the-cuff design that would solve it? Or are you suggesting it is unsolvable? I really dont have a clue what your issue is. Jim (for example) has made his position and reasoning clear. You have only made your position clear, but your reasoning is still a mystery. Mark. > > unfortunately, the buffer interface is not good enough to use > on top of the virtual image memory interface... From fredrik at pythonware.com Tue Aug 17 18:48:31 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 17 Aug 1999 18:48:31 +0200 Subject: [Python-Dev] buffer interface considered harmful References: <000901bee8ca$5ceff4a0$1101a8c0@bobcat> Message-ID: <005201bee8d0$9b4737d0$f29b12c2@secret.pythonware.com> > Care to elaborate? Statements like "buffer interface needs a redesign" or > "the buffer interface is not good enough to use on top of the virtual image > memory interface" really only give me the impression you have a bee in your > bonnet over these buffer interfaces. re "good enough": http://www.python.org/pipermail/python-dev/1999-August/000650.html re "needs a redesign": http://www.python.org/pipermail/python-dev/1999-August/000659.html and to some extent: http://www.python.org/pipermail/python-dev/1999-August/000658.html > Jim (for example) has made his position and reasoning clear. among other things, Jim said: "At this point, I don't have a good idea what buffers are for and I don't see alot of evidence that there *is* a design. I assume that there was a design, but I can't see it". which pretty much echoes my concerns in: http://www.python.org/pipermail/python-dev/1999-August/000612.html http://www.python.org/pipermail/python-dev/1999-August/000648.html > You found an issue with array reallocation - great - but thats > a bug rather than a design flaw. for me, that bug (and the marshal glitch) indicates that the design isn't as chrystal-clear as it needs to be, for such a fundamental feature. otherwise, Greg would never have made that mistake, and Guido would have spotted it when he added the "buffer" built-in... so what are you folks waiting for? could someone who thinks he understands exactly what this thing is spend an hour on writing that design document, so me and Jim can put this entire thing behind us? PS. btw, was it luck or careful analysis behind the decision to make buffer() always return read-only buffers, also for objects implementing the read/write protocol? From da at ski.org Wed Aug 18 00:41:14 1999 From: da at ski.org (David Ascher) Date: Tue, 17 Aug 1999 15:41:14 -0700 (Pacific Daylight Time) Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: <19990816094243.3CE83303120@snelboot.oratrix.nl> Message-ID: On Mon, 16 Aug 1999, Jack Jansen wrote: > Would adding a buffer interface to cobject solve your problem? Cobject is > described as being used for passing C objects between Python modules, but I've > always thought of it as passing C objects from one C routine to another C > routine through Python, which doesn't necessarily understand what the object > is all about. > > That latter description seems to fit your bill quite nicely. It's an interesting idea, but it wouldn't do as it is, as I'd need the ability to create a CObject given a memory location and a size. Also, I am not expected to free() the memory, which would happen when the CObject got GC'ed. (BTW: I am *not* arguing that PyBuffer_FromReadWriteMemory() should be exposed by default. I'm happy with exposing it in my little extension module for my exotic needs.) --david From mal at lemburg.com Wed Aug 18 11:02:02 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 18 Aug 1999 11:02:02 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <000001bee882$5b7d8da0$112d2399@tim> <37B92239.4076841E@lemburg.com> Message-ID: <37BA768A.50DF5574@lemburg.com> [about weakdicts and the possibility of building them on weak references; the obvious way doesn't clean up the dict itself by magic; maybe a weak object should be notified when its referent goes away ] Here is a new version of my Proxy package which includes a self managing weak reference mechanism without the need to add extra bits or bytes to all Python objects: http://starship.skyport.net/~lemburg/mxProxy-pre0.2.0.zip The docs and an explanation of how the thingie works are included in the archive's Doc subdir. Basically it builds upon the idea I posted earlier on on this thread -- with a few extra kicks to get it right in the end ;-) Usage is pretty simple: from Proxy import WeakProxy object = [] wr = WeakProxy(object) wr.append(8) del object >>> wr[0] Traceback (innermost last): File "", line 1, in ? mxProxy.LostReferenceError: object already garbage collected I have checked the ref counts pretty thoroughly, but before going public I would like the Python-Dev crowd to run some tests as well: after all, the point is for the weak references to be weak and that's sometimes a bit hard to check. Hope you have as much fun with it as I had writing it ;-) Ah yes, for the raw details have a look at the code. The code uses a list of back references to the weak Proxies and notifies them when the object goes away... would it be useful to add a hook to the Proxies so that they can apply some other action as well ? -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 135 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Vladimir.Marangozov at inrialpes.fr Wed Aug 18 13:42:08 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Wed, 18 Aug 1999 12:42:08 +0100 (NFT) Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: <37BA768A.50DF5574@lemburg.com> from "M.-A. Lemburg" at "Aug 18, 99 11:02:02 am" Message-ID: <199908181142.MAA22596@pukapuka.inrialpes.fr> M.-A. Lemburg wrote: > > Usage is pretty simple: > > from Proxy import WeakProxy > object = [] > wr = WeakProxy(object) > wr.append(8) > del object > > >>> wr[0] > Traceback (innermost last): > File "", line 1, in ? > mxProxy.LostReferenceError: object already garbage collected > > I have checked the ref counts pretty thoroughly, but before > going public I would like the Python-Dev crowd to run some > tests as well: after all, the point is for the weak references > to be weak and that's sometimes a bit hard to check. It's even harder to implement them without side effects. I used the same hack for the __heirs__ class attribute some time ago. But I knew that a parent class cannot be garbage collected before all of its descendants. That allowed me to keep weak refs in the parent class, and preserve the existing strong refs in the subclasses. On every dealloc of a subclass, the corresponding weak ref in the parent class' __heirs__ is removed. In your case, the lifetime of the objects cannot be predicted, so implementing weak refs by messing with refcounts or checking mem pointers is a dead end. I don't know whether this is the case with mxProxy as I just browsed the code quickly, but here's a scenario where your scheme (or implementation) is not working: >>> from Proxy import WeakProxy >>> o = [] >>> p = WeakProxy(o) >>> d = WeakProxy(o) >>> p >>> d >>> print p [] >>> print d [] >>> del o >>> p >>> d >>> print p Illegal instruction (core dumped) -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From jack at oratrix.nl Wed Aug 18 13:02:13 1999 From: jack at oratrix.nl (Jack Jansen) Date: Wed, 18 Aug 1999 13:02:13 +0200 Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: Message by "M.-A. Lemburg" , Wed, 18 Aug 1999 11:02:02 +0200 , <37BA768A.50DF5574@lemburg.com> Message-ID: <19990818110213.A558F303120@snelboot.oratrix.nl> The one thing I'm not thrilled by in mxProxy is that a call to CheckWeakReferences() is needed before an object is cleaned up. I guess this boils down to the same problem I had with my weak reference scheme: you somehow want the Python core to tell the proxy stuff that the object can be cleaned up (although the details are different: in my scheme this would be triggered by refcount==0 and in mxProxy by refcount==1). And because objects are created and destroyed in Python at a tremendous rate you don't want to do this call for every object, only if you have a hint that the object has a weak reference (or a proxy). -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mal at lemburg.com Wed Aug 18 13:46:45 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 18 Aug 1999 13:46:45 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <19990818110213.A558F303120@snelboot.oratrix.nl> Message-ID: <37BA9D25.95E46EA@lemburg.com> Jack Jansen wrote: > > The one thing I'm not thrilled by in mxProxy is that a call to > CheckWeakReferences() is needed before an object is cleaned up. I guess this > boils down to the same problem I had with my weak reference scheme: you > somehow want the Python core to tell the proxy stuff that the object can be > cleaned up (although the details are different: in my scheme this would be > triggered by refcount==0 and in mxProxy by refcount==1). And because objects > are created and destroyed in Python at a tremendous rate you don't want to do > this call for every object, only if you have a hint that the object has a weak > reference (or a proxy). Well, the check is done prior to every action using a proxy to the object and also when a proxy to it is deallocated. The addition checkweakrefs() API is only included to enable additional explicit checking of the whole weak refs dictionary, e.g. every 10 seconds or so (just like you would with a mark&sweep GC). But yes, GC of the phantom object is delayed a bit depending on how you set up the proxies. Still, I think most usages won't have this problem, since the proxies themselves are usually temporary objects. It may sometimes even make sense to have the phantom object around as long as possible, e.g. to implement the soft references Tim quoted from the Java paper. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 135 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Aug 18 13:33:18 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 18 Aug 1999 13:33:18 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <199908181142.MAA22596@pukapuka.inrialpes.fr> Message-ID: <37BA99FE.45D582AD@lemburg.com> Vladimir Marangozov wrote: > > M.-A. Lemburg wrote: > > I have checked the ref counts pretty thoroughly, but before > > going public I would like the Python-Dev crowd to run some > > tests as well: after all, the point is for the weak references > > to be weak and that's sometimes a bit hard to check. > > It's even harder to implement them without side effects. I used > the same hack for the __heirs__ class attribute some time ago. > But I knew that a parent class cannot be garbage collected before > all of its descendants. That allowed me to keep weak refs in > the parent class, and preserve the existing strong refs in the > subclasses. On every dealloc of a subclass, the corresponding > weak ref in the parent class' __heirs__ is removed. > > In your case, the lifetime of the objects cannot be predicted, > so implementing weak refs by messing with refcounts or checking > mem pointers is a dead end. > I don't know whether this is the > case with mxProxy as I just browsed the code quickly, but here's > a scenario where your scheme (or implementation) is not working: > > >>> from Proxy import WeakProxy > >>> o = [] > >>> p = WeakProxy(o) > >>> d = WeakProxy(o) > >>> p > > >>> d > > >>> print p > [] > >>> print d > [] > >>> del o > >>> p > > >>> d > > >>> print p > Illegal instruction (core dumped) Could you tell me where the core dump originates ? Also, it would help to compile the package with the -DMAL_DEBUG switch turned on (edit Setup) and then run the same things using 'python -d'. The package will then print a pretty complete list of things it is doing to mxProxy.log, which would help track down errors like these. BTW, I get: >>> print p Traceback (innermost last): File "", line 1, in ? mxProxy.LostReferenceError: object already garbage collected >>> [Don't know why the print statement prints an empty line, though.] Thanks for trying it, -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 135 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Vladimir.Marangozov at inrialpes.fr Wed Aug 18 15:12:14 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Wed, 18 Aug 1999 14:12:14 +0100 (NFT) Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: <37BA99FE.45D582AD@lemburg.com> from "M.-A. Lemburg" at "Aug 18, 99 01:33:18 pm" Message-ID: <199908181312.OAA20542@pukapuka.inrialpes.fr> [about mxProxy, WeakProxy] M.-A. Lemburg wrote: > > Could you tell me where the core dump originates ? Also, it would > help to compile the package with the -DMAL_DEBUG switch turned > on (edit Setup) and then run the same things using 'python -d'. > The package will then print a pretty complete list of things it > is doing to mxProxy.log, which would help track down errors like > these. > > BTW, I get: > >>> print p > > Traceback (innermost last): > File "", line 1, in ? > mxProxy.LostReferenceError: object already garbage collected > >>> > > [Don't know why the print statement prints an empty line, though.] > The previous example now *seems* to work fine in a freshly launched interpreter, so it's not a good example, but this shorter one definitely doesn't: >>> from Proxy import WeakProxy >>> o = [] >>> p = q = WeakProxy(o) >>> p = q = WeakProxy(o) >>> del o >>> print p or q Illegal instruction (core dumped) Or even shorter: >>> from Proxy import WeakProxy >>> o = [] >>> p = q = WeakProxy(o) >>> p = WeakProxy(o) >>> del o >>> print p Illegal instruction (core dumped) It crashes in PyDict_DelItem() called from mxProxy_CollectWeakReference(). I can mail you a complete trace in private, if you still need it. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal at lemburg.com Wed Aug 18 14:50:08 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 18 Aug 1999 14:50:08 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <199908181312.OAA20542@pukapuka.inrialpes.fr> Message-ID: <37BAAC00.27A34FF7@lemburg.com> Vladimir Marangozov wrote: > > [about mxProxy, WeakProxy] > > M.-A. Lemburg wrote: > > > > Could you tell me where the core dump originates ? Also, it would > > help to compile the package with the -DMAL_DEBUG switch turned > > on (edit Setup) and then run the same things using 'python -d'. > > The package will then print a pretty complete list of things it > > is doing to mxProxy.log, which would help track down errors like > > these. > > > > BTW, I get: > > >>> print p > > > > Traceback (innermost last): > > File "", line 1, in ? > > mxProxy.LostReferenceError: object already garbage collected > > >>> > > > > [Don't know why the print statement prints an empty line, though.] > > > > The previous example now *seems* to work fine in a freshly launched > interpreter, so it's not a good example, but this shorter one > definitely doesn't: > > >>> from Proxy import WeakProxy > >>> o = [] > >>> p = q = WeakProxy(o) > >>> p = q = WeakProxy(o) > >>> del o > >>> print p or q > Illegal instruction (core dumped) > > It crashes in PyDict_DelItem() called from mxProxy_CollectWeakReference(). > I can mail you a complete trace in private, if you still need it. That would be nice (please also include the log-file), because I get: >>> print p or q Traceback (innermost last): File "", line 1, in ? mxProxy.LostReferenceError: object already garbage collected >>> Thank you, -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 135 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From skip at mojam.com Wed Aug 18 16:47:23 1999 From: skip at mojam.com (Skip Montanaro) Date: Wed, 18 Aug 1999 09:47:23 -0500 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart Message-ID: <199908181447.JAA05151@dolphin.mojam.com> I posted a note to the main list yesterday in response to Dan Connolly's complaint that the os module isn't very portable. I saw no followups (it's amazing how fast a thread can die out :-), but I think it's a reasonable idea, perhaps for Python 2.0, so I'll repeat it here to get some feedback from people more interesting in long-term Python developments. The basic premise is that for each platform on which Python runs there are portable and nonportable interfaces to the underlying operating system. The term POSIX has some portability connotations, so let's assume that the posix module exposes the portable subset of the OS interface. To keep things simple, let's also assume there are only three supported general OS platforms: unix, nt and mac. The proposal then is that importing the platform's module by name will import both the portable and non-portable interface elements. Importing the posix module will import just that portion of the interface that is truly portable across all platforms. To add new functionality to the posix interface it would have to be added to all three platforms. The posix module will be able to ferret out the platform it is running on and import the correct OS-independent posix implementation: import sys _plat = sys.platform del sys if _plat == "mac": from posixmac import * elif _plat == "nt": from posixnt import * else: from posixunix import * # some unix variant The platform-dependent module would simply import everything it could, e.g.: from posixunix import * from nonposixunix import * The os module would vanish or be deprecated with its current behavior intact. The documentation would be modified so that the posix module documents the portable interface and the OS-dependent module's documentation documents the rest and just refers users to the posix module docs for the portable stuff. In theory, this could be done for 1.6, however as I've proposed it, the semantics of importing the posix module would change. Dan Connolly probably isn't going to have a problem with that, though I suppose Guido might... If this idea is good enough for 1.6, perhaps we leave os and posix module semantics alone and add a module named "portable", "portableos" or "portableposix" or something equally arcane. Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/~skip/ 847-971-7098 From guido at CNRI.Reston.VA.US Wed Aug 18 16:54:28 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Wed, 18 Aug 1999 10:54:28 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: Your message of "Wed, 18 Aug 1999 09:47:23 CDT." <199908181447.JAA05151@dolphin.mojam.com> References: <199908181447.JAA05151@dolphin.mojam.com> Message-ID: <199908181454.KAA07692@eric.cnri.reston.va.us> > I posted a note to the main list yesterday in response to Dan Connolly's > complaint that the os module isn't very portable. I saw no followups (it's > amazing how fast a thread can die out :-), but I think it's a reasonable > idea, perhaps for Python 2.0, so I'll repeat it here to get some feedback > from people more interesting in long-term Python developments. > > The basic premise is that for each platform on which Python runs there are > portable and nonportable interfaces to the underlying operating system. The > term POSIX has some portability connotations, so let's assume that the posix > module exposes the portable subset of the OS interface. To keep things > simple, let's also assume there are only three supported general OS > platforms: unix, nt and mac. The proposal then is that importing the > platform's module by name will import both the portable and non-portable > interface elements. Importing the posix module will import just that > portion of the interface that is truly portable across all platforms. To > add new functionality to the posix interface it would have to be added to > all three platforms. The posix module will be able to ferret out the > platform it is running on and import the correct OS-independent posix > implementation: > > import sys > _plat = sys.platform > del sys > > if _plat == "mac": from posixmac import * > elif _plat == "nt": from posixnt import * > else: from posixunix import * # some unix variant > > The platform-dependent module would simply import everything it could, e.g.: > > from posixunix import * > from nonposixunix import * > > The os module would vanish or be deprecated with its current behavior > intact. The documentation would be modified so that the posix module > documents the portable interface and the OS-dependent module's documentation > documents the rest and just refers users to the posix module docs for the > portable stuff. > > In theory, this could be done for 1.6, however as I've proposed it, the > semantics of importing the posix module would change. Dan Connolly probably > isn't going to have a problem with that, though I suppose Guido might... If > this idea is good enough for 1.6, perhaps we leave os and posix module > semantics alone and add a module named "portable", "portableos" or > "portableposix" or something equally arcane. And the advantage of this would be...? Basically, it seems you're just renaming the functionality of os to posix. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Wed Aug 18 17:10:41 1999 From: skip at mojam.com (Skip Montanaro) Date: Wed, 18 Aug 1999 10:10:41 -0500 (CDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <199908181454.KAA07692@eric.cnri.reston.va.us> References: <199908181447.JAA05151@dolphin.mojam.com> <199908181454.KAA07692@eric.cnri.reston.va.us> Message-ID: <14266.51743.904066.470431@dolphin.mojam.com> Guido> And the advantage of this would be...? Guido> Basically, it seems you're just renaming the functionality of os Guido> to posix. I see a few advantages. 1. We will get the meaning of the noun "posix" more or less right. Programmers coming from other languages are used to thinking of programming to a POSIX API or the "POSIX subset of the OS API". Witness all the "#ifdef _POSIX" in the header files on my Linux box In Python, the exact opposite is true. Importing the posix module is documented to be the non-portable way to interface to Unix platforms. 2. You would make it clear on all platforms when you expect to be programming in a non-portable fashion, by importing the platform-specific os (unix, nt, mac). "import unix" would mean I expect this code to only run on Unix machines. You could argue that you are declaring your non-portability by importing the posix module today, but to the casual user or to a new Python programmer with a C or C++ background, that won't be obvious. 3. If Dan Connolly's contention is correct, importing the os module today is not all that portable. I can't really say one way or the other, because I'm lucky enough to be able to confine my serious programming to Unix. I'm sure there's someone out there that can try the following on a few platforms: import os dir(os) and compare the output. Skip From jack at oratrix.nl Wed Aug 18 17:33:20 1999 From: jack at oratrix.nl (Jack Jansen) Date: Wed, 18 Aug 1999 17:33:20 +0200 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: Message by Skip Montanaro , Wed, 18 Aug 1999 09:47:23 -0500 , <199908181447.JAA05151@dolphin.mojam.com> Message-ID: <19990818153320.D61F6303120@snelboot.oratrix.nl> > The proposal then is that importing the > platform's module by name will import both the portable and non-portable > interface elements. Importing the posix module will import just that > portion of the interface that is truly portable across all platforms. There's one slight problem with this: when you use functionality that is partially portable, i.e. a call that is available on Windows and Unix but not on the Mac. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From akuchlin at mems-exchange.org Wed Aug 18 17:39:30 1999 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Wed, 18 Aug 1999 11:39:30 -0400 (EDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14266.51743.904066.470431@dolphin.mojam.com> References: <199908181447.JAA05151@dolphin.mojam.com> <199908181454.KAA07692@eric.cnri.reston.va.us> <14266.51743.904066.470431@dolphin.mojam.com> Message-ID: <14266.54194.715887.808096@amarok.cnri.reston.va.us> Skip Montanaro writes: > 2. You would make it clear on all platforms when you expect to be > programming in a non-portable fashion, by importing the > platform-specific os (unix, nt, mac). "import unix" would mean I To my mind, POSIX == Unix; other platforms may have bits of POSIX-ish functionality, but most POSIX functions will only be found on Unix systems. One of my projects for 1.6 is to go through the O'Reilly POSIX book and add all the missing calls to the posix modules. Practically none of those functions would exist on Windows or Mac. Perhaps it's really a documentation fix: the os module should document only those features common to all of the big 3 platforms (Unix, Windows, Mac), and have pointers to a section for each of the platform-specific modules, listing the platform-specific functions. -- A.M. Kuchling http://starship.python.net/crew/amk/ Setting loose on the battlefield weapons that are able to learn may be one of the biggest mistakes mankind has ever made. It could also be one of the last. -- Richard Forsyth, "Machine Learning for Expert Systems" From skip at mojam.com Wed Aug 18 17:52:20 1999 From: skip at mojam.com (Skip Montanaro) Date: Wed, 18 Aug 1999 10:52:20 -0500 (CDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14266.54194.715887.808096@amarok.cnri.reston.va.us> References: <199908181447.JAA05151@dolphin.mojam.com> <199908181454.KAA07692@eric.cnri.reston.va.us> <14266.51743.904066.470431@dolphin.mojam.com> <14266.54194.715887.808096@amarok.cnri.reston.va.us> Message-ID: <14266.54907.143970.101594@dolphin.mojam.com> Andrew> Perhaps it's really a documentation fix: the os module should Andrew> document only those features common to all of the big 3 Andrew> platforms (Unix, Windows, Mac), and have pointers to a section Andrew> for each of the platform-specific modules, listing the Andrew> platform-specific functions. Perhaps. Should that read ... the os module should *expose* only those features common to all of the big 3 platforms ... ? Skip From skip at mojam.com Wed Aug 18 17:54:11 1999 From: skip at mojam.com (Skip Montanaro) Date: Wed, 18 Aug 1999 10:54:11 -0500 (CDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <19990818153320.D61F6303120@snelboot.oratrix.nl> References: <199908181447.JAA05151@dolphin.mojam.com> <19990818153320.D61F6303120@snelboot.oratrix.nl> Message-ID: <14266.54991.27912.12075@dolphin.mojam.com> >>>>> "Jack" == Jack Jansen writes: >> The proposal then is that importing the >> platform's module by name will import both the portable and non-portable >> interface elements. Importing the posix module will import just that >> portion of the interface that is truly portable across all platforms. Jack> There's one slight problem with this: when you use functionality that is Jack> partially portable, i.e. a call that is available on Windows and Unix but not Jack> on the Mac. Agreed. I'm not sure what to do there. Is the intersection of the common OS calls on Unix, Windows and Mac so small as to be useless (or are there some really gotta have functions not in the intersection because they are missing only on the Mac)? Skip From guido at CNRI.Reston.VA.US Wed Aug 18 18:16:27 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Wed, 18 Aug 1999 12:16:27 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: Your message of "Wed, 18 Aug 1999 10:52:20 CDT." <14266.54907.143970.101594@dolphin.mojam.com> References: <199908181447.JAA05151@dolphin.mojam.com> <199908181454.KAA07692@eric.cnri.reston.va.us> <14266.51743.904066.470431@dolphin.mojam.com> <14266.54194.715887.808096@amarok.cnri.reston.va.us> <14266.54907.143970.101594@dolphin.mojam.com> Message-ID: <199908181616.MAA07901@eric.cnri.reston.va.us> > ... the os module should *expose* only those features common to all of > the big 3 platforms ... Why? My experience has been that functionality that was thought to be Unix specific has gradually become available on other platforms, which makes it hard to decide in which module a function should be placed. The proper test for portability of a program is not whether it imports certain module names, but whether it uses certain functions from those modules (and whether it uses them in a portable fashion). As platforms evolve, a program that was previously thought to be non-portable might become more portable. --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov at inrialpes.fr Wed Aug 18 19:33:44 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Wed, 18 Aug 1999 18:33:44 +0100 (NFT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14266.54991.27912.12075@dolphin.mojam.com> from "Skip Montanaro" at "Aug 18, 99 10:54:11 am" Message-ID: <199908181733.SAA08434@pukapuka.inrialpes.fr> Everybody's right in this debate. I have to type a lot to express objectively my opinion, but better filter my reasoning and just say the conclusion. Having in mind: - what POSIX is - what an OS is - that an OS may or may not comply w/ the POSIX standard, and if it doesn't, it may do so in a couple of years (Windows 3K and PyOS come to mind ;-) - that the os module claims portability amongst the different OSes, mainly regarding their filesystem & process management services, hence it's exposing only a *subset* of the os specific services - the current state of Python It would be nice: - to leave the os module as a common denominator - to have a "unix" module (which could further incorporate the different brands of unix) - to have the posix module capture the fraction of posix functionality, exported from a particular OS specific module, and add the appropriate POSIX propaganda in the docs - to manage to do this, or argue what's wrong with the above -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal at lemburg.com Thu Aug 19 12:02:26 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 19 Aug 1999 12:02:26 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <199908181312.OAA20542@pukapuka.inrialpes.fr> <37BAAC00.27A34FF7@lemburg.com> Message-ID: <37BBD632.3F66419C@lemburg.com> [about weak references and a sample implementation in mxProxy] With the help of Vladimir, I have solved the problem and uploaded a modified version of the prerelease: http://starship.skyport.net/~lemburg/mxProxy-pre0.2.0.zip The archive now also contains a precompiled Win32 PYD file for those on WinXX platforms. Please give it a try and tell me what you think. Cheers, -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 134 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jack at oratrix.nl Thu Aug 19 16:06:01 1999 From: jack at oratrix.nl (Jack Jansen) Date: Thu, 19 Aug 1999 16:06:01 +0200 Subject: [Python-Dev] Optimization idea Message-ID: <19990819140602.433BC303120@snelboot.oratrix.nl> I just had yet another idea for optimizing Python that looks so plausible that I guess someone else must have looked into it already (and, hence, probably rejected it:-): We add to the type structure a "type identifier" number, a small integer for the common types (int=1, float=2, string=3, etc) and 0 for everything else. When eval_code2 sees, for instance, a MULTIPLY operation it does something like the following: case BINARY_MULTIPLY: w = POP(); v = POP(); code = (BINARY_MULTIPLY << 8) | ((v->ob_type->tp_typeid) << 4) | ((w->ob_type->tp_typeid); x = (binopfuncs[code])(v, w); .... etc ... The idea is that all the 256 BINARY_MULTIPLY entries would be filled with PyNumber_Multiply, except for a few common cases. The int*int field could point straight to int_mul(), etc. Assuming the common cases are really more common than the uncommon cases the fact that they jump straight out to the implementation function in stead of mucking around in PyNumber_Multiply and PyNumber_Coerce should easily offset the added overhead of shifts, ors and indexing. Any thoughts? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From guido at CNRI.Reston.VA.US Thu Aug 19 16:05:28 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Thu, 19 Aug 1999 10:05:28 -0400 Subject: [Python-Dev] Localization expert needed Message-ID: <199908191405.KAA10401@eric.cnri.reston.va.us> My contact at HP is asking for expert advice on localization and multi-byte characters. I have little to share except pointing to Martin von Loewis and Pythonware. Does anyone on this list have a suggestion besides those? Don't hesitate to recommend yourself -- there's money in it! --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Wed, 18 Aug 1999 23:15:55 -0700 From: JOE_ELLSWORTH To: guido at CNRI.Reston.VA.US Subject: Localization efforts and state in Python. Hi Guido. Can you give me some references to The best references currently available for using Python in CGI applications when multi-byte localization is known to be needed? Who is the expert in this in the Python area? Can you recomend that they work with us in this area? Thanks, Joe E. ------- End of Forwarded Message From guido at CNRI.Reston.VA.US Thu Aug 19 16:15:28 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Thu, 19 Aug 1999 10:15:28 -0400 Subject: [Python-Dev] Optimization idea In-Reply-To: Your message of "Thu, 19 Aug 1999 16:06:01 +0200." <19990819140602.433BC303120@snelboot.oratrix.nl> References: <19990819140602.433BC303120@snelboot.oratrix.nl> Message-ID: <199908191415.KAA10432@eric.cnri.reston.va.us> > I just had yet another idea for optimizing Python that looks so > plausible that I guess someone else must have looked into it already > (and, hence, probably rejected it:-): > > We add to the type structure a "type identifier" number, a small integer for > the common types (int=1, float=2, string=3, etc) and 0 for everything else. > > When eval_code2 sees, for instance, a MULTIPLY operation it does something > like the following: > case BINARY_MULTIPLY: > w = POP(); > v = POP(); > code = (BINARY_MULTIPLY << 8) | > ((v->ob_type->tp_typeid) << 4) | > ((w->ob_type->tp_typeid); > x = (binopfuncs[code])(v, w); > .... etc ... > > The idea is that all the 256 BINARY_MULTIPLY entries would be filled with > PyNumber_Multiply, except for a few common cases. The int*int field could > point straight to int_mul(), etc. > > Assuming the common cases are really more common than the uncommon cases the > fact that they jump straight out to the implementation function in stead of > mucking around in PyNumber_Multiply and PyNumber_Coerce should easily offset > the added overhead of shifts, ors and indexing. You're assuming that arithmetic operations are a major time sink. I doubt that; much of my code contains hardly any arithmetic these days. Of course, if you *do* have a piece of code that does a lot of basic arithmetic, it might pay off -- but even then I would guess that the majority of opcodes are things like list accessors and variable. But we needn't speculate. It's easy enough to measure the speedup: you can use tp_xxx5 in the type structure and plug a typecode into it for the int and float types. (Note that you would need a separate table of binopfuncs per operator.) --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov at inrialpes.fr Thu Aug 19 21:09:26 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Thu, 19 Aug 1999 20:09:26 +0100 (NFT) Subject: [Python-Dev] about line numbers Message-ID: <199908191909.UAA20618@pukapuka.inrialpes.fr> [Tim, in an earlier msg] > > Would be more valuable to rethink the debugger's breakpoint approach so that > SET_LINENO is never needed (line-triggered callbacks are expensive because > called so frequently, turning each dynamic SET_LINENO into a full-blown > Python call; Ok. In the meantime I think that folding the redundant SET_LINENO doesn't hurt. I ended up with a patchlet that seems to have no side effects, that updates the lnotab as it should and that even makes pdb a bit more clever, IMHO. Consider an extreme case for the function f (listed below). Currently, we get the following: ------------------------------------------- >>> from test import f >>> import dis, pdb >>> dis.dis(f) 0 SET_LINENO 1 3 SET_LINENO 2 6 SET_LINENO 3 9 SET_LINENO 4 12 SET_LINENO 5 15 LOAD_CONST 1 (1) 18 STORE_FAST 0 (a) 21 SET_LINENO 6 24 SET_LINENO 7 27 SET_LINENO 8 30 LOAD_CONST 2 (None) 33 RETURN_VALUE >>> pdb.runcall(f) > test.py(1)f() -> def f(): (Pdb) list 1, 20 1 -> def f(): 2 """Comment about f""" 3 """Another one""" 4 """A third one""" 5 a = 1 6 """Forth""" 7 "and pdb can set a breakpoint on this one (simple quotes)" 8 """but it's intelligent about triple quotes...""" [EOF] (Pdb) step > test.py(2)f() -> """Comment about f""" (Pdb) step > test.py(3)f() -> """Another one""" (Pdb) step > test.py(4)f() -> """A third one""" (Pdb) step > test.py(5)f() -> a = 1 (Pdb) step > test.py(6)f() -> """Forth""" (Pdb) step > test.py(7)f() -> "and pdb can set a breakpoint on this one (simple quotes)" (Pdb) step > test.py(8)f() -> """but it's intelligent about triple quotes...""" (Pdb) step --Return-- > test.py(8)f()->None -> """but it's intelligent about triple quotes...""" (Pdb) >>> ------------------------------------------- With folded SET_LINENO, we have this: ------------------------------------------- >>> from test import f >>> import dis, pdb >>> dis.dis(f) 0 SET_LINENO 5 3 LOAD_CONST 1 (1) 6 STORE_FAST 0 (a) 9 SET_LINENO 8 12 LOAD_CONST 2 (None) 15 RETURN_VALUE >>> pdb.runcall(f) > test.py(5)f() -> a = 1 (Pdb) list 1, 20 1 def f(): 2 """Comment about f""" 3 """Another one""" 4 """A third one""" 5 -> a = 1 6 """Forth""" 7 "and pdb can set a breakpoint on this one (simple quotes)" 8 """but it's intelligent about triple quotes...""" [EOF] (Pdb) break 7 Breakpoint 1 at test.py:7 (Pdb) break 8 *** Blank or comment (Pdb) step > test.py(8)f() -> """but it's intelligent about triple quotes...""" (Pdb) step --Return-- > test.py(8)f()->None -> """but it's intelligent about triple quotes...""" (Pdb) >>> ------------------------------------------- i.e, pdb stops at (points to) the first real instruction and doesn't step trough the doc strings. Or is there something I'm missing here? -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 -------------------------------[ cut here ]--------------------------- *** compile.c-orig Thu Aug 19 19:27:13 1999 --- compile.c Thu Aug 19 19:00:31 1999 *************** *** 615,620 **** --- 615,623 ---- int arg; { if (op == SET_LINENO) { + if (!Py_OptimizeFlag && c->c_last_addr == c->c_nexti - 3) + /* Hack for folding several SET_LINENO in a row. */ + c->c_nexti -= 3; com_set_lineno(c, arg); if (Py_OptimizeFlag) return; From guido at CNRI.Reston.VA.US Thu Aug 19 23:10:33 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Thu, 19 Aug 1999 17:10:33 -0400 Subject: [Python-Dev] about line numbers In-Reply-To: Your message of "Thu, 19 Aug 1999 20:09:26 BST." <199908191909.UAA20618@pukapuka.inrialpes.fr> References: <199908191909.UAA20618@pukapuka.inrialpes.fr> Message-ID: <199908192110.RAA12755@eric.cnri.reston.va.us> Earlier, you argued that this is "not an optimization," but rather avoiding redundancy. I should have responded right then that I disagree, or at least I'm lukewarm about your patch. Either you're not using -O, and then you don't care much about this; or you care, and then you should be using -O. Rather than encrusting the code with more and more ad-hoc micro optimizations, I'd prefer to have someone look into Tim's suggestion of supporting more efficient breakpoints... --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov at inrialpes.fr Fri Aug 20 14:45:46 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 20 Aug 1999 13:45:46 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <199908192110.RAA12755@eric.cnri.reston.va.us> from "Guido van Rossum" at "Aug 19, 99 05:10:33 pm" Message-ID: <199908201245.NAA27098@pukapuka.inrialpes.fr> Guido van Rossum wrote: > > Earlier, you argued that this is "not an optimization," but rather > avoiding redundancy. I haven't argued so much; I asked whether this would be reasonable. Probably I should have said that I don't see the purpose of emitting SET_LINENO instructions for those nodes for which the compiler generates no code, mainly because (as I learned subsequently) SET_LINENO serve no other purpose but debugging. As I haven't payed much attention to this aspect of the code, I thought thay they might still be used for tracebacks. But I couldn't have said that because I didn't know it. > I should have responded right then that I disagree, ... Although I agree this is a minor issue, I'm interested in your argument here, if it's something else than the dialectic: "we're more interested in long term improvements" which is also my opinion. > ... or at least I'm lukewarm about your patch. No surprise here :-) But I haven't found another way of not generating SET_LINENO for doc strings other than backpatching. > Either you're > not using -O, and then you don't care much about this; or you care, > and then you should be using -O. Neither of those. I don't really care, frankly. I was just intrigued by the consecutive SET_LINENO in my disassemblies, so I started to think and ask questions about it. > > Rather than encrusting the code with more and more ad-hoc micro > optimizations, I'd prefer to have someone look into Tim's suggestion > of supporting more efficient breakpoints... This is *the* real issue with the real potential solution. I'm willing to have a look at this (although I don't know pdb/bdb in its finest details). All suggestions and thoughts are welcome. We would probably leave the SET_LINENO opcode as is and (eventually) introduce a new opcode (instead of transforming/renaming it) for compatibility reasons, methinks. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From gmcm at hypernet.com Fri Aug 20 18:04:22 1999 From: gmcm at hypernet.com (Gordon McMillan) Date: Fri, 20 Aug 1999 11:04:22 -0500 Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: <19990818110213.A558F303120@snelboot.oratrix.nl> References: Message by "M.-A. Lemburg" , Wed, 18 Aug 1999 11:02:02 +0200 , <37BA768A.50DF5574@lemburg.com> Message-ID: <1276961301-70195@hypernet.com> In reply to no one in particular: I've often wished that the instance type object had an (optimized) __decref__ slot. With nothing but hand-waving to support it, I'll claim that would enable all these games. - Gordon From gmcm at hypernet.com Fri Aug 20 18:04:22 1999 From: gmcm at hypernet.com (Gordon McMillan) Date: Fri, 20 Aug 1999 11:04:22 -0500 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/ In-Reply-To: <19990818153320.D61F6303120@snelboot.oratrix.nl> References: Message by Skip Montanaro , Wed, 18 Aug 1999 09:47:23 -0500 , <199908181447.JAA05151@dolphin.mojam.com> Message-ID: <1276961295-70552@hypernet.com> Jack Jansen wrote: > There's one slight problem with this: when you use functionality > that is partially portable, i.e. a call that is available on Windows > and Unix but not on the Mac. It gets worse, I think. How about the inconsistencies in POSIX support among *nixes? How about NT being a superset of Win9x? How about NTFS having capabilities that FAT does not? I'd guess there are inconsistencies between Mac flavors, too. The Java approach (if you can't do it everywhere, you can't do it) sucks. In some cases you could probably have the missing functionality (in os) fail silently, but in other cases that would be a disaster. "Least-worst"-is-not-necessarily-"good"-ly y'rs - Gordon From tismer at appliedbiometrics.com Fri Aug 20 17:05:47 1999 From: tismer at appliedbiometrics.com (Christian Tismer) Date: Fri, 20 Aug 1999 17:05:47 +0200 Subject: [Python-Dev] about line numbers References: <199908191909.UAA20618@pukapuka.inrialpes.fr> <199908192110.RAA12755@eric.cnri.reston.va.us> Message-ID: <37BD6ECB.9DD17460@appliedbiometrics.com> Guido van Rossum wrote: > > Earlier, you argued that this is "not an optimization," but rather > avoiding redundancy. I should have responded right then that I > disagree, or at least I'm lukewarm about your patch. Either you're > not using -O, and then you don't care much about this; or you care, > and then you should be using -O. > > Rather than encrusting the code with more and more ad-hoc micro > optimizations, I'd prefer to have someone look into Tim's suggestion > of supporting more efficient breakpoints... I didn't think of this before, but I just realized that I have something like that already in Stackless Python. It is possible to set a breakpoint at every opcode, for every frame. Adding an extra opcode for breakpoints is a good thing as well. The former are good for tracing, conditionla breakpoints and such, and cost a little more time since the is always one extra function call. The latter would be a quick, less versatile thing. The implementation of inserting extra breakpoint opcodes for running code turns out to be easy to implement, if the running frame gets a local extra copy of its code object, with the breakpoints replacing the original opcodes. The breakpoint handler would then simply look into the original code object. Inserting breakpoints on the source level gives us breakpoints per procedure. Doing it in a running frame gives "instance" level debugging of code. Checking a monitor function on every opcode is slightly more expensive but most general. We can have it all, what do you think. I'm going to finish and publish the stackless/continous package and submit a paper by end of September. Should I include this debugging feature? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From guido at CNRI.Reston.VA.US Fri Aug 20 17:09:32 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Fri, 20 Aug 1999 11:09:32 -0400 Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: Your message of "Fri, 20 Aug 1999 11:04:22 CDT." <1276961301-70195@hypernet.com> References: Message by "M.-A. Lemburg" , Wed, 18 Aug 1999 11:02:02 +0200 , <37BA768A.50DF5574@lemburg.com> <1276961301-70195@hypernet.com> Message-ID: <199908201509.LAA14726@eric.cnri.reston.va.us> > In reply to no one in particular: > > I've often wished that the instance type object had an (optimized) > __decref__ slot. With nothing but hand-waving to support it, I'll > claim that would enable all these games. Without context, I don't know when this would be called. If you want this called on all DECREFs (regardless of the refcount value), realize that this is a huge slowdown because it would mean the DECREF macro has to inspect the type object, which means several indirections. This would slow down *every* DECREF operation, not just those on instances with a __decref__ slot, because the DECREF macro doesn't know the type of the object! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at CNRI.Reston.VA.US Fri Aug 20 17:13:16 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Fri, 20 Aug 1999 11:13:16 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/ In-Reply-To: Your message of "Fri, 20 Aug 1999 11:04:22 CDT." <1276961295-70552@hypernet.com> References: Message by Skip Montanaro , Wed, 18 Aug 1999 09:47:23 -0500 , <199908181447.JAA05151@dolphin.mojam.com> <1276961295-70552@hypernet.com> Message-ID: <199908201513.LAA14741@eric.cnri.reston.va.us> From: "Gordon McMillan" > Jack Jansen wrote: > > > There's one slight problem with this: when you use functionality > > that is partially portable, i.e. a call that is available on Windows > > and Unix but not on the Mac. > > It gets worse, I think. How about the inconsistencies in POSIX > support among *nixes? How about NT being a superset of Win9x? How > about NTFS having capabilities that FAT does not? I'd guess there are > inconsistencies between Mac flavors, too. > > The Java approach (if you can't do it everywhere, you can't do it) > sucks. In some cases you could probably have the missing > functionality (in os) fail silently, but in other cases that would > be a disaster. The Python policy has always been "if it's available, there's a standard name and API for it; if it's not available, the function is not defined or will raise an exception; you can use hasattr(os, ...) or catch exceptions to cope if you can live without it." There are a few cases where unavailable calls are emulated, a few where they are made no-ops, and a few where they are made to raise an exception uncoditionally, but in most cases the function will simply not exist, so it's easy to test. --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov at inrialpes.fr Fri Aug 20 22:54:10 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 20 Aug 1999 21:54:10 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <37BD6ECB.9DD17460@appliedbiometrics.com> from "Christian Tismer" at "Aug 20, 99 05:05:47 pm" Message-ID: <199908202054.VAA26970@pukapuka.inrialpes.fr> I'll try to sketch here the scheme I'm thinking of for the callback/breakpoint issue (without SET_LINENO), although some technical details are still missing. I'm assuming the following, in this order: 1) No radical changes in the current behavior, i.e. preserve the current architecture / strategy as much as possible. 2) We dont have breakpoints per opcode, but per source line. For that matter, we have sys.settrace (and for now, we don't aim to have sys.settracei that would be called on every opcode, although we might want this in the future) 3) SET_LINENO disappear. Actually, SET_LINENO are conditional breakpoints, used for callbacks from C to Python. So the basic problem is to generate these callbacks. If any of the above is not an appropriate assumption and we want a radical change in the strategy of setting breakpoints/ generating callbacks, then this post is invalid. The solution I'm thinking of: a) Currently, we have a function PyCode_Addr2Line which computes the source line from the opcode's address. I hereby assume that we can write the reverse function PyCode_Line2Addr that returns the address from a given source line number. I don't have the implementation, but it should be doable. Furthermore, we can compute, having the co_lnotab table and co_firstlineno, the source line range for a code object. As a consequence, even with the dumbiest of all algorithms, by looping trough this source line range, we can enumerate with PyCode_Line2Addr the sequence of addresses for the source lines of this code object. b) As Chris pointed out, in case sys.settrace is defined, we can allocate and keep a copy of the original code string per frame. We can further dynamically overwrite the original code string with a new (internal, one byte) CALL_TRACE opcode at the addresses we have enumerated in a). The CALL_TRACE opcodes will trigger the callbacks from C to Python, just as the current SET_LINENO does. c) At execution time, whenever a CALL_TRACE opcode is reached, we trigger the callback and if it returns successfully, we'll fetch the original opcode for the current location from the copy of the original co_code. Then we directly jump to the arg fetch code (or in case we fetch the entire original opcode in CALL_TRACE - we jump to the dispatch code). Hmm. I think that's all. At the heart of this scheme is the PyCode_Line2Addr function, which is the only blob in my head, for now. Christian Tismer wrote: > > I didn't think of this before, but I just realized that > I have something like that already in Stackless Python. > It is possible to set a breakpoint at every opcode, for every > frame. Adding an extra opcode for breakpoints is a good thing > as well. The former are good for tracing, conditionla breakpoints > and such, and cost a little more time since the is always one extra > function call. The latter would be a quick, less versatile thing. I don't think I understand clearly the difference you're talking about, and why the one thing is better that the other, probably because I'm a bit far from stackless python. > I'm going to finish and publish the stackless/continous package > and submit a paper by end of September. Should I include this debugging > feature? Write the paper first, you have more than enough material to talk about already ;-). Then if you have time to implement some debugging support, you could always add another section, but it won't be a central point of your paper. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From guido at CNRI.Reston.VA.US Fri Aug 20 21:59:24 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Fri, 20 Aug 1999 15:59:24 -0400 Subject: [Python-Dev] about line numbers In-Reply-To: Your message of "Fri, 20 Aug 1999 21:54:10 BST." <199908202054.VAA26970@pukapuka.inrialpes.fr> References: <199908202054.VAA26970@pukapuka.inrialpes.fr> Message-ID: <199908201959.PAA16105@eric.cnri.reston.va.us> > I'll try to sketch here the scheme I'm thinking of for the > callback/breakpoint issue (without SET_LINENO), although some > technical details are still missing. > > I'm assuming the following, in this order: > > 1) No radical changes in the current behavior, i.e. preserve the > current architecture / strategy as much as possible. > > 2) We dont have breakpoints per opcode, but per source line. For that > matter, we have sys.settrace (and for now, we don't aim to have > sys.settracei that would be called on every opcode, although we might > want this in the future) > > 3) SET_LINENO disappear. Actually, SET_LINENO are conditional breakpoints, > used for callbacks from C to Python. So the basic problem is to generate > these callbacks. They used to be the only mechanism by which the traceback code knew the current line number (long before the debugger hooks existed), but with the lnotab, that's no longer necessary. > If any of the above is not an appropriate assumption and we want a radical > change in the strategy of setting breakpoints/ generating callbacks, then > this post is invalid. Sounds reasonable. > The solution I'm thinking of: > > a) Currently, we have a function PyCode_Addr2Line which computes the source > line from the opcode's address. I hereby assume that we can write the > reverse function PyCode_Line2Addr that returns the address from a given > source line number. I don't have the implementation, but it should be > doable. Furthermore, we can compute, having the co_lnotab table and > co_firstlineno, the source line range for a code object. > > As a consequence, even with the dumbiest of all algorithms, by looping > trough this source line range, we can enumerate with PyCode_Line2Addr > the sequence of addresses for the source lines of this code object. > > b) As Chris pointed out, in case sys.settrace is defined, we can allocate > and keep a copy of the original code string per frame. We can further > dynamically overwrite the original code string with a new (internal, > one byte) CALL_TRACE opcode at the addresses we have enumerated in a). > > The CALL_TRACE opcodes will trigger the callbacks from C to Python, > just as the current SET_LINENO does. > > c) At execution time, whenever a CALL_TRACE opcode is reached, we trigger > the callback and if it returns successfully, we'll fetch the original > opcode for the current location from the copy of the original co_code. > Then we directly jump to the arg fetch code (or in case we fetch the > entire original opcode in CALL_TRACE - we jump to the dispatch code). Tricky, but doable. > Hmm. I think that's all. > > At the heart of this scheme is the PyCode_Line2Addr function, which is > the only blob in my head, for now. I'm pretty sure that this would be straightforward. I'm a little anxious about modifying the code, and was thinking myself of a way to specify a bitvector of addresses where to break. But that would still cause some overhead for code without breakpoints, so I guess you're right (and it's certainly a long-standing tradition in breakpoint-setting!) --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov at inrialpes.fr Fri Aug 20 23:22:12 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 20 Aug 1999 22:22:12 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <199908201959.PAA16105@eric.cnri.reston.va.us> from "Guido van Rossum" at "Aug 20, 99 03:59:24 pm" Message-ID: <199908202122.WAA26956@pukapuka.inrialpes.fr> Guido van Rossum wrote: > > > I'm a little anxious about modifying the code, and was thinking myself > of a way to specify a bitvector of addresses where to break. But that > would still cause some overhead for code without breakpoints, so I > guess you're right (and it's certainly a long-standing tradition in > breakpoint-setting!) > Hm. You're probably right, especially if someone wants to inspect a code object from the debugger or something. But I belive, that we can manage to redirect the instruction pointer in the beginning of eval_code2 to the *copy* of co_code, and modify the copy with CALL_TRACE, preserving the original intact. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From skip at mojam.com Fri Aug 20 22:25:25 1999 From: skip at mojam.com (Skip Montanaro) Date: Fri, 20 Aug 1999 15:25:25 -0500 (CDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/ In-Reply-To: <1276961295-70552@hypernet.com> References: <199908181447.JAA05151@dolphin.mojam.com> <19990818153320.D61F6303120@snelboot.oratrix.nl> <1276961295-70552@hypernet.com> Message-ID: <14269.47443.192469.525132@dolphin.mojam.com> Gordon> It gets worse, I think. How about the inconsistencies in POSIX Gordon> support among *nixes? How about NT being a superset of Win9x? Gordon> How about NTFS having capabilities that FAT does not? I'd guess Gordon> there are inconsistencies between Mac flavors, too. To a certain degree I think the C module(s) that interface to the underlying OS's API can iron out differences. In other cases you may have to document minor (known) differences. In still other cases you may have to relegate particular functionality to the OS-dependent modules. Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/~skip/ 847-971-7098 From gmcm at hypernet.com Sat Aug 21 00:38:14 1999 From: gmcm at hypernet.com (Gordon McMillan) Date: Fri, 20 Aug 1999 17:38:14 -0500 Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: <199908201509.LAA14726@eric.cnri.reston.va.us> References: Your message of "Fri, 20 Aug 1999 11:04:22 CDT." <1276961301-70195@hypernet.com> Message-ID: <1276937670-1491544@hypernet.com> [me] > > > > I've often wished that the instance type object had an (optimized) > > __decref__ slot. With nothing but hand-waving to support it, I'll > > claim that would enable all these games. [Guido] > Without context, I don't know when this would be called. If you > want this called on all DECREFs (regardless of the refcount value), > realize that this is a huge slowdown because it would mean the > DECREF macro has to inspect the type object, which means several > indirections. This would slow down *every* DECREF operation, not > just those on instances with a __decref__ slot, because the DECREF > macro doesn't know the type of the object! This was more 2.0-ish speculation, and really thinking of classic C++ ref counting where decref would be a function call, not a macro. Still a slowdown, of course, but not quite so massive. The upside is opening up all kinds of tricks at the type object and user class levels, (such as weak refs and copy on write etc). Worth it? I'd think so, but I'm not a speed demon. - Gordon From tim_one at email.msn.com Sat Aug 21 10:09:17 1999 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 21 Aug 1999 04:09:17 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14266.51743.904066.470431@dolphin.mojam.com> Message-ID: <000201beebac$776d32e0$0c2d2399@tim> [Skip Montanaro] > ... > 3. If Dan Connolly's contention is correct, importing the os module > today is not all that portable. I can't really say one way or the > other, because I'm lucky enough to be able to confine my serious > programming to Unix. I'm sure there's someone out there that > can try the following on a few platforms: > > import os > dir(os) > > and compare the output. There's no need to, Skip. Just read the os module docs; where a function says, e.g., "Availability: Unix.", it doesn't show up on a Windows or Mac box. In that sense using (some) os functions is certainly unportable. But I have no sympathy for the phrasing of Dan's complaint: if he calls os.getegid(), *he* knows perfectly well that's a Unix-specific function, and expressing outrage over it not working on NT is disingenuous. OTOH, I don't think you're going to find anything in the OS module documented as available only on Windows or only on Macs, and some semi-portable functions (notoriosly chmod) are documented in ways that make sense only to Unixheads. This certainly gives a strong impression of Unix-centricity to non-Unix weenies, and has got to baffle true newbies completely. So 'twould be nice to have a basic os module all of whose functions "run everywhere", whose interfaces aren't copies of cryptic old Unixisms, and whose docs are platform neutral. If Guido is right that the os functions tend to get more portable over time, fine, that module can grow over time too. In the meantime, life would be easier for everyone except Python's implementers. From Vladimir.Marangozov at inrialpes.fr Sat Aug 21 17:34:32 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Sat, 21 Aug 1999 16:34:32 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <199908202122.WAA26956@pukapuka.inrialpes.fr> from "Vladimir Marangozov" at "Aug 20, 99 10:22:12 pm" Message-ID: <199908211534.QAA22392@pukapuka.inrialpes.fr> [me] > > Guido van Rossum wrote: > > > > > > I'm a little anxious about modifying the code, and was thinking myself > > of a way to specify a bitvector of addresses where to break. But that > > would still cause some overhead for code without breakpoints, so I > > guess you're right (and it's certainly a long-standing tradition in > > breakpoint-setting!) > > > > Hm. You're probably right, especially if someone wants to inspect > a code object from the debugger or something. But I belive, that > we can manage to redirect the instruction pointer in the beginning > of eval_code2 to the *copy* of co_code, and modify the copy with > CALL_TRACE, preserving the original intact. > I wrote a very rough first implementation of this idea. The files are at: http://sirac.inrialpes.fr/~marangoz/python/lineno/ Basically, what I did is: 1) what I said :-) 2) No more SET_LINENO 3) In tracing mode, a copy of the original code is put in an additional slot (co_tracecode) of the code object. Then it's overwritten with CALL_TRACE opcodes at the locations returned by PyCode_Line2Addr. The VM is routed to execute this code, and not the original one. 4) When tracing is off (i.e. sys_tracefunc is NULL) the VM fallbacks to normal execution of the original code. A couple of things that need finalization: a) how to deallocate the modified code string when tracing is off b) the value of CALL_TRACE (I almost randomly picked 76) c) I don't handle the cases where sys_tracefunc is enabled or disabled within the same code object. Tracing or not is determined before the main loop. d) update pdb, so that it does not allow setting breakpoints on lines with no code. To achieve this, I think that python versions of PyCode_Addr2Line & PyCode_Line2Addr have to be integrated into pdb as helper functions. e) correct bugs and design flaws f) something else? And here's the sample session of my lousy function f with this 'proof of concept' code: >>> from test import f >>> import dis, pdb >>> dis.dis(f) 0 LOAD_CONST 1 (1) 3 STORE_FAST 0 (a) 6 LOAD_CONST 2 (None) 9 RETURN_VALUE >>> pdb.runcall(f) > test.py(5)f() -> a = 1 (Pdb) list 1, 10 1 def f(): 2 """Comment about f""" 3 """Another one""" 4 """A third one""" 5 -> a = 1 6 """Forth""" 7 "and pdb can set a breakpoint on this one (simple quotes)" 8 """but it's intelligent about triple quotes...""" [EOF] (Pdb) step > test.py(8)f() -> """but it's intelligent about triple quotes...""" (Pdb) step --Return-- > test.py(8)f()->None -> """but it's intelligent about triple quotes...""" (Pdb) >>> -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tismer at appliedbiometrics.com Sat Aug 21 19:10:50 1999 From: tismer at appliedbiometrics.com (Christian Tismer) Date: Sat, 21 Aug 1999 19:10:50 +0200 Subject: [Python-Dev] about line numbers References: <199908211534.QAA22392@pukapuka.inrialpes.fr> Message-ID: <37BEDD9A.DBA817B1@appliedbiometrics.com> Vladimir Marangozov wrote: ... > I wrote a very rough first implementation of this idea. The files are at: > > http://sirac.inrialpes.fr/~marangoz/python/lineno/ > > Basically, what I did is: > > 1) what I said :-) > 2) No more SET_LINENO > 3) In tracing mode, a copy of the original code is put in an additional > slot (co_tracecode) of the code object. Then it's overwritten with > CALL_TRACE opcodes at the locations returned by PyCode_Line2Addr. I'd rather keep the original code object as it is, create a copy with inserted breakpoints and put that into the frame slot. Pointing back to the original from there. Then I'd redirect the code from the CALL_TRACE opcode completely to a user-defined function. Getting rid of the extra code object would be done by this function when tracing is off. It also vanishes automatically when the frame is released. > a) how to deallocate the modified code string when tracing is off By making the copy a frame property which is temporary, I think. Or, if tracing should work for all frames, by pushing the original in the back of the modified. Both works. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From Vladimir.Marangozov at inrialpes.fr Sat Aug 21 23:40:05 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Sat, 21 Aug 1999 22:40:05 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <37BEDD9A.DBA817B1@appliedbiometrics.com> from "Christian Tismer" at "Aug 21, 99 07:10:50 pm" Message-ID: <199908212140.WAA51054@pukapuka.inrialpes.fr> Chris, could you please repeat that step by step in more detail? I'm not sure I understand your suggestions. Christian Tismer wrote: > > Vladimir Marangozov wrote: > ... > > I wrote a very rough first implementation of this idea. The files are at: > > > > http://sirac.inrialpes.fr/~marangoz/python/lineno/ > > > > Basically, what I did is: > > > > 1) what I said :-) > > 2) No more SET_LINENO > > 3) In tracing mode, a copy of the original code is put in an additional > > slot (co_tracecode) of the code object. Then it's overwritten with > > CALL_TRACE opcodes at the locations returned by PyCode_Line2Addr. > > I'd rather keep the original code object as it is, create a copy > with inserted breakpoints and put that into the frame slot. You seem to suggest to duplicate the entire code object, right? And reference the modified duplicata from the current frame? I actually duplicate only the opcode string (that is, the co_code string object) and I don't see the point of duplicating the entire code object. Keeping a reference from the current frame makes sense, but won't it deallocate the modified version on every frame release (then redo all the code duplication work for every frame) ? > Pointing back to the original from there. I don't understand this. What points back where? > > Then I'd redirect the code from the CALL_TRACE opcode completely > to a user-defined function. What user-defined function? I don't understand that either... Except the sys_tracefunc, what other (user-defined) function do we have here? Is it a Python or a C function? > Getting rid of the extra code object would be done by this function > when tracing is off. How exactly? This seems to be obvious for you, but obviously, not for me ;-) > It also vanishes automatically when the frame is released. The function or the extra code object? > > > a) how to deallocate the modified code string when tracing is off > > By making the copy a frame property which is temporary, I think. I understood that the frame lifetime could be exploited "somehow"... > Or, if tracing should work for all frames, by pushing the original > in the back of the modified. Both works. Tracing is done for all frames, if sys_tracefunc is not NULL, which is a function that usually ends up in the f_trace slot. > > ciao - chris I'm confused. I didn't understand your idea. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tismer at appliedbiometrics.com Sat Aug 21 23:23:10 1999 From: tismer at appliedbiometrics.com (Christian Tismer) Date: Sat, 21 Aug 1999 23:23:10 +0200 Subject: [Python-Dev] about line numbers References: <199908212140.WAA51054@pukapuka.inrialpes.fr> Message-ID: <37BF18BE.B3D58836@appliedbiometrics.com> Vladimir Marangozov wrote: > > Chris, could you please repeat that step by step in more detail? > I'm not sure I understand your suggestions. I think I was too quick. I thought of copying the whole code object, of course. ... > > I'd rather keep the original code object as it is, create a copy > > with inserted breakpoints and put that into the frame slot. > > You seem to suggest to duplicate the entire code object, right? > And reference the modified duplicata from the current frame? Yes. > I actually duplicate only the opcode string (that is, the co_code string > object) and I don't see the point of duplicating the entire code object. > > Keeping a reference from the current frame makes sense, but won't it > deallocate the modified version on every frame release (then redo all the > code duplication work for every frame) ? You get two options by that. 1) permanently modifying one code object to be traceable is pushing a copy of the original "behind" by means of some co_back pointer. This keeps the patched one where the original was, and makes a global debugging version. 2) Creating a copy for one frame, and putting the original in to an co_back pointer. This gives debugging just for this one frame. ... > > Then I'd redirect the code from the CALL_TRACE opcode completely > > to a user-defined function. > > What user-defined function? I don't understand that either... > Except the sys_tracefunc, what other (user-defined) function do we have here? > Is it a Python or a C function? I would suggest a Python function, of course. > > Getting rid of the extra code object would be done by this function > > when tracing is off. > > How exactly? This seems to be obvious for you, but obviously, not for me ;-) If the permanent tracing "1)" is used, just restore the code object's contents from the original in co_back, and drop co_back. In the "2)" version, just pull the co_back into the frame's code pointer and loose the reference to the copy. Occours automatically on frame release. > > It also vanishes automatically when the frame is released. > > The function or the extra code object? The extra code object. ... > I'm confused. I didn't understand your idea. Forget it, it isn't more than another brain fart :-) ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From tim_one at email.msn.com Sun Aug 22 03:25:22 1999 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 21 Aug 1999 21:25:22 -0400 Subject: [Python-Dev] about line numbers In-Reply-To: <199908131347.OAA30740@pukapuka.inrialpes.fr> Message-ID: <000001beec3d$348f0160$cb2d2399@tim> [going back a week here, to dict resizing ...] [Vladimir Marangozov] > ... > All in all, for performance reasons, dicts remain an exception > to the rule of releasing memory ASAP. Yes, except I don't think there is such a rule! The actual rule is a balancing act between the cost of keeping memory around "just in case", and the expense of getting rid of it. Resizing a dict is extraordinarily expensive because the entire table needs to be rearranged, but lists make this tradeoff too (when you del a list element or list slice, it still goes thru NRESIZE, which still keeps space for as many as 100 "extra" elements around). The various internal caches for int and frame objects (etc) also play this sort of game; e.g., if I happen to have a million ints sitting around at some time, Python effectively assumes I'll never want to reuse that int storage for anything other than ints again. python-rarely-releases-memory-asap-ly y'rs - tim From Vladimir.Marangozov at inrialpes.fr Sun Aug 22 21:41:59 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Sun, 22 Aug 1999 20:41:59 +0100 (NFT) Subject: [Python-Dev] Memory (was: about line numbers, which was shrinking dicts) In-Reply-To: <000001beec3d$348f0160$cb2d2399@tim> from "Tim Peters" at "Aug 21, 99 09:25:22 pm" Message-ID: <199908221941.UAA54480@pukapuka.inrialpes.fr> Tim Peters wrote: > > [going back a week here, to dict resizing ...] Yes, and the subject line does not correspond to the contents because at the moment I've sent this message, I ran out of disk space and the mailer picked a random header after destroying half of the messages in this mailbox. > > [Vladimir Marangozov] > > ... > > All in all, for performance reasons, dicts remain an exception > > to the rule of releasing memory ASAP. > > Yes, except I don't think there is such a rule! The actual rule is a > balancing act between the cost of keeping memory around "just in case", and > the expense of getting rid of it. Good point. > > Resizing a dict is extraordinarily expensive because the entire table needs > to be rearranged, but lists make this tradeoff too (when you del a list > element or list slice, it still goes thru NRESIZE, which still keeps space > for as many as 100 "extra" elements around). > > The various internal caches for int and frame objects (etc) also play this > sort of game; e.g., if I happen to have a million ints sitting around at > some time, Python effectively assumes I'll never want to reuse that int > storage for anything other than ints again. > > python-rarely-releases-memory-asap-ly y'rs - tim Yes, and I'm somewhat sensible to this issue afer spending 6 years in a team which deals a lot with memory management (mainly DSM). In other words, you say that Python tolerates *virtual* memory fragmentation (a funny term :-). In the case of dicts and strings, we tolerate "internal fragmentation" (a contiguous chunk is allocated, then partially used). In the case of ints, floats or frames, we tolerate "external fragmentation". And as you said, Python tolerates this because of the speed/space tradeoff. Hopefully, all we deal with at this level is virtual memory, so even if you have zillions of ints, it's the OS VMM that will help you more with its long-term scheduling than Python's wild guesses about a hypothetical usage of zillions of ints later. I think that some OS concepts can give us hints on how to reduce our virtual fragmentation (which, as we all know, is a not a very good thing). A few kewords: compaction, segmentation, paging, sharing. We can't do much about our internal fragmentation, except changing the algorithms of dicts & strings (which is not appealing anyways). But it would be nice to think about the external fragmentation of Python's caches. Or even try to reduce the internal fragmentation in combination with the internal caches... BTW, this is the whole point of PyMalloc: in a virtual memory world, try to reduce the distance between the user view and the OS view on memory. PyMalloc addresses the fragmentation problem at a lower level of granularity than an OS (that is, *within* a page), because most Python's objects are very small. However, it can't handle efficiently large chunks like the int/float caches. Basically what it does is: segmentation of the virtual space and sharing of the cached free space. I think that Python could improve on sharing its internal caches, without significant slowdowns... The bottom line is that there's still plenty of room for exploring alternate mem mgt strategies that fit better Python's memory needs as a whole. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From jack at oratrix.nl Sun Aug 22 23:25:56 1999 From: jack at oratrix.nl (Jack Jansen) Date: Sun, 22 Aug 1999 23:25:56 +0200 Subject: [Python-Dev] Converting C objects to Python objects and back Message-ID: <19990822212601.2D4BE18BA0D@oratrix.oratrix.nl> Here's another siy idea, not having to do with optimization. On the Mac, and as far as I know on Windows as well, there are quite a few OS API structures that have a Python Object representation that is little more than the PyObject boilerplate plus a pointer to the C API object. (And, of course, lots of methods to operate on the object). To convert these from Python to C I always use boilerplate code like WindowPtr *win; PyArg_ParseTuple(args, "O&", PyWin_Convert, &win); where PyWin_Convert is the function that takes a PyObject * and a void **, does the typecheck and sets the pointer. A similar way is used to convert C pointers back to Python objects in Py_BuildValue. What I was thinking is that it would be nice (if you are _very_ careful) if this functionality was available in struct. So, if I would somehow obtain (in a Python string) a C structure that contained, say, a WindowPtr and two ints, I would be able to say win, x, y = struct.unpack("Ohh", Win.WindowType) and struct would be able, through the WindowType type object, to get at the PyWin_Convert and PyWin_New functions. A nice side issue is that you can add an option to PyArg_Parsetuple so you can say PyArg_ParseTuple(args, "O+", Win_WinObject, &win) and you don't have to remember the different names the various types use for their conversion routines. Is this worth pursuing is is it just too dangerous? And, if it is worth pursuing, I have to stash away the two function pointers somewhere in the TypeObject, should I grab one of the tp_xxx fields for this or is there a better place? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From fdrake at acm.org Mon Aug 23 16:54:07 1999 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 23 Aug 1999 10:54:07 -0400 (EDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <000201beebac$776d32e0$0c2d2399@tim> References: <14266.51743.904066.470431@dolphin.mojam.com> <000201beebac$776d32e0$0c2d2399@tim> Message-ID: <14273.24719.865520.797568@weyr.cnri.reston.va.us> Tim Peters writes: > OTOH, I don't think you're going to find anything in the OS module > documented as available only on Windows or only on Macs, and some Tim, Actually, the spawn*() functions are included in os and are documented as Windows-only, along with the related P_* constants. These are provided by the nt module. > everywhere", whose interfaces aren't copies of cryptic old Unixisms, and > whose docs are platform neutral. I'm alwasy glad to see documentation patches, or even pointers to specific problems. Being a Unix-weenie myself, making the documentation more readable to Windows-weenies can be difficult at times. But given useful pointers, I can usually pull it off, or at least drive someone who canto do so. ;-) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From tim_one at email.msn.com Tue Aug 24 08:32:49 1999 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 24 Aug 1999 02:32:49 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14273.24719.865520.797568@weyr.cnri.reston.va.us> Message-ID: <000701beedfa$7c5c8e40$902d2399@tim> [Fred L. Drake, Jr.] > Actually, the spawn*() functions are included in os and are > documented as Windows-only, along with the related P_* constants. > These are provided by the nt module. I stand corrected, Fred -- so how do the Unix dweebs like this Windows crap cluttering "their" docs ? [Tim, pitching a portable sane interface to a portable sane subset of os functionality] > I'm alwasy glad to see documentation patches, or even pointers to > specific problems. Being a Unix-weenie myself, making the > documentation more readable to Windows-weenies can be difficult at > times. But given useful pointers, I can usually pull it off, or at > least drive someone who canto do so. ;-) No, it's deeper than that. Some of the inherited Unix interfaces are flatly incomprehensible to anyone other than a Unix-head, but the functionality is supplied only in that form (docs may ease the pain, but the interfaces still suck); for example, mkdir (path[, mode]) Create a directory named path with numeric mode mode. The default mode is 0777 (octal). On some systems, mode is ignored. Where it is used, the current umask value is first masked out. Availability: Macintosh, Unix, Windows. If you have a sister or parent or 3-year-old child (they're all equivalent for this purpose ), just imagine them reading that. If you can't, I'll have my sister call you . Raw numeric permission modes, octal mode notation, and the "umask" business are Unix-specific -- and even Unices supply symbolic ways to specify permissions. chmod is likely the one I hear the most gripes about. Windows heads are looking to change "file attributes", the name "chmod" is gibberish to them, most of the Unix mode bits make no sense under Windows (& contra Guido's optimism, never will) even if you know the secret octal code, and Windows has several attributes (hidden bit, system bit, archive bit) chmod can't get at. The only portable functionality here is the write bit, but no non-Unix person could possibly guess either that chmod is the function they need, or what to type after someone tells them it's chmod. So this is less a doc issue than that more of os needs to become more like os.path (i.e., intelligently named functions with intelligently abstracted interfaces). never-grasped-what-ken-thompson-had-against-trailing-"e"s-ly y'rs - tim From skip at mojam.com Tue Aug 24 19:21:53 1999 From: skip at mojam.com (Skip Montanaro) Date: Tue, 24 Aug 1999 12:21:53 -0500 (CDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <000701beedfa$7c5c8e40$902d2399@tim> References: <14273.24719.865520.797568@weyr.cnri.reston.va.us> <000701beedfa$7c5c8e40$902d2399@tim> Message-ID: <14274.53860.210265.71990@dolphin.mojam.com> Tim> chmod is likely the one I hear the most gripes about. Windows Tim> heads are looking to change "file attributes", the name "chmod" is Tim> gibberish to them Well, we could confuse everyone and rename "chmod" to "chfat" (is that like file system liposuction?). Windows probably has an equivalent function whose name is 17 characters long which we'd all love to type, I'm sure. ;-) Tim> most of the Unix mode bits make no sense under Windows (& contra Tim> Guido's optimism, never will) even if you know the secret octal Tim> code ... It beats a secret handshake. Imagine all the extra peripherals we'd have to make available for everyone's computer. ;-) Tim> So this is less a doc issue than that more of os needs to become Tim> more like os.path (i.e., intelligently named functions with Tim> intelligently abstracted interfaces). Hasn't Guido's position been that the interface modules like os, posix, etc are just a thin layer over the underlying API (Guido: note how I cleverly attributed this position to you but also placed the responsibility for correctness on your head!)? If that's the case, perhaps we should provide a slightly higher level module that abstracts the file system as objects, and adopts a more user-friendly approach to the secret octal codes. Those of us worried about job security could continue to use the lower level module and leave the higher level interface for former Visual Basic programmers. Tim> never-grasped-what-ken-thompson-had-against-trailing-"e"s-ly y'rs - maybe-the-"e"-key-stuck-on-his-TTY-ly y'rs... Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/~skip/ 847-971-7098 | Python: Programming the way Guido indented... From fdrake at acm.org Tue Aug 24 20:21:44 1999 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 24 Aug 1999 14:21:44 -0400 (EDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14274.53860.210265.71990@dolphin.mojam.com> References: <14273.24719.865520.797568@weyr.cnri.reston.va.us> <000701beedfa$7c5c8e40$902d2399@tim> <14274.53860.210265.71990@dolphin.mojam.com> Message-ID: <14274.58040.138331.413958@weyr.cnri.reston.va.us> Skip Montanaro writes: > whose name is 17 characters long which we'd all love to type, I'm sure. ;-) Just 17? ;-) > Tim> So this is less a doc issue than that more of os needs to become > Tim> more like os.path (i.e., intelligently named functions with > Tim> intelligently abstracted interfaces). Sounds like some doc improvements can really help improve things, at least in the short term. > correctness on your head!)? If that's the case, perhaps we should provide a > slightly higher level module that abstracts the file system as objects, and > adopts a more user-friendly approach to the secret octal codes. Those of us I'm all for an object interface to a logical filesystem; having had to write just such a thing in Java not long ago, and we have a similar construct in Python (not by me, though), that we use in our Knowbot work. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From tim_one at email.msn.com Wed Aug 25 09:02:21 1999 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 25 Aug 1999 03:02:21 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14274.53860.210265.71990@dolphin.mojam.com> Message-ID: <000801beeec7$c6f06b20$fc2d153f@tim> [Skip Montanaro] > Well, we could confuse everyone and rename "chmod" to "chfat" ... I don't want to rename anything, nor do I want to use MS-specific names. chmod is both the wrong spelling & the wrong functionality for all non-Unix systems. os.path did a Good Thing by, e.g., introducing getmtime(), despite that everyone knows it's just os.stat()[8]. New isreadonly(path) and setreadonly(path) are more what I'm after; nothing beyond that is portable, & never will be. > Windows probably has an equivalent function whose name is 17 > characters long Indeed, SetFileAttributes is exactly 17 characters long (you moonlighting on NT, Skip?!). But while Windows geeks would like to use that, it's both the wrong spelling & the wrong functionality for all non-Windows systems. > ... > Hasn't Guido's position been that the interface modules like os, > posix, etc are just a thin layer over the underlying API (Guido: > note how I cleverly attributed this position to you but also placed > the responsibility for correctness on your head!)? If that's the > case, perhaps we should provide a slightly higher level module that > abstracts the file system as objects, and adopts a more user-friendly > approach to the secret octal codes. Like that, yes. > Those of us worried about job security could continue to use the > lower level module and leave the higher level interface for former > Visual Basic programmers. You're just *begging* Guido to make the Python2 os module take all of its names from the Win32 API . it's-no-lamer-to-be-ignorant-of-unix-names-than-it-is- to-be-ignorant-of-chinese-ly y'rs - tim From tim_one at email.msn.com Wed Aug 25 09:05:31 1999 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 25 Aug 1999 03:05:31 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14274.58040.138331.413958@weyr.cnri.reston.va.us> Message-ID: <000901beeec8$380d05c0$fc2d153f@tim> [Fred L. Drake, Jr.] > ... > I'm all for an object interface to a logical filesystem; having > had to write just such a thing in Java not long ago, and we have > a similar construct in Python (not by me, though), that we use in > our Knowbot work. Well, don't read anything unintended into this, but Guido *is* out of town, and you *do* have the power to check in code outside the doc subtree ... barry-will-help-he's-been-itching-to-revolt-too-ly y'rs - tim From bwarsaw at cnri.reston.va.us Wed Aug 25 13:20:16 1999 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 25 Aug 1999 07:20:16 -0400 (EDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart References: <14274.58040.138331.413958@weyr.cnri.reston.va.us> <000901beeec8$380d05c0$fc2d153f@tim> Message-ID: <14275.53616.585669.890621@anthem.cnri.reston.va.us> >>>>> "TP" == Tim Peters writes: TP> Well, don't read anything unintended into this, but Guido *is* TP> out of town, and you *do* have the power to check in code TP> outside the doc subtree ... TP> barry-will-help-he's-been-itching-to-revolt-too-ly y'rs I'll bring the pitchforks if you bring the torches! -Barry From skip at mojam.com Wed Aug 25 17:17:35 1999 From: skip at mojam.com (Skip Montanaro) Date: Wed, 25 Aug 1999 10:17:35 -0500 (CDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <000901beeec8$380d05c0$fc2d153f@tim> References: <14274.58040.138331.413958@weyr.cnri.reston.va.us> <000901beeec8$380d05c0$fc2d153f@tim> Message-ID: <14276.2229.983969.228891@dolphin.mojam.com> > I'm all for an object interface to a logical filesystem; having had to > write just such a thing in Java not long ago, and we have a similar > construct in Python (not by me, though), that we use in our Knowbot > work. Fred, Since this is the dev group, how about showing us the Knowbot's logical filesystem API, and let's do some dev-ing... Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/~skip/ 847-971-7098 | Python: Programming the way Guido indented... From fdrake at acm.org Wed Aug 25 18:22:52 1999 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 25 Aug 1999 12:22:52 -0400 (EDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <000801beeec7$c6f06b20$fc2d153f@tim> References: <14274.53860.210265.71990@dolphin.mojam.com> <000801beeec7$c6f06b20$fc2d153f@tim> Message-ID: <14276.6236.605103.369339@weyr.cnri.reston.va.us> Tim Peters writes: > os.path did a Good Thing by, e.g., introducing getmtime(), despite that > everyone knows it's just os.stat()[8]. New isreadonly(path) and > setreadonly(path) are more what I'm after; nothing beyond that is portable, Tim, I think we can simply declare that isreadonly() checks that the file doesn't allow the user to read it, but setreadonly() sounds to me like it wouldn't be portable to Unix. There's more than one (reasonable) way to make a file unreadable to a user just by manipulating permission bits, and which is best will vary according to both the user and the file's existing permissions. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake at acm.org Wed Aug 25 18:26:25 1999 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 25 Aug 1999 12:26:25 -0400 (EDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <000901beeec8$380d05c0$fc2d153f@tim> References: <14274.58040.138331.413958@weyr.cnri.reston.va.us> <000901beeec8$380d05c0$fc2d153f@tim> Message-ID: <14276.6449.428851.402955@weyr.cnri.reston.va.us> Tim Peters writes: > Well, don't read anything unintended into this, but Guido *is* out > of town, and you *do* have the power to check in code outside the > doc subtree ... Good thing I turned of the python-checkins list when I added the curly bracket patch I've been working on! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake at acm.org Wed Aug 25 20:46:30 1999 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 25 Aug 1999 14:46:30 -0400 (EDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14276.2229.983969.228891@dolphin.mojam.com> References: <14274.58040.138331.413958@weyr.cnri.reston.va.us> <000901beeec8$380d05c0$fc2d153f@tim> <14276.2229.983969.228891@dolphin.mojam.com> Message-ID: <14276.14854.366220.664463@weyr.cnri.reston.va.us> Skip Montanaro writes: > Since this is the dev group, how about showing us the Knowbot's logical > filesystem API, and let's do some dev-ing... Well, I took a look at it, and I must confess it's just not really different from the set of interfaces in the os module; the important point is that they are methods instead of functions (other than a few data items: sep, pardir, curdir). The path attribute provided the same interface as os.path. Its only user-visible state is the current-directory setting, which may or may not be that useful. We left off chmod(), which would make Tim happy, but that was only because it wasn't meaningful in context. We'd have to add it (or something equivalent) for a general purpose filesystem object. So Tim's only happy if he can come up with a general interface that is actually portable (consider my earlier comments on setreadonly()). On the other hand, you don't need chmod() or anything like it for most situations where a filesystem object would be useful. An FTPFilesystem class would not be hard to write! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From jack at oratrix.nl Wed Aug 25 23:43:16 1999 From: jack at oratrix.nl (Jack Jansen) Date: Wed, 25 Aug 1999 23:43:16 +0200 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: Message by "Fred L. Drake, Jr." , Wed, 25 Aug 1999 12:22:52 -0400 (EDT) , <14276.6236.605103.369339@weyr.cnri.reston.va.us> Message-ID: <19990825214321.D50AD18BA0F@oratrix.oratrix.nl> But in Python, with its nice high-level datastructures, couldn't we design the Mother Of All File Attribute Calls, which would optionally map functionality from one platform to another? As an example consider the Mac resource fork size. If on unix I did fattrs = os.getfileattributes(filename) rfsize = fattrs.get('resourceforksize') it would raise an exception. If, however, I did rfsize = fattrs.get('resourceforksize', compat=1) I would get a "close approximation", 0. Note that you want some sort of a compat parameter, not a default value, as for some attributes (the various atime/mtime/ctimes, permission bits, etc) you'd get a default based on other file attributes that do exist on the current platform. Hmm, the file-attribute-object idea has the added advantage that you can then use setfileattributes(filename, fattrs) to be sure that you've copied all relevant attributes, independent of the platform you're on. Mapping permissions takes a bit more (design-) work, with unix having user/group/other only and Windows having full-fledged ACLs (or nothing at all, depending how you look at it:-), but should also be doable. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From Vladimir.Marangozov at inrialpes.fr Thu Aug 26 08:10:01 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Thu, 26 Aug 1999 07:10:01 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <199908211534.QAA22392@pukapuka.inrialpes.fr> from "Vladimir Marangozov" at "Aug 21, 99 04:34:32 pm" Message-ID: <199908260610.HAA20304@pukapuka.inrialpes.fr> [me, dropping SET_LINENO] > > I wrote a very rough first implementation of this idea. The files are at: > > http://sirac.inrialpes.fr/~marangoz/python/lineno/ > > ... > > A couple of things that need finalization: > > ... An updated version is available at the same location. I think that this one does The Right Thing (tm). a) Everything is internal to the VM and totally hidden, as it should be. b) No modifications of the code and frame objects (no additional slots) c) The modified code string (used for tracing) is allocated dynamically when the 1st frame pointing to its original switches in trace mode, and is deallocated automatically when the last frame pointing to its original dies. I feel better with this code so I can stop thinking about it and move on :-) (leaving it to your appreciation). What's next? File attributes? ;-) It's not easy to weight what kind of common interface would be easy to grasp, intuitive and unambiguous for the average user. I think that the people on this list (being core developers) are more or less biased here (I'd say more than less). Perhaps some input from the community (c.l.py) would help? -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tim_one at email.msn.com Thu Aug 26 07:06:57 1999 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 26 Aug 1999 01:06:57 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14276.14854.366220.664463@weyr.cnri.reston.va.us> Message-ID: <000301beef80$d26158c0$522d153f@tim> [Fred L. Drake, Jr.] > ... > We left off chmod(), which would make Tim happy, but that was only > because it wasn't meaningful in context. I'd be appalled to see chmod go away; for many people it's comfortable and useful. I want *another* way, to do what little bit is portable in a way that doesn't require first mastering a badly designed interface from a dying OS . > We'd have to add it (or something equivalent) for a general purpose > filesystem object. So Tim's only happy if he can come up with a > general interface that is actually portable (consider my earlier > comments on setreadonly()). I don't care about general here; making up a general new way to spell everything that everyone may want to do under every OS would create an interface even worse than chmod's. My sister doesn't want to create files that are read-only to the world but writable to her group -- she just wants to mark certain precious files as read-only to minimize the chance of accidental destruction. What she wants is easy to do under Windows or Unix, and I expect she's the norm rather than the exception. > On the other hand, you don't need chmod() or anything like it for > most situations where a filesystem object would be useful. An > FTPFilesystem class would not be hard to write! An OO filesystem object with a .makereadonly method suits me fine . From tim_one at email.msn.com Thu Aug 26 07:06:54 1999 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 26 Aug 1999 01:06:54 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14276.6236.605103.369339@weyr.cnri.reston.va.us> Message-ID: <000201beef80$d072f640$522d153f@tim> [Fred L. Drake, Jr.] > I think we can simply declare that isreadonly() checks that the > file doesn't allow the user to read it, Had more in mind that the file doesn't allow the user to write it . > but setreadonly() sounds to me like it wouldn't be portable to Unix. > There's more than one (reasonable) way to make a file unreadable to > a user just by manipulating permission bits, and which is best will > vary according to both the user and the file's existing permissions. "Portable" implies least common denominator, and the plain meaning of read-only is that nobody (whether owner, group or world in Unix) has write permission. People wanting something beyond that are going beyond what's portable, and that's fine -- I'm not suggesting getting rid of chmod for Unix dweebs. But by the same token, Windows dweebs should get some other (as non-portable as chmod) way to fiddle the bits important on *their* OS (only one of which chmod can affect). Billions of newbies will delightedly stick to the portable interface with the name that makes sense. the-percentage-of-programmers-doing-systems-programming-shrinks-by- the-millisecond-ly y'rs - tim From mal at lemburg.com Sat Aug 28 16:37:50 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 28 Aug 1999 16:37:50 +0200 Subject: [Python-Dev] Iterating over dictionaries and objects in general References: <990826114149.ZM59302@rayburn.hcs.tl> <199908261702.NAA01866@eric.cnri.reston.va.us> <37C57E01.2ADC02AE@digicool.com> <990826150216.ZM60002@rayburn.hcs.tl> <37C5BAF1.4D6C1031@lemburg.com> <37C5C320.CF11BC7C@digicool.com> <37C643B0.7ECA586@lemburg.com> <37C69FB3.9CB279C7@digicool.com> Message-ID: <37C7F43E.67EEAB98@lemburg.com> [Followup to a discussion on psa-members about iterating over dictionaries without creating intermediate lists] Jim Fulton wrote: > > "M.-A. Lemburg" wrote: > > > > Jim Fulton wrote: > > > > > > > The problem with the PyDict_Next() approach is that it will only > > > > work reliably from within a single C call. You can't return > > > > to Python between calls to PyDict_Next(), because those could > > > > modify the dictionary causing the next PyDict_Next() call to > > > > fail or core dump. > > > > > > I do this all the time without problem. Basically, you provide an > > > index and if the index is out of range, you simply get an end-of-data return. > > > The only downside of this approach is that you might get "incorrect" > > > results if the dictionary is modified between calls. This isn't > > > all that different from iterating over a list with an index. > > > > Hmm, that's true... but what if the dictionary gets resized > > in between iterations ? The item layout is then likely to > > change, so you could potentially get complet bogus. > > I think I said that. :) Just wanted to verify my understanding ;-) > > Even iterating over items twice may then occur, I guess. > > Yup. > > Again, this is not so different from iterating over > a list using a range: > > l=range(10) > for i in range.len(l): > l.insert(0,'Bruce') > print l[i] > > This always outputs 'Bruce'. :) Ok, so the "risk" is under user control. Fine with me... > > Or perhaps via a special dictionary iterator, so that the following > > works: > > > > for item in dictrange(d): > > ... > > Yup. > > > The iterator could then also take some extra actions to insure > > that the dictionary hasn't been resized. > > I don't think it should do that. It should simply > stop when it has run out of items. I think I'll give such an iterator a spin. Would be a nice extension to mxTools. BTW, a generic type slot for iterating over types would probably be a nice feature too. The type slot could provide hooks of the form it_first, it_last, it_next, it_prev which all work integer index based, e.g. in pseudo code: int i; PyObject *item; /* set up i and item to point to the first item */ if (obj.it_first(&i,&item) < 0) goto onError; while (1) { PyObject_Print(item); /* move i and item to the next item; an IndexError is raised in case there are no more items */ if (obj.it_next(&i,&item) < 0) { PyErr_Clear(); break; } } These slots would cover all problem instances where iteration over non-sequences or non-uniform sequences (i.e. sequences like objects which don't provide konvex index sets, e.g. 1,2,3,6,7,8,11,12) is required, e.g. dictionaries, multi-segment buffers -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 127 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gward at cnri.reston.va.us Mon Aug 30 21:02:22 1999 From: gward at cnri.reston.va.us (Greg Ward) Date: Mon, 30 Aug 1999 15:02:22 -0400 Subject: [Python-Dev] Portable "spawn" module for core? Message-ID: <19990830150222.B428@cnri.reston.va.us> Hi all -- it recently occured to me that the 'spawn' module I wrote for the Distutils (and which Perry Stoll extended to handle NT), could fit nicely in the core library. On Unix, it's just a front-end to fork-and-exec; on NT, it's a front-end to spawnv(). In either case, it's just enough code (and just tricky enough code) that not everybody should have to duplicate it for their own uses. The basic idea is this: from spawn import spawn ... spawn (['cmd', 'arg1', 'arg2']) # or spawn (['cmd'] + args) you get the idea: it takes a *list* representing the command to spawn: no strings to parse, no shells to get in the way, no sneaky meta-characters ruining your day, draining your efficiency, or compromising your security. (Conversely, no pipelines, redirection, etc.) The 'spawn()' function just calls '_spawn_posix()' or '_spawn_nt()' depending on os.name. Additionally, it takes a couple of optional keyword arguments (all booleans): 'search_path', 'verbose', and 'dry_run', which do pretty much what you'd expect. The module as it's currently in the Distutils code is attached. Let me know what you think... Greg -- Greg Ward - software developer gward at cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913 From skip at mojam.com Mon Aug 30 21:11:50 1999 From: skip at mojam.com (Skip Montanaro) Date: Mon, 30 Aug 1999 14:11:50 -0500 (CDT) Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: <19990830150222.B428@cnri.reston.va.us> References: <19990830150222.B428@cnri.reston.va.us> Message-ID: <14282.54880.922571.792484@dolphin.mojam.com> Greg> it recently occured to me that the 'spawn' module I wrote for the Greg> Distutils (and which Perry Stoll extended to handle NT), could fit Greg> nicely in the core library. How's spawn.spawn semantically different from the Windows-dependent os.spawn? How are stdout/stdin/stderr connected to the child process - just like fork+exec or something slightly higher level like os.popen? If it's semantically like os.spawn and a little bit higher level abstraction than fork+exec, I'd vote for having the os module simply import it: from spawn import spawn and thus make that function more widely available... Greg> The module as it's currently in the Distutils code is attached. Not in the message I saw... Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/~skip/ 847-971-7098 | Python: Programming the way Guido indented... From gward at cnri.reston.va.us Mon Aug 30 21:14:57 1999 From: gward at cnri.reston.va.us (Greg Ward) Date: Mon, 30 Aug 1999 15:14:57 -0400 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: <19990830150222.B428@cnri.reston.va.us>; from Greg Ward on Mon, Aug 30, 1999 at 03:02:22PM -0400 References: <19990830150222.B428@cnri.reston.va.us> Message-ID: <19990830151457.C428@cnri.reston.va.us> On 30 August 1999, To python-dev at python.org said: > The module as it's currently in the Distutils code is attached. Let me > know what you think... New definition of "attached": I'll just reply to my own message with the code I meant to attach. D'oh! ------------------------------------------------------------------------ """distutils.spawn Provides the 'spawn()' function, a front-end to various platform- specific functions for launching another program in a sub-process.""" # created 1999/07/24, Greg Ward __rcsid__ = "$Id: spawn.py,v 1.2 1999/08/29 18:20:56 gward Exp $" import sys, os, string from distutils.errors import * def spawn (cmd, search_path=1, verbose=0, dry_run=0): """Run another program, specified as a command list 'cmd', in a new process. 'cmd' is just the argument list for the new process, ie. cmd[0] is the program to run and cmd[1:] are the rest of its arguments. There is no way to run a program with a name different from that of its executable. If 'search_path' is true (the default), the system's executable search path will be used to find the program; otherwise, cmd[0] must be the exact path to the executable. If 'verbose' is true, a one-line summary of the command will be printed before it is run. If 'dry_run' is true, the command will not actually be run. Raise DistutilsExecError if running the program fails in any way; just return on success.""" if os.name == 'posix': _spawn_posix (cmd, search_path, verbose, dry_run) elif os.name in ( 'nt', 'windows' ): # ??? _spawn_nt (cmd, search_path, verbose, dry_run) else: raise DistutilsPlatformError, \ "don't know how to spawn programs on platform '%s'" % os.name # spawn () def _spawn_nt ( cmd, search_path=1, verbose=0, dry_run=0): import string executable = cmd[0] if search_path: paths = string.split( os.environ['PATH'], os.pathsep) base,ext = os.path.splitext(executable) if (ext != '.exe'): executable = executable + '.exe' if not os.path.isfile(executable): paths.reverse() # go over the paths and keep the last one for p in paths: f = os.path.join( p, executable ) if os.path.isfile ( f ): # the file exists, we have a shot at spawn working executable = f if verbose: print string.join ( [executable] + cmd[1:], ' ') if not dry_run: # spawn for NT requires a full path to the .exe rc = os.spawnv (os.P_WAIT, executable, cmd) if rc != 0: raise DistutilsExecError("command failed: %d" % rc) def _spawn_posix (cmd, search_path=1, verbose=0, dry_run=0): if verbose: print string.join (cmd, ' ') if dry_run: return exec_fn = search_path and os.execvp or os.execv pid = os.fork () if pid == 0: # in the child try: #print "cmd[0] =", cmd[0] #print "cmd =", cmd exec_fn (cmd[0], cmd) except OSError, e: sys.stderr.write ("unable to execute %s: %s\n" % (cmd[0], e.strerror)) os._exit (1) sys.stderr.write ("unable to execute %s for unknown reasons" % cmd[0]) os._exit (1) else: # in the parent # Loop until the child either exits or is terminated by a signal # (ie. keep waiting if it's merely stopped) while 1: (pid, status) = os.waitpid (pid, 0) if os.WIFSIGNALED (status): raise DistutilsExecError, \ "command %s terminated by signal %d" % \ (cmd[0], os.WTERMSIG (status)) elif os.WIFEXITED (status): exit_status = os.WEXITSTATUS (status) if exit_status == 0: return # hey, it succeeded! else: raise DistutilsExecError, \ "command %s failed with exit status %d" % \ (cmd[0], exit_status) elif os.WIFSTOPPED (status): continue else: raise DistutilsExecError, \ "unknown error executing %s: termination status %d" % \ (cmd[0], status) # _spawn_posix () ------------------------------------------------------------------------ -- Greg Ward - software developer gward at cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913 From gward at cnri.reston.va.us Mon Aug 30 21:31:55 1999 From: gward at cnri.reston.va.us (Greg Ward) Date: Mon, 30 Aug 1999 15:31:55 -0400 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: <14282.54880.922571.792484@dolphin.mojam.com>; from Skip Montanaro on Mon, Aug 30, 1999 at 02:11:50PM -0500 References: <19990830150222.B428@cnri.reston.va.us> <14282.54880.922571.792484@dolphin.mojam.com> Message-ID: <19990830153155.D428@cnri.reston.va.us> On 30 August 1999, Skip Montanaro said: > > Greg> it recently occured to me that the 'spawn' module I wrote for the > Greg> Distutils (and which Perry Stoll extended to handle NT), could fit > Greg> nicely in the core library. > > How's spawn.spawn semantically different from the Windows-dependent > os.spawn? My understanding (purely from reading Perry's code!) is that the Windows spawnv() and spawnve() calls require the full path of the executable, and there is no spawnvp(). Hence, the bulk of Perry's '_spawn_nt()' function is code to search the system path if the 'search_path' flag is true. In '_spawn_posix()', I just use either 'execv()' or 'execvp()' for this. The bulk of my code is the complicated dance required to wait for a fork'ed child process to finish. > How are stdout/stdin/stderr connected to the child process - just > like fork+exec or something slightly higher level like os.popen? Just like fork 'n exec -- '_spawn_posix()' is just a front end to fork and exec (either execv or execvp). In a previous life, I *did* implement a spawning module for a certain other popular scripting language that handles redirection and capturing (backticks in the shell and that other scripting language). It was a lot of fun, but pretty hairy. Took three attempts gradually developed over two years to get it right in the end. In fact, it does all the easy stuff that a Unix shell does in spawning commands, ie. search the path, fork 'n exec, and redirection and capturing. Doesn't handle the tricky stuff, ie. pipelines and job control. The documentation for this module is 22 pages long; the code is 600+ lines of somewhat tricky Perl (1300 lines if you leave in comments and blank lines). That's why the Distutils spawn module doesn't do anything with std{out,err,in}. > If it's semantically like os.spawn and a little bit higher level > abstraction than fork+exec, I'd vote for having the os module simply > import it: So os.spawnv and os.spawnve would be Windows-specific, but os.spawn portable? Could be confusing. And despite the recent extended discussion of the os module, I'm not sure if this fits the model. BTW, is there anything like this on the Mac? On what other OSs does it even make sense to talk about programs spawning other programs? (Surely those GUI user interfaces have to do *something*...) Greg -- Greg Ward - software developer gward at cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913 From skip at mojam.com Mon Aug 30 21:52:49 1999 From: skip at mojam.com (Skip Montanaro) Date: Mon, 30 Aug 1999 14:52:49 -0500 (CDT) Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: <19990830153155.D428@cnri.reston.va.us> References: <19990830150222.B428@cnri.reston.va.us> <14282.54880.922571.792484@dolphin.mojam.com> <19990830153155.D428@cnri.reston.va.us> Message-ID: <14282.57574.918011.54595@dolphin.mojam.com> Greg> BTW, is there anything like this on the Mac? There will be, once Jack Jansen contributes _spawn_mac... ;-) Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/~skip/ 847-971-7098 | Python: Programming the way Guido indented... From jack at oratrix.nl Mon Aug 30 23:25:04 1999 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 30 Aug 1999 23:25:04 +0200 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: Message by Greg Ward , Mon, 30 Aug 1999 15:31:55 -0400 , <19990830153155.D428@cnri.reston.va.us> Message-ID: <19990830212509.7F5C018B9FB@oratrix.oratrix.nl> Recently, Greg Ward said: > BTW, is there anything like this on the Mac? On what other OSs does it > even make sense to talk about programs spawning other programs? (Surely > those GUI user interfaces have to do *something*...) Yes, but the interface is quite a bit more high-level, so it's pretty difficult to reconcile with the Unix and Windows "every argument is a string" paradigm. You start the process and pass along an AppleEvent (basically an RPC-call) that will be presented to the program upon startup. So on the mac there's a serious difference between (inventing the API interface here, cut down to make it understandable to non-macheads:-) spawn("netscape", ("Open", "file.html")) and spawn("netscape", ("OpenURL", "http://foo.com/file.html")) The mac interface is (of course:-) infinitely more powerful, allowing you to talk to running apps, adressing stuff in it as COM/OLE does, etc. but unfortunately the simple case of spawn("rm", "-rf", "/") is impossible to represent in a meaningful way. Add to that the fact that there's no stdin/stdout/stderr and there's little common ground. The one area of common ground is "run program X on files Y and Z and wait (or don't wait) for completion", so that is something that could maybe have a special method that could be implemented on all three mentioned platforms (and probably everything else as well). And even then it'll be surprising to Mac users that they have to _exit_ their editor (if you specify wait), not something people commonly do. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From guido at CNRI.Reston.VA.US Mon Aug 30 23:29:55 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Mon, 30 Aug 1999 17:29:55 -0400 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: Your message of "Mon, 30 Aug 1999 23:25:04 +0200." <19990830212509.7F5C018B9FB@oratrix.oratrix.nl> References: <19990830212509.7F5C018B9FB@oratrix.oratrix.nl> Message-ID: <199908302129.RAA08442@eric.cnri.reston.va.us> > Recently, Greg Ward said: > > BTW, is there anything like this on the Mac? On what other OSs does it > > even make sense to talk about programs spawning other programs? (Surely > > those GUI user interfaces have to do *something*...) > > Yes, but the interface is quite a bit more high-level, so it's pretty > difficult to reconcile with the Unix and Windows "every argument is a > string" paradigm. You start the process and pass along an AppleEvent > (basically an RPC-call) that will be presented to the program upon > startup. > > So on the mac there's a serious difference between (inventing the API > interface here, cut down to make it understandable to non-macheads:-) > spawn("netscape", ("Open", "file.html")) > and > spawn("netscape", ("OpenURL", "http://foo.com/file.html")) > > The mac interface is (of course:-) infinitely more powerful, allowing > you to talk to running apps, adressing stuff in it as COM/OLE does, > etc. but unfortunately the simple case of spawn("rm", "-rf", "/") is > impossible to represent in a meaningful way. > > Add to that the fact that there's no stdin/stdout/stderr and there's > little common ground. The one area of common ground is "run program X > on files Y and Z and wait (or don't wait) for completion", so that is > something that could maybe have a special method that could be > implemented on all three mentioned platforms (and probably everything > else as well). And even then it'll be surprising to Mac users that > they have to _exit_ their editor (if you specify wait), not something > people commonly do. Indeed. I'm guessing that Greg wrote his code specifically to drive compilers, not so much to invoke an editor on a specific file. It so happens that the Windows compilers have command lines that look sufficiently like the Unix compilers that this might actually work. On the Mac, driving the compilers is best done using AppleEvents, so it's probably better to to try to abuse the spawn() interface for that... (Greg, is there a higher level where the compiler actions are described without referring to specific programs, but perhaps just to compiler actions and input and output files?) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at CNRI.Reston.VA.US Mon Aug 30 23:35:45 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Mon, 30 Aug 1999 17:35:45 -0400 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: Your message of "Mon, 30 Aug 1999 15:02:22 EDT." <19990830150222.B428@cnri.reston.va.us> References: <19990830150222.B428@cnri.reston.va.us> Message-ID: <199908302135.RAA08467@eric.cnri.reston.va.us> > it recently occured to me that the 'spawn' module I wrote for the > Distutils (and which Perry Stoll extended to handle NT), could fit > nicely in the core library. On Unix, it's just a front-end to > fork-and-exec; on NT, it's a front-end to spawnv(). In either case, > it's just enough code (and just tricky enough code) that not everybody > should have to duplicate it for their own uses. > > The basic idea is this: > > from spawn import spawn > ... > spawn (['cmd', 'arg1', 'arg2']) > # or > spawn (['cmd'] + args) > > you get the idea: it takes a *list* representing the command to spawn: > no strings to parse, no shells to get in the way, no sneaky > meta-characters ruining your day, draining your efficiency, or > compromising your security. (Conversely, no pipelines, redirection, > etc.) > > The 'spawn()' function just calls '_spawn_posix()' or '_spawn_nt()' > depending on os.name. Additionally, it takes a couple of optional > keyword arguments (all booleans): 'search_path', 'verbose', and > 'dry_run', which do pretty much what you'd expect. > > The module as it's currently in the Distutils code is attached. Let me > know what you think... I'm not sure that the verbose and dry_run options belong in the standard library. When both are given, this does something semi-useful; for Posix that's basically just printing the arguments, while for NT it prints the exact command that will be executed. Not sure if that's significant though. Perhaps it's better to extract the code that runs the path to find the right executable and make that into a separate routine. (Also, rather than reversing the path, I would break out of the loop at the first hit.) --Guido van Rossum (home page: http://www.python.org/~guido/) From gward at cnri.reston.va.us Mon Aug 30 23:38:36 1999 From: gward at cnri.reston.va.us (Greg Ward) Date: Mon, 30 Aug 1999 17:38:36 -0400 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: <199908302129.RAA08442@eric.cnri.reston.va.us>; from Guido van Rossum on Mon, Aug 30, 1999 at 05:29:55PM -0400 References: <19990830212509.7F5C018B9FB@oratrix.oratrix.nl> <199908302129.RAA08442@eric.cnri.reston.va.us> Message-ID: <19990830173836.F428@cnri.reston.va.us> On 30 August 1999, Guido van Rossum said: > Indeed. I'm guessing that Greg wrote his code specifically to drive > compilers, not so much to invoke an editor on a specific file. It so > happens that the Windows compilers have command lines that look > sufficiently like the Unix compilers that this might actually work. Correct, but the spawn module I posted should work for any case where you want to run an external command synchronously without redirecting I/O. (And it could probably be extended to handle those cases, but a) I don't need them for Distutils [yet!], and b) I don't know how to do it portably.) > On the Mac, driving the compilers is best done using AppleEvents, so > it's probably better to to try to abuse the spawn() interface for > that... (Greg, is there a higher level where the compiler actions are > described without referring to specific programs, but perhaps just to > compiler actions and input and output files?) [off-topic alert... probably belongs on distutils-sig, but there you go] Yes, my CCompiler class is all about providing a (hopefully) compiler- and platform-neutral interface to a C/C++ compiler. Currently there're only two concrete subclasses of this: UnixCCompiler and MSVCCompiler, and they both obviously use spawn, because Unix C compilers and MSVC both provide that kind of interface. A hypothetical sibling class that provides an interface to some Mac C compiler might use a souped-up spawn that "knows about" Apple Events, or it might use some other interface to Apple Events. If Jack's simplified summary of what passing Apple Events to a command looks like is accurate, maybe spawn can be souped up to work on the Mac. Or we might need a dedicated module for running Mac programs. So does anybody have code to run external programs on the Mac using Apple Events? Would it be possible/reasonable to add that as '_spawn_mac()' to my spawn module? Greg -- Greg Ward - software developer gward at cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913 From jack at oratrix.nl Mon Aug 30 23:52:29 1999 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 30 Aug 1999 23:52:29 +0200 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: Message by Greg Ward , Mon, 30 Aug 1999 17:38:36 -0400 , <19990830173836.F428@cnri.reston.va.us> Message-ID: <19990830215234.ED4E718B9FB@oratrix.oratrix.nl> Hmm, if we're talking a "Python Make" or some such here the best way would probably be to use Tool Server. Tool Server is a thing that is based on Apple's old MPW programming environment, that is still supported by compiler vendors like MetroWerks. The nice thing of Tool Server for this type of work is that it _is_ command-line based, so you can probably send it things like spawn("cc", "-O", "test.c") But, although I know it is possible to do this with ToolServer, I haven't a clue on how to do it... -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From tim_one at email.msn.com Tue Aug 31 07:44:18 1999 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 31 Aug 1999 01:44:18 -0400 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: <19990830153155.D428@cnri.reston.va.us> Message-ID: <000101bef373$de2974c0$932d153f@tim> [Greg Ward] > ... > In a previous life, I *did* implement a spawning module for > a certain other popular scripting language that handles > redirection and capturing (backticks in the shell and that other > scripting language). It was a lot of fun, but pretty hairy. Took > three attempts gradually developed over two years to get it right > in the end. In fact, it does all the easy stuff that a Unix shell > does in spawning commands, ie. search the path, fork 'n exec, and > redirection and capturing. Doesn't handle the tricky stuff, ie. > pipelines and job control. > > The documentation for this module is 22 pages long; the code > is 600+ lines of somewhat tricky Perl (1300 lines if you leave > in comments and blank lines). That's why the Distutils spawn > module doesn't do anything with std{out,err,in}. Note that win/tclWinPipe.c-- which contains the Windows-specific support for Tcl's "exec" cmd --is about 3,200 lines of C. It does handle pipelines and redirection, and even fakes pipes as needed with temp files when it can identify a pipeline component as belonging to the 16-bit subsystem. Even so, the Tcl help page for "exec" bristles with hilarious caveats under the Windows subsection; e.g., When redirecting from NUL:, some applications may hang, others will get an infinite stream of "0x01" bytes, and some will actually correctly get an immediate end-of-file; the behavior seems to depend upon something compiled into the application itself. When redirecting greater than 4K or so to NUL:, some applications will hang. The above problems do not happen with 32-bit applications. Still, people seem very happy with Tcl's exec, and I'm certain no language tries harder to provide a portable way to "do command lines". Two points to that: 1) If Python ever wants to do something similar, let's steal the Tcl code (& unlike stealing Perl's code, stealing Tcl's code actually looks possible -- it's very much better organized and written). 2) For all its heroic efforts to hide platform limitations, int Tcl_ExecObjCmd(dummy, interp, objc, objv) ClientData dummy; /* Not used. */ Tcl_Interp *interp; /* Current interpreter. */ int objc; /* Number of arguments. */ Tcl_Obj *CONST objv[]; /* Argument objects. */ { #ifdef MAC_TCL Tcl_AppendResult(interp, "exec not implemented under Mac OS", (char *)NULL); return TCL_ERROR; #else ... a-generalized-spawn-is-a-good-start-ly y'rs - tim From fredrik at pythonware.com Tue Aug 31 08:39:56 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 31 Aug 1999 08:39:56 +0200 Subject: [Python-Dev] Portable "spawn" module for core? References: <19990830150222.B428@cnri.reston.va.us> Message-ID: <005101bef37b$b0415070$f29b12c2@secret.pythonware.com> Greg Ward wrote: > it recently occured to me that the 'spawn' module I wrote for the > Distutils (and which Perry Stoll extended to handle NT), could fit > nicely in the core library. On Unix, it's just a front-end to > fork-and-exec; on NT, it's a front-end to spawnv(). any reason this couldn't go into the os module instead? just add parts of it to os.py, and change the docs to say that spawn* are supported on Windows and Unix... (supporting the full set of spawn* primitives would of course be nice, btw. just like os.py provides all exec variants...) From da at ski.org Tue Aug 3 01:01:26 1999 From: da at ski.org (David Ascher) Date: Mon, 2 Aug 1999 16:01:26 -0700 (Pacific Daylight Time) Subject: [Python-Dev] Pickling w/ low overhead Message-ID: An issue which has dogged the NumPy project is that there is (to my knowledge) no way to pickle very large arrays without creating strings which contain all of the data. This can be a problem given that NumPy arrays tend to be very large -- often several megabytes, sometimes much bigger. This slows things down, sometimes a lot, depending on the platform. It seems that it should be possible to do something more efficient. Two alternatives come to mind: -- define a new pickling protocol which passes a file-like object to the instance and have the instance write itself to that file, being as efficient or inefficient as it cares to. This protocol is used only if the instance/type defines the appropriate slot. Alternatively, enrich the semantics of the getstate interaction, so that an object can return partial data and tell the pickling mechanism to come back for more. -- make pickling of objects which support the buffer interface use that inteface's notion of segments and use that 'chunk' size to do something more efficient if not necessarily most efficient. (oh, and make NumPy arrays support the buffer interface =). This is simple for NumPy arrays since we want to pickle "everything", but may not be what other buffer-supporting objects want. Thoughts? Alternatives? --david From mhammond at skippinet.com.au Tue Aug 3 02:41:23 1999 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue, 3 Aug 1999 10:41:23 +1000 Subject: [Python-Dev] Buffer interface in abstract.c? Message-ID: <001001bedd48$ea796280$1101a8c0@bobcat> Hi all, Im trying to slowly wean myself over to the buffer interfaces. My exploration so far indicates that, for most cases, simply replacing "PyString_FromStringAndSize" with "PyBuffer_FromMemory" handles the vast majority of cases, and is preferred when the data contains arbitary bytes. PyArg_ParseTuple("s#", ...) still works correctly as we would hope. However, performing this explicitly is a pain. Looking at getargs.c, the code to achieve this is a little too convoluted to cut-and-paste each time. Therefore, I would like to propose these functions to be added to abstract.c: int PyObject_GetBufferSize(); void *PyObject_GetReadWriteBuffer(); /* or "char *"? */ const void *PyObject_GetReadOnlyBuffer(); Although equivalent functions exist for the buffer object, I can't see the equivalent abstract implementations - ie, that work with any object supporting the protocol. Im willing to provide a patch if there is agreement a) the general idea is good, and b) my specific spelling of the idea is OK (less likely - PyBuffer_* seems better, but loses any implication of being abstract?). Thoughts? Mark. From gstein at lyra.org Tue Aug 3 03:51:43 1999 From: gstein at lyra.org (Greg Stein) Date: Mon, 02 Aug 1999 18:51:43 -0700 Subject: [Python-Dev] Buffer interface in abstract.c? References: <001001bedd48$ea796280$1101a8c0@bobcat> Message-ID: <37A64B2F.3386F0A9@lyra.org> Mark Hammond wrote: > ... > Therefore, I would like to propose these functions to be added to > abstract.c: > > int PyObject_GetBufferSize(); > void *PyObject_GetReadWriteBuffer(); /* or "char *"? */ > const void *PyObject_GetReadOnlyBuffer(); > > Although equivalent functions exist for the buffer object, I can't see the > equivalent abstract implementations - ie, that work with any object > supporting the protocol. > > Im willing to provide a patch if there is agreement a) the general idea is > good, and b) my specific spelling of the idea is OK (less likely - > PyBuffer_* seems better, but loses any implication of being abstract?). Marc-Andre proposed exactly the same thing back at the end of March (to me and Guido). The two of us hashed out some of the stuff and M.A. came up with a full patch for the stuff. Guido was relatively non-committal at the point one way or another, but said they seemed fine. It appears the stuff never made it into source control. If Marc-Andre can resurface the final proposal/patch, then we'd be set. Until then: use the bufferprocs :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Tue Aug 3 11:11:11 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 03 Aug 1999 11:11:11 +0200 Subject: [Python-Dev] Pickling w/ low overhead References: Message-ID: <37A6B22F.7A14BA2C@lemburg.com> David Ascher wrote: > > An issue which has dogged the NumPy project is that there is (to my > knowledge) no way to pickle very large arrays without creating strings > which contain all of the data. This can be a problem given that NumPy > arrays tend to be very large -- often several megabytes, sometimes much > bigger. This slows things down, sometimes a lot, depending on the > platform. It seems that it should be possible to do something more > efficient. > > Two alternatives come to mind: > > -- define a new pickling protocol which passes a file-like object to the > instance and have the instance write itself to that file, being as > efficient or inefficient as it cares to. This protocol is used only > if the instance/type defines the appropriate slot. Alternatively, > enrich the semantics of the getstate interaction, so that an object > can return partial data and tell the pickling mechanism to come back > for more. > > -- make pickling of objects which support the buffer interface use that > inteface's notion of segments and use that 'chunk' size to do > something more efficient if not necessarily most efficient. (oh, and > make NumPy arrays support the buffer interface =). This is simple > for NumPy arrays since we want to pickle "everything", but may not be > what other buffer-supporting objects want. > > Thoughts? Alternatives? Hmm, types can register their own pickling/unpickling functions via copy_reg, so they can access the self.write method in pickle.py to implement the write to file interface. Don't know how this would be done for cPickle.c though. For instances the situation is different since there is no dispatching done on a per-class basis. I guess an optional argument could help here. Perhaps some lazy pickling wrapper would help fix this in general: an object which calls back into the to-be-pickled object to access the data rather than store the data in a huge string. Yet another idea would be using memory mapped files instead of strings as temporary storage (but this is probably hard to implement right and not as portable). Dunno... just some thoughts. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 150 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Tue Aug 3 09:50:33 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 03 Aug 1999 09:50:33 +0200 Subject: [Python-Dev] Buffer interface in abstract.c? References: <001001bedd48$ea796280$1101a8c0@bobcat> <37A64B2F.3386F0A9@lyra.org> Message-ID: <37A69F49.3575AE85@lemburg.com> Greg Stein wrote: > > Mark Hammond wrote: > > ... > > Therefore, I would like to propose these functions to be added to > > abstract.c: > > > > int PyObject_GetBufferSize(); > > void *PyObject_GetReadWriteBuffer(); /* or "char *"? */ > > const void *PyObject_GetReadOnlyBuffer(); > > > > Although equivalent functions exist for the buffer object, I can't see the > > equivalent abstract implementations - ie, that work with any object > > supporting the protocol. > > > > Im willing to provide a patch if there is agreement a) the general idea is > > good, and b) my specific spelling of the idea is OK (less likely - > > PyBuffer_* seems better, but loses any implication of being abstract?). > > Marc-Andre proposed exactly the same thing back at the end of March (to > me and Guido). The two of us hashed out some of the stuff and M.A. came > up with a full patch for the stuff. Guido was relatively non-committal > at the point one way or another, but said they seemed fine. It appears > the stuff never made it into source control. > > If Marc-Andre can resurface the final proposal/patch, then we'd be set. Below is the code I currently use. I don't really remember if this is what Greg and I discussed a while back, but I'm sure he'll correct me ;-) Note that you the buffer length is implicitly returned by these APIs. /* Takes an arbitrary object which must support the character (single segment) buffer interface and returns a pointer to a read-only memory location useable as character based input for subsequent processing. buffer and buffer_len are only set in case no error occurrs. Otherwise, -1 is returned and an exception set. */ static int PyObject_AsCharBuffer(PyObject *obj, const char **buffer, int *buffer_len) { PyBufferProcs *pb = obj->ob_type->tp_as_buffer; const char *pp; int len; if ( pb == NULL || pb->bf_getcharbuffer == NULL || pb->bf_getsegcount == NULL ) { PyErr_SetString(PyExc_TypeError, "expected a character buffer object"); goto onError; } if ( (*pb->bf_getsegcount)(obj,NULL) != 1 ) { PyErr_SetString(PyExc_TypeError, "expected a single-segment buffer object"); goto onError; } len = (*pb->bf_getcharbuffer)(obj,0,&pp); if (len < 0) goto onError; *buffer = pp; *buffer_len = len; return 0; onError: return -1; } /* Same as PyObject_AsCharBuffer() except that this API expects a readable (single segment) buffer interface and returns a pointer to a read-only memory location which can contain arbitrary data. buffer and buffer_len are only set in case no error occurrs. Otherwise, -1 is returned and an exception set. */ static int PyObject_AsReadBuffer(PyObject *obj, const void **buffer, int *buffer_len) { PyBufferProcs *pb = obj->ob_type->tp_as_buffer; void *pp; int len; if ( pb == NULL || pb->bf_getreadbuffer == NULL || pb->bf_getsegcount == NULL ) { PyErr_SetString(PyExc_TypeError, "expected a readable buffer object"); goto onError; } if ( (*pb->bf_getsegcount)(obj,NULL) != 1 ) { PyErr_SetString(PyExc_TypeError, "expected a single-segment buffer object"); goto onError; } len = (*pb->bf_getreadbuffer)(obj,0,&pp); if (len < 0) goto onError; *buffer = pp; *buffer_len = len; return 0; onError: return -1; } /* Takes an arbitrary object which must support the writeable (single segment) buffer interface and returns a pointer to a writeable memory location in buffer of size buffer_len. buffer and buffer_len are only set in case no error occurrs. Otherwise, -1 is returned and an exception set. */ static int PyObject_AsWriteBuffer(PyObject *obj, void **buffer, int *buffer_len) { PyBufferProcs *pb = obj->ob_type->tp_as_buffer; void*pp; int len; if ( pb == NULL || pb->bf_getwritebuffer == NULL || pb->bf_getsegcount == NULL ) { PyErr_SetString(PyExc_TypeError, "expected a writeable buffer object"); goto onError; } if ( (*pb->bf_getsegcount)(obj,NULL) != 1 ) { PyErr_SetString(PyExc_TypeError, "expected a single-segment buffer object"); goto onError; } len = (*pb->bf_getwritebuffer)(obj,0,&pp); if (len < 0) goto onError; *buffer = pp; *buffer_len = len; return 0; onError: return -1; } -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 150 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jack at oratrix.nl Tue Aug 3 11:53:39 1999 From: jack at oratrix.nl (Jack Jansen) Date: Tue, 03 Aug 1999 11:53:39 +0200 Subject: [Python-Dev] Buffer interface in abstract.c? In-Reply-To: Message by "M.-A. Lemburg" , Tue, 03 Aug 1999 09:50:33 +0200 , <37A69F49.3575AE85@lemburg.com> Message-ID: <19990803095339.E02CE303120@snelboot.oratrix.nl> Why not pass the index to the As*Buffer routines as well and make getsegcount available too? Then you could code things like for(i=0; i Message-ID: <37A6C387.7360D792@lyra.org> Jack Jansen wrote: > > Why not pass the index to the As*Buffer routines as well and make getsegcount > available too? Then you could code things like > for(i=0; i if ( PyObject_AsCharBuffer(obj, &buf, &count, i) < 0 ) > return -1; > write(fp, buf, count); > } Simply because multiple segments hasn't been seen. All objects supporting the buffer interface have a single segment. IMO, it is best to drop the argument to make typical usage easier. For handling multiple segments, a caller can use the raw interface rather than the handy functions. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jim at digicool.com Tue Aug 3 12:58:54 1999 From: jim at digicool.com (Jim Fulton) Date: Tue, 03 Aug 1999 06:58:54 -0400 Subject: [Python-Dev] Buffer interface in abstract.c? References: <001001bedd48$ea796280$1101a8c0@bobcat> Message-ID: <37A6CB6E.C990F561@digicool.com> Mark Hammond wrote: > > Hi all, > Im trying to slowly wean myself over to the buffer interfaces. OK, I'll bite. Where is the buffer interface documented? I found references to it in various places (e.g. built-in buffer()) but didn't find the interface itself. Jim -- Jim Fulton mailto:jim at digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From mal at lemburg.com Tue Aug 3 13:06:46 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 03 Aug 1999 13:06:46 +0200 Subject: [Python-Dev] Buffer interface in abstract.c? References: <19990803095339.E02CE303120@snelboot.oratrix.nl> Message-ID: <37A6CD46.642A9C6D@lemburg.com> Jack Jansen wrote: > > Why not pass the index to the As*Buffer routines as well and make getsegcount > available too? Then you could code things like > for(i=0; i if ( PyObject_AsCharBuffer(obj, &buf, &count, i) < 0 ) > return -1; > write(fp, buf, count); > } Well, just like Greg said, this is not much different than using the buffer interface directly. While the above would be a handy PyObject_WriteAsBuffer() kind of helper, I don't think that this is really used all that much. E.g. in mxODBC I use the APIs for accessing the raw char data in a buffer: the pointer is passed directly to the ODBC APIs without copying, which makes things quite fast. IMHO, this is the greatest advantage of the buffer interface. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 150 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake at cnri.reston.va.us Tue Aug 3 15:07:44 1999 From: fdrake at cnri.reston.va.us (Fred L. Drake) Date: Tue, 3 Aug 1999 09:07:44 -0400 (EDT) Subject: [Python-Dev] Buffer interface in abstract.c? In-Reply-To: <37A64B2F.3386F0A9@lyra.org> References: <001001bedd48$ea796280$1101a8c0@bobcat> <37A64B2F.3386F0A9@lyra.org> Message-ID: <14246.59808.561395.761772@weyr.cnri.reston.va.us> Greg Stein writes: > Until then: use the bufferprocs :-) Greg, On the topic of the buffer interface: Have you written documentation for this that I can include in the API reference? Bugging you about this is on my to-do list. ;-) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mal at lemburg.com Tue Aug 3 13:29:43 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 03 Aug 1999 13:29:43 +0200 Subject: [Python-Dev] Buffer interface in abstract.c? References: <001001bedd48$ea796280$1101a8c0@bobcat> <37A6CB6E.C990F561@digicool.com> Message-ID: <37A6D2A7.27F27554@lemburg.com> Jim Fulton wrote: > > Mark Hammond wrote: > > > > Hi all, > > Im trying to slowly wean myself over to the buffer interfaces. > > OK, I'll bite. Where is the buffer interface documented? I found references > to it in various places (e.g. built-in buffer()) but didn't find the interface > itself. I guess it's a read-the-source feature :-) Objects/bufferobject.c and Include/object.h provide a start. Objects/stringobject.c has a "sample" implementation. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 150 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jack at oratrix.nl Tue Aug 3 16:45:25 1999 From: jack at oratrix.nl (Jack Jansen) Date: Tue, 03 Aug 1999 16:45:25 +0200 Subject: [Python-Dev] Buffer interface in abstract.c? In-Reply-To: Message by Greg Stein , Tue, 03 Aug 1999 03:25:11 -0700 , <37A6C387.7360D792@lyra.org> Message-ID: <19990803144526.6B796303120@snelboot.oratrix.nl> > > Why not pass the index to the As*Buffer routines as well and make getsegcount > > available too? > > Simply because multiple segments hasn't been seen. All objects > supporting the buffer interface have a single segment. Hmm. And I went out of my way to include this stupid multi-buffer stuff because the NumPy folks said they couldn't live without it (and one of the reasons for the buffer stuff was to allow NumPy arrays, which may be discontiguous, to be written efficiently). Can someone confirm that the Numeric stuff indeed doesn't use this? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From da at ski.org Tue Aug 3 18:19:19 1999 From: da at ski.org (David Ascher) Date: Tue, 3 Aug 1999 09:19:19 -0700 (Pacific Daylight Time) Subject: [Python-Dev] Pickling w/ low overhead In-Reply-To: <37A6B22F.7A14BA2C@lemburg.com> Message-ID: On Tue, 3 Aug 1999, M.-A. Lemburg wrote: > Hmm, types can register their own pickling/unpickling functions > via copy_reg, so they can access the self.write method in pickle.py > to implement the write to file interface. Are you sure? My understanding of copy_reg is, as stated in the doc: pickle (type, function[, constructor]) Declares that function should be used as a ``reduction'' function for objects of type or class type. function should return either a string or a tuple. The optional constructor parameter, if provided, is a callable object which can be used to reconstruct the object when called with the tuple of arguments returned by function at pickling time. How does one access the 'self.write method in pickle.py'? > Perhaps some lazy pickling wrapper would help fix this in general: > an object which calls back into the to-be-pickled object to > access the data rather than store the data in a huge string. Right. That's an idea. > Yet another idea would be using memory mapped files instead > of strings as temporary storage (but this is probably hard to implement > right and not as portable). That's a very interesting idea! I'll try that -- it might just be the easiest way to do this. I think that portability isn't a huge concern -- the folks who are coming up with the speed issue are on platforms which have mmap support. Thanks for the suggestions. --david From da at ski.org Tue Aug 3 18:20:37 1999 From: da at ski.org (David Ascher) Date: Tue, 3 Aug 1999 09:20:37 -0700 (Pacific Daylight Time) Subject: [Python-Dev] Buffer interface in abstract.c? In-Reply-To: <37A6C387.7360D792@lyra.org> Message-ID: On Tue, 3 Aug 1999, Greg Stein wrote: > Simply because multiple segments hasn't been seen. All objects > supporting the buffer interface have a single segment. IMO, it is best FYI, if/when NumPy objects support the buffer API, they will require multiple-segments. From da at ski.org Tue Aug 3 18:23:31 1999 From: da at ski.org (David Ascher) Date: Tue, 3 Aug 1999 09:23:31 -0700 (Pacific Daylight Time) Subject: [Python-Dev] Buffer interface in abstract.c? In-Reply-To: <19990803144526.6B796303120@snelboot.oratrix.nl> Message-ID: On Tue, 3 Aug 1999, Jack Jansen wrote: > > > Why not pass the index to the As*Buffer routines as well and make getsegcount > > > available too? > > > > Simply because multiple segments hasn't been seen. All objects > > supporting the buffer interface have a single segment. > > Hmm. And I went out of my way to include this stupid multi-buffer stuff > because the NumPy folks said they couldn't live without it (and one of the > reasons for the buffer stuff was to allow NumPy arrays, which may be > discontiguous, to be written efficiently). > > Can someone confirm that the Numeric stuff indeed doesn't use this? /usr/LLNLDistribution/Numerical/Include$ grep buffer *.h /usr/LLNLDistribution/Numerical/Include$ Yes. =) See the other thread on low-overhead pickling. But again, *if* multiarrays supported the buffer interface, they'd have to use the multi-segment feature (repeating myself). --david From mal at lemburg.com Tue Aug 3 21:17:16 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 03 Aug 1999 21:17:16 +0200 Subject: [Python-Dev] Pickling w/ low overhead References: Message-ID: <37A7403C.3BC05D02@lemburg.com> David Ascher wrote: > > On Tue, 3 Aug 1999, M.-A. Lemburg wrote: > > > Hmm, types can register their own pickling/unpickling functions > > via copy_reg, so they can access the self.write method in pickle.py > > to implement the write to file interface. > > Are you sure? My understanding of copy_reg is, as stated in the doc: > > pickle (type, function[, constructor]) > Declares that function should be used as a ``reduction'' function for > objects of type or class type. function should return either a string > or a tuple. The optional constructor parameter, if provided, is a > callable object which can be used to reconstruct the object when > called with the tuple of arguments returned by function at pickling > time. > > How does one access the 'self.write method in pickle.py'? Ooops. Sorry, that doesn't work... well at least not using "normal" Python ;-) You could of course simply go up one stack frame and then grab the self object and then... well, you know... -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 150 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From skip at mojam.com Tue Aug 3 22:47:04 1999 From: skip at mojam.com (Skip Montanaro) Date: Tue, 3 Aug 1999 15:47:04 -0500 (CDT) Subject: [Python-Dev] Pickling w/ low overhead In-Reply-To: References: Message-ID: <14247.21628.225029.392711@dolphin.mojam.com> David> An issue which has dogged the NumPy project is that there is (to David> my knowledge) no way to pickle very large arrays without creating David> strings which contain all of the data. This can be a problem David> given that NumPy arrays tend to be very large -- often several David> megabytes, sometimes much bigger. This slows things down, David> sometimes a lot, depending on the platform. It seems that it David> should be possible to do something more efficient. David, Using __getstate__/__setstate__, could you create a compressed representation using zlib or some other scheme? I don't know how well numeric data compresses in general, but that might help. Also, I trust you use cPickle when it's available, yes? Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/~skip/ 847-475-3758 From da at ski.org Tue Aug 3 22:58:23 1999 From: da at ski.org (David Ascher) Date: Tue, 3 Aug 1999 13:58:23 -0700 (Pacific Daylight Time) Subject: [Python-Dev] Pickling w/ low overhead In-Reply-To: <14247.21628.225029.392711@dolphin.mojam.com> Message-ID: On Tue, 3 Aug 1999, Skip Montanaro wrote: > Using __getstate__/__setstate__, could you create a compressed > representation using zlib or some other scheme? I don't know how well > numeric data compresses in general, but that might help. Also, I trust you > use cPickle when it's available, yes? I *really* hate to admit it, but I've found the source of the most massive problem in the pickling process that I was using. I didn't use binary mode, which meant that the huge strings were written & read one-character-at-a-time. I think I'll put a big fat note in the NumPy doc to that effect. (note that luckily this just affected my usage, not all NumPy users). --da From gstein at lyra.org Wed Aug 4 21:15:27 1999 From: gstein at lyra.org (Greg Stein) Date: Wed, 04 Aug 1999 12:15:27 -0700 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/api api.tex References: <199908041313.JAA26344@weyr.cnri.reston.va.us> Message-ID: <37A8914F.6F5B9971@lyra.org> Fred L. Drake wrote: > > Update of /projects/cvsroot/python/dist/src/Doc/api > In directory weyr:/home/fdrake/projects/python/Doc/api > > Modified Files: > api.tex > Log Message: > > Started documentation on buffer objects & types. Very preliminary. > > Greg Stein: Please help with this; it's your baby! > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://www.python.org/mailman/listinfo/python-checkins All righty. I'll send some doc on this stuff. Somebody else did the initial buffer interface, but it seems that it has fallen to me now :-) Please give me a little while to get to this, though. I'm in and out of town for the next four weeks. I'm in the process of moving into a new house in Palo Alto, CA, and I'm travelling back and forth until Anni and I move for real in September. I should be able to get to this by the weekend, or possibly in a couple weeks. Cheers, -g -- Greg Stein, http://www.lyra.org/ From fdrake at acm.org Wed Aug 4 23:00:26 1999 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 4 Aug 1999 17:00:26 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/api api.tex In-Reply-To: <37A8914F.6F5B9971@lyra.org> References: <199908041313.JAA26344@weyr.cnri.reston.va.us> <37A8914F.6F5B9971@lyra.org> Message-ID: <14248.43498.664539.597656@weyr.cnri.reston.va.us> Greg Stein writes: > All righty. I'll send some doc on this stuff. Somebody else did the > initial buffer interface, but it seems that it has fallen to me now :-) I was not aware that you were not the origin of this work; feel free to pass it to the right person. > Please give me a little while to get to this, though. I'm in and out of > town for the next four weeks. I'm in the process of > moving into a new house in Palo Alto, CA, and I'm travelling back and > forth until Anni and I move for real in September. Cool! > I should be able to get to this by the weekend, or possibly in a couple > weeks. That's good enough for me. I expect it may be a couple of months or more before I try and get another release out with various fixes and additions. There's not a huge need to update the released doc set, other than a few embarassing editorial...er, "oversights" (!). -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From jack at oratrix.nl Thu Aug 5 11:57:33 1999 From: jack at oratrix.nl (Jack Jansen) Date: Thu, 05 Aug 1999 11:57:33 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/api api.tex In-Reply-To: Message by Greg Stein , Wed, 04 Aug 1999 12:15:27 -0700 , <37A8914F.6F5B9971@lyra.org> Message-ID: <19990805095733.69D90303120@snelboot.oratrix.nl> > All righty. I'll send some doc on this stuff. Somebody else did the > initial buffer interface, but it seems that it has fallen to me now :-) I think I did, but I gladly bequeath it to you. (Hmm, that's the first time I typed "bequeath", I think). -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From fredrik at pythonware.com Thu Aug 5 17:46:43 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 5 Aug 1999 17:46:43 +0200 Subject: [Python-Dev] Buffer interface in abstract.c? References: Message-ID: <009801bedf59$b8150020$f29b12c2@secret.pythonware.com> > > Simply because multiple segments hasn't been seen. All objects > > supporting the buffer interface have a single segment. IMO, it is best > > FYI, if/when NumPy objects support the buffer API, they will require > multiple-segments. same goes for PIL. in the worst case, there's one segment per line. ... on the other hand, I think something is missing from the buffer design; I definitely don't like that people can write and marshal objects that happen to implement the buffer interface, only to find that Python didn't do what they expected... >>> import unicode >>> import marshal >>> u = unicode.unicode >>> s = u("foo") >>> data = marshal.dumps(s) >>> marshal.loads(data) 'f\000o\000o\000' >>> type(marshal.loads(data)) as for PIL, I would also prefer if the exported buffer corresponded to what you get from im.tostring(). iirc, that cannot be done -- I cannot export via a temporary memory buffer, since there's no way to know when to get rid of it... From jack at oratrix.nl Thu Aug 5 22:59:46 1999 From: jack at oratrix.nl (Jack Jansen) Date: Thu, 05 Aug 1999 22:59:46 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: Message by "Fredrik Lundh" , Thu, 5 Aug 1999 17:46:43 +0200 , <009801bedf59$b8150020$f29b12c2@secret.pythonware.com> Message-ID: <19990805205952.531B9E267A@oratrix.oratrix.nl> Recently, "Fredrik Lundh" said: > on the other hand, I think something is missing from > the buffer design; I definitely don't like that people > can write and marshal objects that happen to > implement the buffer interface, only to find that > Python didn't do what they expected... > > >>> import unicode > >>> import marshal > >>> u = unicode.unicode > >>> s = u("foo") > >>> data = marshal.dumps(s) > >>> marshal.loads(data) > 'f\000o\000o\000' > >>> type(marshal.loads(data)) > Hmm. Looking at the code there is a catchall at the end, with a comment explicitly saying "Write unknown buffer-style objects as a string". IMHO this is an incorrect design, but that's a bit philosophical (so I'll gladly defer to Our Great Philosopher if he has anything to say on the matter:-). Unless, of course, there are buffer-style non-string objects around that are better read back as strings than not read back at all. Hmm again, I think I'd like it better if marshal.dumps() would barf on attempts to write unrepresentable data. Currently unrepresentable objects are written as TYPE_UNKNOWN (unless they have bufferness (or should I call that "a buffer-aspect"? :-)), which means you think you are writing correctly marshalled data but you'll be in for an exception when you try to read it back... -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From akuchlin at mems-exchange.org Fri Aug 6 00:24:03 1999 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Thu, 5 Aug 1999 18:24:03 -0400 (EDT) Subject: [Python-Dev] mmapfile module Message-ID: <199908052224.SAA24159@amarok.cnri.reston.va.us> A while back the suggestion was made that the mmapfile module be added to the core distribution, and there was a guardedly positive reaction. Should I go ahead and do that? No one reported any problems when I asked for bug reports, but that was probably because no one tried it; putting it in the core would cause more people to try it. I suppose this leads to a more important question: at what point should we start checking 1.6-only things into the CVS tree? For example, once the current alphas of the re module are up to it (they're not yet), when should they be checked in? -- A.M. Kuchling http://starship.python.net/crew/amk/ Kids! Bringing about Armageddon can be dangerous. Do not attempt it in your home. -- Terry Pratchett & Neil Gaiman, _Good Omens_ From bwarsaw at cnri.reston.va.us Fri Aug 6 04:10:18 1999 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 5 Aug 1999 22:10:18 -0400 (EDT) Subject: [Python-Dev] mmapfile module References: <199908052224.SAA24159@amarok.cnri.reston.va.us> Message-ID: <14250.17418.781127.684009@anthem.cnri.reston.va.us> >>>>> "AMK" == Andrew M Kuchling writes: AMK> I suppose this leads to a more important question: at what AMK> point should we start checking 1.6-only things into the CVS AMK> tree? For example, once the current alphas of the re module AMK> are up to it (they're not yet), when should they be checked AMK> in? Good question. I've had a bunch of people ask about the string methods branch, which I'm assuming will be a 1.6 feature, and I'd like to get that checked in at some point too. I think what's holding this up is that Guido hasn't decided whether there will be a patch release to 1.5.2 or not. -Barry From tim_one at email.msn.com Fri Aug 6 04:26:06 1999 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 5 Aug 1999 22:26:06 -0400 Subject: [Python-Dev] mmapfile module In-Reply-To: <199908052224.SAA24159@amarok.cnri.reston.va.us> Message-ID: <000201bedfb3$09a99000$98a22299@tim> [Andrew M. Kuchling] > ... > I suppose this leads to a more important question: at what point > should we start checking 1.6-only things into the CVS tree? For > example, once the current alphas of the re module are up to it > (they're not yet), when should they be checked in? I'd like to see a bugfix release of 1.5.2 put out first, then have at it. There are several bugfixes that ought to go out ASAP. Thread tstate races, the cpickle/cookie.py snafu, and playing nice with current Tcl/Tk pop to mind immediately. I'm skeptical that anyone other than Guido could decide what *needs* to go out, so it's a good thing he's got nothing to do . one-boy's-opinion-ly y'rs - tim From mhammond at skippinet.com.au Fri Aug 6 05:30:55 1999 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri, 6 Aug 1999 13:30:55 +1000 Subject: [Python-Dev] mmapfile module In-Reply-To: <000201bedfb3$09a99000$98a22299@tim> Message-ID: <00a801bedfbc$1871a7e0$1101a8c0@bobcat> [Tim laments] > mind immediately. I'm skeptical that anyone other than Guido > could decide > what *needs* to go out, so it's a good thing he's got nothing > to do . He has been very quiet recently - where are you hiding Guido. > one-boy's-opinion-ly y'rs - tim Here is another. Lets take a different tack - what has been checked in since 1.5.2 that should _not_ go out - ie, is too controversial? If nothing else, makes a good starting point, and may help Guido out: Below summary of the CVS diff I just did, and categorized by my opinion. It turns out that most of the changes would appear candidates. While not actually "bug-fixes", many have better documentation, removal of unused imports etc, so would definately not hurt to get out. Looks like some build issues have been fixed too. Apart from possibly Tim's recent "UnboundLocalError" (which is the only serious behaviour change) I can't see anything that should obviously be ommitted. Hopefully this is of interest... [Disclaimer - lots of files here - it is quite possible I missed something...] Mark. UNCONTROVERSIAL: ---------------- RCS file: /projects/cvsroot/python/dist/src/README,v RCS file: /projects/cvsroot/python/dist/src/Lib/cgi.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/ftplib.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/poplib.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/re.py,v RCS file: /projects/cvsroot/python/dist/src/Tools/audiopy/README,v Doc changes. RCS file: /projects/cvsroot/python/dist/src/Lib/SimpleHTTPServer.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/cmd.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/htmllib.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/netrc.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/pipes.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/pty.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/shlex.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/urlparse.py,v Remove unused imports RCS file: /projects/cvsroot/python/dist/src/Lib/pdb.py,v Remove unused globals RCS file: /projects/cvsroot/python/dist/src/Lib/popen2.py,v Change to cleanup RCS file: /projects/cvsroot/python/dist/src/Lib/profile.py,v Remove unused imports and changes to comments. RCS file: /projects/cvsroot/python/dist/src/Lib/pyclbr.py,v Better doc, and support for module level functions. RCS file: /projects/cvsroot/python/dist/src/Lib/repr.py,v self.maxlist changed to self.maxdict RCS file: /projects/cvsroot/python/dist/src/Lib/rfc822.py,v Doc changes, and better date handling. RCS file: /projects/cvsroot/python/dist/src/configure,v RCS file: /projects/cvsroot/python/dist/src/configure.in,v Looks like FreeBSD build flag changes. RCS file: /projects/cvsroot/python/dist/src/Demo/classes/bitvec.py,v RCS file: /projects/cvsroot/python/dist/src/Python/pythonrun.c,v Whitespace fixes. RCS file: /projects/cvsroot/python/dist/src/Demo/scripts/makedir.py,v Check we have passed a non empty string RCS file: /projects/cvsroot/python/dist/src/Include/patchlevel.h,v 1.5.2+ RCS file: /projects/cvsroot/python/dist/src/Lib/BaseHTTPServer.py,v Remove import rfc822 and more robust errors. RCS file: /projects/cvsroot/python/dist/src/Lib/CGIHTTPServer.py,v Support for HTTP_COOKIE RCS file: /projects/cvsroot/python/dist/src/Lib/fpformat.py,v NotANumber supports class exceptions. RCS file: /projects/cvsroot/python/dist/src/Lib/macpath.py,v Use constants from stat module RCS file: /projects/cvsroot/python/dist/src/Lib/macurl2path.py,v Minor changes to path parsing RCS file: /projects/cvsroot/python/dist/src/Lib/mimetypes.py,v Recognise '.js': 'application/x-javascript', RCS file: /projects/cvsroot/python/dist/src/Lib/sunau.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/wave.py,v Support for binary files. RCS file: /projects/cvsroot/python/dist/src/Lib/whichdb.py,v Reads file header to check for bsddb format. RCS file: /projects/cvsroot/python/dist/src/Lib/xmllib.py,v XML may be at the start of the string, instead of the whole string. RCS file: /projects/cvsroot/python/dist/src/Lib/lib-tk/tkSimpleDialog.py,v Destroy method added. RCS file: /projects/cvsroot/python/dist/src/Modules/cPickle.c,v As in the log :-) RCS file: /projects/cvsroot/python/dist/src/Modules/cStringIO.c,v No longer a Py_FatalError on module init failure. RCS file: /projects/cvsroot/python/dist/src/Modules/fpectlmodule.c,v Support for OSF in #ifdefs RCS file: /projects/cvsroot/python/dist/src/Modules/makesetup,v # to handle backslashes for sh's that don't automatically # continue a read when the last char is a backslash RCS file: /projects/cvsroot/python/dist/src/Modules/posixmodule.c,v Better error handling RCS file: /projects/cvsroot/python/dist/src/Modules/timemodule.c,v #ifdef changes for __GNU_LIBRARY__/_GLIBC_ RCS file: /projects/cvsroot/python/dist/src/Python/errors.c,v Better error messages on Win32 RCS file: /projects/cvsroot/python/dist/src/Python/getversion.c,v Bigger buffer and strings. RCS file: /projects/cvsroot/python/dist/src/Python/pystate.c,v Threading bug RCS file: /projects/cvsroot/python/dist/src/Objects/floatobject.c,v Tim Peters writes:1. Fixes float divmod etc. RCS file: /projects/cvsroot/python/dist/src/Objects/listobject.c,v Doc changes, and When deallocating a list, DECREF the items from the end back to the start. RCS file: /projects/cvsroot/python/dist/src/Objects/stringobject.c,v Bug for to do with width of a formatspecifier RCS file: /projects/cvsroot/python/dist/src/Objects/tupleobject.c,v Appropriate overflow checks so that things like sys.maxint*(1,) can'tdump core. RCS file: /projects/cvsroot/python/dist/src/Lib/tempfile.py,v don't cache attributes of type int RCS file: /projects/cvsroot/python/dist/src/Lib/urllib.py,v Number of revisions. RCS file: /projects/cvsroot/python/dist/src/Lib/aifc.py,v Chunk moved to new module. RCS file: /projects/cvsroot/python/dist/src/Lib/audiodev.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/dbhash.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/dis.py,v Changes in comments. RCS file: /projects/cvsroot/python/dist/src/Lib/cmpcache.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/cmp.py,v New "shallow" arg. RCS file: /projects/cvsroot/python/dist/src/Lib/dumbdbm.py,v Coerce f.tell() to int. RCS file: /projects/cvsroot/python/dist/src/Modules/main.c,v Fix to tracebacks off by a line with -x RCS file: /projects/cvsroot/python/dist/src/Lib/lib-tk/Tkinter.py,v Number of changes you can review! OTHERS: -------- RCS file: /projects/cvsroot/python/dist/src/Lib/asynchat.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/asyncore.py,v Latest versions from Sam??? RCS file: /projects/cvsroot/python/dist/src/Lib/smtplib.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/sched.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/ConfigParser.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/SocketServer.py,v RCS file: /projects/cvsroot/python/dist/src/Lib/calendar.py,v Sorry - out of time to detail RCS file: /projects/cvsroot/python/dist/src/Python/bltinmodule.c,v Unbound local, docstring, and better support for ExtensionClasses. Freeze: Few changes IDLE: Lotsa changes :-) Number of .h files have #ifdef changes for CE I wont detail (but would be great to get a few of these in - and I have more :-) Tools directory: Number of changes - outa time to detail From mal at lemburg.com Fri Aug 6 10:54:20 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 06 Aug 1999 10:54:20 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> Message-ID: <37AAA2BC.466750B5@lemburg.com> Jack Jansen wrote: > > Recently, "Fredrik Lundh" said: > > on the other hand, I think something is missing from > > the buffer design; I definitely don't like that people > > can write and marshal objects that happen to > > implement the buffer interface, only to find that > > Python didn't do what they expected... > > > > >>> import unicode > > >>> import marshal > > >>> u = unicode.unicode > > >>> s = u("foo") > > >>> data = marshal.dumps(s) > > >>> marshal.loads(data) > > 'f\000o\000o\000' > > >>> type(marshal.loads(data)) > > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought that unicode objects use a two-byte character representation. Note that implementing the char buffer interface will also give you strange results with other code that uses PyArg_ParseTuple(...,"s#",...), e.g. you could search through Unicode strings as if they were normal 1-byte/char strings (and most certainly not find what you're looking for, I guess). > Hmm again, I think I'd like it better if marshal.dumps() would barf on > attempts to write unrepresentable data. Currently unrepresentable > objects are written as TYPE_UNKNOWN (unless they have bufferness (or > should I call that "a buffer-aspect"? :-)), which means you think you > are writing correctly marshalled data but you'll be in for an > exception when you try to read it back... I'd prefer an exception on write too. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 147 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake at acm.org Fri Aug 6 16:44:35 1999 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 6 Aug 1999 10:44:35 -0400 (EDT) Subject: [Python-Dev] mmapfile module In-Reply-To: <00a801bedfbc$1871a7e0$1101a8c0@bobcat> References: <000201bedfb3$09a99000$98a22299@tim> <00a801bedfbc$1871a7e0$1101a8c0@bobcat> Message-ID: <14250.62675.807129.878242@weyr.cnri.reston.va.us> Mark Hammond writes: > Apart from possibly Tim's recent "UnboundLocalError" (which is the only > serious behaviour change) I can't see anything that should obviously be Since UnboundLocalError is a subclass of NameError (what you got before) normally, and they are the same string when -X is used, this only represents a new name in the __builtin__ module for legacy code. This should not be a problem; the only real difference is that, using class exceptions for built-in exceptions, you get more useful information in your tracebacks. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fredrik at pythonware.com Sat Aug 7 12:51:56 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat, 7 Aug 1999 12:51:56 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> Message-ID: <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> > > > >>> import unicode > > > >>> import marshal > > > >>> u = unicode.unicode > > > >>> s = u("foo") > > > >>> data = marshal.dumps(s) > > > >>> marshal.loads(data) > > > 'f\000o\000o\000' > > > >>> type(marshal.loads(data)) > > > > > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought > that unicode objects use a two-byte character representation. >>> import array >>> import marshal >>> a = array.array >>> s = a("f", [1, 2, 3]) >>> data = marshal.dumps(s) >>> marshal.loads(data) '\000\000\200?\000\000\000@\000\000@@' looks like the various implementors haven't really understood the intentions of whoever designed the buffer interface... From mal at lemburg.com Sat Aug 7 18:14:56 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 07 Aug 1999 18:14:56 +0200 Subject: [Python-Dev] Some more constants for the socket module Message-ID: <37AC5B80.56F740DD@lemburg.com> Following the recent discussion on c.l.p about socket options, I found that the socket module does not define all constants defined in the (Linux) socket header file. Below is a patch that adds a few more (note that the SOL_* constants should be used for the setsockopt() level, not the IPPROTO_* constants). --- socketmodule.c~ Sat Aug 7 17:56:05 1999 +++ socketmodule.c Sat Aug 7 18:10:07 1999 @@ -2005,14 +2005,48 @@ initsocket() PySocketSock_Type.tp_doc = sockettype_doc; Py_INCREF(&PySocketSock_Type); if (PyDict_SetItemString(d, "SocketType", (PyObject *)&PySocketSock_Type) != 0) return; + + /* Address families (we only support AF_INET and AF_UNIX) */ +#ifdef AF_UNSPEC + insint(moddict, "AF_UNSPEC", AF_UNSPEC); +#endif insint(d, "AF_INET", AF_INET); #ifdef AF_UNIX insint(d, "AF_UNIX", AF_UNIX); #endif /* AF_UNIX */ +#ifdef AF_AX25 + insint(moddict, "AF_AX25", AF_AX25); /* Amateur Radio AX.25 */ +#endif +#ifdef AF_IPX + insint(moddict, "AF_IPX", AF_IPX); /* Novell IPX */ +#endif +#ifdef AF_APPLETALK + insint(moddict, "AF_APPLETALK", AF_APPLETALK); /* Appletalk DDP */ +#endif +#ifdef AF_NETROM + insint(moddict, "AF_NETROM", AF_NETROM); /* Amateur radio NetROM */ +#endif +#ifdef AF_BRIDGE + insint(moddict, "AF_BRIDGE", AF_BRIDGE); /* Multiprotocol bridge */ +#endif +#ifdef AF_AAL5 + insint(moddict, "AF_AAL5", AF_AAL5); /* Reserved for Werner's ATM */ +#endif +#ifdef AF_X25 + insint(moddict, "AF_X25", AF_X25); /* Reserved for X.25 project */ +#endif +#ifdef AF_INET6 + insint(moddict, "AF_INET6", AF_INET6); /* IP version 6 */ +#endif +#ifdef AF_ROSE + insint(moddict, "AF_ROSE", AF_ROSE); /* Amateur Radio X.25 PLP */ +#endif + + /* Socket types */ insint(d, "SOCK_STREAM", SOCK_STREAM); insint(d, "SOCK_DGRAM", SOCK_DGRAM); #ifndef __BEOS__ /* We have incomplete socket support. */ insint(d, "SOCK_RAW", SOCK_RAW); @@ -2048,11 +2082,10 @@ initsocket() insint(d, "SO_OOBINLINE", SO_OOBINLINE); #endif #ifdef SO_REUSEPORT insint(d, "SO_REUSEPORT", SO_REUSEPORT); #endif - #ifdef SO_SNDBUF insint(d, "SO_SNDBUF", SO_SNDBUF); #endif #ifdef SO_RCVBUF insint(d, "SO_RCVBUF", SO_RCVBUF); @@ -2111,14 +2144,43 @@ initsocket() #ifdef MSG_ETAG insint(d, "MSG_ETAG", MSG_ETAG); #endif /* Protocol level and numbers, usable for [gs]etsockopt */ -/* Sigh -- some systems (e.g. Linux) use enums for these. */ #ifdef SOL_SOCKET insint(d, "SOL_SOCKET", SOL_SOCKET); #endif +#ifdef SOL_IP + insint(moddict, "SOL_IP", SOL_IP); +#else + insint(moddict, "SOL_IP", 0); +#endif +#ifdef SOL_IPX + insint(moddict, "SOL_IPX", SOL_IPX); +#endif +#ifdef SOL_AX25 + insint(moddict, "SOL_AX25", SOL_AX25); +#endif +#ifdef SOL_ATALK + insint(moddict, "SOL_ATALK", SOL_ATALK); +#endif +#ifdef SOL_NETROM + insint(moddict, "SOL_NETROM", SOL_NETROM); +#endif +#ifdef SOL_ROSE + insint(moddict, "SOL_ROSE", SOL_ROSE); +#endif +#ifdef SOL_TCP + insint(moddict, "SOL_TCP", SOL_TCP); +#else + insint(moddict, "SOL_TCP", 6); +#endif +#ifdef SOL_UDP + insint(moddict, "SOL_UDP", SOL_UDP); +#else + insint(moddict, "SOL_UDP", 17); +#endif #ifdef IPPROTO_IP insint(d, "IPPROTO_IP", IPPROTO_IP); #else insint(d, "IPPROTO_IP", 0); #endif @@ -2266,10 +2328,32 @@ initsocket() #ifdef IP_ADD_MEMBERSHIP insint(d, "IP_ADD_MEMBERSHIP", IP_ADD_MEMBERSHIP); #endif #ifdef IP_DROP_MEMBERSHIP insint(d, "IP_DROP_MEMBERSHIP", IP_DROP_MEMBERSHIP); +#endif +#ifdef IP_DEFAULT_MULTICAST_TTL + insint(moddict, "IP_DEFAULT_MULTICAST_TTL", IP_DEFAULT_MULTICAST_TTL); +#endif +#ifdef IP_DEFAULT_MULTICAST_LOOP + insint(moddict, "IP_DEFAULT_MULTICAST_LOOP", IP_DEFAULT_MULTICAST_LOOP); +#endif +#ifdef IP_MAX_MEMBERSHIPS + insint(moddict, "IP_MAX_MEMBERSHIPS", IP_MAX_MEMBERSHIPS); +#endif + + /* TCP options */ +#ifdef TCP_NODELAY + insint(moddict, "TCP_NODELAY", TCP_NODELAY); +#endif +#ifdef TCP_MAXSEG + insint(moddict, "TCP_MAXSEG", TCP_MAXSEG); +#endif + + /* IPX options */ +#ifdef IPX_TYPE + insint(moddict, "IPX_TYPE", IPX_TYPE); #endif /* Initialize gethostbyname lock */ #ifdef USE_GETHOSTBYNAME_LOCK gethostbyname_lock = PyThread_allocate_lock(); -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 146 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein at lyra.org Sat Aug 7 22:15:08 1999 From: gstein at lyra.org (Greg Stein) Date: Sat, 07 Aug 1999 13:15:08 -0700 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> Message-ID: <37AC93CC.53982F3F@lyra.org> Fredrik Lundh wrote: > > > > > >>> import unicode > > > > >>> import marshal > > > > >>> u = unicode.unicode > > > > >>> s = u("foo") > > > > >>> data = marshal.dumps(s) > > > > >>> marshal.loads(data) > > > > 'f\000o\000o\000' > > > > >>> type(marshal.loads(data)) > > > > This was a "nicety" that was put during a round of patches that I submitted to Guido. We both had questions about it but figured that it couldn't hurt since it at least let some things be marshalled out that couldn't be marshalled before. I would suggest backing out the marshalling of buffer-interface objects and adding a mechanism for arbitrary type objects to marshal themselves. Without the second part, arrays and Unicode objects aren't marshallable at all (seems bad). > > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought > > that unicode objects use a two-byte character representation. Unicode objects should *not* implement the getcharbuffer slot. Only read, write, and segcount. > >>> import array > >>> import marshal > >>> a = array.array > >>> s = a("f", [1, 2, 3]) > >>> data = marshal.dumps(s) > >>> marshal.loads(data) > '\000\000\200?\000\000\000@\000\000@@' > > looks like the various implementors haven't > really understood the intentions of whoever > designed the buffer interface... Arrays can/should support both the getreadbuffer and getcharbuffer interface. The former: definitely. The latter: only if the contents are byte-sized. The loading back as a string is a different matter, as pointed out above. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jack at oratrix.nl Sun Aug 8 22:20:52 1999 From: jack at oratrix.nl (Jack Jansen) Date: Sun, 08 Aug 1999 22:20:52 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: Message by Greg Stein , Sat, 07 Aug 1999 13:15:08 -0700 , <37AC93CC.53982F3F@lyra.org> Message-ID: <19990808202057.DB803E267A@oratrix.oratrix.nl> Recently, Greg Stein said: > I would suggest backing out the marshalling of buffer-interface objects > and adding a mechanism for arbitrary type objects to marshal themselves. > Without the second part, arrays and Unicode objects aren't marshallable > at all (seems bad). This sounds like the right approach. It would require 2 slots in the tp_ structure and a little extra glue for the typecodes (currently marshall knows all the 1-letter typecodes for all objecttypes it can handle, but types marshalling their own objects would require a centralized registry of object types. For the time being it would probably suffice to have the mapping of type<->letter be hardcoded in marshal.h, but eventually you probably want a more extensible scheme, where Joe R. Extension-Writer could add a marshaller to his objects and know it won't collide with someone else's. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mal at lemburg.com Mon Aug 9 10:56:30 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 09 Aug 1999 10:56:30 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990808202057.DB803E267A@oratrix.oratrix.nl> Message-ID: <37AE97BE.2CADF48E@lemburg.com> Jack Jansen wrote: > > Recently, Greg Stein said: > > I would suggest backing out the marshalling of buffer-interface objects > > and adding a mechanism for arbitrary type objects to marshal themselves. > > Without the second part, arrays and Unicode objects aren't marshallable > > at all (seems bad). > > This sounds like the right approach. It would require 2 slots in the > tp_ structure and a little extra glue for the typecodes (currently > marshall knows all the 1-letter typecodes for all objecttypes it can > handle, but types marshalling their own objects would require a > centralized registry of object types. For the time being it would > probably suffice to have the mapping of type<->letter be hardcoded in > marshal.h, but eventually you probably want a more extensible scheme, > where Joe R. Extension-Writer could add a marshaller to his objects > and know it won't collide with someone else's. This registry should ideally be reachable via C APIs. Then a module writer could call these APIs in the init function of his module and he'd be set. Since marshal won't be able to handle imports on the fly (like pickle et al.), these modules will have to be imported before unmarshalling. Aside: wouldn't it make sense to move from marshal to pickle and depreciate marshal altogether ? cPickle is quite fast and much more flexible than marshal, plus it already provides mechanisms for registering new types. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 144 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jack at oratrix.nl Mon Aug 9 15:49:44 1999 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 09 Aug 1999 15:49:44 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: Message by "M.-A. Lemburg" , Mon, 09 Aug 1999 10:56:30 +0200 , <37AE97BE.2CADF48E@lemburg.com> Message-ID: <19990809134944.BB2FC303120@snelboot.oratrix.nl> > Aside: wouldn't it make sense to move from marshal to pickle and > depreciate marshal altogether ? cPickle is quite fast and much more > flexible than marshal, plus it already provides mechanisms for > registering new types. This is probably the best idea so far. Just remove the buffer-workaround in marshall, keep it functioning for the things it is used for now (like pyc files) and refer people to (c)Pickle for new development. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From guido at CNRI.Reston.VA.US Mon Aug 9 16:50:46 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Mon, 09 Aug 1999 10:50:46 -0400 Subject: [Python-Dev] Some more constants for the socket module In-Reply-To: Your message of "Sat, 07 Aug 1999 18:14:56 +0200." <37AC5B80.56F740DD@lemburg.com> References: <37AC5B80.56F740DD@lemburg.com> Message-ID: <199908091450.KAA29179@eric.cnri.reston.va.us> Thanks for the socketmodule patch, Marc. This was on my mental TO-DO list for a long time! I've checked it in. (One note: I had a bit of trouble applying the patch; apparently your mailer expanded all tabs to spaces. Perhaps you could use attachments to mail diffs? Also, you seem to have renamed 'd' to 'moddict' but you didn't send the patch for that...) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at CNRI.Reston.VA.US Mon Aug 9 18:26:28 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Mon, 09 Aug 1999 12:26:28 -0400 Subject: [Python-Dev] preferred conference date? Message-ID: <199908091626.MAA29411@eric.cnri.reston.va.us> I need your input about the date of the next Python conference. Foretec is close to a deal for a Python conference in January 2000 at the Alexandria Old Town Hilton hotel. Given our requirement of a good location in the DC area, this is a very good deal (it's a brand new hotel). The prices are high (they tell me that the whole conference will cost $900, with a room rate of $129) but it's a class A location (metro, tons of restaurants, close to National Airport, etc.) and we have found no cheaper DC hotel suitable for our purposes (even in drab suburban locations). I'm worried that I'll be flamed to hell for this by the PSA members, but I don't think we can get the price any lower without starting all over in a different location, probably causing several months of delay. If people won't come, Foretec (and I) will have learned a valuable lesson and we'll rethink the issue for the 2001 conference. Anyway, given that Foretec is likely to go with this hotel, we have a choice of two dates: January 16-19, or 23-26 (both starting on a Sunday with the tutorials). This is where I need your help: which date would you prefer? Please mail me personally. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Mon Aug 9 18:31:43 1999 From: skip at mojam.com (Skip Montanaro) Date: Mon, 9 Aug 1999 11:31:43 -0500 (CDT) Subject: [Python-Dev] preferred conference date? In-Reply-To: <199908091626.MAA29411@eric.cnri.reston.va.us> References: <199908091626.MAA29411@eric.cnri.reston.va.us> Message-ID: <14255.557.474160.824877@dolphin.mojam.com> Guido> The prices are high (they tell me that the whole conference will Guido> cost $900, with a room rate of $129) but it's a class A location No way I (or my company) can afford to plunk down $900 for me to attend... Skip From mal at lemburg.com Mon Aug 9 18:40:45 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 09 Aug 1999 18:40:45 +0200 Subject: [Python-Dev] Some more constants for the socket module References: <37AC5B80.56F740DD@lemburg.com> <199908091450.KAA29179@eric.cnri.reston.va.us> Message-ID: <37AF048D.FC0A540@lemburg.com> Guido van Rossum wrote: > > Thanks for the socketmodule patch, Marc. This was on my mental TO-DO > list for a long time! I've checked it in. Cool, thanks. > (One note: I had a bit of trouble applying the patch; apparently your > mailer expanded all tabs to spaces. Perhaps you could use attachments > to mail diffs? Ok. > Also, you seem to have renamed 'd' to 'moddict' but > you didn't send the patch for that...) Oops, sorry... my "#define to insint" script uses 'd' as moddict, that's the reason why. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 144 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at CNRI.Reston.VA.US Mon Aug 9 19:30:36 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Mon, 09 Aug 1999 13:30:36 -0400 Subject: [Python-Dev] preferred conference date? In-Reply-To: Your message of "Mon, 09 Aug 1999 11:31:43 CDT." <14255.557.474160.824877@dolphin.mojam.com> References: <199908091626.MAA29411@eric.cnri.reston.va.us> <14255.557.474160.824877@dolphin.mojam.com> Message-ID: <199908091730.NAA29559@eric.cnri.reston.va.us> > Guido> The prices are high (they tell me that the whole conference will > Guido> cost $900, with a room rate of $129) but it's a class A location > > No way I (or my company) can afford to plunk down $900 for me to attend... Let me clarify this. The $900 is for the whole 4-day conference, including a day of tutorials and developers' day. I don't know what the exact price breakdown will be, but the tutorials will probably be $300. Last year the total price was $700, with $250 for tutorials. --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov at inrialpes.fr Tue Aug 10 14:04:27 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Tue, 10 Aug 1999 13:04:27 +0100 (NFT) Subject: [Python-Dev] shrinking dicts Message-ID: <199908101204.NAA29572@pukapuka.inrialpes.fr> Currently, dictionaries always grow until they are deallocated from memory. This happens in PyDict_SetItem according to the following code (before inserting the new item into the dict): /* if fill >= 2/3 size, double in size */ if (mp->ma_fill*3 >= mp->ma_size*2) { if (dictresize(mp, mp->ma_used*2) != 0) { if (mp->ma_fill+1 > mp->ma_size) return -1; } } The symmetric case is missing and this has intrigued me for a long time, but I've never had the courage to look deeply into this portion of code and try to propose a solution. Which is: reduce the size of the dict by half when the nb of used items <= 1/6 the size. This situation occurs far less frequently than dict growing, but anyways, it seems useful for the degenerate cases where a dict has a peek usage, then most of the items are deleted. This is usually the case for global dicts holding dynamic object collections, etc. A bonus effect of shrinking big dicts with deleted items is that the lookup speed may be improved, because of the cleaned entries and the reduced overall size (resulting in a better hit ratio). The (only) solution I could came with for this pb is the appended patch. It is not immediately obvious, but in practice, it seems to work fine. (inserting a print statement after the condition, showing the dict size and current usage helps in monitoring what's going on). Any other ideas on how to deal with this? Thoughts, comments? -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 -------------------------------[ cut here ]--------------------------- *** dictobject.c-1.5.2 Fri Aug 6 18:51:02 1999 --- dictobject.c Tue Aug 10 12:21:15 1999 *************** *** 417,423 **** ep->me_value = NULL; mp->ma_used--; Py_DECREF(old_value); ! Py_DECREF(old_key); return 0; } --- 417,430 ---- ep->me_value = NULL; mp->ma_used--; Py_DECREF(old_value); ! Py_DECREF(old_key); ! /* For bigger dictionaries, if used <= 1/6 size, half the size */ ! if (mp->ma_size > MINSIZE*4 && mp->ma_used*6 <= mp->ma_size) { ! if (dictresize(mp, mp->ma_used*2) != 0) { ! if (mp->ma_fill > mp->ma_size) ! return -1; ! } ! } return 0; } From Vladimir.Marangozov at inrialpes.fr Tue Aug 10 15:20:36 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Tue, 10 Aug 1999 14:20:36 +0100 (NFT) Subject: [Python-Dev] shrinking dicts In-Reply-To: <199908101204.NAA29572@pukapuka.inrialpes.fr> from "Vladimir Marangozov" at "Aug 10, 99 01:04:27 pm" Message-ID: <199908101320.OAA21986@pukapuka.inrialpes.fr> I wrote: > > The (only) solution I could came with for this pb is the appended patch. > It is not immediately obvious, but in practice, it seems to work fine. > (inserting a print statement after the condition, showing the dict size > and current usage helps in monitoring what's going on). > > Any other ideas on how to deal with this? Thoughts, comments? > To clarify a bit what the patch does "as is", here's a short description: The code is triggered in PyDict_DelItem only for sizes which are > MINSIZE*4, i.e. greater than 4*4 = 16. Therefore, resizing will occur for a min size of 32 items. one third 32 / 3 = 10 two thirds 32 * 2/3 = 21 one sixth 32 / 6 = 5 So the shinking will happen for a dict size of 32, of which 5 items are used (the sixth was just deleted). After the dictresize, the size will be 16, of which 5 items are used, i.e. one third. The threshold is fixed by the first condition of the patch. It could be made 64, instead of 32. This is subject to discussion... Obviously, this is most useful for bigger dicts, not for small ones. A threshold of 32 items seemed to me to be a reasonable compromise. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From fredrik at pythonware.com Tue Aug 10 14:35:33 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 10 Aug 1999 14:35:33 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> Message-ID: <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> Greg Stein wrote: > > > > > >>> import unicode > > > > > >>> import marshal > > > > > >>> u = unicode.unicode > > > > > >>> s = u("foo") > > > > > >>> data = marshal.dumps(s) > > > > > >>> marshal.loads(data) > > > > > 'f\000o\000o\000' > > > > > >>> type(marshal.loads(data)) > > > > > > > > > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought > > > that unicode objects use a two-byte character representation. > > Unicode objects should *not* implement the getcharbuffer slot. Only > read, write, and segcount. unicode objects do not implement the getcharbuffer slot. here's the relevant descriptor: static PyBufferProcs unicode_as_buffer = { (getreadbufferproc) unicode_buffer_getreadbuf, (getwritebufferproc) unicode_buffer_getwritebuf, (getsegcountproc) unicode_buffer_getsegcount }; the array module uses a similar descriptor. maybe the unicode class shouldn't implement the buffer interface at all? sure looks like the best way to avoid trivial mistakes (the current behaviour of fp.write(unicodeobj) is even more serious than the marshal glitch...) or maybe the buffer design needs an overhaul? From guido at CNRI.Reston.VA.US Tue Aug 10 16:12:23 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Tue, 10 Aug 1999 10:12:23 -0400 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: Your message of "Tue, 10 Aug 1999 14:35:33 +0200." <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> Message-ID: <199908101412.KAA02065@eric.cnri.reston.va.us> > Greg Stein wrote: > > > > > > >>> import unicode > > > > > > >>> import marshal > > > > > > >>> u = unicode.unicode > > > > > > >>> s = u("foo") > > > > > > >>> data = marshal.dumps(s) > > > > > > >>> marshal.loads(data) > > > > > > 'f\000o\000o\000' > > > > > > >>> type(marshal.loads(data)) > > > > > > > > > > > > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought > > > > that unicode objects use a two-byte character representation. > > > > Unicode objects should *not* implement the getcharbuffer slot. Only > > read, write, and segcount. > > unicode objects do not implement the getcharbuffer slot. > here's the relevant descriptor: > > static PyBufferProcs unicode_as_buffer = { > (getreadbufferproc) unicode_buffer_getreadbuf, > (getwritebufferproc) unicode_buffer_getwritebuf, > (getsegcountproc) unicode_buffer_getsegcount > }; > > the array module uses a similar descriptor. > > maybe the unicode class shouldn't implement the > buffer interface at all? sure looks like the best way > to avoid trivial mistakes (the current behaviour of > fp.write(unicodeobj) is even more serious than the > marshal glitch...) > > or maybe the buffer design needs an overhaul? I think most places that should use the charbuffer interface actually use the readbuffer interface. This is what should be fixed. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Tue Aug 10 19:53:56 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 10 Aug 1999 19:53:56 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> Message-ID: <37B06734.4339D3BF@lemburg.com> Fredrik Lundh wrote: > > unicode objects do not implement the getcharbuffer slot. >... > or maybe the buffer design needs an overhaul? I think its usage does. The character slot should be used whenever character data is needed, not the read buffer slot. The latter one is for passing around raw binary data (without reinterpretation !), if I understood Greg correctly back when I gave those abstract APIs a try. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 143 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Tue Aug 10 19:39:29 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 10 Aug 1999 19:39:29 +0200 Subject: [Python-Dev] shrinking dicts References: <199908101204.NAA29572@pukapuka.inrialpes.fr> Message-ID: <37B063D1.29F3106A@lemburg.com> Vladimir Marangozov wrote: > > Currently, dictionaries always grow until they are deallocated from > memory. This happens in PyDict_SetItem according to the following > code (before inserting the new item into the dict): > > /* if fill >= 2/3 size, double in size */ > if (mp->ma_fill*3 >= mp->ma_size*2) { > if (dictresize(mp, mp->ma_used*2) != 0) { > if (mp->ma_fill+1 > mp->ma_size) > return -1; > } > } > > The symmetric case is missing and this has intrigued me for a long time, > but I've never had the courage to look deeply into this portion of code > and try to propose a solution. Which is: reduce the size of the dict by > half when the nb of used items <= 1/6 the size. > > This situation occurs far less frequently than dict growing, but anyways, > it seems useful for the degenerate cases where a dict has a peek usage, > then most of the items are deleted. This is usually the case for global > dicts holding dynamic object collections, etc. > > A bonus effect of shrinking big dicts with deleted items is that > the lookup speed may be improved, because of the cleaned entries > and the reduced overall size (resulting in a better hit ratio). > > The (only) solution I could came with for this pb is the appended patch. > It is not immediately obvious, but in practice, it seems to work fine. > (inserting a print statement after the condition, showing the dict size > and current usage helps in monitoring what's going on). > > Any other ideas on how to deal with this? Thoughts, comments? I think that integrating this into the C code is not really that effective since the situation will not occur that often and then it often better to let the programmer decide rather than integrate an automatic downsize. You can call dict.update({}) to force an internal resize (the empty dictionary can be made global since it is not manipulated in any way and thus does not cause creation overhead). Perhaps a new method .resize(approx_size) would make this even clearer. This would also have the benefit of allowing a programmer to force allocation of the wanted size, e.g. d = {} d.resize(10000) # Insert 10000 items in a batch insert -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 143 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Vladimir.Marangozov at inrialpes.fr Tue Aug 10 21:58:27 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Tue, 10 Aug 1999 20:58:27 +0100 (NFT) Subject: [Python-Dev] shrinking dicts In-Reply-To: <37B063D1.29F3106A@lemburg.com> from "M.-A. Lemburg" at "Aug 10, 99 07:39:29 pm" Message-ID: <199908101958.UAA22028@pukapuka.inrialpes.fr> M.-A. Lemburg wrote: > > [me] > > Any other ideas on how to deal with this? Thoughts, comments? > > I think that integrating this into the C code is not really that > effective since the situation will not occur that often and then > it often better to let the programmer decide rather than integrate > an automatic downsize. Agreed that the situation is rare. But if it occurs, its Python's responsability to manage its data structures (and system resources) efficiently. As a programmer, I really don't want to be bothered with internals -- I trust the interpreter for that. Moreover, how could I decide that at some point, some dict needs to be resized in my fairly big app, say IDLE? > > You can call dict.update({}) to force an internal > resize (the empty dictionary can be made global since it is not > manipulated in any way and thus does not cause creation overhead). I know that I can force the resize in other ways, but this is not the point. I'm usually against the idea of changing the programming logic because of my advanced knowledge of the internals. > > Perhaps a new method .resize(approx_size) would make this even > clearer. This would also have the benefit of allowing a programmer > to force allocation of the wanted size, e.g. > > d = {} > d.resize(10000) > # Insert 10000 items in a batch insert This is interesting, but the two ideas are not mutually excusive. Python has to dowsize dicts automatically (just the same way it doubles the size automatically). Offering more through an API is a plus for hackers. ;-) -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal at lemburg.com Tue Aug 10 22:19:46 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 10 Aug 1999 22:19:46 +0200 Subject: [Python-Dev] shrinking dicts References: <199908101958.UAA22028@pukapuka.inrialpes.fr> Message-ID: <37B08962.6DFB3F0@lemburg.com> Vladimir Marangozov wrote: > > M.-A. Lemburg wrote: > > > > [me] > > > Any other ideas on how to deal with this? Thoughts, comments? > > > > I think that integrating this into the C code is not really that > > effective since the situation will not occur that often and then > > it often better to let the programmer decide rather than integrate > > an automatic downsize. > > Agreed that the situation is rare. But if it occurs, its Python's > responsability to manage its data structures (and system resources) > efficiently. As a programmer, I really don't want to be bothered with > internals -- I trust the interpreter for that. Moreover, how could > I decide that at some point, some dict needs to be resized in my > fairly big app, say IDLE? You usually don't ;-) because "normal" dict only grow (well, more or less). The downsizing thing will only become a problem if you use dictionaries in certain algorithms and there you handle the problem manually. My stack implementation uses the same trick, BTW. Memory is cheap and with an extra resize method (which the mxStack implementation has), problems can be dealt with explicitly for everyone to see in the code. > > You can call dict.update({}) to force an internal > > resize (the empty dictionary can be made global since it is not > > manipulated in any way and thus does not cause creation overhead). > > I know that I can force the resize in other ways, but this is not > the point. I'm usually against the idea of changing the programming > logic because of my advanced knowledge of the internals. True, that why I mentioned... > > > > Perhaps a new method .resize(approx_size) would make this even > > clearer. This would also have the benefit of allowing a programmer > > to force allocation of the wanted size, e.g. > > > > d = {} > > d.resize(10000) > > # Insert 10000 items in a batch insert > > This is interesting, but the two ideas are not mutually excusive. > Python has to dowsize dicts automatically (just the same way it doubles > the size automatically). Offering more through an API is a plus for > hackers. ;-) It's not really for hackers: the point is that it makes the technique visible and understandable (as opposed to the hack above). The same could be useful for lists too (the hack there is l = [None] * size, which I find rather difficult to understand at first sight...). -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 143 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mhammond at skippinet.com.au Wed Aug 11 00:39:30 1999 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 11 Aug 1999 08:39:30 +1000 Subject: [Python-Dev] shrinking dicts In-Reply-To: <37B08962.6DFB3F0@lemburg.com> Message-ID: <010901bee381$36ee5d30$1101a8c0@bobcat> Looking over the messages from Marc and Vladimir, Im going to add my 2c worth. IMO, Marc's position is untenable iff it can be demonstrated that the "average" program is likely to see "sparse" dictionaries, and such dictionaries have an adverse effect on either speed or memory. The analogy is quite simple - you dont need to manually resize lists or dicts before inserting (to allocate more storage - an internal implementation issue) so neither should you need to manually resize when deleting (to reclaim that storage - still internal implementation). Suggesting that the allocation of resources should be automatic, but the recycling of them not be automatic flies in the face of everything else - eg, you dont need to delete each object - when it is no longer referenced, its memory is reclaimed automatically. Marc's position is only reasonable if the specific case we are talking about is very very rare, and unlikely to be hit by anyone with normal, real-world requirements or programs. In this case, exposing the implementation detail is reasonable. So, the question comes down to: "What is the benefit to Vladmir's patch?" Maybe we need some metrics on some dictionaries. For example, maybe a doctored Python that kept stats for each dictionary and log this info. The output of this should be able to tell you what savings you could possibly expect. If you find that the average program really would not benefit at all (say only a few K from a small number of dicts) then the horse was probably dead well before we started flogging it. If however you can demonstrate serious benefits could be achieved, then interest may pick up and I too would lobby for automatic downsizing. Mark. From tim_one at email.msn.com Wed Aug 11 07:30:20 1999 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 11 Aug 1999 01:30:20 -0400 Subject: [Python-Dev] shrinking dicts In-Reply-To: <199908101204.NAA29572@pukapuka.inrialpes.fr> Message-ID: <000001bee3ba$9b226f60$8d2d2399@tim> [Vladimir] > Currently, dictionaries always grow until they are deallocated from > memory. It's more accurate to say they never shrink <0.9 wink>. Even that has exceptions, though, starting with: > This happens in PyDict_SetItem according to the following > code (before inserting the new item into the dict): > > /* if fill >= 2/3 size, double in size */ > if (mp->ma_fill*3 >= mp->ma_size*2) { > if (dictresize(mp, mp->ma_used*2) != 0) { > if (mp->ma_fill+1 > mp->ma_size) > return -1; > } > } This code can shrink the dict too. The load factor computation is based on "fill", but the resize is based on "used". If you grow a huge dict, then delete all the entries one by one, "used" falls to 0 but "fill" stays at its high-water mark. At least 1/3rd of the entries are NULL, so "fill" continues to climb as keys are added again: when the load factor computation triggers again, "used" may be as small as 1, and dictresize can shrink the dict dramatically. The only clear a priori return I see in your patch is that I might save memory if I delete gobs of stuff from a dict and then neither get rid of it nor add keys to it again. But my programs generally grow dicts forever, grow then delete them entirely, or cycle through fat and lean times (in which case the code above already shrinks them from time to time). So I don't expect that your patch would be buy me anything I want, but would cost me more on every delete. > ... > Any other ideas on how to deal with this? Thoughts, comments? Just that slowing the expected case to prevent theoretical bad cases is usually a net loss -- I think the onus is on you to demonstrate that this change is an exception to that rule. I do recall one real-life complaint about it on c.l.py a couple years ago: the poster had a huge dict, eventually deleted most of the items, and then kept it around purely for lookups. They were happy enough to copy the dict into a fresh one a key+value pair at a time; today they could just do d = d.copy() or even d.update({}) to shrink the dict. It would certainly be good to document these tricks! if-it-wasn't-a-problem-the-first-8-years-of-python's-life-it's-hard-to- see-why-1999-is-special-ly y'rs - tim From tim_one at email.msn.com Wed Aug 11 08:45:49 1999 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 11 Aug 1999 02:45:49 -0400 Subject: [Python-Dev] preferred conference date? In-Reply-To: <199908091626.MAA29411@eric.cnri.reston.va.us> Message-ID: <000201bee3c5$25b47b00$8d2d2399@tim> [Guido] > ... > The prices are high (they tell me that the whole conference will cost > $900, with a room rate of $129) Is room rental in addition to, or included in, that $900? > ... > I'm worried that I'll be flamed to hell for this by the PSA members, So have JulieK announce it . > ... > Anyway, given that Foretec is likely to go with this hotel, we have a > choice of two dates: January 16-19, or 23-26 (both starting on a > Sunday with the tutorials). This is where I need your help: which > date would you prefer? 23-26 for me; 16-19 may not be doable. or-everyone-can-switch-to-windows-and-we'll-do-the-conference-via- netmeeting-ly y'rs - tim From Vladimir.Marangozov at inrialpes.fr Wed Aug 11 16:33:17 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Wed, 11 Aug 1999 15:33:17 +0100 (NFT) Subject: [Python-Dev] shrinking dicts In-Reply-To: <000001bee3ba$9b226f60$8d2d2399@tim> from "Tim Peters" at "Aug 11, 99 01:30:20 am" Message-ID: <199908111433.PAA31842@pukapuka.inrialpes.fr> Tim Peters wrote: > > [Vladimir] > > Currently, dictionaries always grow until they are deallocated from > > memory. > > It's more accurate to say they never shrink <0.9 wink>. Even that has > exceptions, though, starting with: > > > This happens in PyDict_SetItem according to the following > > code (before inserting the new item into the dict): > > > > /* if fill >= 2/3 size, double in size */ > > if (mp->ma_fill*3 >= mp->ma_size*2) { > > if (dictresize(mp, mp->ma_used*2) != 0) { > > if (mp->ma_fill+1 > mp->ma_size) > > return -1; > > } > > } > > This code can shrink the dict too. The load factor computation is based on > "fill", but the resize is based on "used". If you grow a huge dict, then > delete all the entries one by one, "used" falls to 0 but "fill" stays at its > high-water mark. At least 1/3rd of the entries are NULL, so "fill" > continues to climb as keys are added again: when the load factor > computation triggers again, "used" may be as small as 1, and dictresize can > shrink the dict dramatically. Thanks for clarifying this! > [snip] > > > ... > > Any other ideas on how to deal with this? Thoughts, comments? > > Just that slowing the expected case to prevent theoretical bad cases is > usually a net loss -- I think the onus is on you to demonstrate that this > change is an exception to that rule. I won't, because this case is rare in practice, classifying it already as an exception. A real exception. I'll have to think a bit more about all this. Adding 1/3 new entries to trigger the next resize sounds suboptimal (if it happens at all). > I do recall one real-life complaint > about it on c.l.py a couple years ago: the poster had a huge dict, > eventually deleted most of the items, and then kept it around purely for > lookups. They were happy enough to copy the dict into a fresh one a > key+value pair at a time; today they could just do > > d = d.copy() > > or even > > d.update({}) > > to shrink the dict. > > It would certainly be good to document these tricks! I think that officializing these tricks in the documentation is a bad idea. > > if-it-wasn't-a-problem-the-first-8-years-of-python's-life-it's-hard-to- > see-why-1999-is-special-ly y'rs - tim > This is a good (your favorite ;-) argument, but don't forget that you've been around, teaching people various tricks. And 1999 is special -- we just had a solar eclipse today, the next being scheduled for 2081. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From fredrik at pythonware.com Wed Aug 11 16:07:44 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 11 Aug 1999 16:07:44 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> Message-ID: <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> > > or maybe the buffer design needs an overhaul? > > I think most places that should use the charbuffer interface actually > use the readbuffer interface. This is what should be fixed. ok. btw, how about adding support for buffer access to data that have strange internal formats (like cer- tain PIL image memories) or isn't directly accessible (like "virtual" and "abstract" image buffers in PIL 1.1). something like: int initbuffer(PyObject* obj, void** context); int exitbuffer(PyObject* obj, void* context); and corresponding context arguments to the rest of the functions... From guido at CNRI.Reston.VA.US Wed Aug 11 16:42:10 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Wed, 11 Aug 1999 10:42:10 -0400 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: Your message of "Wed, 11 Aug 1999 16:07:44 +0200." <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> Message-ID: <199908111442.KAA04423@eric.cnri.reston.va.us> > btw, how about adding support for buffer access > to data that have strange internal formats (like cer- > tain PIL image memories) or isn't directly accessible > (like "virtual" and "abstract" image buffers in PIL 1.1). > something like: > > int initbuffer(PyObject* obj, void** context); > int exitbuffer(PyObject* obj, void* context); > > and corresponding context arguments to the > rest of the functions... Can you explain this idea more? Without more understanding of PIL I have no idea what you're talking about... --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one at email.msn.com Thu Aug 12 07:15:39 1999 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 12 Aug 1999 01:15:39 -0400 Subject: [Python-Dev] shrinking dicts In-Reply-To: <199908111433.PAA31842@pukapuka.inrialpes.fr> Message-ID: <000301bee481$b78ae5c0$4e2d2399@tim> [Tim] >> ...slowing the expected case to prevent theoretical bad cases is >> usually a net loss -- I think the onus is on you to demonstrate >> that this change is an exception to that rule. [Vladimir Marangozov] > I won't, because this case is rare in practice, classifying it already > as an exception. A real exception. I'll have to think a bit more about > all this. Adding 1/3 new entries to trigger the next resize sounds > suboptimal (if it happens at all). "Suboptimal" with respect to which specific cost model? Exhibiting a specific bad case isn't compelling, and especially not when it's considered to be "a real exception". Adding new expense to every delete is an obvious new burden -- where's the payback, and is the expected net effect amortized across all dict usage a win or loss? Offhand it sounds like a small loss to me, although I haven't worked up a formal cost model either . > ... > I think that officializing these tricks in the documentation is a > bad idea. It's rarely a good idea to keep truths secret, although implementation-du-jour tricks don't belong in the current doc set. Probably in a HowTo. >> if-it-wasn't-a-problem-the-first-8-years-of-python's-life-it's-hard- >> to-see-why-1999-is-special-ly y'rs - tim > This is a good (your favorite ;-) argument, I actually hate that kind of argument -- it's one of *Guido's* favorites, and in his current silent state I'm simply channeling him . > but don't forget that you've been around, teaching people various > tricks. As I said, this particular trick has come up only once in real life in my experience; it's never come up in my own code; it's an anti-FAQ. People are 100x more likely to whine about theoretical quadratic-time list growth nobody has ever encountered (although it looks like they may finally get it under an out-of-the-box BDW collector!). > And 1999 is special -- we just had a solar eclipse today, the next being > scheduled for 2081. Ya, like any of us will survive Y2K to see it . 1999-is-special-cuz-it's-the-end-of-civilization-ly y'rs - tim From Vladimir.Marangozov at inrialpes.fr Thu Aug 12 20:22:06 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Thu, 12 Aug 1999 19:22:06 +0100 (NFT) Subject: [Python-Dev] about line numbers Message-ID: <199908121822.TAA40444@pukapuka.inrialpes.fr> Just curious: Is python with vs. without "-O" equivalent today regarding line numbers? Are SET_LINENO opcodes a plus in some situations or not? Next, I see quite often several SET_LINENO in a row in the beginning of code objects due to doc strings, etc. Since I don't think that folding them into one SET_LINENO would be an optimisation (it would rather be avoiding the redundancy), is it possible and/or reasonable to do something in this direction? A trivial example: >>> def f(): ... "This is a comment about f" ... a = 1 ... >>> import dis >>> dis.dis(f) 0 SET_LINENO 1 3 SET_LINENO 2 6 SET_LINENO 3 9 LOAD_CONST 1 (1) 12 STORE_FAST 0 (a) 15 LOAD_CONST 2 (None) 18 RETURN_VALUE >>> Can the above become something like this instead: 0 SET_LINENO 3 3 LOAD_CONST 1 (1) 6 STORE_FAST 0 (a) 9 LOAD_CONST 2 (None) 12 RETURN_VALUE -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From jack at oratrix.nl Fri Aug 13 00:02:06 1999 From: jack at oratrix.nl (Jack Jansen) Date: Fri, 13 Aug 1999 00:02:06 +0200 Subject: [Python-Dev] about line numbers In-Reply-To: Message by Vladimir Marangozov , Thu, 12 Aug 1999 19:22:06 +0100 (NFT) , <199908121822.TAA40444@pukapuka.inrialpes.fr> Message-ID: <19990812220211.B3CED993@oratrix.oratrix.nl> The only possible problem I can see with folding linenumbers is if someone sets a breakpoint on such a line. And I think it'll be difficult to explain the missing line numbers to pdb, so there isn't an easy workaround (at least, it takes more than my 30 seconds of brainpoewr to come up with one:-). -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From Vladimir.Marangozov at inrialpes.fr Fri Aug 13 01:10:26 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 13 Aug 1999 00:10:26 +0100 (NFT) Subject: [Python-Dev] shrinking dicts In-Reply-To: <000301bee481$b78ae5c0$4e2d2399@tim> from "Tim Peters" at "Aug 12, 99 01:15:39 am" Message-ID: <199908122310.AAA29618@pukapuka.inrialpes.fr> Tim Peters wrote: > > [Tim] > >> ...slowing the expected case to prevent theoretical bad cases is > >> usually a net loss -- I think the onus is on you to demonstrate > >> that this change is an exception to that rule. > > [Vladimir Marangozov] > > I won't, because this case is rare in practice, classifying it already > > as an exception. A real exception. I'll have to think a bit more about > > all this. Adding 1/3 new entries to trigger the next resize sounds > > suboptimal (if it happens at all). > > "Suboptimal" with respect to which specific cost model? Exhibiting a > specific bad case isn't compelling, and especially not when it's considered > to be "a real exception". Adding new expense to every delete is an obvious > new burden -- where's the payback, and is the expected net effect amortized > across all dict usage a win or loss? Offhand it sounds like a small loss to > me, although I haven't worked up a formal cost model either . C'mon Tim, don't try to impress me with cost models. I'm already impressed :-) Anyways, I've looked at some traces. As expected, the conclusion is that this case is extremely rare wrt the average dict usage. There are 3 reasons: (1) dicts are usually deleted entirely and (2) del d[key] is rare in practice (3) often d[key] = None is used instead of (2). There is, however, a small percentage of dicts which are used below 1/3 of their size. I must say, below 1/3 of their peek size, because dowsizing is also rare. To trigger a downsize, 1/3 new entries of the peek size must be inserted. Besides these observations, after looking at the code one more time, I can't really understand why the resize logic is based on the "fill" watermark and not on "used". fill = used + dummy, but since lookdict returns the first free slot (null or dummy), I don't really see what's the point of using a fill watermark... Perhaps you can enlighten me on this. Using only the "used" metrics seems fine to me. I even deactivated "fill" and replaced it with "used" to see what happens -- no visible changes, except a tiny speedup I'm willing to neglect. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From Vladimir.Marangozov at inrialpes.fr Fri Aug 13 01:21:48 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 13 Aug 1999 00:21:48 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <19990812220211.B3CED993@oratrix.oratrix.nl> from "Jack Jansen" at "Aug 13, 99 00:02:06 am" Message-ID: <199908122321.AAA29572@pukapuka.inrialpes.fr> Jack Jansen wrote: > > > The only possible problem I can see with folding linenumbers is if > someone sets a breakpoint on such a line. And I think it'll be > difficult to explain the missing line numbers to pdb, so there isn't > an easy workaround (at least, it takes more than my 30 seconds of > brainpoewr to come up with one:-). > Eek! We can set a breakpoint on a doc string? :-) There's no code in there. It should be treated as a comment by pdb. I can't set a breakpoint on a comment line even in C ;-) There must be something deeper about it... -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tim_one at email.msn.com Fri Aug 13 02:07:32 1999 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 12 Aug 1999 20:07:32 -0400 Subject: [Python-Dev] about line numbers In-Reply-To: <199908121822.TAA40444@pukapuka.inrialpes.fr> Message-ID: <000101bee51f$d7601de0$fb2d2399@tim> [Vladimir Marangozov] > Is python with vs. without "-O" equivalent today regarding > line numbers? > > Are SET_LINENO opcodes a plus in some situations or not? In theory it should make no difference, except that the trace mechanism makes a callback on each SET_LINENO, and that's how the debugger implements line-number breakpoints. Under -O, there are no SET_LINENOs, so debugger line-number breakpoints don't work under -O. I think there's also a sporadic buglet, which I've never bothered to track down: sometimes a line number reported in a traceback under -O (&, IIRC, it's always the topmost line number) comes out as a senseless negative value. > Next, I see quite often several SET_LINENO in a row in the beginning > of code objects due to doc strings, etc. Since I don't think that > folding them into one SET_LINENO would be an optimisation (it would > rather be avoiding the redundancy), is it possible and/or reasonable > to do something in this direction? All opcodes consume time, although a wasted trip or two around the eval loop at the start of a function isn't worth much effort to avoid. Still, it's a legitimate opportunity for provable speedup, even if unmeasurable speedup . Would be more valuable to rethink the debugger's breakpoint approach so that SET_LINENO is never needed (line-triggered callbacks are expensive because called so frequently, turning each dynamic SET_LINENO into a full-blown Python call; if I used the debugger often enough to care , I'd think about munging in a new opcode to make breakpoint sites explicit). immutability-is-made-to-be-violated-ly y'rs - tim From tim_one at email.msn.com Fri Aug 13 06:53:38 1999 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 13 Aug 1999 00:53:38 -0400 Subject: [Python-Dev] shrinking dicts In-Reply-To: <199908122307.AAA06018@pukapuka.inrialpes.fr> Message-ID: <000101bee547$cffaa020$992d2399@tim> [Vladimir Marangozov, *almost* seems ready to give up on a counter- productive dict pessimization ] > ... > There is, however, a small percentage of dicts which are used > below 1/3 of their size. I must say, below 1/3 of their peek size, > because dowsizing is also rare. To trigger a downsize, 1/3 new > entries of the peek size must be inserted. Not so, although "on average" 1/6 may be correct. Look at an extreme: Say a dict has size 333 (it can't, but it makes the math obvious ...). Say it contains 221 items. Now someone deletes them all, one at a time. used==0 and fill==221 at this point. They insert one new key that happens to hit one of the 333-221 = 112 remaining NULL keys. Then used==1 and fill==222. They insert a 2nd key, and before the dict is searched the new fill of 222 triggers the 2/3rds load-factor resizing -- which asks for a new size of 1*2 == 2. For the minority of dicts that go up and down in size wildly many times, the current behavior is fine. > Besides these observations, after looking at the code one more > time, I can't really understand why the resize logic is based on > the "fill" watermark and not on "used". fill = used + dummy, but > since lookdict returns the first free slot (null or dummy), I don't > really see what's the point of using a fill watermark... Let's just consider an unsuccessful search. Then it does return "the first" free slot, but not necessarily at the time it *sees* the first free slot. So long as it sees a dummy, it has to keep searching; the search doesn't end until it finds a NULL. So consider this, assuming the resize triggered only on "used": d = {} for i in xrange(50000): d[random.randrange(1000000)] = 1 for k in d.keys(): del d[k] # now there are 50000 dummy dict keys, and some number of NULLs # loop invariant: used == 0 for i in xrange(sys.maxint): j = random.randrange(10000000) d[j] = 1 del d[j] assert not d.has_key(i) However many NULL slots remained, the last loop eventually transforms them *all* into dummies. The dummies act exactly like "real keys" with respect to expected time for an unsuccessful search, which is why it's thoroughly appropriate to include dummies in the load factor computation. The loop will run slower and slower as the percentage of dummies approaches 100%, and each failing has_key approaches O(N) time. In most hash table implementations that's the worst that can happen (and it's a disaster), but under Python's implementation it's worse: Python never checks to see whether the probe sequence "wraps around", so the first search after the last NULL is changed to a dummy never ends. Counting the dummies in the load-factor computation prevents all that: no matter how much inserts and deletes are intermixed, the "effective load factor" stays under 2/3rds so gives excellent expected-case behavior; and it also protects against an all-dummy dict, making the lack of an expensive inner-loop "wrapped around?" check safe. > Perhaps you can enlighten me on this. Using only the "used" metrics > seems fine to me. I even deactivated "fill" and replaced it with "used" > to see what happens -- no visible changes, except a tiny speedup I'm > willing to neglect. You need a mix of deletes and inserts for the dummies to make a difference; dicts that always grow don't have dummies, so they're not likely to have any dummy-related problems either . Try this (untested): import time from random import randrange N = 1000 thatmany = [None] * N while 1: start = time.clock() for i in thatmany: d[randrange(10000000)] = 1 for i in d.keys(): del d[i] finish = time.clock() print round(finish - start, 3) Succeeding iterations of the outer loop should grow dramatically slower, and finally get into an infinite loop, despite that "used" never exceeds N. Short course rewording: for purposes of predicting expected search time, a dummy is the same as a live key, because finding a dummy doesn't end a search -- it has to press on until either finding the key it was looking for, or finding a NULL. And with a mix of insertions and deletions, and if the hash function is doing a good job, then over time all the slots in the table will become either live or dummy, even if "used" stays within a very small range. So, that's why . dictobject-may-be-the-subtlest-object-there-is-ly y'rs - tim From gstein at lyra.org Fri Aug 13 11:13:55 1999 From: gstein at lyra.org (Greg Stein) Date: Fri, 13 Aug 1999 02:13:55 -0700 (PDT) Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> Message-ID: On Tue, 10 Aug 1999, Fredrik Lundh wrote: >... > unicode objects do not implement the getcharbuffer slot. This is Goodness. All righty. >... > maybe the unicode class shouldn't implement the > buffer interface at all? sure looks like the best way It is needed for fp.write(unicodeobj) ... It is also very handy for C functions to deal with Unicode strings. > to avoid trivial mistakes (the current behaviour of > fp.write(unicodeobj) is even more serious than the > marshal glitch...) What's wrong with fp.write(unicodeobj)? It should write the unicode value to the file. Are you suggesting that it will need to be done differently? Icky. > or maybe the buffer design needs an overhaul? Not that I know of. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Aug 13 12:36:13 1999 From: gstein at lyra.org (Greg Stein) Date: Fri, 13 Aug 1999 03:36:13 -0700 (PDT) Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: <199908101412.KAA02065@eric.cnri.reston.va.us> Message-ID: On Tue, 10 Aug 1999, Guido van Rossum wrote: >... > > or maybe the buffer design needs an overhaul? > > I think most places that should use the charbuffer interface actually > use the readbuffer interface. This is what should be fixed. I believe that I properly changed all of these within the core distribution. Per your requested design, third-party extensions must switch from "s#" to "t#" to move to the charbuffer interface, as needed. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Vladimir.Marangozov at inrialpes.fr Fri Aug 13 15:47:05 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 13 Aug 1999 14:47:05 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <000101bee51f$d7601de0$fb2d2399@tim> from "Tim Peters" at "Aug 12, 99 08:07:32 pm" Message-ID: <199908131347.OAA30740@pukapuka.inrialpes.fr> Tim Peters wrote: > > [Vladimir Marangozov, *almost* seems ready to give up on a counter- > productive dict pessimization ] > Of course I will! Now everything is perfectly clear. Thanks. > ... > So, that's why . > Now, *this* one explanation of yours should go into a HowTo/BecauseOf for developers. I timed your scripts and a couple of mine which attest (again) the validity of the current implementation. My patch is out of bounds. It even disturbs from time to time the existing harmony in the results ;-) because of early resizing. All in all, for performance reasons, dicts remain an exception to the rule of releasing memory ASAP. They have been designed to tolerate caching because of their dynamics, which is the main reason for the rare case addressed by my patch. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal at lemburg.com Fri Aug 13 19:27:19 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 13 Aug 1999 19:27:19 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: Message-ID: <37B45577.7772CAA1@lemburg.com> Greg Stein wrote: > > On Tue, 10 Aug 1999, Guido van Rossum wrote: > >... > > > or maybe the buffer design needs an overhaul? > > > > I think most places that should use the charbuffer interface actually > > use the readbuffer interface. This is what should be fixed. > > I believe that I properly changed all of these within the core > distribution. Per your requested design, third-party extensions must > switch from "s#" to "t#" to move to the charbuffer interface, as needed. Shouldn't this be the other way around ? After all, extensions using "s#" do expect character data and not arbitrary binary encodings of information. IMHO, the latter should be special cased, not the prior. E.g. it doesn't make sense to use the re module to scan over 2-byte Unicode with single character based search patterns. Aside: Is the buffer interface reachable in any way from within Python ? Why isn't the interface exposed via __XXX__ methods on normal Python instances (could be implemented by returning a buffer object) ? -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 140 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake at acm.org Fri Aug 13 17:32:40 1999 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 13 Aug 1999 11:32:40 -0400 (EDT) Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: <37B45577.7772CAA1@lemburg.com> References: <37B45577.7772CAA1@lemburg.com> Message-ID: <14260.15000.398399.840716@weyr.cnri.reston.va.us> M.-A. Lemburg writes: > Aside: Is the buffer interface reachable in any way from within > Python ? Why isn't the interface exposed via __XXX__ methods > on normal Python instances (could be implemented by returning a > buffer object) ? Would it even make sense? I though a large part of the intent was to for performance, avoiding memory copies. Perhaps there should be an .__as_buffer__() which returned an object that supports the C buffer interface. I'm not sure how useful it would be; perhaps for classes that represent image data? They could return a buffer object created from a string/array/NumPy array. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fredrik at pythonware.com Fri Aug 13 17:59:12 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 13 Aug 1999 17:59:12 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us> Message-ID: <00d401bee5a7$13f84750$f29b12c2@secret.pythonware.com> > Would it even make sense? I though a large part of the intent was > to for performance, avoiding memory copies. looks like there's some confusion here over what the buffer interface is all about. time for a new GvR essay, perhaps? From fdrake at acm.org Fri Aug 13 18:22:09 1999 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 13 Aug 1999 12:22:09 -0400 (EDT) Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: <00d401bee5a7$13f84750$f29b12c2@secret.pythonware.com> References: <37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us> <00d401bee5a7$13f84750$f29b12c2@secret.pythonware.com> Message-ID: <14260.17969.497916.382752@weyr.cnri.reston.va.us> Fredrik Lundh writes: > looks like there's some confusion here over > what the buffer interface is all about. time > for a new GvR essay, perhaps? If he'll write something about it, I'll be glad to adapt it to the extending & embedding manual. It seems important that it be included in the standard documentation since it will be important for extension writers to understand when they should implement it. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fredrik at pythonware.com Fri Aug 13 18:34:46 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 13 Aug 1999 18:34:46 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> Message-ID: <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> Guido van Rossum wrote: > > btw, how about adding support for buffer access > > to data that have strange internal formats (like cer- > > tain PIL image memories) or isn't directly accessible > > (like "virtual" and "abstract" image buffers in PIL 1.1). > > something like: > > > > int initbuffer(PyObject* obj, void** context); > > int exitbuffer(PyObject* obj, void* context); > > > > and corresponding context arguments to the > > rest of the functions... > > Can you explain this idea more? Without more understanding of PIL I > have no idea what you're talking about... in code: void* context; // this can be done at any time segments = pb->getsegcount(obj, NULL, context); if (!pb->bf_initbuffer(obj, &context)) ... failed to initialise buffer api ... ... allocate segment size buffer ... pb->getsegcount(obj, &bytes, context); ... calculate total buffer size and allocate buffer ... for (i = offset = 0; i < segments; i++) { n = pb->getreadbuffer(obj, i, &p, context); if (n < 0) ... failed to fetch a given segment ... memcpy(buf + offset, p, n); // or write to file, or whatevef offset = offset + n; } pb->bf_exitbuffer(obj, context); in other words, this would given the target object a chance to keep some local context (like a temporary buffer) during a sequence of buffer operations... for PIL, this would make it possible to 1) store required metadata (size, mode, palette) along with the actual buffer contents. 2) possibly pack formats that use extra internal storage for performance reasons -- RGB pixels are stored as 32-bit integers, for example. 3) access virtual image memories (that can only be accessed via a buffer-like interface in them- selves -- given an image object, you acquire an access handle, and use a getdata method to access the actual data. without initbuffer, there's no way to do two buffer access in parallel. without exitbuffer, there's no way to release the access handle. without the context variable, there's nowhere to keep the access handle between calls.) 4) access abstract image memories (like virtual memories, but they reside outside PIL, like on a remote server, or inside another image pro- cessing library, or on a hardware device). 5) convert to external formats on the fly: fp.write(im.buffer("JPEG")) and probably a lot more. as far as I can tell, nothing of this can be done using the current design... ... besides, what about buffers and threads? if you return a pointer from getreadbuf, wouldn't it be good to know exactly when Python doesn't need that pointer any more? explicit initbuffer/exitbuffer calls around each sequence of buffer operations would make that a lot safer... From mal at lemburg.com Fri Aug 13 21:16:44 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 13 Aug 1999 21:16:44 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us> Message-ID: <37B46F1C.1A513F33@lemburg.com> Fred L. Drake, Jr. wrote: > > M.-A. Lemburg writes: > > Aside: Is the buffer interface reachable in any way from within > > Python ? Why isn't the interface exposed via __XXX__ methods > > on normal Python instances (could be implemented by returning a > > buffer object) ? > > Would it even make sense? I though a large part of the intent was > to for performance, avoiding memory copies. Perhaps there should be > an .__as_buffer__() which returned an object that supports the C > buffer interface. I'm not sure how useful it would be; perhaps for > classes that represent image data? They could return a buffer object > created from a string/array/NumPy array. That's what I had in mind. def __getreadbuffer__(self): return buffer(self.data) def __getcharbuffer__(self): return buffer(self.string_data) def __getwritebuffer__(self): return buffer(self.mmaped_file) Note that buffer() does not copy the data, it only adds a reference to the object being used. Hmm, how about adding a writeable binary object to the core ? This would be useful for the __getwritebbuffer__() API because currently, I think, only mmap'ed files are useable as write buffers -- no other in-memory type. Perhaps buffer objects could be used for this purpose too, e.g. by having them allocate the needed memory chunk in case you pass None as object. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 140 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jack at oratrix.nl Fri Aug 13 23:48:12 1999 From: jack at oratrix.nl (Jack Jansen) Date: Fri, 13 Aug 1999 23:48:12 +0200 Subject: [Python-Dev] Quick-and-dirty weak references Message-ID: <19990813214817.5393C1C4742@oratrix.oratrix.nl> This week again I was bitten by the fact that Python doesn't have any form of weak references, and while I was toying with some ideas I came up with the following quick-and-dirty scheme that I thought I'd bounce off this list. I might even volunteer to implement it, if people agree it is worth it:-) We add a new builtin function (or a module with that function) weak(). This returns a weak reference to the object passed as a parameter. A weak object has one method: strong(), which returns the corresponding real object or raises an exception if the object doesn't exist anymore. For convenience we could add a method exists() that returns true if the real object still exists. Now comes the bit that I'm unsure about: to implement this I need to add a pointer to every object. This pointer is either NULL or points to the corresponding weak objectt (so for every object there is either no weak reference object or exactly one). But, for the price of 4 bytes extra in every object we get the nicety that there is little cpu-overhead: refcounting macros work identical to the way they do now, the only thing to take care of is that during object deallocation we have to zero the weak pointer. (actually: we could make do with a single bit in every object, with the bit meaning "this object has an associated weak object". We could then use a global dictionary indexed by object address to find the weak object) From mal at lemburg.com Sat Aug 14 01:15:39 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 14 Aug 1999 01:15:39 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: Message-ID: <37B4A71B.2073875F@lemburg.com> Greg Stein wrote: > > On Tue, 10 Aug 1999, Fredrik Lundh wrote: > > maybe the unicode class shouldn't implement the > > buffer interface at all? sure looks like the best way > > It is needed for fp.write(unicodeobj) ... > > It is also very handy for C functions to deal with Unicode strings. Wouldn't a special C API be (even) more convenient ? > > to avoid trivial mistakes (the current behaviour of > > fp.write(unicodeobj) is even more serious than the > > marshal glitch...) > > What's wrong with fp.write(unicodeobj)? It should write the unicode value > to the file. Are you suggesting that it will need to be done differently? > Icky. Would this also write some kind of Unicode encoding header ? [Sorry, this is my Unicode ignorance shining through... I only remember lots of talk about these things on the string-sig.] Since fp.write() uses "s#" this would use the getreadbuffer slot in 1.5.2... I think what it *should* do is use the getcharbuffer slot instead (see my other post), since dumping the raw unicode data would loose too much information. Again, such things should be handled by extra methods, e.g. fp.rawwrite(). Hmm, I guess the philosophy behind the interface is not really clear. Binary data is fetched via getreadbuffer and then interpreted as character data... I always thought that the getcharbuffer should be used for such an interpretation. Or maybe, we should dump the getcharbufer slot again and use the getreadbuffer information just as we would a void* pointer in C: with no explicit or implicit type information. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 140 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein at lyra.org Sat Aug 14 10:53:04 1999 From: gstein at lyra.org (Greg Stein) Date: Sat, 14 Aug 1999 01:53:04 -0700 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <37B4A71B.2073875F@lemburg.com> Message-ID: <37B52E70.2D957546@lyra.org> M.-A. Lemburg wrote: > > Greg Stein wrote: > > > > On Tue, 10 Aug 1999, Fredrik Lundh wrote: > > > maybe the unicode class shouldn't implement the > > > buffer interface at all? sure looks like the best way > > > > It is needed for fp.write(unicodeobj) ... > > > > It is also very handy for C functions to deal with Unicode strings. > > Wouldn't a special C API be (even) more convenient ? Why? Accessing the Unicode values as a series of bytes matches exactly to the semantics of the buffer interface. Why throw in Yet Another Function? Your abstract.c functions make it quite simple. > > > to avoid trivial mistakes (the current behaviour of > > > fp.write(unicodeobj) is even more serious than the > > > marshal glitch...) > > > > What's wrong with fp.write(unicodeobj)? It should write the unicode value > > to the file. Are you suggesting that it will need to be done differently? > > Icky. > > Would this also write some kind of Unicode encoding header ? > [Sorry, this is my Unicode ignorance shining through... I only > remember lots of talk about these things on the string-sig.] Absolutely not. Placing the Byte Order Mark (BOM) into an output stream is an application-level task. It should never by done by any subsystem. There are no other "encoding headers" that would go into the output stream. The output would simply be UTF-16 (2-byte values in host byte order). > Since fp.write() uses "s#" this would use the getreadbuffer > slot in 1.5.2... I think what it *should* do is use the > getcharbuffer slot instead (see my other post), since dumping > the raw unicode data would loose too much information. Again, I very much disagree. To me, fp.write() is not about writing characters to a stream. I think it makes much more sense as "writing bytes to a stream" and the buffer interface fits that perfectly. There is no loss of data. You could argue that the byte order is lost, but I think that is incorrect. The application defines the semantics: the file might be defined as using host-order, or the application may be writing a BOM at the head of the file. > such things should be handled by extra methods, e.g. fp.rawwrite(). I believe this would be a needless complication of the interface. > Hmm, I guess the philosophy behind the interface is not > really clear. I didn't design or implement it initially, but (as you may have guessed) I am a proponent of its existence. > Binary data is fetched via getreadbuffer and then > interpreted as character data... I always thought that the > getcharbuffer should be used for such an interpretation. The former is bad behavior. That is why getcharbuffer was added (by me, for 1.5.2). It was a preventative measure for the introduction of Unicode strings. Using getreadbuffer for characters would break badly given a Unicode string. Therefore, "clients" that want (8-bit) characters from an object supporting the buffer interface should use getcharbuffer. The Unicode object doesn't implement it, implying that it cannot provide 8-bit characters. You can get the raw bytes thru getreadbuffer. > Or maybe, we should dump the getcharbufer slot again and > use the getreadbuffer information just as we would a > void* pointer in C: with no explicit or implicit type information. Nope. That path is frought with failure :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Sat Aug 14 12:21:51 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 14 Aug 1999 12:21:51 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <19990813214817.5393C1C4742@oratrix.oratrix.nl> Message-ID: <37B5433F.61CE6F76@lemburg.com> Jack Jansen wrote: > > This week again I was bitten by the fact that Python doesn't have any > form of weak references, and while I was toying with some ideas I came > up with the following quick-and-dirty scheme that I thought I'd bounce > off this list. I might even volunteer to implement it, if people agree > it is worth it:-) Have you checked the weak reference dictionary implementation by Dieter Maurer ? It's at: http://www.handshake.de/~dieter/weakdict.html While I like the idea of having weak references in the core, I think 4 extra bytes for *every* object is just a little too much. The flag bit idea (with the added global dictionary of weak referenced objects) looks promising though. BTW, how would this be done in JPython ? I guess it doesn't make much sense there because cycles are no problem for the Java VM GC. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 139 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Sat Aug 14 14:30:45 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 14 Aug 1999 14:30:45 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> Message-ID: <37B56175.23ABB350@lemburg.com> Greg Stein wrote: > > M.-A. Lemburg wrote: > > > > Greg Stein wrote: > > > > > > On Tue, 10 Aug 1999, Fredrik Lundh wrote: > > > > maybe the unicode class shouldn't implement the > > > > buffer interface at all? sure looks like the best way > > > > > > It is needed for fp.write(unicodeobj) ... > > > > > > It is also very handy for C functions to deal with Unicode strings. > > > > Wouldn't a special C API be (even) more convenient ? > > Why? Accessing the Unicode values as a series of bytes matches exactly > to the semantics of the buffer interface. Why throw in Yet Another > Function? I meant PyUnicode_* style APIs for dealing with all the aspects of Unicode objects -- much like the PyString_* APIs available. > Your abstract.c functions make it quite simple. BTW, do we need an extra set of those with buffer index or not ? Those would really be one-liners for the sake of hiding the type slots from applications. > > > > to avoid trivial mistakes (the current behaviour of > > > > fp.write(unicodeobj) is even more serious than the > > > > marshal glitch...) > > > > > > What's wrong with fp.write(unicodeobj)? It should write the unicode value > > > to the file. Are you suggesting that it will need to be done differently? > > > Icky. > > > > Would this also write some kind of Unicode encoding header ? > > [Sorry, this is my Unicode ignorance shining through... I only > > remember lots of talk about these things on the string-sig.] > > Absolutely not. Placing the Byte Order Mark (BOM) into an output stream > is an application-level task. It should never by done by any subsystem. > > There are no other "encoding headers" that would go into the output > stream. The output would simply be UTF-16 (2-byte values in host byte > order). Ok. > > Since fp.write() uses "s#" this would use the getreadbuffer > > slot in 1.5.2... I think what it *should* do is use the > > getcharbuffer slot instead (see my other post), since dumping > > the raw unicode data would loose too much information. Again, > > I very much disagree. To me, fp.write() is not about writing characters > to a stream. I think it makes much more sense as "writing bytes to a > stream" and the buffer interface fits that perfectly. This is perfectly ok, but shouldn't the behaviour of fp.write() mimic that of previous Python versions ? How does JPython write the data ? Inlined different subject: I think the internal semantics of "s#" using the getreadbuffer slot and "t#" the getcharbuffer slot should be switched; see my other post. In previous Python versions "s#" had the semantics of string data with possibly embedded NULL bytes. Now it suddenly has the meaning of binary data and you can't simply change extensions to use the new "t#" because people are still using them with older Python versions. > There is no loss of data. You could argue that the byte order is lost, > but I think that is incorrect. The application defines the semantics: > the file might be defined as using host-order, or the application may be > writing a BOM at the head of the file. The problem here is that many application were not written to handle these kind of objects. Previously they could only handle strings, now they can suddenly handle any object having the buffer interface and then fail when the data gets read back in. > > such things should be handled by extra methods, e.g. fp.rawwrite(). > > I believe this would be a needless complication of the interface. It would clarify things and make the interface 100% backward compatible again. > > Hmm, I guess the philosophy behind the interface is not > > really clear. > > I didn't design or implement it initially, but (as you may have guessed) > I am a proponent of its existence. > > > Binary data is fetched via getreadbuffer and then > > interpreted as character data... I always thought that the > > getcharbuffer should be used for such an interpretation. > > The former is bad behavior. That is why getcharbuffer was added (by me, > for 1.5.2). It was a preventative measure for the introduction of > Unicode strings. Using getreadbuffer for characters would break badly > given a Unicode string. Therefore, "clients" that want (8-bit) > characters from an object supporting the buffer interface should use > getcharbuffer. The Unicode object doesn't implement it, implying that it > cannot provide 8-bit characters. You can get the raw bytes thru > getreadbuffer. I agree 100%, but did you add the "t#" instead of having "s#" use the getcharbuffer interface ? E.g. my mxTextTools package uses "s#" on many APIs. Now someone could stick in a Unicode object and get pretty strange results without any notice about mxTextTools and Unicode being incompatible. You could argue that I change to "t#", but that doesn't work since many people out there still use Python versions <1.5.2 and those didn't have "t#", so mxTextTools would then fail completely for them. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 139 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein at lyra.org Sat Aug 14 13:34:17 1999 From: gstein at lyra.org (Greg Stein) Date: Sat, 14 Aug 1999 04:34:17 -0700 Subject: [Python-Dev] buffer design (was: marshal (was:Buffer interface in abstract.c?)) References: <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> Message-ID: <37B55439.683272D2@lyra.org> M.-A. Lemburg wrote: >... > I meant PyUnicode_* style APIs for dealing with all the aspects > of Unicode objects -- much like the PyString_* APIs available. Sure, these could be added as necessary. For raw access to the bytes, I would refer people to the abstract buffer functions, tho. > > Your abstract.c functions make it quite simple. > > BTW, do we need an extra set of those with buffer index or not ? > Those would really be one-liners for the sake of hiding the > type slots from applications. It sounds like NumPy and PIL would need it, which makes the landscape quite a bit different from the last time we discussed this (when we didn't imagine anybody needing those). >... > > > Since fp.write() uses "s#" this would use the getreadbuffer > > > slot in 1.5.2... I think what it *should* do is use the > > > getcharbuffer slot instead (see my other post), since dumping > > > the raw unicode data would loose too much information. Again, > > > > I very much disagree. To me, fp.write() is not about writing characters > > to a stream. I think it makes much more sense as "writing bytes to a > > stream" and the buffer interface fits that perfectly. > > This is perfectly ok, but shouldn't the behaviour of fp.write() > mimic that of previous Python versions ? How does JPython > write the data ? fp.write() had no semantics for writing Unicode objects since they didn't exist. Therefore, we are not breaking or changing any behavior. > Inlined different subject: > I think the internal semantics of "s#" using the getreadbuffer slot > and "t#" the getcharbuffer slot should be switched; see my other post. 1) Too late 2) The use of "t#" ("text") for the getcharbuffer slot was decided by the Benevolent Dictator. 3) see (2) > In previous Python versions "s#" had the semantics of string data > with possibly embedded NULL bytes. Now it suddenly has the meaning > of binary data and you can't simply change extensions to use the > new "t#" because people are still using them with older Python > versions. Guido and I had a pretty long discussion on what the best approach here was. I think we even pulled in Tim as a final arbiter, as I recall. I believe "s#" remained getreadbuffer simply because it *also* meant "give me the bytes of that object". If it changed to getcharbuffer, then you could see exceptions in code that didn't raise exceptions beforehand. (more below) > > There is no loss of data. You could argue that the byte order is lost, > > but I think that is incorrect. The application defines the semantics: > > the file might be defined as using host-order, or the application may be > > writing a BOM at the head of the file. > > The problem here is that many application were not written > to handle these kind of objects. Previously they could only > handle strings, now they can suddenly handle any object > having the buffer interface and then fail when the data > gets read back in. An application is a complete unit. How are you suddenly going to manifest Unicode objects within that application? The only way is if the developer goes in and changes things; let them deal with the issues and fallout of their change. The other is external changes such as an upgrade to the interpreter or a module. Again, (IMO) if you're perturbing a system, then you are responsible for also correcting any problems you introduce. In any case, Guido's position was that things can easily switch over to the "t#" interface to prevent the class of error where you pass a Unicode string to a function that expects a standard string. > > > such things should be handled by extra methods, e.g. fp.rawwrite(). > > > > I believe this would be a needless complication of the interface. > > It would clarify things and make the interface 100% backward > compatible again. No. "s#" used to pull bytes from any buffer-capable object. Your suggestion for "s#" to use the getcharbuffer could introduce exceptions into currently-working code. (this was probably Guido's prime motivation for the currently meaning of "t#"... I can dig up the mail thread if people need an authoritative commentary on the decision that was made) > > > Hmm, I guess the philosophy behind the interface is not > > > really clear. > > > > I didn't design or implement it initially, but (as you may have guessed) > > I am a proponent of its existence. > > > > > Binary data is fetched via getreadbuffer and then > > > interpreted as character data... I always thought that the > > > getcharbuffer should be used for such an interpretation. > > > > The former is bad behavior. That is why getcharbuffer was added (by me, > > for 1.5.2). It was a preventative measure for the introduction of > > Unicode strings. Using getreadbuffer for characters would break badly > > given a Unicode string. Therefore, "clients" that want (8-bit) > > characters from an object supporting the buffer interface should use > > getcharbuffer. The Unicode object doesn't implement it, implying that it > > cannot provide 8-bit characters. You can get the raw bytes thru > > getreadbuffer. > > I agree 100%, but did you add the "t#" instead of having > "s#" use the getcharbuffer interface ? Yes. For reasons detailed above. > E.g. my mxTextTools > package uses "s#" on many APIs. Now someone could stick > in a Unicode object and get pretty strange results without > any notice about mxTextTools and Unicode being incompatible. They could also stick in an array of integers. That supports the buffer interface, meaning the "s#" in your code would extract the bytes from it. In other words, people can already stick bogus stuff into your code. This seems to be a moot argument. > You could argue that I change to "t#", but that doesn't > work since many people out there still use Python versions > <1.5.2 and those didn't have "t#", so mxTextTools would then > fail completely for them. If support for the older versions is needed, then use an #ifdef to set up the appropriate macro in some header. Use that throughout your code. In any case: yes -- I would argue that you should absolutely be using "t#". Cheers, -g -- Greg Stein, http://www.lyra.org/ From fredrik at pythonware.com Sat Aug 14 15:19:07 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat, 14 Aug 1999 15:19:07 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> Message-ID: <003101bee657$972d1550$f29b12c2@secret.pythonware.com> M.-A. Lemburg wrote: > I meant PyUnicode_* style APIs for dealing with all the aspects > of Unicode objects -- much like the PyString_* APIs available. it's already there, of course. see unicode.h in the unicode distribution (Mark is hopefully adding this to 1.6 in this very moment...) > > I very much disagree. To me, fp.write() is not about writing characters > > to a stream. I think it makes much more sense as "writing bytes to a > > stream" and the buffer interface fits that perfectly. > > This is perfectly ok, but shouldn't the behaviour of fp.write() > mimic that of previous Python versions ? How does JPython > write the data ? the crucial point is how an average user expects things to work. the current design is quite assymmetric -- you can easily *write* things that implement the buffer inter- face to a stream, but how the heck do you get them back? (as illustrated by the marshal buglet...) From fredrik at pythonware.com Sat Aug 14 17:21:48 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat, 14 Aug 1999 17:21:48 +0200 Subject: [Python-Dev] buffer design (was: marshal (was:Buffer interface in abstract.c?)) References: <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> <37B55439.683272D2@lyra.org> Message-ID: <004201bee668$ba6e9870$f29b12c2@secret.pythonware.com> Greg Stein wrote: > > E.g. my mxTextTools > > package uses "s#" on many APIs. Now someone could stick > > in a Unicode object and get pretty strange results without > > any notice about mxTextTools and Unicode being incompatible. > > They could also stick in an array of integers. That supports the buffer > interface, meaning the "s#" in your code would extract the bytes from > it. In other words, people can already stick bogus stuff into your code. Except that people may expect unicode strings to work just like any other kind of string, while arrays are surely a different thing. I'm beginning to suspect that the current buffer design is partially broken; it tries to work around at least two problems at once: a) the current use of "string" objects for two purposes: as strings of 8-bit characters, and as buffers containing arbitrary binary data. b) performance issues when reading/writing certain kinds of data to/from streams. and fails to fully address either of them. From mal at lemburg.com Sat Aug 14 18:30:21 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 14 Aug 1999 18:30:21 +0200 Subject: [Python-Dev] Re: buffer design References: <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> <37B55439.683272D2@lyra.org> Message-ID: <37B5999D.201EA88C@lemburg.com> Greg Stein wrote: > > M.-A. Lemburg wrote: > >... > > I meant PyUnicode_* style APIs for dealing with all the aspects > > of Unicode objects -- much like the PyString_* APIs available. > > Sure, these could be added as necessary. For raw access to the bytes, I > would refer people to the abstract buffer functions, tho. I guess that's up to them... PyUnicode_AS_WCHAR() could also be exposed I guess (are C's wchar strings useable as Unicode basis ?). > > > Your abstract.c functions make it quite simple. > > > > BTW, do we need an extra set of those with buffer index or not ? > > Those would really be one-liners for the sake of hiding the > > type slots from applications. > > It sounds like NumPy and PIL would need it, which makes the landscape > quite a bit different from the last time we discussed this (when we > didn't imagine anybody needing those). Ok, then I'll add them and post the new set next week. > >... > > > > Since fp.write() uses "s#" this would use the getreadbuffer > > > > slot in 1.5.2... I think what it *should* do is use the > > > > getcharbuffer slot instead (see my other post), since dumping > > > > the raw unicode data would loose too much information. Again, > > > > > > I very much disagree. To me, fp.write() is not about writing characters > > > to a stream. I think it makes much more sense as "writing bytes to a > > > stream" and the buffer interface fits that perfectly. > > > > This is perfectly ok, but shouldn't the behaviour of fp.write() > > mimic that of previous Python versions ? How does JPython > > write the data ? > > fp.write() had no semantics for writing Unicode objects since they > didn't exist. Therefore, we are not breaking or changing any behavior. The problem is hidden in polymorph functions and tools: previously they could not handle anything but strings, now they also work on arbitrary buffers without raising exceptions. That's what I'm concerned about. > > Inlined different subject: > > I think the internal semantics of "s#" using the getreadbuffer slot > > and "t#" the getcharbuffer slot should be switched; see my other post. > > 1) Too late > 2) The use of "t#" ("text") for the getcharbuffer slot was decided by > the Benevolent Dictator. > 3) see (2) 1) It's not too late: most people aren't even aware of the buffer interface (except maybe the small crowd on this list). 2) A mistake in patchlevel release of Python can easily be undone in the next minor release. No big deal. 3) Too remain even compatible to 1.5.2 in future revisions, a new explicit marker, e.g. "r#" for raw data, could be added to hold the code for getreadbuffer. "s#" and "z#" should then switch to using getcharbuffer. > > In previous Python versions "s#" had the semantics of string data > > with possibly embedded NULL bytes. Now it suddenly has the meaning > > of binary data and you can't simply change extensions to use the > > new "t#" because people are still using them with older Python > > versions. > > Guido and I had a pretty long discussion on what the best approach here > was. I think we even pulled in Tim as a final arbiter, as I recall. What was the final argument then ? (I guess the discussion was held *before* the addition of getcharbuffer, right ?) > I believe "s#" remained getreadbuffer simply because it *also* meant > "give me the bytes of that object". If it changed to getcharbuffer, then > you could see exceptions in code that didn't raise exceptions > beforehand. > > (more below) "s#" historically always meant "give be char* data with length". It did not mean: "give me a pointer to the data area and its length". That interpretation is new in 1.5.2. Even integers and lists could provide buffer access with the new interpretation... (sound evil ;-) > > > There is no loss of data. You could argue that the byte order is lost, > > > but I think that is incorrect. The application defines the semantics: > > > the file might be defined as using host-order, or the application may be > > > writing a BOM at the head of the file. > > > > The problem here is that many application were not written > > to handle these kind of objects. Previously they could only > > handle strings, now they can suddenly handle any object > > having the buffer interface and then fail when the data > > gets read back in. > > An application is a complete unit. How are you suddenly going to > manifest Unicode objects within that application? The only way is if the > developer goes in and changes things; let them deal with the issues and > fallout of their change. The other is external changes such as an > upgrade to the interpreter or a module. Again, (IMO) if you're > perturbing a system, then you are responsible for also correcting any > problems you introduce. Well, ok, if you're talking about standalone apps. I was referring to applications which interact with other applications, e.g. via files or sockets. You could pass a Unicode obj to a socket and have it transfer the data to the other end without getting an exception on the sending part of the connection. The receiver would read the data as string and most probably fail. The whole application sitting in between and dealing with the protocol and connection management wouldn't even notice that you've just tried to extended its capabilities. > In any case, Guido's position was that things can easily switch over to > the "t#" interface to prevent the class of error where you pass a > Unicode string to a function that expects a standard string. Strange, why should code that relies on 8-bit character data be changed because a new unsupported object type pops up ? Code supporting the new type will have to be rewritten anyway, but why break existing extensions in unpredicted ways ? > > > > such things should be handled by extra methods, e.g. fp.rawwrite(). > > > > > > I believe this would be a needless complication of the interface. > > > > It would clarify things and make the interface 100% backward > > compatible again. > > No. "s#" used to pull bytes from any buffer-capable object. Your > suggestion for "s#" to use the getcharbuffer could introduce exceptions > into currently-working code. The buffer objects were introduced in 1.5.1, AFAIR. Changing the semantics back to the original ones would only break extensions relying on the behaviour you desribe -- the distribution can easily be adapted to use some other marker, such as "r#". > (this was probably Guido's prime motivation for the currently meaning of > "t#"... I can dig up the mail thread if people need an authoritative > commentary on the decision that was made) > > > > > Hmm, I guess the philosophy behind the interface is not > > > > really clear. > > > > > > I didn't design or implement it initially, but (as you may have guessed) > > > I am a proponent of its existence. > > > > > > > Binary data is fetched via getreadbuffer and then > > > > interpreted as character data... I always thought that the > > > > getcharbuffer should be used for such an interpretation. > > > > > > The former is bad behavior. That is why getcharbuffer was added (by me, > > > for 1.5.2). It was a preventative measure for the introduction of > > > Unicode strings. Using getreadbuffer for characters would break badly > > > given a Unicode string. Therefore, "clients" that want (8-bit) > > > characters from an object supporting the buffer interface should use > > > getcharbuffer. The Unicode object doesn't implement it, implying that it > > > cannot provide 8-bit characters. You can get the raw bytes thru > > > getreadbuffer. > > > > I agree 100%, but did you add the "t#" instead of having > > "s#" use the getcharbuffer interface ? > > Yes. For reasons detailed above. > > > E.g. my mxTextTools > > package uses "s#" on many APIs. Now someone could stick > > in a Unicode object and get pretty strange results without > > any notice about mxTextTools and Unicode being incompatible. > > They could also stick in an array of integers. That supports the buffer > interface, meaning the "s#" in your code would extract the bytes from > it. In other words, people can already stick bogus stuff into your code. Right now they can with 1.5.1 and 1.5.2 which is unfortunate. I'd rather have the parsing function raise an exception. > This seems to be a moot argument. Not really when you have to support extensions across three different patchlevels of Python. > > You could argue that I change to "t#", but that doesn't > > work since many people out there still use Python versions > > <1.5.2 and those didn't have "t#", so mxTextTools would then > > fail completely for them. > > If support for the older versions is needed, then use an #ifdef to set > up the appropriate macro in some header. Use that throughout your code. > > In any case: yes -- I would argue that you should absolutely be using > "t#". I can easily change my code, no big deal, but what about the dozens of other extensions I don't want to bother diving into ? I'd rather see an exception then complete garbage written to a file or a socket. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 139 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Sat Aug 14 18:53:45 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 14 Aug 1999 18:53:45 +0200 Subject: [Python-Dev] buffer design References: <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> <37B55439.683272D2@lyra.org> <004201bee668$ba6e9870$f29b12c2@secret.pythonware.com> Message-ID: <37B59F19.45C1D23B@lemburg.com> Fredrik Lundh wrote: > > Greg Stein wrote: > > > E.g. my mxTextTools > > > package uses "s#" on many APIs. Now someone could stick > > > in a Unicode object and get pretty strange results without > > > any notice about mxTextTools and Unicode being incompatible. > > > > They could also stick in an array of integers. That supports the buffer > > interface, meaning the "s#" in your code would extract the bytes from > > it. In other words, people can already stick bogus stuff into your code. > > Except that people may expect unicode strings > to work just like any other kind of string, while > arrays are surely a different thing. > > I'm beginning to suspect that the current buffer > design is partially broken; it tries to work around > at least two problems at once: > > a) the current use of "string" objects for two purposes: > as strings of 8-bit characters, and as buffers containing > arbitrary binary data. > > b) performance issues when reading/writing certain kinds > of data to/from streams. > > and fails to fully address either of them. True, a higher level interface for those two objectives would certainly address them much better than what we are trying to do at bit level. Buffers should probably only be treated as pointers to abstract memory areas and nothing more. BTW, what about my suggestion to extend buffers to also allocate memory (in case you pass None as object) ? Or should array be used for that purpose (its an undocumented feature of arrays) ? -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 139 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein at lyra.org Sun Aug 15 04:59:25 1999 From: gstein at lyra.org (Greg Stein) Date: Sat, 14 Aug 1999 19:59:25 -0700 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> Message-ID: <37B62D0D.6EC24240@lyra.org> Fredrik Lundh wrote: >... > besides, what about buffers and threads? if you > return a pointer from getreadbuf, wouldn't it be > good to know exactly when Python doesn't need > that pointer any more? explicit initbuffer/exitbuffer > calls around each sequence of buffer operations > would make that a lot safer... This is a pretty obvious one, I think: it lasts only as long as the object. PyString_AS_STRING is similar. Nothing new or funny here. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sun Aug 15 05:09:19 1999 From: gstein at lyra.org (Greg Stein) Date: Sat, 14 Aug 1999 20:09:19 -0700 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us> <37B46F1C.1A513F33@lemburg.com> Message-ID: <37B62F5E.30C62070@lyra.org> M.-A. Lemburg wrote: > > Fred L. Drake, Jr. wrote: > > > > M.-A. Lemburg writes: > > > Aside: Is the buffer interface reachable in any way from within > > > Python ? Why isn't the interface exposed via __XXX__ methods > > > on normal Python instances (could be implemented by returning a > > > buffer object) ? > > > > Would it even make sense? I though a large part of the intent was > > to for performance, avoiding memory copies. Perhaps there should be > > an .__as_buffer__() which returned an object that supports the C > > buffer interface. I'm not sure how useful it would be; perhaps for > > classes that represent image data? They could return a buffer object > > created from a string/array/NumPy array. There is no way to do this. The buffer interface only returns pointers to memory. There would be no place to return an intermediary object, nor a way to retain the reference for it. For example, your class instance quickly sets up a PyBufferObject with the relevant data and returns that. The underlying C code must now hold that reference *and* return a pointer to the calling code. Impossible. Fredrik's open/close concept for buffer accesses would make this possible, as long as clients are aware that any returned pointer is valid only until the buffer_close call. The context argument he proposes would hold the object reference. Having class instances respond to the buffer interface is interesting, but until more code attempts to *use* the interface, I'm not quite sure of the utility... >... > Hmm, how about adding a writeable binary object to the core ? > This would be useful for the __getwritebbuffer__() API because > currently, I think, only mmap'ed files are useable as write > buffers -- no other in-memory type. Perhaps buffer objects > could be used for this purpose too, e.g. by having them > allocate the needed memory chunk in case you pass None as > object. Yes, this would be very good. I would recommend that you pass an integer, however, rather than None. You need to tell it the size of the buffer to allocate. Since buffer(5) has no meaning at the moment, altering the semantics to include this form would not be a problem. Cheers, -g -- Greg Stein, http://www.lyra.org/ From da at ski.org Sun Aug 15 08:10:59 1999 From: da at ski.org (David Ascher) Date: Sat, 14 Aug 1999 23:10:59 -0700 (Pacific Daylight Time) Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: <37B62F5E.30C62070@lyra.org> Message-ID: On Sat, 14 Aug 1999, Greg Stein wrote: > Having class instances respond to the buffer interface is interesting, > but until more code attempts to *use* the interface, I'm not quite sure > of the utility... Well, here's an example from my work today. Maybe someone can suggest an alternative that I haven't seen. I'm using buffer objects to pass pointers to structs back and forth between Python and Windows (Win32's GUI scheme involves sending messages to functions with, oftentimes, addresses of structs as arguments, and expect the called function to modify the struct directly -- similarly, I must call Win32 functions w/ pointers to memory that Windows will modify, and be able to read the modified memory). With 'raw' buffer object manipulation (after exposing the PyBuffer_FromMemoryReadWrite call to Python), this works fine [*]. So far, no instances. I also have a class which allows the user to describe the buffer memory layout in a natural way given the C struct, and manipulate the buffer layout w/ getattr/setattr. For example: class Win32MenuItemStruct(AutoStruct): # # for each slot, specify type (maps to a struct.pack specifier), # name (for setattr/getattr behavior) and optional defaults. # table = [(UINT, 'cbSize', AutoStruct.sizeOfStruct), (UINT, 'fMask', MIIM_STRING | MIIM_TYPE | MIIM_ID), (UINT, 'fType', MFT_STRING), (UINT, 'fState', MFS_ENABLED), (UINT, 'wID', None), (HANDLE, 'hSubMenu', 0), (HANDLE, 'hbmpChecked', 0), (HANDLE, 'hbmpUnchecked', 0), (DWORD, 'dwItemData', 0), (LPSTR, 'name', None), (UINT, 'cch', 0)] AutoStruct has machinery which allows setting of buffer slices by slot name, conversion of numeric types, etc. This is working well. The only hitch is that to send the buffer to the SWIG'ed function call, I have three options, none ideal: 1) define a __str__ method which makes a string of the buffer and pass that to the function which expects an "s#" argument. This send a copy of the data, not the address. As a result, this works well for structs which I create from scratch as long as I don't need to see any changes that Windows might have performed on the memory. 2) send the instance but make up my own 'get-the-instance-as-buffer' API -- complicates extension module code. 3) send the buffer attribute of the instance instead of the instance -- complicates Python code, and the C code isn't trivial because there is no 'buffer' typecode for PyArg_ParseTuple(). If I could define an def __aswritebuffer__ and if there was a PyArg_ParseTuple() typecode associated with read/write buffers (I nominate 'w'!), I believe things would be simpler -- I could then send the instance, specify in the PyArgParse_Tuple that I want a pointer to memory, and I'd be golden. What did I miss? --david [*] I feel naughty modifying random bits of memory from Python, but Bill Gates made me do it! From mal at lemburg.com Sun Aug 15 10:47:00 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 15 Aug 1999 10:47:00 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: <37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us> <37B46F1C.1A513F33@lemburg.com> <37B62F5E.30C62070@lyra.org> Message-ID: <37B67E84.6BBC8136@lemburg.com> Greg Stein wrote: > > [me suggesting new __XXX__ methods on Python instances to provide > the buffer slots to Python programmers] > > Having class instances respond to the buffer interface is interesting, > but until more code attempts to *use* the interface, I'm not quite sure > of the utility... Well, there already is lots of code supporting the interface, e.g. fp.write(), socket.write() etc. Basically all streaming interfaces I guess. So these APIs could be used to "write" the object directly into a file. > >... > > Hmm, how about adding a writeable binary object to the core ? > > This would be useful for the __getwritebbuffer__() API because > > currently, I think, only mmap'ed files are useable as write > > buffers -- no other in-memory type. Perhaps buffer objects > > could be used for this purpose too, e.g. by having them > > allocate the needed memory chunk in case you pass None as > > object. > > Yes, this would be very good. I would recommend that you pass an > integer, however, rather than None. You need to tell it the size of the > buffer to allocate. Since buffer(5) has no meaning at the moment, > altering the semantics to include this form would not be a problem. I was thinking of using the existing buffer(object,offset,size) constructor... that's why I took None as object. offset would then always be 0 and size gives the size of the memory chunk to allocate. Of course, buffer(size) would look nicer, but it seems a rather peculiar interface definition to say: ok, if you pass a real Python integer, we'll take that as size. Who knows, maybe at some in the future, you want to "write" integers via the buffer interface too... then you'd probably also want to write None... so how about a new builtin writebuffer(size) ? Also, I think it would make sense to extend buffers to have methods and attributes: .writeable - attribute that tells whether the buffer is writeable .chardata - true iff the getcharbuffer slot is available .asstring() - return the buffer as Python string object -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 138 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Sun Aug 15 10:59:21 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 15 Aug 1999 10:59:21 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) References: Message-ID: <37B68169.73E03C84@lemburg.com> David Ascher wrote: > > On Sat, 14 Aug 1999, Greg Stein wrote: > > > Having class instances respond to the buffer interface is interesting, > > but until more code attempts to *use* the interface, I'm not quite sure > > of the utility... > > Well, here's an example from my work today. Maybe someone can suggest an > alternative that I haven't seen. > > I'm using buffer objects to pass pointers to structs back and forth > between Python and Windows (Win32's GUI scheme involves sending messages > to functions with, oftentimes, addresses of structs as arguments, and > expect the called function to modify the struct directly -- similarly, I > must call Win32 functions w/ pointers to memory that Windows will modify, > and be able to read the modified memory). With 'raw' buffer object > manipulation (after exposing the PyBuffer_FromMemoryReadWrite call to > Python), this works fine [*]. So far, no instances. So that's why you were suggesting that struct.pack returns a buffer rather than a string ;-) Actually, I think you could use arrays to do the trick right now, because they are writeable (unlike strings). Until creating writeable buffer objects becomes possible that is... > I also have a class which allows the user to describe the buffer memory > layout in a natural way given the C struct, and manipulate the buffer > layout w/ getattr/setattr. For example: > > class Win32MenuItemStruct(AutoStruct): > # > # for each slot, specify type (maps to a struct.pack specifier), > # name (for setattr/getattr behavior) and optional defaults. > # > table = [(UINT, 'cbSize', AutoStruct.sizeOfStruct), > (UINT, 'fMask', MIIM_STRING | MIIM_TYPE | MIIM_ID), > (UINT, 'fType', MFT_STRING), > (UINT, 'fState', MFS_ENABLED), > (UINT, 'wID', None), > (HANDLE, 'hSubMenu', 0), > (HANDLE, 'hbmpChecked', 0), > (HANDLE, 'hbmpUnchecked', 0), > (DWORD, 'dwItemData', 0), > (LPSTR, 'name', None), > (UINT, 'cch', 0)] > > AutoStruct has machinery which allows setting of buffer slices by slot > name, conversion of numeric types, etc. This is working well. > > The only hitch is that to send the buffer to the SWIG'ed function call, I > have three options, none ideal: > > 1) define a __str__ method which makes a string of the buffer and pass > that to the function which expects an "s#" argument. This send > a copy of the data, not the address. As a result, this works > well for structs which I create from scratch as long as I don't need > to see any changes that Windows might have performed on the memory. > > 2) send the instance but make up my own 'get-the-instance-as-buffer' > API -- complicates extension module code. > > 3) send the buffer attribute of the instance instead of the instance -- > complicates Python code, and the C code isn't trivial because there > is no 'buffer' typecode for PyArg_ParseTuple(). > > If I could define an > > def __aswritebuffer__ > > and if there was a PyArg_ParseTuple() typecode associated with read/write > buffers (I nominate 'w'!), I believe things would be simpler -- I could > then send the instance, specify in the PyArgParse_Tuple that I want a > pointer to memory, and I'd be golden. > > What did I miss? Just a naming thingie: __getwritebuffer__ et al. would map to the C interfaces more directly. The new typecode "w#" for writeable buffer style objects is a good idea (it should only work on single segment buffers). -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 138 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik at pythonware.com Sun Aug 15 12:32:59 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sun, 15 Aug 1999 12:32:59 +0200 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> Message-ID: <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> > Fredrik Lundh wrote: > >... > > besides, what about buffers and threads? if you > > return a pointer from getreadbuf, wouldn't it be > > good to know exactly when Python doesn't need > > that pointer any more? explicit initbuffer/exitbuffer > > calls around each sequence of buffer operations > > would make that a lot safer... > > This is a pretty obvious one, I think: it lasts only as long as the > object. PyString_AS_STRING is similar. Nothing new or funny here. well, I think the buffer behaviour is both new and pretty funny: from array import array a = array("f", [0]*8192) b = buffer(a) for i in range(1000): a.append(1234) print b in other words, the buffer interface should be redesigned, or removed. (though I'm sure AOL would find some inter- resting use for this ;-) "Confusing? Yes, but this is a lot better than allowing arbitrary pointers!" -- GvR on assignment operators, November 91 From da at ski.org Sun Aug 15 18:54:23 1999 From: da at ski.org (David Ascher) Date: Sun, 15 Aug 1999 09:54:23 -0700 (Pacific Daylight Time) Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: <37B68169.73E03C84@lemburg.com> Message-ID: On Sun, 15 Aug 1999, M.-A. Lemburg wrote: > Actually, I think you could use arrays to do the trick right now, > because they are writeable (unlike strings). Until creating > writeable buffer objects becomes possible that is... No, because I can't make an array around existing memory which Win32 allocates before I get to it. > Just a naming thingie: __getwritebuffer__ et al. would map to the > C interfaces more directly. Whatever. > The new typecode "w#" for writeable buffer style objects is a good idea > (it should only work on single segment buffers). Indeed. --david From gstein at lyra.org Sun Aug 15 22:27:57 1999 From: gstein at lyra.org (Greg Stein) Date: Sun, 15 Aug 1999 13:27:57 -0700 Subject: [Python-Dev] w# typecode (was: marshal (was:Buffer interface in abstract.c? )) References: Message-ID: <37B722CD.383A2A9E@lyra.org> David Ascher wrote: > On Sun, 15 Aug 1999, M.-A. Lemburg wrote: > ... > > The new typecode "w#" for writeable buffer style objects is a good idea > > (it should only work on single segment buffers). > > Indeed. I just borrowed Guido's time machine. That typecode is already in 1.5.2. :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sun Aug 15 22:35:25 1999 From: gstein at lyra.org (Greg Stein) Date: Sun, 15 Aug 1999 13:35:25 -0700 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> Message-ID: <37B7248D.31E5D2BF@lyra.org> Fredrik Lundh wrote: >... > well, I think the buffer behaviour is both > new and pretty funny: I think the buffer interface was introduced in 1.5 (by Jack?). I added the 8-bit character buffer slot and buffer objects in 1.5.2. > from array import array > > a = array("f", [0]*8192) > > b = buffer(a) > > for i in range(1000): > a.append(1234) > > print b > > in other words, the buffer interface should > be redesigned, or removed. I don't understand what you believe is weird here. Also, are you saying the buffer *interface* is weird, or the buffer *object* ? thx, -g -- Greg Stein, http://www.lyra.org/ From da at ski.org Sun Aug 15 22:49:23 1999 From: da at ski.org (David Ascher) Date: Sun, 15 Aug 1999 13:49:23 -0700 (Pacific Daylight Time) Subject: [Python-Dev] w# typecode (was: marshal (was:Buffer interface in abstract.c? )) In-Reply-To: <37B722CD.383A2A9E@lyra.org> Message-ID: On Sun, 15 Aug 1999, Greg Stein wrote: > David Ascher wrote: > > On Sun, 15 Aug 1999, M.-A. Lemburg wrote: > > ... > > > The new typecode "w#" for writeable buffer style objects is a good idea > > > (it should only work on single segment buffers). > > > > Indeed. > > I just borrowed Guido's time machine. That typecode is already in 1.5.2. Ha. Cool. --da From gstein at lyra.org Sun Aug 15 22:53:51 1999 From: gstein at lyra.org (Greg Stein) Date: Sun, 15 Aug 1999 13:53:51 -0700 Subject: [Python-Dev] instances as buffers References: Message-ID: <37B728DF.2CA2A20A@lyra.org> David Ascher wrote: >... > I'm using buffer objects to pass pointers to structs back and forth > between Python and Windows (Win32's GUI scheme involves sending messages > to functions with, oftentimes, addresses of structs as arguments, and > expect the called function to modify the struct directly -- similarly, I > must call Win32 functions w/ pointers to memory that Windows will modify, > and be able to read the modified memory). With 'raw' buffer object > manipulation (after exposing the PyBuffer_FromMemoryReadWrite call to > Python), this works fine [*]. So far, no instances. How do you manage the lifetimes of the memory and objects? PyBuffer_FromReadWriteMemory() creates a buffer object that points to memory. You need to ensure that the memory exists as long as the buffer does. Would it make more sense to use PyBuffer_New(size)? Note: PyBuffer_FromMemory() (read-only) was built primarily for the case where you have static constants in an extension module (strings, code objects, etc) and want to expose them to Python without copying them into the heap. Currently, stuff like this must be copied into a dynamic string object to be exposed to Python. The PyBuffer_FromReadWriteMemory() is there for symmetry, but it can be very dangerous to use because of the lifetime problem. PyBuffer_New() allocates its own memory, so the lifetimes are managed properly. PyBuffer_From*Object maintains a reference to the target object so that the target object can be kept around at least as long as the buffer. > I also have a class which allows the user to describe the buffer memory > layout in a natural way given the C struct, and manipulate the buffer > layout w/ getattr/setattr. For example: This is a very cool class. Mark and I had discussed doing something just like this (a while back) for some of the COM stuff. Basically, we'd want to generate these structures from type libraries. >... > The only hitch is that to send the buffer to the SWIG'ed function call, I > have three options, none ideal: > > 1) define a __str__ method which makes a string of the buffer and pass > that to the function which expects an "s#" argument. This send > a copy of the data, not the address. As a result, this works > well for structs which I create from scratch as long as I don't need > to see any changes that Windows might have performed on the memory. Note that "s#" can be used directly against the buffer object. You could pass it directly rather than via __str__. > 2) send the instance but make up my own 'get-the-instance-as-buffer' > API -- complicates extension module code. > > 3) send the buffer attribute of the instance instead of the instance -- > complicates Python code, and the C code isn't trivial because there > is no 'buffer' typecode for PyArg_ParseTuple(). > > If I could define an > > def __aswritebuffer__ > > and if there was a PyArg_ParseTuple() typecode associated with read/write > buffers (I nominate 'w'!), I believe things would be simpler -- I could > then send the instance, specify in the PyArgParse_Tuple that I want a > pointer to memory, and I'd be golden. > > What did I miss? You can do #3 today since there is a buffer typecode present ("w" or "w#"). It will complicate Python code a bit since you need to pass the buffer, but it is the simplest of the three options. Allowing instances to return buffers does seem to make sense, although it exposes a lot of underlying machinery at the Python level. It might be nicer to find a better semantic for this than just exposing the buffer interface slots. Cheers, -g -- Greg Stein, http://www.lyra.org/ From da at ski.org Sun Aug 15 23:07:35 1999 From: da at ski.org (David Ascher) Date: Sun, 15 Aug 1999 14:07:35 -0700 (Pacific Daylight Time) Subject: [Python-Dev] Re: instances as buffers In-Reply-To: <37B728DF.2CA2A20A@lyra.org> Message-ID: On Sun, 15 Aug 1999, Greg Stein wrote: > How do you manage the lifetimes of the memory and objects? > PyBuffer_FromReadWriteMemory() creates a buffer object that points to > memory. You need to ensure that the memory exists as long as the buffer > does. For those cases where I use PyBuffer_FromReadWriteMemory, I have no control over the memory involved. Windows allocates the memory, lets me use it for a litle while, and it cleans it up whenever it feels like it. It hasn't been a problem yet, but I agree that it's possibly a problem. I'd call it a problem w/ the win32 API, though. > Would it make more sense to use PyBuffer_New(size)? Again, I can't because I am given a pointer and am expected to modify e.g. bytes 10-12 starting from that memory location. > This is a very cool class. Mark and I had discussed doing something just > like this (a while back) for some of the COM stuff. Basically, we'd want > to generate these structures from type libraries. I know zilch about type libraries. This is for CE work, although none about this class is CE-specific. Do type libraries give the same kind of info? > You can do #3 today since there is a buffer typecode present ("w" or > "w#"). It will complicate Python code a bit since you need to pass the > buffer, but it is the simplest of the three options. Ok. Time to patch SWIG again! --david From Vladimir.Marangozov at inrialpes.fr Mon Aug 16 01:35:10 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Mon, 16 Aug 1999 00:35:10 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <000101bee51f$d7601de0$fb2d2399@tim> from "Tim Peters" at "Aug 12, 99 08:07:32 pm" Message-ID: <199908152335.AAA55842@pukapuka.inrialpes.fr> Tim Peters wrote: > > Would be more valuable to rethink the debugger's breakpoint approach so that > SET_LINENO is never needed (line-triggered callbacks are expensive because > called so frequently, turning each dynamic SET_LINENO into a full-blown > Python call; if I used the debugger often enough to care , I'd think > about munging in a new opcode to make breakpoint sites explicit). > > immutability-is-made-to-be-violated-ly y'rs - tim > Could you elaborate a bit more on this? Do you mean setting breakpoints on a per opcode basis (for example by exchanging the original opcode with a new BREAKPOINT opcode in the code object) and use the lineno tab for breakpoints based on the source listing? -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tim_one at email.msn.com Mon Aug 16 04:31:16 1999 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 15 Aug 1999 22:31:16 -0400 Subject: [Python-Dev] about line numbers In-Reply-To: <199908152335.AAA55842@pukapuka.inrialpes.fr> Message-ID: <000101bee78f$6aa217e0$f22d2399@tim> [Vladimir Marangozov] > Could you elaborate a bit more on this? No time for this now -- sorry. > Do you mean setting breakpoints on a per opcode basis (for example > by exchanging the original opcode with a new BREAKPOINT opcode in > the code object) and use the lineno tab for breakpoints based on > the source listing? Something like that. The classic way to implement positional breakpoints is to perturb the code; the classic problem is how to get back the effect of the code that was overwritten. From gstein at lyra.org Mon Aug 16 06:42:19 1999 From: gstein at lyra.org (Greg Stein) Date: Sun, 15 Aug 1999 21:42:19 -0700 Subject: [Python-Dev] Re: why References: Message-ID: <37B796AB.34F6F93@lyra.org> David Ascher wrote: > > Why does buffer(array('c', 'test')) return a read-only buffer? Simply because the buffer() builtin always creates a read-only object, rather than selecting read/write when possible. Shouldn't be hard to alter the semantics of buffer() to do so. Maybe do this at the same time as updating it to create read/write buffers out of the blue. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim_one at email.msn.com Mon Aug 16 08:42:17 1999 From: tim_one at email.msn.com (Tim Peters) Date: Mon, 16 Aug 1999 02:42:17 -0400 Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: <19990813214817.5393C1C4742@oratrix.oratrix.nl> Message-ID: <000b01bee7b2$7c62d780$f22d2399@tim> [Jack Jansen] > ... A long time ago, Dianne Hackborn actually implemented a scheme like this, under the name VREF (for "virtual reference", or some such). IIRC, differences from your scheme were mainly that: 1) There was an elaborate proxy mechanism to avoid having to explicitly strengthen the weak. 2) Each object contained a pointer to a linked list of associated weak refs. This predates DejaNews, so may be a pain to find. > ... > We add a new builtin function (or a module with that function) > weak(). This returns a weak reference to the object passed as a > parameter. A weak object has one method: strong(), which returns the > corresponding real object or raises an exception if the object doesn't > exist anymore. This interface appears nearly isomorphic to MIT Scheme's "hash" and "unhash" functions, except that their hash returns an (unbounded) int and guarantees that hash(o1) != hash(o2) for any distinct objects o1 and o2 (this is a stronger guarantee than Python's "id", which may return the same int for objects with disjoint lifetimes; the other reason object address isn't appropriate for them is that objects can be moved by garbage collection, but hash is an object invariant). Of course unhash(hash(o)) is o, unless o has been gc'ed; then unhash raises an exception. By most accounts (I haven't used it seriously myself), it's a usable interface. > ... > to implement this I need to add a pointer to every object. That's unattractive, of course. > ... > (actually: we could make do with a single bit in every object, with > the bit meaning "this object has an associated weak object". We could > then use a global dictionary indexed by object address to find the > weak object) Is a single bit actually smaller than a pointer? For example, on most machines these days #define PyObject_HEAD \ int ob_refcnt; \ struct _typeobject *ob_type; is two 4-byte fields packed solid already, and structure padding prevents adding anything less than a 4-byte increment in reality. I guess on Alpha there's a 4-byte hole here, but I don't want weak pointers enough to switch machines . OTOH, sooner or later Guido is going to want a mark bit too, so the other way to view this is that 32 new flag bits are as cheap as one . There's one other thing I like about this: it can get rid of the dicey > Strong() checks that self->object->weak == self and returns > self->object (INCREFfed) if it is. check. If object has gone away, you're worried that self->object may (on some systems) point to a newly-invalid address. But worse than that, its memory may get reused, and then self->object may point into the *middle* of some other object where the bit pattern at the "weak" offset just happens to equal self. Let's try a sketch in pseduo-Python, where __xxx are secret functions that do the obvious things (and glossing over thread safety since these are presumably really implemented in C): # invariant: __is_weak_bit_set(obj) == id2weak.has_key(id(obj)) # So "the weak bit" is simply an optimization, sparing most objects # from a dict lookup when they die. # The invariant is delicate in the presence of threads. id2weak = {} class _Weak: def __init__(self, obj): self.id = id(obj) # obj's refcount not bumped __set_weak_bit(obj) id2weak[self.id] = self # note that "the system" (see below) sets self.id # to None if obj dies def strong(self): if self.id is None: raise DeadManWalkingError(self.id) return __id2obj(self.id) # will bump obj's refcount def __del__(self): # this is purely an optimization: if self gets nuked, # exempt its referent from greater expense when *it* # dies if self.id is not None: __clear_weak_bit(__id2obj(self.id)) del id2weak[self.id] def weak(obj): return id2weak.get(id(obj), None) or _Weak(obj) and then whenever an object of any kind is deleted the system does: if __is_weak_bit_set(obj): objid = id(obj) id2weak[objid].id = None del id2weak[objid] In my current over-tired state, I think that's safe (modulo threads), portable and reasonably fast; I do think the extra bit costs 4 bytes, though. > ... > The weak object isn't transparent, because you have to call strong() > before you can do anything with it, but this is an advantage (says he, > aspiring to a career in politics or sales:-): with a transparent weak > object the object could disappear at unexpected moments and with this > scheme it can't, because when you have the object itself in hand you > have a refcount too. Explicit is better than implicit for me. [M.-A. Lemburg] > Have you checked the weak reference dictionary implementation > by Dieter Maurer ? It's at: > > http://www.handshake.de/~dieter/weakdict.html A project where I work is using it; it blows up a lot . While some form of weak dict is what most people want in the end, I'm not sure Dieter's decision to support weak dicts with only weak values (not weak keys) is sufficient. For example, the aforementioned project wants to associate various computed long strings with certain hashable objects, and for some reason or other (ain't my project ...) these objects can't be changed. So they can't store the strings in the objects. So they'd like to map the objects to the strings via assorted dicts. But using the object as a dict key keeps it (and, via the dicts, also its associated strings) artificially alive; they really want a weakdict with weak *keys*. I'm not sure I know of a clear & fast way to implement a weakdict building only on the weak() function. Jack? Using weak objects as values (or keys) with an ordinary dict can prevent their referents from being kept artificially alive, but that doesn't get the dict itself cleaned up by magic. Perhaps "the system" should notify a weak object when its referent goes away; that would at least give the WO a chance to purge itself from structures it knows it's in ... > ... > BTW, how would this be done in JPython ? I guess it doesn't > make much sense there because cycles are no problem for the > Java VM GC. Weak refs have many uses beyond avoiding cycles, and Java 1.2 has all of "hard", "soft", "weak", and "phantom" references. See java.lang.ref for details. I stopped paying attention to Java, so it's up to you to tell us what you learn about it . From fredrik at pythonware.com Mon Aug 16 09:06:43 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon, 16 Aug 1999 09:06:43 +0200 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B7248D.31E5D2BF@lyra.org> Message-ID: <009401bee7b5$e5dc27e0$f29b12c2@secret.pythonware.com> > I think the buffer interface was introduced in 1.5 (by Jack?). I added > the 8-bit character buffer slot and buffer objects in 1.5.2. > > > from array import array > > > > a = array("f", [0]*8192) > > > > b = buffer(a) > > > > for i in range(1000): > > a.append(1234) > > > > print b > > > > in other words, the buffer interface should > > be redesigned, or removed. > > I don't understand what you believe is weird here. did you run that code? it may work, it may bomb, or it may generate bogus output. all depending on your memory allocator, the phase of the moon, etc. just like back in the C/C++ days... imo, that's not good enough for a core feature. From gstein at lyra.org Mon Aug 16 09:15:54 1999 From: gstein at lyra.org (Greg Stein) Date: Mon, 16 Aug 1999 00:15:54 -0700 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B7248D.31E5D2BF@lyra.org> <009401bee7b5$e5dc27e0$f29b12c2@secret.pythonware.com> Message-ID: <37B7BAAA.1E6EE4CA@lyra.org> Fredrik Lundh wrote: > > > I think the buffer interface was introduced in 1.5 (by Jack?). I added > > the 8-bit character buffer slot and buffer objects in 1.5.2. > > > > > from array import array > > > > > > a = array("f", [0]*8192) > > > > > > b = buffer(a) > > > > > > for i in range(1000): > > > a.append(1234) > > > > > > print b > > > > > > in other words, the buffer interface should > > > be redesigned, or removed. > > > > I don't understand what you believe is weird here. > > did you run that code? Yup. It printed nothing. > it may work, it may bomb, or it may generate bogus > output. all depending on your memory allocator, the > phase of the moon, etc. just like back in the C/C++ > days... It probably appeared as an empty string because the construction of the array filled it with zeroes (at least the first byte). Regardless, I'd be surprised if it crashed the interpreter. The print command is supposed to do a str() on the object, which creates a PyStringObject from the buffer contents. Shouldn't be a crash there. > imo, that's not good enough for a core feature. If it crashed, then sure. But I'd say that indicates a bug rather than a design problem. Do you have a stack trace from a crash? Ah. I just worked through, in my head, what is happening here. The buffer object caches the pointer returned by the array object. The append on the array does a realloc() somewhere, thereby invalidating the pointer inside the buffer object. Icky. Gotta think on this one... As an initial thought, it would seem that the buffer would have to re-query the pointer for each operation. There are performance implications there, of course, but that would certainly fix the problem. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jack at oratrix.nl Mon Aug 16 11:42:42 1999 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 16 Aug 1999 11:42:42 +0200 Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: Message by David Ascher , Sun, 15 Aug 1999 09:54:23 -0700 (Pacific Daylight Time) , Message-ID: <19990816094243.3CE83303120@snelboot.oratrix.nl> > On Sun, 15 Aug 1999, M.-A. Lemburg wrote: > > > Actually, I think you could use arrays to do the trick right now, > > because they are writeable (unlike strings). Until creating > > writeable buffer objects becomes possible that is... > > No, because I can't make an array around existing memory which Win32 > allocates before I get to it. Would adding a buffer interface to cobject solve your problem? Cobject is described as being used for passing C objects between Python modules, but I've always thought of it as passing C objects from one C routine to another C routine through Python, which doesn't necessarily understand what the object is all about. That latter description seems to fit your bill quite nicely. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From jack at oratrix.nl Mon Aug 16 11:49:41 1999 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 16 Aug 1999 11:49:41 +0200 Subject: [Python-Dev] buffer interface considered harmful In-Reply-To: Message by Greg Stein , Sun, 15 Aug 1999 13:35:25 -0700 , <37B7248D.31E5D2BF@lyra.org> Message-ID: <19990816094941.83BE2303120@snelboot.oratrix.nl> > >... > > well, I think the buffer behaviour is both > > new and pretty funny: > > I think the buffer interface was introduced in 1.5 (by Jack?). I added > the 8-bit character buffer slot and buffer objects in 1.5.2. Ah, now I understand why I didn't understand some of the previous conversation: I hadn't never come across the buffer *objects* (as opposed to the buffer *interface*) until Fredrik's example. I've just look at it, and I'm not sure I understand the full intentions of the buffer object. Buffer objects can either behave as the "buffer-aspect" of the object behind them (without the rest of their functionality) or as array objects, and if they start out life as the first they can evolve into the second, is that right? Is there a rationale behind this design, or is it just something that happened? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From gstein at lyra.org Mon Aug 16 11:56:31 1999 From: gstein at lyra.org (Greg Stein) Date: Mon, 16 Aug 1999 02:56:31 -0700 Subject: [Python-Dev] buffer interface considered harmful References: <19990816094941.83BE2303120@snelboot.oratrix.nl> Message-ID: <37B7E04F.3843004@lyra.org> Jack Jansen wrote: >... > I've just look at it, and I'm not sure I understand the full intentions of the > buffer object. Buffer objects can either behave as the "buffer-aspect" of the > object behind them (without the rest of their functionality) or as array > objects, and if they start out life as the first they can evolve into the > second, is that right? > > Is there a rationale behind this design, or is it just something that > happened? The object doesn't change. You create it as a reference to an existing object's buffer (as exported via the buffer interface), or you create it as a reference to some arbitrary memory. The buffer object provides (optionally read/write) string-like behavior to any object that supports buffer behavior. It can also be used to make lightweight slices of another object. For example: >>> a = "abcdefghi" >>> b = buffer(a, 3, 3) >>> print b def >>> In the above example, there is only one copy of "def" (the portion inside of the string object referenced by ). The string-like behavior can be quite nice for memory-mapped files. Andrew's mmapfile module's file objects export the buffer interface. This means that you can open a file, wrap a buffer around it, and perform quick and easy random-access on the thing. You could even select slices of the file and pass them around as if they were strings, without loading anything into the process heap. (I want to try mmap'ing a .pyc and create code objects that have buffer-based bytecode streams; it will be interesting to see if this significantly reduces memory consumption (in terms of the heap size; the mmap'd .pyc can be shared across processes)). Cheers, -g -- Greg Stein, http://www.lyra.org/ From jim at digicool.com Mon Aug 16 14:30:41 1999 From: jim at digicool.com (Jim Fulton) Date: Mon, 16 Aug 1999 08:30:41 -0400 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> Message-ID: <37B80471.F0F467C9@digicool.com> Fredrik Lundh wrote: > > > Fredrik Lundh wrote: > > >... > > > besides, what about buffers and threads? if you > > > return a pointer from getreadbuf, wouldn't it be > > > good to know exactly when Python doesn't need > > > that pointer any more? explicit initbuffer/exitbuffer > > > calls around each sequence of buffer operations > > > would make that a lot safer... > > > > This is a pretty obvious one, I think: it lasts only as long as the > > object. PyString_AS_STRING is similar. Nothing new or funny here. > > well, I think the buffer behaviour is both > new and pretty funny: > > from array import array > > a = array("f", [0]*8192) > > b = buffer(a) > > for i in range(1000): > a.append(1234) > > print b > > in other words, the buffer interface should > be redesigned, or removed. A while ago I asked for some documentation on the Buffer interface. I basically got silence. At this point, I don't have a good idea what buffers are for and I don't see alot of evidence that there *is* a design. I assume that there was a design, but I can't see it. This whole discussion makes me very queasy. I'm probably just out of it, since I don't have time to read the Python list anymore. Presumably the buffer interface was proposed and discussed there at some distant point in the past. (I can't pay as much attention to this discussion as I suspect I should, due to time constaints and due to a basic understanding of the rational for the buffer interface. Jst now I caught a sniff of something I find kinda repulsive. I think I hear you all talking about beasies that hold a reference to some object's internal storage and that have write operations so you can write directly to the objects storage bypassing the object interfaces. I probably just imagined it.) Jim -- Jim Fulton mailto:jim at digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From gstein at lyra.org Mon Aug 16 14:41:23 1999 From: gstein at lyra.org (Greg Stein) Date: Mon, 16 Aug 1999 05:41:23 -0700 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B80471.F0F467C9@digicool.com> Message-ID: <37B806F3.2C5EDC44@lyra.org> Jim Fulton wrote: >... > A while ago I asked for some documentation on the Buffer > interface. I basically got silence. At this point, I I think the silence was caused by the simple fact that the documentation does not (yet) exist. That's all... nothing nefarious. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Mon Aug 16 14:05:35 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 16 Aug 1999 14:05:35 +0200 Subject: [Python-Dev] Re: w# typecode (was: marshal (was:Buffer interface in abstract.c? )) References: <37B722CD.383A2A9E@lyra.org> Message-ID: <37B7FE8F.30C35284@lemburg.com> Greg Stein wrote: > > David Ascher wrote: > > On Sun, 15 Aug 1999, M.-A. Lemburg wrote: > > ... > > > The new typecode "w#" for writeable buffer style objects is a good idea > > > (it should only work on single segment buffers). > > > > Indeed. > > I just borrowed Guido's time machine. That typecode is already in 1.5.2. > > :-) Ah, cool :-) -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 138 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Aug 16 14:29:31 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 16 Aug 1999 14:29:31 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <000b01bee7b2$7c62d780$f22d2399@tim> Message-ID: <37B8042B.21DE6053@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > Have you checked the weak reference dictionary implementation > > by Dieter Maurer ? It's at: > > > > http://www.handshake.de/~dieter/weakdict.html > > A project where I work is using it; it blows up a lot . > > While some form of weak dict is what most people want in the end, I'm not > sure Dieter's decision to support weak dicts with only weak values (not weak > keys) is sufficient. For example, the aforementioned project wants to > associate various computed long strings with certain hashable objects, and > for some reason or other (ain't my project ...) these objects can't be > changed. So they can't store the strings in the objects. So they'd like to > map the objects to the strings via assorted dicts. But using the object as > a dict key keeps it (and, via the dicts, also its associated strings) > artificially alive; they really want a weakdict with weak *keys*. > > I'm not sure I know of a clear & fast way to implement a weakdict building > only on the weak() function. Jack? > > Using weak objects as values (or keys) with an ordinary dict can prevent > their referents from being kept artificially alive, but that doesn't get the > dict itself cleaned up by magic. Perhaps "the system" should notify a weak > object when its referent goes away; that would at least give the WO a chance > to purge itself from structures it knows it's in ... Perhaps one could fiddle something out of the Proxy objects in mxProxy (you know where...). These support a special __cleanup__ protocol that I use a lot to work around circular garbage: the __cleanup__ method of the referenced object is called prior to destroying the proxy; even if the reference count on the object has not yet gone down to 0. This makes direct circles possible without problems: the parent can reference a child through the proxy and the child can reference the parent directly. As soon as the parent is cleaned up, the reference to the proxy is deleted which then automagically makes the back reference in the child disappear, allowing the parent to be deallocated after cleanup without leaving a circular reference around. > > ... > > BTW, how would this be done in JPython ? I guess it doesn't > > make much sense there because cycles are no problem for the > > Java VM GC. > > Weak refs have many uses beyond avoiding cycles, and Java 1.2 has all of > "hard", "soft", "weak", and "phantom" references. See java.lang.ref for > details. I stopped paying attention to Java, so it's up to you to tell us > what you learn about it . Thanks for the reference... but I guess this will remain a weak one for some time since the latter is currently a limited resource :-) -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 138 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Aug 16 14:41:51 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 16 Aug 1999 14:41:51 +0200 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B7248D.31E5D2BF@lyra.org> <009401bee7b5$e5dc27e0$f29b12c2@secret.pythonware.com> <37B7BAAA.1E6EE4CA@lyra.org> Message-ID: <37B8070F.763C3FF8@lemburg.com> Greg Stein wrote: > > Fredrik Lundh wrote: > > > > > I think the buffer interface was introduced in 1.5 (by Jack?). I added > > > the 8-bit character buffer slot and buffer objects in 1.5.2. > > > > > > > from array import array > > > > > > > > a = array("f", [0]*8192) > > > > > > > > b = buffer(a) > > > > > > > > for i in range(1000): > > > > a.append(1234) > > > > > > > > print b > > > > > > > > in other words, the buffer interface should > > > > be redesigned, or removed. > > > > > > I don't understand what you believe is weird here. > > > > did you run that code? > > Yup. It printed nothing. > > > it may work, it may bomb, or it may generate bogus > > output. all depending on your memory allocator, the > > phase of the moon, etc. just like back in the C/C++ > > days... > > It probably appeared as an empty string because the construction of the > array filled it with zeroes (at least the first byte). > > Regardless, I'd be surprised if it crashed the interpreter. The print > command is supposed to do a str() on the object, which creates a > PyStringObject from the buffer contents. Shouldn't be a crash there. > > > imo, that's not good enough for a core feature. > > If it crashed, then sure. But I'd say that indicates a bug rather than a > design problem. Do you have a stack trace from a crash? > > Ah. I just worked through, in my head, what is happening here. The > buffer object caches the pointer returned by the array object. The > append on the array does a realloc() somewhere, thereby invalidating the > pointer inside the buffer object. > > Icky. Gotta think on this one... As an initial thought, it would seem > that the buffer would have to re-query the pointer for each operation. > There are performance implications there, of course, but that would > certainly fix the problem. I guess that's the way to go. I wouldn't want to think about those details when using buffer objects and a function call is still better than a copy... it would do the init/exit wrapping implicitly: init at the time the getreadbuffer call is made and exit next time a thread switch is done - provided that the functions using the memory pointer also keep a reference to the buffer object alive (but that should be natural as this is always done when dealing with references in a safe way). -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 138 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jim at digicool.com Mon Aug 16 15:26:40 1999 From: jim at digicool.com (Jim Fulton) Date: Mon, 16 Aug 1999 09:26:40 -0400 Subject: [Python-Dev] buffer interface considered harmful References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us> <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B80471.F0F467C9@digicool.com> <37B806F3.2C5EDC44@lyra.org> Message-ID: <37B81190.165C373E@digicool.com> Greg Stein wrote: > > Jim Fulton wrote: > >... > > A while ago I asked for some documentation on the Buffer > > interface. I basically got silence. At this point, I > > I think the silence was caused by the simple fact that the documentation > does not (yet) exist. That's all... nothing nefarious. I didn't mean to suggest anything nefarious. I do think that a change that affects something as basic as the standard object type layout and that generates this much discussion really should be documented before it becomes part of the core. I'd especially like to see some kind of document that includes information like: - A problem statement that describes the problem the change is solving, - How does the solution solve the problem, - When and how should people writing new types support the new interfaces? We're not talking about a new library module here. There's been a change to the core object interface. Jim -- Jim Fulton mailto:jim at digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From jack at oratrix.nl Mon Aug 16 15:45:31 1999 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 16 Aug 1999 15:45:31 +0200 Subject: [Python-Dev] buffer interface considered harmful In-Reply-To: Message by Jim Fulton , Mon, 16 Aug 1999 08:30:41 -0400 , <37B80471.F0F467C9@digicool.com> Message-ID: <19990816134531.C30B5303120@snelboot.oratrix.nl> > A while ago I asked for some documentation on the Buffer > interface. I basically got silence. At this point, I > don't have a good idea what buffers are for and I don't see alot > of evidence that there *is* a design. I assume that there was > a design, but I can't see it. This whole discussion makes me > very queasy. Okay, as I'm apparently not the only one who is queasy let's start from scratch. First, there is the old buffer _interface_. This is a C interface that allows extension (and builtin) modules and functions a unified way to access objects if they want to write the object to file and similar things. It is also what the PyArg_ParseTuple "s#" returns. This is, in C, the getreadbuffer/getwritebuffer interface. Second, there's the extension the the buffer interface as of 1.5.2. This is again only available in C, and it allows C programmers to get an object _as an ASCII string_. This is meant for things like regexp modules, to access any "textual" object as an ASCII string. This is the getcharbuffer interface, and bound to the "t#" specifier in PyArg_ParseTuple. Third, there is the buffer _object_, also new in 1.5.2. This sort-of exports the functionality of the buffer interface to Python, but it does a bit more as well, because the buffer objects have a sort of copy-on-write semantics that means they may or may not be "attached" to a python object through the buffer interface. I think that the C interface and the object should be treated completely separately. I definitely want the C interface, but I personally don't use the Python buffer objects, so I don't really care all that much about those. Also, I think that the buffer objects might become easier to understand if we don't think of it as "the buffer interface exported to python", but as "Python buffer objects, that may share memory with other Python objects as an optimization". -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From jim at digicool.com Mon Aug 16 18:03:54 1999 From: jim at digicool.com (Jim Fulton) Date: Mon, 16 Aug 1999 12:03:54 -0400 Subject: [Python-Dev] buffer interface considered harmful References: <19990816134531.C30B5303120@snelboot.oratrix.nl> Message-ID: <37B8366A.82B305C7@digicool.com> Jack Jansen wrote: > > > A while ago I asked for some documentation on the Buffer > > interface. I basically got silence. At this point, I > > don't have a good idea what buffers are for and I don't see alot > > of evidence that there *is* a design. I assume that there was > > a design, but I can't see it. This whole discussion makes me > > very queasy. > > Okay, as I'm apparently not the only one who is queasy let's start from > scratch. Yee ha! > First, there is the old buffer _interface_. This is a C interface that allows > extension (and builtin) modules and functions a unified way to access objects > if they want to write the object to file and similar things. Is this serialization? What does this achiev that, say, the pickling protocols don't achiev? What other problems does it solve? > It is also what > the PyArg_ParseTuple "s#" returns. This is, in C, the > getreadbuffer/getwritebuffer interface. Huh? "s#" doesn't return a string? Or are you saying that you can pass a non-string object to a C function that uses "s#" and have it bufferized and then stringized? In either case, this is not consistent with the documentation (interface) of PyArg_ParseTuple. > Second, there's the extension the the buffer interface as of 1.5.2. This is > again only available in C, and it allows C programmers to get an object _as an > ASCII string_. This is meant for things like regexp modules, to access any > "textual" object as an ASCII string. This is the getcharbuffer interface, and > bound to the "t#" specifier in PyArg_ParseTuple. Hm. So this is making a little more sense. So, there is a notion that there are "textual" objects that want to provide a method for getting their "text". How does this text differ from what you get from __str__ or __repr__? > Third, there is the buffer _object_, also new in 1.5.2. This sort-of exports > the functionality of the buffer interface to Python, How so? Maybe I'm at sea because I still don't get what the C buffer interface is for. > but it does a bit more as > well, because the buffer objects have a sort of copy-on-write semantics that > means they may or may not be "attached" to a python object through the buffer > interface. What is this thing used for? Where does the slot in tp_as_buffer come into all of this? Why does this need to be a slot in the first place? Are these "textual" objects really common? Is the presense of this slot a flag for "textualness"? It would help alot, at least for me, if there was a clearer description of what motivates these things. What problems are they trying to solve? Jim -- Jim Fulton mailto:jim at digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From da at ski.org Mon Aug 16 18:45:47 1999 From: da at ski.org (David Ascher) Date: Mon, 16 Aug 1999 09:45:47 -0700 (Pacific Daylight Time) Subject: [Python-Dev] buffer interface considered harmful In-Reply-To: <37B8366A.82B305C7@digicool.com> Message-ID: On Mon, 16 Aug 1999, Jim Fulton wrote: > > Second, there's the extension the the buffer interface as of 1.5.2. This is > > again only available in C, and it allows C programmers to get an object _as an > > ASCII string_. This is meant for things like regexp modules, to access any > > "textual" object as an ASCII string. This is the getcharbuffer interface, and > > bound to the "t#" specifier in PyArg_ParseTuple. > > Hm. So this is making a little more sense. So, there is a notion that > there are "textual" objects that want to provide a method for getting > their "text". How does this text differ from what you get from __str__ > or __repr__? I'll let others give a well thought out rationale. Here are some examples of use which I think worthwile: * Consider an mmap()'ed file, twelve gigabytes long. Making mmapfile objects fit this aspect of the buffer interface allows you to do regexp searches on it w/o ever building a twelve gigabyte PyString. * Consider a non-contiguous NumPy array. If the array type supported the multi-segment buffer interface, extension module writers could manipulate the data within this array w/o having to worry about the non-contiguous nature of the data. They'd still have to worry about the multi-byte nature of the data, but it's still a win. In other words, I think that the buffer interface could be useful even w/ non-textual data. * If NumPy was modified to have arrays with data stored in buffer objects as opposed to the current "char *", and if PIL was modified to have images stored in buffer objects as opposed to whatever it uses, one could have arrays and images which shared data. I think all of these provide examples of motivations which are appealing to at least some Python users. I make no claim that they motivate the specific interface. In all the cases I can think of, one or both of two features are the key asset: - access to subset of huge data regions w/o creation of huge temporary variables. - sharing of data space. Yes, it's a power tool, and as a such should come with safety goggles. But then again, the same is true for ExtensionClasses =). leaving-out-the-regexp-on-NumPy-arrays-example, --david PS: I take back the implicit suggestion that buffer() return read-write buffers when possible. From jim at digicool.com Mon Aug 16 19:06:19 1999 From: jim at digicool.com (Jim Fulton) Date: Mon, 16 Aug 1999 13:06:19 -0400 Subject: [Python-Dev] buffer interface considered harmful References: Message-ID: <37B8450B.C5D308E4@digicool.com> David Ascher wrote: > > On Mon, 16 Aug 1999, Jim Fulton wrote: > > > > Second, there's the extension the the buffer interface as of 1.5.2. This is > > > again only available in C, and it allows C programmers to get an object _as an > > > ASCII string_. This is meant for things like regexp modules, to access any > > > "textual" object as an ASCII string. This is the getcharbuffer interface, and > > > bound to the "t#" specifier in PyArg_ParseTuple. > > > > Hm. So this is making a little more sense. So, there is a notion that > > there are "textual" objects that want to provide a method for getting > > their "text". How does this text differ from what you get from __str__ > > or __repr__? > > I'll let others give a well thought out rationale. I eagerly await this. :) > Here are some examples > of use which I think worthwile: > > * Consider an mmap()'ed file, twelve gigabytes long. Making mmapfile > objects fit this aspect of the buffer interface allows you to do regexp > searches on it w/o ever building a twelve gigabyte PyString. This seems reasonable, if a bit exotic. :) > * Consider a non-contiguous NumPy array. If the array type supported the > multi-segment buffer interface, extension module writers could > manipulate the data within this array w/o having to worry about the > non-contiguous nature of the data. They'd still have to worry about > the multi-byte nature of the data, but it's still a win. In other > words, I think that the buffer interface could be useful even w/ > non-textual data. Why is this a good thing? Why should extension module writes worry abot the non-contiguous nature of the data now? Does the NumPy C API somehow expose this now? Will multi-segment buffers make it go away somehow? > * If NumPy was modified to have arrays with data stored in buffer objects > as opposed to the current "char *", and if PIL was modified to have > images stored in buffer objects as opposed to whatever it uses, one > could have arrays and images which shared data. Uh, and this would be a good thing? Maybe PIL should just be modified to use NumPy arrays. > I think all of these provide examples of motivations which are appealing > to at least some Python users. Perhaps, although Guido knows how they'd find out about them. ;) Jim -- Jim Fulton mailto:jim at digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From da at ski.org Mon Aug 16 19:18:46 1999 From: da at ski.org (David Ascher) Date: Mon, 16 Aug 1999 10:18:46 -0700 (Pacific Daylight Time) Subject: [Python-Dev] buffer interface considered harmful In-Reply-To: <37B8450B.C5D308E4@digicool.com> Message-ID: On Mon, 16 Aug 1999, Jim Fulton wrote: >> [regexps on gigabyte files] > > This seems reasonable, if a bit exotic. :) In the bioinformatics world, I think it's everyday stuff. > Why is this a good thing? Why should extension module writes worry > abot the non-contiguous nature of the data now? Does the NumPy C API > somehow expose this now? Will multi-segment buffers make it go away > somehow? A NumPy extension module writer needs to create and modify NumPy arrays. These arrays may be non-contiguous (if e.g. they are the result of slicing). The NumPy C API exposes the non-contiguous nature, but it's hard enough to deal with it that I suspect most extension writers require contiguous arrays, which means unnecessary copies. Multi-segment buffers won't make the API go away necessarily (backwards compatibility and all that), but it could make it unnecessary for many extension writers. > > * If NumPy was modified to have arrays with data stored in buffer objects > > as opposed to the current "char *", and if PIL was modified to have > > images stored in buffer objects as opposed to whatever it uses, one > > could have arrays and images which shared data. > > Uh, and this would be a good thing? Maybe PIL should just be modified > to use NumPy arrays. Why? PIL was designed for image processing, and made design decisions appropriate to that domain. NumPy was designed for multidimensional numeric array processing, and made design decisions appropriate to that domain. The intersection of interests exists (e.g. in the medical imaging world), and I know people who spend a lot of their CPU time moving data between images and arrays with "stupid" tostring/fromstring operations. Given the size of the images, it's a prodigious waste of time, and kills the use of Python in many a project. > Perhaps, although Guido knows how they'd find out about them. ;) Uh? These issues have been discussed in the NumPy/PIL world for a while, with no solution in sight. Recently, I and others saw mentions of buffers in the source, and they seemed like a reasonable approach, which could be done w/o a rewrite of either PIL or NumPy. Don't get me wrong -- I'm all for better documentation of the buffer stuff, design guidelines, warnings and protocols. I stated as much on June 15: http://www.python.org/pipermail/python-dev/1999-June/000338.html --david From jim at digicool.com Mon Aug 16 19:38:22 1999 From: jim at digicool.com (Jim Fulton) Date: Mon, 16 Aug 1999 13:38:22 -0400 Subject: [Python-Dev] buffer interface considered harmful References: Message-ID: <37B84C8E.46885C8E@digicool.com> David Ascher wrote: > > On Mon, 16 Aug 1999, Jim Fulton wrote: > > >> [regexps on gigabyte files] > > > > This seems reasonable, if a bit exotic. :) > > In the bioinformatics world, I think it's everyday stuff. Right, in some (exotic ;) domains it's not exotic at all. > > Why is this a good thing? Why should extension module writes worry > > abot the non-contiguous nature of the data now? Does the NumPy C API > > somehow expose this now? Will multi-segment buffers make it go away > > somehow? > > A NumPy extension module writer needs to create and modify NumPy arrays. > These arrays may be non-contiguous (if e.g. they are the result of > slicing). The NumPy C API exposes the non-contiguous nature, but it's > hard enough to deal with it that I suspect most extension writers require > contiguous arrays, which means unnecessary copies. Hm. This sounds like an API problem to me. > Multi-segment buffers won't make the API go away necessarily (backwards > compatibility and all that), but it could make it unnecessary for many > extension writers. Multi-segment buffers don't make the mult-segmented nature of the memory go away. Do they really simplify the API that much? They seem to strip away an awful lot of information hiding. > > > * If NumPy was modified to have arrays with data stored in buffer objects > > > as opposed to the current "char *", and if PIL was modified to have > > > images stored in buffer objects as opposed to whatever it uses, one > > > could have arrays and images which shared data. > > > > Uh, and this would be a good thing? Maybe PIL should just be modified > > to use NumPy arrays. > > Why? PIL was designed for image processing, and made design decisions > appropriate to that domain. NumPy was designed for multidimensional > numeric array processing, and made design decisions appropriate to that > domain. The intersection of interests exists (e.g. in the medical imaging > world), and I know people who spend a lot of their CPU time moving data > between images and arrays with "stupid" tostring/fromstring operations. > Given the size of the images, it's a prodigious waste of time, and kills > the use of Python in many a project. It seems to me that NumPy is sufficiently broad enogh to encompass image processing. My main concern is having two systems rely on some low-level "shared memory" mechanism to achiev effiecient communication. > > Perhaps, although Guido knows how they'd find out about them. ;) > > Uh? These issues have been discussed in the NumPy/PIL world for a while, > with no solution in sight. Recently, I and others saw mentions of buffers > in the source, and they seemed like a reasonable approach, which could be > done w/o a rewrite of either PIL or NumPy. My point was that people would be lucky to find out about buffers or about how to use them as things stand. > Don't get me wrong -- I'm all for better documentation of the buffer > stuff, design guidelines, warnings and protocols. I stated as much on > June 15: > > http://www.python.org/pipermail/python-dev/1999-June/000338.html Yes, that was quite a jihad you launched. ;) Jim -- Jim Fulton mailto:jim at digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From da at ski.org Mon Aug 16 20:25:54 1999 From: da at ski.org (David Ascher) Date: Mon, 16 Aug 1999 11:25:54 -0700 (Pacific Daylight Time) Subject: [Python-Dev] buffer interface considered harmful In-Reply-To: <37B84C8E.46885C8E@digicool.com> Message-ID: On Mon, 16 Aug 1999, Jim Fulton wrote: [ Aside: > It seems to me that NumPy is sufficiently broad enogh to encompass > image processing. Well, I'll just say that you could have been right, but w/ the current NumPy, I don't blame F/ for having developed his own data structures. NumPy is messy, and some of its design decisions are wrong for image things (memory handling, casting rules, etc.). It's all water under the bridge at this point. ] Back to the main topic: You say: > [Multi-segment buffers] seem to strip away an awful lot of information > hiding. My impression of the buffer notion was that it is intended to *provide* information hiding, by giving a simple API to byte arrays which could be stored in various ways. I do agree that whether those bytes should be shared or not is a decision which should be weighted carefully. > My main concern is having two systems rely on some low-level "shared > memory" mechanism to achiev effiecient communication. I don't particularly care about the specific buffer interface (the low-level nature of which is what I think you object to). I do care about having a well-defined mechanism for sharing memory between objects, and I think there is value in defining such an interface generically. Maybe the notion of segmented arrays of bytes is too low-level, and instead we should think of the data spaces as segmented arrays of chunks, where a chunk can be one or more bytes? Or do you object to any 'generic' interface? Just for fun, here's the list of things which either currently do or have been talked about possibly in the future supporting some sort of buffer interface, and my guesses as to chunk size, segmented status and writeability): - strings (1 byte, single-segment, r/o) - unicode strings (2 bytes, single-segment, r/o) - struct.pack() things (1 byte, single-segment,r/o) - arrays (1-4? bytes, single-segment, r/w) - NumPy arrays (1-8 bytes, multi-segment, r/w) - PIL images (1-? bytes, multi-segment, r/w) - CObjects (1-byte, single-segment, r/?) - mmapfiles (1-byte, multi-segment?, r/w) - non-python-owned memory (1-byte, single-segment, r/w) --david From jack at oratrix.nl Mon Aug 16 21:36:40 1999 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 16 Aug 1999 21:36:40 +0200 Subject: [Python-Dev] Buffer interface and multiple threads Message-ID: <19990816193645.9E5B5CF320@oratrix.oratrix.nl> Hmm, something that just struck me: the buffer _interface_ (i.e. the C routines, not the buffer object stuff) is potentially thread-unsafe. In the "old world", where "s#" only worked on string objects, you could be sure that the C pointer returned remained valid as long as you had a reference to the python string object in hand, as strings are immutable. In the "new world", where "s#" also works on, say, array objects, this doesn't hold anymore. So, potentially, while one thread is in a write() system call writing the contents of the array to a file another thread could come in and change the data. Is this a problem? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mal at lemburg.com Mon Aug 16 22:22:12 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 16 Aug 1999 22:22:12 +0200 Subject: [Python-Dev] New htmlentitydefs.py file Message-ID: <37B872F4.1C3F5D39@lemburg.com> Attached you find a new HTML entity definitions file taken and parsed from: http://www.w3.org/TR/1998/REC-html40-19980424/HTMLlat1.ent http://www.w3.org/TR/1998/REC-html40-19980424/HTMLsymbol.ent http://www.w3.org/TR/1998/REC-html40-19980424/HTMLspecial.ent The latter two contain Unicode charcodes which obviously cannot (yet) be mapped to Unicode strings... perhaps Fredrik wants to include a spiced up version in with his Unicode type. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 138 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ -------------- next part -------------- """ Entity definitions for HTML4.0. Taken and parsed from: http://www.w3.org/TR/1998/REC-html40/HTMLlat1.ent http://www.w3.org/TR/1998/REC-html40/HTMLsymbol.ent http://www.w3.org/TR/1998/REC-html40/HTMLspecial.ent """ entitydefs = { 'AElig': chr(198), # latin capital letter AE = latin capital ligature AE, U+00C6 ISOlat1 'Aacute': chr(193), # latin capital letter A with acute, U+00C1 ISOlat1 'Acirc': chr(194), # latin capital letter A with circumflex, U+00C2 ISOlat1 'Agrave': chr(192), # latin capital letter A with grave = latin capital letter A grave, U+00C0 ISOlat1 'Alpha': 'Α', # greek capital letter alpha, U+0391 'Aring': chr(197), # latin capital letter A with ring above = latin capital letter A ring, U+00C5 ISOlat1 'Atilde': chr(195), # latin capital letter A with tilde, U+00C3 ISOlat1 'Auml': chr(196), # latin capital letter A with diaeresis, U+00C4 ISOlat1 'Beta': 'Β', # greek capital letter beta, U+0392 'Ccedil': chr(199), # latin capital letter C with cedilla, U+00C7 ISOlat1 'Chi': 'Χ', # greek capital letter chi, U+03A7 'Dagger': '‡', # double dagger, U+2021 ISOpub 'Delta': 'Δ', # greek capital letter delta, U+0394 ISOgrk3 'ETH': chr(208), # latin capital letter ETH, U+00D0 ISOlat1 'Eacute': chr(201), # latin capital letter E with acute, U+00C9 ISOlat1 'Ecirc': chr(202), # latin capital letter E with circumflex, U+00CA ISOlat1 'Egrave': chr(200), # latin capital letter E with grave, U+00C8 ISOlat1 'Epsilon': 'Ε', # greek capital letter epsilon, U+0395 'Eta': 'Η', # greek capital letter eta, U+0397 'Euml': chr(203), # latin capital letter E with diaeresis, U+00CB ISOlat1 'Gamma': 'Γ', # greek capital letter gamma, U+0393 ISOgrk3 'Iacute': chr(205), # latin capital letter I with acute, U+00CD ISOlat1 'Icirc': chr(206), # latin capital letter I with circumflex, U+00CE ISOlat1 'Igrave': chr(204), # latin capital letter I with grave, U+00CC ISOlat1 'Iota': 'Ι', # greek capital letter iota, U+0399 'Iuml': chr(207), # latin capital letter I with diaeresis, U+00CF ISOlat1 'Kappa': 'Κ', # greek capital letter kappa, U+039A 'Lambda': 'Λ', # greek capital letter lambda, U+039B ISOgrk3 'Mu': 'Μ', # greek capital letter mu, U+039C 'Ntilde': chr(209), # latin capital letter N with tilde, U+00D1 ISOlat1 'Nu': 'Ν', # greek capital letter nu, U+039D 'Oacute': chr(211), # latin capital letter O with acute, U+00D3 ISOlat1 'Ocirc': chr(212), # latin capital letter O with circumflex, U+00D4 ISOlat1 'Ograve': chr(210), # latin capital letter O with grave, U+00D2 ISOlat1 'Omega': 'Ω', # greek capital letter omega, U+03A9 ISOgrk3 'Omicron': 'Ο', # greek capital letter omicron, U+039F 'Oslash': chr(216), # latin capital letter O with stroke = latin capital letter O slash, U+00D8 ISOlat1 'Otilde': chr(213), # latin capital letter O with tilde, U+00D5 ISOlat1 'Ouml': chr(214), # latin capital letter O with diaeresis, U+00D6 ISOlat1 'Phi': 'Φ', # greek capital letter phi, U+03A6 ISOgrk3 'Pi': 'Π', # greek capital letter pi, U+03A0 ISOgrk3 'Prime': '″', # double prime = seconds = inches, U+2033 ISOtech 'Psi': 'Ψ', # greek capital letter psi, U+03A8 ISOgrk3 'Rho': 'Ρ', # greek capital letter rho, U+03A1 'Sigma': 'Σ', # greek capital letter sigma, U+03A3 ISOgrk3 'THORN': chr(222), # latin capital letter THORN, U+00DE ISOlat1 'Tau': 'Τ', # greek capital letter tau, U+03A4 'Theta': 'Θ', # greek capital letter theta, U+0398 ISOgrk3 'Uacute': chr(218), # latin capital letter U with acute, U+00DA ISOlat1 'Ucirc': chr(219), # latin capital letter U with circumflex, U+00DB ISOlat1 'Ugrave': chr(217), # latin capital letter U with grave, U+00D9 ISOlat1 'Upsilon': 'Υ', # greek capital letter upsilon, U+03A5 ISOgrk3 'Uuml': chr(220), # latin capital letter U with diaeresis, U+00DC ISOlat1 'Xi': 'Ξ', # greek capital letter xi, U+039E ISOgrk3 'Yacute': chr(221), # latin capital letter Y with acute, U+00DD ISOlat1 'Zeta': 'Ζ', # greek capital letter zeta, U+0396 'aacute': chr(225), # latin small letter a with acute, U+00E1 ISOlat1 'acirc': chr(226), # latin small letter a with circumflex, U+00E2 ISOlat1 'acute': chr(180), # acute accent = spacing acute, U+00B4 ISOdia 'aelig': chr(230), # latin small letter ae = latin small ligature ae, U+00E6 ISOlat1 'agrave': chr(224), # latin small letter a with grave = latin small letter a grave, U+00E0 ISOlat1 'alefsym': 'ℵ', # alef symbol = first transfinite cardinal, U+2135 NEW 'alpha': 'α', # greek small letter alpha, U+03B1 ISOgrk3 'and': '∧', # logical and = wedge, U+2227 ISOtech 'ang': '∠', # angle, U+2220 ISOamso 'aring': chr(229), # latin small letter a with ring above = latin small letter a ring, U+00E5 ISOlat1 'asymp': '≈', # almost equal to = asymptotic to, U+2248 ISOamsr 'atilde': chr(227), # latin small letter a with tilde, U+00E3 ISOlat1 'auml': chr(228), # latin small letter a with diaeresis, U+00E4 ISOlat1 'bdquo': '„', # double low-9 quotation mark, U+201E NEW 'beta': 'β', # greek small letter beta, U+03B2 ISOgrk3 'brvbar': chr(166), # broken bar = broken vertical bar, U+00A6 ISOnum 'bull': '•', # bullet = black small circle, U+2022 ISOpub 'cap': '∩', # intersection = cap, U+2229 ISOtech 'ccedil': chr(231), # latin small letter c with cedilla, U+00E7 ISOlat1 'cedil': chr(184), # cedilla = spacing cedilla, U+00B8 ISOdia 'cent': chr(162), # cent sign, U+00A2 ISOnum 'chi': 'χ', # greek small letter chi, U+03C7 ISOgrk3 'clubs': '♣', # black club suit = shamrock, U+2663 ISOpub 'cong': '≅', # approximately equal to, U+2245 ISOtech 'copy': chr(169), # copyright sign, U+00A9 ISOnum 'crarr': '↵', # downwards arrow with corner leftwards = carriage return, U+21B5 NEW 'cup': '∪', # union = cup, U+222A ISOtech 'curren': chr(164), # currency sign, U+00A4 ISOnum 'dArr': '⇓', # downwards double arrow, U+21D3 ISOamsa 'dagger': '†', # dagger, U+2020 ISOpub 'darr': '↓', # downwards arrow, U+2193 ISOnum 'deg': chr(176), # degree sign, U+00B0 ISOnum 'delta': 'δ', # greek small letter delta, U+03B4 ISOgrk3 'diams': '♦', # black diamond suit, U+2666 ISOpub 'divide': chr(247), # division sign, U+00F7 ISOnum 'eacute': chr(233), # latin small letter e with acute, U+00E9 ISOlat1 'ecirc': chr(234), # latin small letter e with circumflex, U+00EA ISOlat1 'egrave': chr(232), # latin small letter e with grave, U+00E8 ISOlat1 'empty': '∅', # empty set = null set = diameter, U+2205 ISOamso 'emsp': ' ', # em space, U+2003 ISOpub 'ensp': ' ', # en space, U+2002 ISOpub 'epsilon': 'ε', # greek small letter epsilon, U+03B5 ISOgrk3 'equiv': '≡', # identical to, U+2261 ISOtech 'eta': 'η', # greek small letter eta, U+03B7 ISOgrk3 'eth': chr(240), # latin small letter eth, U+00F0 ISOlat1 'euml': chr(235), # latin small letter e with diaeresis, U+00EB ISOlat1 'exist': '∃', # there exists, U+2203 ISOtech 'fnof': 'ƒ', # latin small f with hook = function = florin, U+0192 ISOtech 'forall': '∀', # for all, U+2200 ISOtech 'frac12': chr(189), # vulgar fraction one half = fraction one half, U+00BD ISOnum 'frac14': chr(188), # vulgar fraction one quarter = fraction one quarter, U+00BC ISOnum 'frac34': chr(190), # vulgar fraction three quarters = fraction three quarters, U+00BE ISOnum 'frasl': '⁄', # fraction slash, U+2044 NEW 'gamma': 'γ', # greek small letter gamma, U+03B3 ISOgrk3 'ge': '≥', # greater-than or equal to, U+2265 ISOtech 'hArr': '⇔', # left right double arrow, U+21D4 ISOamsa 'harr': '↔', # left right arrow, U+2194 ISOamsa 'hearts': '♥', # black heart suit = valentine, U+2665 ISOpub 'hellip': '…', # horizontal ellipsis = three dot leader, U+2026 ISOpub 'iacute': chr(237), # latin small letter i with acute, U+00ED ISOlat1 'icirc': chr(238), # latin small letter i with circumflex, U+00EE ISOlat1 'iexcl': chr(161), # inverted exclamation mark, U+00A1 ISOnum 'igrave': chr(236), # latin small letter i with grave, U+00EC ISOlat1 'image': 'ℑ', # blackletter capital I = imaginary part, U+2111 ISOamso 'infin': '∞', # infinity, U+221E ISOtech 'int': '∫', # integral, U+222B ISOtech 'iota': 'ι', # greek small letter iota, U+03B9 ISOgrk3 'iquest': chr(191), # inverted question mark = turned question mark, U+00BF ISOnum 'isin': '∈', # element of, U+2208 ISOtech 'iuml': chr(239), # latin small letter i with diaeresis, U+00EF ISOlat1 'kappa': 'κ', # greek small letter kappa, U+03BA ISOgrk3 'lArr': '⇐', # leftwards double arrow, U+21D0 ISOtech 'lambda': 'λ', # greek small letter lambda, U+03BB ISOgrk3 'lang': '〈', # left-pointing angle bracket = bra, U+2329 ISOtech 'laquo': chr(171), # left-pointing double angle quotation mark = left pointing guillemet, U+00AB ISOnum 'larr': '←', # leftwards arrow, U+2190 ISOnum 'lceil': '⌈', # left ceiling = apl upstile, U+2308 ISOamsc 'ldquo': '“', # left double quotation mark, U+201C ISOnum 'le': '≤', # less-than or equal to, U+2264 ISOtech 'lfloor': '⌊', # left floor = apl downstile, U+230A ISOamsc 'lowast': '∗', # asterisk operator, U+2217 ISOtech 'loz': '◊', # lozenge, U+25CA ISOpub 'lrm': '‎', # left-to-right mark, U+200E NEW RFC 2070 'lsaquo': '‹', # single left-pointing angle quotation mark, U+2039 ISO proposed 'lsquo': '‘', # left single quotation mark, U+2018 ISOnum 'macr': chr(175), # macron = spacing macron = overline = APL overbar, U+00AF ISOdia 'mdash': '—', # em dash, U+2014 ISOpub 'micro': chr(181), # micro sign, U+00B5 ISOnum 'middot': chr(183), # middle dot = Georgian comma = Greek middle dot, U+00B7 ISOnum 'minus': '−', # minus sign, U+2212 ISOtech 'mu': 'μ', # greek small letter mu, U+03BC ISOgrk3 'nabla': '∇', # nabla = backward difference, U+2207 ISOtech 'nbsp': chr(160), # no-break space = non-breaking space, U+00A0 ISOnum 'ndash': '–', # en dash, U+2013 ISOpub 'ne': '≠', # not equal to, U+2260 ISOtech 'ni': '∋', # contains as member, U+220B ISOtech 'not': chr(172), # not sign, U+00AC ISOnum 'notin': '∉', # not an element of, U+2209 ISOtech 'nsub': '⊄', # not a subset of, U+2284 ISOamsn 'ntilde': chr(241), # latin small letter n with tilde, U+00F1 ISOlat1 'nu': 'ν', # greek small letter nu, U+03BD ISOgrk3 'oacute': chr(243), # latin small letter o with acute, U+00F3 ISOlat1 'ocirc': chr(244), # latin small letter o with circumflex, U+00F4 ISOlat1 'ograve': chr(242), # latin small letter o with grave, U+00F2 ISOlat1 'oline': '‾', # overline = spacing overscore, U+203E NEW 'omega': 'ω', # greek small letter omega, U+03C9 ISOgrk3 'omicron': 'ο', # greek small letter omicron, U+03BF NEW 'oplus': '⊕', # circled plus = direct sum, U+2295 ISOamsb 'or': '∨', # logical or = vee, U+2228 ISOtech 'ordf': chr(170), # feminine ordinal indicator, U+00AA ISOnum 'ordm': chr(186), # masculine ordinal indicator, U+00BA ISOnum 'oslash': chr(248), # latin small letter o with stroke, = latin small letter o slash, U+00F8 ISOlat1 'otilde': chr(245), # latin small letter o with tilde, U+00F5 ISOlat1 'otimes': '⊗', # circled times = vector product, U+2297 ISOamsb 'ouml': chr(246), # latin small letter o with diaeresis, U+00F6 ISOlat1 'para': chr(182), # pilcrow sign = paragraph sign, U+00B6 ISOnum 'part': '∂', # partial differential, U+2202 ISOtech 'permil': '‰', # per mille sign, U+2030 ISOtech 'perp': '⊥', # up tack = orthogonal to = perpendicular, U+22A5 ISOtech 'phi': 'φ', # greek small letter phi, U+03C6 ISOgrk3 'pi': 'π', # greek small letter pi, U+03C0 ISOgrk3 'piv': 'ϖ', # greek pi symbol, U+03D6 ISOgrk3 'plusmn': chr(177), # plus-minus sign = plus-or-minus sign, U+00B1 ISOnum 'pound': chr(163), # pound sign, U+00A3 ISOnum 'prime': '′', # prime = minutes = feet, U+2032 ISOtech 'prod': '∏', # n-ary product = product sign, U+220F ISOamsb 'prop': '∝', # proportional to, U+221D ISOtech 'psi': 'ψ', # greek small letter psi, U+03C8 ISOgrk3 'rArr': '⇒', # rightwards double arrow, U+21D2 ISOtech 'radic': '√', # square root = radical sign, U+221A ISOtech 'rang': '〉', # right-pointing angle bracket = ket, U+232A ISOtech 'raquo': chr(187), # right-pointing double angle quotation mark = right pointing guillemet, U+00BB ISOnum 'rarr': '→', # rightwards arrow, U+2192 ISOnum 'rceil': '⌉', # right ceiling, U+2309 ISOamsc 'rdquo': '”', # right double quotation mark, U+201D ISOnum 'real': 'ℜ', # blackletter capital R = real part symbol, U+211C ISOamso 'reg': chr(174), # registered sign = registered trade mark sign, U+00AE ISOnum 'rfloor': '⌋', # right floor, U+230B ISOamsc 'rho': 'ρ', # greek small letter rho, U+03C1 ISOgrk3 'rlm': '‏', # right-to-left mark, U+200F NEW RFC 2070 'rsaquo': '›', # single right-pointing angle quotation mark, U+203A ISO proposed 'rsquo': '’', # right single quotation mark, U+2019 ISOnum 'sbquo': '‚', # single low-9 quotation mark, U+201A NEW 'sdot': '⋅', # dot operator, U+22C5 ISOamsb 'sect': chr(167), # section sign, U+00A7 ISOnum 'shy': chr(173), # soft hyphen = discretionary hyphen, U+00AD ISOnum 'sigma': 'σ', # greek small letter sigma, U+03C3 ISOgrk3 'sigmaf': 'ς', # greek small letter final sigma, U+03C2 ISOgrk3 'sim': '∼', # tilde operator = varies with = similar to, U+223C ISOtech 'spades': '♠', # black spade suit, U+2660 ISOpub 'sub': '⊂', # subset of, U+2282 ISOtech 'sube': '⊆', # subset of or equal to, U+2286 ISOtech 'sum': '∑', # n-ary sumation, U+2211 ISOamsb 'sup': '⊃', # superset of, U+2283 ISOtech 'sup1': chr(185), # superscript one = superscript digit one, U+00B9 ISOnum 'sup2': chr(178), # superscript two = superscript digit two = squared, U+00B2 ISOnum 'sup3': chr(179), # superscript three = superscript digit three = cubed, U+00B3 ISOnum 'supe': '⊇', # superset of or equal to, U+2287 ISOtech 'szlig': chr(223), # latin small letter sharp s = ess-zed, U+00DF ISOlat1 'tau': 'τ', # greek small letter tau, U+03C4 ISOgrk3 'there4': '∴', # therefore, U+2234 ISOtech 'theta': 'θ', # greek small letter theta, U+03B8 ISOgrk3 'thetasym': 'ϑ', # greek small letter theta symbol, U+03D1 NEW 'thinsp': ' ', # thin space, U+2009 ISOpub 'thorn': chr(254), # latin small letter thorn with, U+00FE ISOlat1 'times': chr(215), # multiplication sign, U+00D7 ISOnum 'trade': '™', # trade mark sign, U+2122 ISOnum 'uArr': '⇑', # upwards double arrow, U+21D1 ISOamsa 'uacute': chr(250), # latin small letter u with acute, U+00FA ISOlat1 'uarr': '↑', # upwards arrow, U+2191 ISOnum 'ucirc': chr(251), # latin small letter u with circumflex, U+00FB ISOlat1 'ugrave': chr(249), # latin small letter u with grave, U+00F9 ISOlat1 'uml': chr(168), # diaeresis = spacing diaeresis, U+00A8 ISOdia 'upsih': 'ϒ', # greek upsilon with hook symbol, U+03D2 NEW 'upsilon': 'υ', # greek small letter upsilon, U+03C5 ISOgrk3 'uuml': chr(252), # latin small letter u with diaeresis, U+00FC ISOlat1 'weierp': '℘', # script capital P = power set = Weierstrass p, U+2118 ISOamso 'xi': 'ξ', # greek small letter xi, U+03BE ISOgrk3 'yacute': chr(253), # latin small letter y with acute, U+00FD ISOlat1 'yen': chr(165), # yen sign = yuan sign, U+00A5 ISOnum 'yuml': chr(255), # latin small letter y with diaeresis, U+00FF ISOlat1 'zeta': 'ζ', # greek small letter zeta, U+03B6 ISOgrk3 'zwj': '‍', # zero width joiner, U+200D NEW RFC 2070 'zwnj': '‌', # zero width non-joiner, U+200C NEW RFC 2070 } From tim_one at email.msn.com Tue Aug 17 09:30:17 1999 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 17 Aug 1999 03:30:17 -0400 Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: <37B8042B.21DE6053@lemburg.com> Message-ID: <000001bee882$5b7d8da0$112d2399@tim> [about weakdicts and the possibility of building them on weak references; the obvious way doesn't clean up the dict itself by magic; maybe a weak object should be notified when its referent goes away ] [M.-A. Lemburg] > Perhaps one could fiddle something out of the Proxy objects > in mxProxy (you know where...). These support a special __cleanup__ > protocol that I use a lot to work around circular garbage: > the __cleanup__ method of the referenced object is called prior > to destroying the proxy; even if the reference count on the > object has not yet gone down to 0. > > This makes direct circles possible without problems: the parent > can reference a child through the proxy and the child can reference the > parent directly. What you just wrote is: parent --> proxy --> child -->+ ^ v +<----------------------------+ Looks like a plain old cycle to me! > As soon as the parent is cleaned up, the reference to > the proxy is deleted which then automagically makes the > back reference in the child disappear, allowing the parent > to be deallocated after cleanup without leaving a circular > reference around. M-A, this is making less sense by the paragraph : skipping the middle, this says "as soon as the parent is cleaned up ... allowing the parent to be deallocated after cleanup". If we presume that the parent gets cleaned up explicitly (since the reference from the child is keeping it alive, it's not going to get cleaned up by magic, right?), then the parent could just as well call the __cleanup__ methods of the things it references directly without bothering with a proxy. For that matter, if it's the straightforward parent <-> child kind of cycle, the parent's cleanup method can just do self.__dict__.clear() and the cycle is broken without writing a __cleanup__ method anywhere (that's what I usually do, and in this kind of cycle that clears the last reference to the child, which then goes away, which in turn automagically clears its back reference to the parent). So, offhand, I don't see that the proxy protocol could help here. In a sense, what's really needed is the opposite: notifying the *proxy* when the *real* object goes away (which makes no sense in the context of what your proxy objects were designed to do). [about Java and its four reference strengths] Found a good introductory writeup at (sorry, my mailer will break this URL, so I'll break it myself at a sensible place): http://developer.java.sun.com/developer/ technicalArticles//ALT/RefObj/index.html They have a class for each of the three "not strong" flavors of references. For all three you pass the referenced object to the constructor, and all three accept (optional in two of the flavors) a second ReferenceQueue argument. In the latter case, when the referenced object goes away the weak/soft/phantom-ref proxy object is placed on the queue. Which, in turn, is a thread-safe queue with various put, get, and timeout-limited polling functions. So you have to write code to look at the queue from time to time, to find the proxies whose referents have gone away. The three flavors may (or may not ...) have these motivations: soft: an object reachable at strongest by soft references can go away at any time, but the garbage collector strives to keep it intact until it can't find any other way to get enough memory weak: an object reachable at strongest by weak references can go away at any time, and the collector makes no attempt to delay its death phantom: an object reachable at strongest by phantom references can get *finalized* at any time, but won't get *deallocated* before its phantom proxy does something or other (goes away? wasn't clear). This is the flavor that requires passing a queue argument to the constructor. Seems to be a major hack to worm around Java's notorious problems with order of finalization -- along the lines that you give phantom referents trivial finalizers, and put the real cleanup logic in the phantom proxy. This lets your program take responsibility for running the real cleanup code in the order-- and in the thread! --where it makes sense. Java 1.2 *also* tosses in a WeakHashMap class, which is a dict with under-the-cover weak keys (unlike Dieter's flavor with weak values), and where the key+value pairs vanish by magic when the key object goes away. The details and the implementation of these guys waren't clear to me, but then I didn't download the code, just scanned the online docs. Ah, a correction to my last post: class _Weak: ... def __del__(self): # this is purely an optimization: if self gets nuked, # exempt its referent from greater expense when *it* # dies if self.id is not None: __clear_weak_bit(__id2obj(self.id)) del id2weak[self.id] Root of all evil: this method is useless, since the id2weak dict keeps each _Weak object alive until its referent goes away (at which time self.id gets set to None, so _Weak.__del__ doesn't do anything). Even if it did do something, it's no cheaper to do it here than in the systemt cleanup code ("greater expense" was wrong). weakly y'rs - tim PS: Ooh! Ooh! Fellow at work today was whining about weakdicts, and called them "limp dicts". I'm not entirely sure it was an innocent Freudian slut, but it's a funny pun even if it wasn't (for you foreigners, it sounds like American slang for "flaccid one-eyed trouser snake" ...). From fredrik at pythonware.com Tue Aug 17 09:23:03 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 17 Aug 1999 09:23:03 +0200 Subject: [Python-Dev] buffer interface considered harmful References: Message-ID: <00c201bee884$42a10ad0$f29b12c2@secret.pythonware.com> David Ascher wrote: > Why? PIL was designed for image processing, and made design decisions > appropriate to that domain. NumPy was designed for multidimensional > numeric array processing, and made design decisions appropriate to that > domain. The intersection of interests exists (e.g. in the medical imaging > world), and I know people who spend a lot of their CPU time moving data > between images and arrays with "stupid" tostring/fromstring operations. > Given the size of the images, it's a prodigious waste of time, and kills > the use of Python in many a project. as an aside, PIL 1.1 (*) introduces "virtual image memories" which are, as I mentioned in an earlier post, accessed via an API rather than via direct pointers. it'll also include an adapter allowing you to use NumPy objects as image memories. unfortunately, the buffer interface is not good enough to use on top of the virtual image memory interface... *) 1.1 is our current development thread, which will be released to plus customers in a number of weeks... From mal at lemburg.com Tue Aug 17 10:50:01 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 17 Aug 1999 10:50:01 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <000001bee882$5b7d8da0$112d2399@tim> Message-ID: <37B92239.4076841E@lemburg.com> Tim Peters wrote: > > [about weakdicts and the possibility of building them on weak > references; the obvious way doesn't clean up the dict itself by > magic; maybe a weak object should be notified when its referent > goes away > ] > > [M.-A. Lemburg] > > Perhaps one could fiddle something out of the Proxy objects > > in mxProxy (you know where...). These support a special __cleanup__ > > protocol that I use a lot to work around circular garbage: > > the __cleanup__ method of the referenced object is called prior > > to destroying the proxy; even if the reference count on the > > object has not yet gone down to 0. > > > > This makes direct circles possible without problems: the parent > > can reference a child through the proxy and the child can reference the > > parent directly. > > What you just wrote is: > > parent --> proxy --> child -->+ > ^ v > +<----------------------------+ > > Looks like a plain old cycle to me! Sure :-) That was the intention. I'm using this to implement acquisition without turning to ExtensionClasses. [Nice picture, BTW] > > As soon as the parent is cleaned up, the reference to > > the proxy is deleted which then automagically makes the > > back reference in the child disappear, allowing the parent > > to be deallocated after cleanup without leaving a circular > > reference around. > > M-A, this is making less sense by the paragraph : skipping the > middle, this says "as soon as the parent is cleaned up ... allowing the > parent to be deallocated after cleanup". If we presume that the parent gets > cleaned up explicitly (since the reference from the child is keeping it > alive, it's not going to get cleaned up by magic, right?), then the parent > could just as well call the __cleanup__ methods of the things it references > directly without bothering with a proxy. For that matter, if it's the > straightforward > > parent <-> child > > kind of cycle, the parent's cleanup method can just do > > self.__dict__.clear() > > and the cycle is broken without writing a __cleanup__ method anywhere > (that's what I usually do, and in this kind of cycle that clears the last > reference to the child, which then goes away, which in turn automagically > clears its back reference to the parent). > > So, offhand, I don't see that the proxy protocol could help here. In a > sense, what's really needed is the opposite: notifying the *proxy* when the > *real* object goes away (which makes no sense in the context of what your > proxy objects were designed to do). All true :-). The nice thing about the proxy is that it takes care of the process automagically. And yes, the parent is used via a proxy too. So the picture looks like this: --> proxy --> parent --> proxy --> child -->+ ^ v +<----------------------------+ Since the proxy isn't noticed by the referencing objects (well, at least if they don't fiddle with internals), the picture for the objects looks like this: --> parent --> child -->+ ^ v +<------------------+ You could of course do the same via explicit invokation of the __cleanup__ method, but the object references involved could be hidden in some other structure, so they might be hard to find. And there's another feature about Proxies (as defined in mxProxy): they allow you to control access in a much more strict way than Python does. You can actually hide attributes and methods you don't want exposed in a way that doesn't even let you access them via some dict or pass me the frame object trick. This is very useful when you program multi-user application host servers where you don't want users to access internal structures of the server. > [about Java and its four reference strengths] > > Found a good introductory writeup at (sorry, my mailer will break this URL, > so I'll break it myself at a sensible place): > > http://developer.java.sun.com/developer/ > technicalArticles//ALT/RefObj/index.html Thanks for the reference... and for the summary ;-) > They have a class for each of the three "not strong" flavors of references. > For all three you pass the referenced object to the constructor, and all > three accept (optional in two of the flavors) a second ReferenceQueue > argument. In the latter case, when the referenced object goes away the > weak/soft/phantom-ref proxy object is placed on the queue. Which, in turn, > is a thread-safe queue with various put, get, and timeout-limited polling > functions. So you have to write code to look at the queue from time to > time, to find the proxies whose referents have gone away. > > The three flavors may (or may not ...) have these motivations: > > soft: an object reachable at strongest by soft references can go away at > any time, but the garbage collector strives to keep it intact until it can't > find any other way to get enough memory So there is a possibility of reviving these objects, right ? I've just recently added a hackish function to my mxTools which allows me to regain access to objects via their address (no, not thread safe, not even necessarily correct). sys.makeref(id) Provided that id is a valid address of a Python object (id(object) returns this address), this function returns a new reference to it. Only objects that are "alive" can be referenced this way, ones with zero reference count cause an exception to be raised. You can use this function to reaccess objects lost during garbage collection. USE WITH CARE: this is an expert-only function since it can cause instant core dumps and many other strange things -- even ruin your system if you don't know what you're doing ! SECURITY WARNING: This function can provide you with access to objects that are otherwise not visible, e.g. in restricted mode, and thus be a potential security hole. I use it for tracking objects via id-key based dictionary and hooks in the create/del mechanisms of Python instances. It helps finding those memory eating cycles. > weak: an object reachable at strongest by weak references can go away at > any time, and the collector makes no attempt to delay its death > > phantom: an object reachable at strongest by phantom references can get > *finalized* at any time, but won't get *deallocated* before its phantom > proxy does something or other (goes away? wasn't clear). This is the flavor > that requires passing a queue argument to the constructor. Seems to be a > major hack to worm around Java's notorious problems with order of > finalization -- along the lines that you give phantom referents trivial > finalizers, and put the real cleanup logic in the phantom proxy. This lets > your program take responsibility for running the real cleanup code in the > order-- and in the thread! --where it makes sense. Wouldn't these flavors be possible using the following setup ? Note that it's quite similar to your _Weak class except that I use a proxy without the need to first get a strong reference for the object and that it doesn't use a weak bit. --> proxy --> object ^ | all_managed_objects all_managed_objects is a dictionary indexed by address (its id) and keeps a strong reference to the objects. The proxy does not keep a strong reference to the object, but only the address as integer and checks the ref-count on the object in the all_managed_objects dictionary prior to every dereferencing action. In case this refcount falls down to 1 (only the all_managed_objects dict references it), the proxy takes appropriate action, e.g. raises an exceptions and deletes the reference in all_managed_objects to mimic a weak reference. The same check is done prior to garbage collection of the proxy. Add to this some queues, pepper and salt and place it in an oven at 220? for 20 minutes... plus take a look every 10 seconds or so... The downside is obvious: the zombified object will not get inspected (and then GCed) until the next time a weak reference to it is used. > Java 1.2 *also* tosses in a WeakHashMap class, which is a dict with > under-the-cover weak keys (unlike Dieter's flavor with weak values), and > where the key+value pairs vanish by magic when the key object goes away. > The details and the implementation of these guys waren't clear to me, but > then I didn't download the code, just scanned the online docs. Would the above help in creating such beasts ? > Ah, a correction to my last post: > > class _Weak: > ... > def __del__(self): > # this is purely an optimization: if self gets nuked, > # exempt its referent from greater expense when *it* > # dies > if self.id is not None: > __clear_weak_bit(__id2obj(self.id)) > del id2weak[self.id] > > Root of all evil: this method is useless, since the id2weak dict keeps each > _Weak object alive until its referent goes away (at which time self.id gets > set to None, so _Weak.__del__ doesn't do anything). Even if it did do > something, it's no cheaper to do it here than in the systemt cleanup code > ("greater expense" was wrong). > > weakly y'rs - tim > > PS: Ooh! Ooh! Fellow at work today was whining about weakdicts, and > called them "limp dicts". I'm not entirely sure it was an innocent Freudian > slut, but it's a funny pun even if it wasn't (for you foreigners, it sounds > like American slang for "flaccid one-eyed trouser snake" ...). :-) -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 136 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mhammond at skippinet.com.au Tue Aug 17 18:05:40 1999 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 18 Aug 1999 02:05:40 +1000 Subject: [Python-Dev] buffer interface considered harmful In-Reply-To: <00c201bee884$42a10ad0$f29b12c2@secret.pythonware.com> Message-ID: <000901bee8ca$5ceff4a0$1101a8c0@bobcat> Fredrik, Care to elaborate? Statements like "buffer interface needs a redesign" or "the buffer interface is not good enough to use on top of the virtual image memory interface" really only give me the impression you have a bee in your bonnet over these buffer interfaces. If you could actually stretch these statements out to provide even _some_ background, problem statement or potential solution it would help. All I know is "Fredrik doesnt like it for some unexplained reason". You found an issue with array reallocation - great - but thats a bug rather than a design flaw. Can you tell us why its not good enough, and an off-the-cuff design that would solve it? Or are you suggesting it is unsolvable? I really dont have a clue what your issue is. Jim (for example) has made his position and reasoning clear. You have only made your position clear, but your reasoning is still a mystery. Mark. > > unfortunately, the buffer interface is not good enough to use > on top of the virtual image memory interface... From fredrik at pythonware.com Tue Aug 17 18:48:31 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 17 Aug 1999 18:48:31 +0200 Subject: [Python-Dev] buffer interface considered harmful References: <000901bee8ca$5ceff4a0$1101a8c0@bobcat> Message-ID: <005201bee8d0$9b4737d0$f29b12c2@secret.pythonware.com> > Care to elaborate? Statements like "buffer interface needs a redesign" or > "the buffer interface is not good enough to use on top of the virtual image > memory interface" really only give me the impression you have a bee in your > bonnet over these buffer interfaces. re "good enough": http://www.python.org/pipermail/python-dev/1999-August/000650.html re "needs a redesign": http://www.python.org/pipermail/python-dev/1999-August/000659.html and to some extent: http://www.python.org/pipermail/python-dev/1999-August/000658.html > Jim (for example) has made his position and reasoning clear. among other things, Jim said: "At this point, I don't have a good idea what buffers are for and I don't see alot of evidence that there *is* a design. I assume that there was a design, but I can't see it". which pretty much echoes my concerns in: http://www.python.org/pipermail/python-dev/1999-August/000612.html http://www.python.org/pipermail/python-dev/1999-August/000648.html > You found an issue with array reallocation - great - but thats > a bug rather than a design flaw. for me, that bug (and the marshal glitch) indicates that the design isn't as chrystal-clear as it needs to be, for such a fundamental feature. otherwise, Greg would never have made that mistake, and Guido would have spotted it when he added the "buffer" built-in... so what are you folks waiting for? could someone who thinks he understands exactly what this thing is spend an hour on writing that design document, so me and Jim can put this entire thing behind us? PS. btw, was it luck or careful analysis behind the decision to make buffer() always return read-only buffers, also for objects implementing the read/write protocol? From da at ski.org Wed Aug 18 00:41:14 1999 From: da at ski.org (David Ascher) Date: Tue, 17 Aug 1999 15:41:14 -0700 (Pacific Daylight Time) Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) In-Reply-To: <19990816094243.3CE83303120@snelboot.oratrix.nl> Message-ID: On Mon, 16 Aug 1999, Jack Jansen wrote: > Would adding a buffer interface to cobject solve your problem? Cobject is > described as being used for passing C objects between Python modules, but I've > always thought of it as passing C objects from one C routine to another C > routine through Python, which doesn't necessarily understand what the object > is all about. > > That latter description seems to fit your bill quite nicely. It's an interesting idea, but it wouldn't do as it is, as I'd need the ability to create a CObject given a memory location and a size. Also, I am not expected to free() the memory, which would happen when the CObject got GC'ed. (BTW: I am *not* arguing that PyBuffer_FromReadWriteMemory() should be exposed by default. I'm happy with exposing it in my little extension module for my exotic needs.) --david From mal at lemburg.com Wed Aug 18 11:02:02 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 18 Aug 1999 11:02:02 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <000001bee882$5b7d8da0$112d2399@tim> <37B92239.4076841E@lemburg.com> Message-ID: <37BA768A.50DF5574@lemburg.com> [about weakdicts and the possibility of building them on weak references; the obvious way doesn't clean up the dict itself by magic; maybe a weak object should be notified when its referent goes away ] Here is a new version of my Proxy package which includes a self managing weak reference mechanism without the need to add extra bits or bytes to all Python objects: http://starship.skyport.net/~lemburg/mxProxy-pre0.2.0.zip The docs and an explanation of how the thingie works are included in the archive's Doc subdir. Basically it builds upon the idea I posted earlier on on this thread -- with a few extra kicks to get it right in the end ;-) Usage is pretty simple: from Proxy import WeakProxy object = [] wr = WeakProxy(object) wr.append(8) del object >>> wr[0] Traceback (innermost last): File "", line 1, in ? mxProxy.LostReferenceError: object already garbage collected I have checked the ref counts pretty thoroughly, but before going public I would like the Python-Dev crowd to run some tests as well: after all, the point is for the weak references to be weak and that's sometimes a bit hard to check. Hope you have as much fun with it as I had writing it ;-) Ah yes, for the raw details have a look at the code. The code uses a list of back references to the weak Proxies and notifies them when the object goes away... would it be useful to add a hook to the Proxies so that they can apply some other action as well ? -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 135 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Vladimir.Marangozov at inrialpes.fr Wed Aug 18 13:42:08 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Wed, 18 Aug 1999 12:42:08 +0100 (NFT) Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: <37BA768A.50DF5574@lemburg.com> from "M.-A. Lemburg" at "Aug 18, 99 11:02:02 am" Message-ID: <199908181142.MAA22596@pukapuka.inrialpes.fr> M.-A. Lemburg wrote: > > Usage is pretty simple: > > from Proxy import WeakProxy > object = [] > wr = WeakProxy(object) > wr.append(8) > del object > > >>> wr[0] > Traceback (innermost last): > File "", line 1, in ? > mxProxy.LostReferenceError: object already garbage collected > > I have checked the ref counts pretty thoroughly, but before > going public I would like the Python-Dev crowd to run some > tests as well: after all, the point is for the weak references > to be weak and that's sometimes a bit hard to check. It's even harder to implement them without side effects. I used the same hack for the __heirs__ class attribute some time ago. But I knew that a parent class cannot be garbage collected before all of its descendants. That allowed me to keep weak refs in the parent class, and preserve the existing strong refs in the subclasses. On every dealloc of a subclass, the corresponding weak ref in the parent class' __heirs__ is removed. In your case, the lifetime of the objects cannot be predicted, so implementing weak refs by messing with refcounts or checking mem pointers is a dead end. I don't know whether this is the case with mxProxy as I just browsed the code quickly, but here's a scenario where your scheme (or implementation) is not working: >>> from Proxy import WeakProxy >>> o = [] >>> p = WeakProxy(o) >>> d = WeakProxy(o) >>> p >>> d >>> print p [] >>> print d [] >>> del o >>> p >>> d >>> print p Illegal instruction (core dumped) -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From jack at oratrix.nl Wed Aug 18 13:02:13 1999 From: jack at oratrix.nl (Jack Jansen) Date: Wed, 18 Aug 1999 13:02:13 +0200 Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: Message by "M.-A. Lemburg" , Wed, 18 Aug 1999 11:02:02 +0200 , <37BA768A.50DF5574@lemburg.com> Message-ID: <19990818110213.A558F303120@snelboot.oratrix.nl> The one thing I'm not thrilled by in mxProxy is that a call to CheckWeakReferences() is needed before an object is cleaned up. I guess this boils down to the same problem I had with my weak reference scheme: you somehow want the Python core to tell the proxy stuff that the object can be cleaned up (although the details are different: in my scheme this would be triggered by refcount==0 and in mxProxy by refcount==1). And because objects are created and destroyed in Python at a tremendous rate you don't want to do this call for every object, only if you have a hint that the object has a weak reference (or a proxy). -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mal at lemburg.com Wed Aug 18 13:46:45 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 18 Aug 1999 13:46:45 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <19990818110213.A558F303120@snelboot.oratrix.nl> Message-ID: <37BA9D25.95E46EA@lemburg.com> Jack Jansen wrote: > > The one thing I'm not thrilled by in mxProxy is that a call to > CheckWeakReferences() is needed before an object is cleaned up. I guess this > boils down to the same problem I had with my weak reference scheme: you > somehow want the Python core to tell the proxy stuff that the object can be > cleaned up (although the details are different: in my scheme this would be > triggered by refcount==0 and in mxProxy by refcount==1). And because objects > are created and destroyed in Python at a tremendous rate you don't want to do > this call for every object, only if you have a hint that the object has a weak > reference (or a proxy). Well, the check is done prior to every action using a proxy to the object and also when a proxy to it is deallocated. The addition checkweakrefs() API is only included to enable additional explicit checking of the whole weak refs dictionary, e.g. every 10 seconds or so (just like you would with a mark&sweep GC). But yes, GC of the phantom object is delayed a bit depending on how you set up the proxies. Still, I think most usages won't have this problem, since the proxies themselves are usually temporary objects. It may sometimes even make sense to have the phantom object around as long as possible, e.g. to implement the soft references Tim quoted from the Java paper. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 135 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Aug 18 13:33:18 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 18 Aug 1999 13:33:18 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <199908181142.MAA22596@pukapuka.inrialpes.fr> Message-ID: <37BA99FE.45D582AD@lemburg.com> Vladimir Marangozov wrote: > > M.-A. Lemburg wrote: > > I have checked the ref counts pretty thoroughly, but before > > going public I would like the Python-Dev crowd to run some > > tests as well: after all, the point is for the weak references > > to be weak and that's sometimes a bit hard to check. > > It's even harder to implement them without side effects. I used > the same hack for the __heirs__ class attribute some time ago. > But I knew that a parent class cannot be garbage collected before > all of its descendants. That allowed me to keep weak refs in > the parent class, and preserve the existing strong refs in the > subclasses. On every dealloc of a subclass, the corresponding > weak ref in the parent class' __heirs__ is removed. > > In your case, the lifetime of the objects cannot be predicted, > so implementing weak refs by messing with refcounts or checking > mem pointers is a dead end. > I don't know whether this is the > case with mxProxy as I just browsed the code quickly, but here's > a scenario where your scheme (or implementation) is not working: > > >>> from Proxy import WeakProxy > >>> o = [] > >>> p = WeakProxy(o) > >>> d = WeakProxy(o) > >>> p > > >>> d > > >>> print p > [] > >>> print d > [] > >>> del o > >>> p > > >>> d > > >>> print p > Illegal instruction (core dumped) Could you tell me where the core dump originates ? Also, it would help to compile the package with the -DMAL_DEBUG switch turned on (edit Setup) and then run the same things using 'python -d'. The package will then print a pretty complete list of things it is doing to mxProxy.log, which would help track down errors like these. BTW, I get: >>> print p Traceback (innermost last): File "", line 1, in ? mxProxy.LostReferenceError: object already garbage collected >>> [Don't know why the print statement prints an empty line, though.] Thanks for trying it, -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 135 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Vladimir.Marangozov at inrialpes.fr Wed Aug 18 15:12:14 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Wed, 18 Aug 1999 14:12:14 +0100 (NFT) Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: <37BA99FE.45D582AD@lemburg.com> from "M.-A. Lemburg" at "Aug 18, 99 01:33:18 pm" Message-ID: <199908181312.OAA20542@pukapuka.inrialpes.fr> [about mxProxy, WeakProxy] M.-A. Lemburg wrote: > > Could you tell me where the core dump originates ? Also, it would > help to compile the package with the -DMAL_DEBUG switch turned > on (edit Setup) and then run the same things using 'python -d'. > The package will then print a pretty complete list of things it > is doing to mxProxy.log, which would help track down errors like > these. > > BTW, I get: > >>> print p > > Traceback (innermost last): > File "", line 1, in ? > mxProxy.LostReferenceError: object already garbage collected > >>> > > [Don't know why the print statement prints an empty line, though.] > The previous example now *seems* to work fine in a freshly launched interpreter, so it's not a good example, but this shorter one definitely doesn't: >>> from Proxy import WeakProxy >>> o = [] >>> p = q = WeakProxy(o) >>> p = q = WeakProxy(o) >>> del o >>> print p or q Illegal instruction (core dumped) Or even shorter: >>> from Proxy import WeakProxy >>> o = [] >>> p = q = WeakProxy(o) >>> p = WeakProxy(o) >>> del o >>> print p Illegal instruction (core dumped) It crashes in PyDict_DelItem() called from mxProxy_CollectWeakReference(). I can mail you a complete trace in private, if you still need it. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal at lemburg.com Wed Aug 18 14:50:08 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 18 Aug 1999 14:50:08 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <199908181312.OAA20542@pukapuka.inrialpes.fr> Message-ID: <37BAAC00.27A34FF7@lemburg.com> Vladimir Marangozov wrote: > > [about mxProxy, WeakProxy] > > M.-A. Lemburg wrote: > > > > Could you tell me where the core dump originates ? Also, it would > > help to compile the package with the -DMAL_DEBUG switch turned > > on (edit Setup) and then run the same things using 'python -d'. > > The package will then print a pretty complete list of things it > > is doing to mxProxy.log, which would help track down errors like > > these. > > > > BTW, I get: > > >>> print p > > > > Traceback (innermost last): > > File "", line 1, in ? > > mxProxy.LostReferenceError: object already garbage collected > > >>> > > > > [Don't know why the print statement prints an empty line, though.] > > > > The previous example now *seems* to work fine in a freshly launched > interpreter, so it's not a good example, but this shorter one > definitely doesn't: > > >>> from Proxy import WeakProxy > >>> o = [] > >>> p = q = WeakProxy(o) > >>> p = q = WeakProxy(o) > >>> del o > >>> print p or q > Illegal instruction (core dumped) > > It crashes in PyDict_DelItem() called from mxProxy_CollectWeakReference(). > I can mail you a complete trace in private, if you still need it. That would be nice (please also include the log-file), because I get: >>> print p or q Traceback (innermost last): File "", line 1, in ? mxProxy.LostReferenceError: object already garbage collected >>> Thank you, -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 135 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From skip at mojam.com Wed Aug 18 16:47:23 1999 From: skip at mojam.com (Skip Montanaro) Date: Wed, 18 Aug 1999 09:47:23 -0500 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart Message-ID: <199908181447.JAA05151@dolphin.mojam.com> I posted a note to the main list yesterday in response to Dan Connolly's complaint that the os module isn't very portable. I saw no followups (it's amazing how fast a thread can die out :-), but I think it's a reasonable idea, perhaps for Python 2.0, so I'll repeat it here to get some feedback from people more interesting in long-term Python developments. The basic premise is that for each platform on which Python runs there are portable and nonportable interfaces to the underlying operating system. The term POSIX has some portability connotations, so let's assume that the posix module exposes the portable subset of the OS interface. To keep things simple, let's also assume there are only three supported general OS platforms: unix, nt and mac. The proposal then is that importing the platform's module by name will import both the portable and non-portable interface elements. Importing the posix module will import just that portion of the interface that is truly portable across all platforms. To add new functionality to the posix interface it would have to be added to all three platforms. The posix module will be able to ferret out the platform it is running on and import the correct OS-independent posix implementation: import sys _plat = sys.platform del sys if _plat == "mac": from posixmac import * elif _plat == "nt": from posixnt import * else: from posixunix import * # some unix variant The platform-dependent module would simply import everything it could, e.g.: from posixunix import * from nonposixunix import * The os module would vanish or be deprecated with its current behavior intact. The documentation would be modified so that the posix module documents the portable interface and the OS-dependent module's documentation documents the rest and just refers users to the posix module docs for the portable stuff. In theory, this could be done for 1.6, however as I've proposed it, the semantics of importing the posix module would change. Dan Connolly probably isn't going to have a problem with that, though I suppose Guido might... If this idea is good enough for 1.6, perhaps we leave os and posix module semantics alone and add a module named "portable", "portableos" or "portableposix" or something equally arcane. Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/~skip/ 847-971-7098 From guido at CNRI.Reston.VA.US Wed Aug 18 16:54:28 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Wed, 18 Aug 1999 10:54:28 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: Your message of "Wed, 18 Aug 1999 09:47:23 CDT." <199908181447.JAA05151@dolphin.mojam.com> References: <199908181447.JAA05151@dolphin.mojam.com> Message-ID: <199908181454.KAA07692@eric.cnri.reston.va.us> > I posted a note to the main list yesterday in response to Dan Connolly's > complaint that the os module isn't very portable. I saw no followups (it's > amazing how fast a thread can die out :-), but I think it's a reasonable > idea, perhaps for Python 2.0, so I'll repeat it here to get some feedback > from people more interesting in long-term Python developments. > > The basic premise is that for each platform on which Python runs there are > portable and nonportable interfaces to the underlying operating system. The > term POSIX has some portability connotations, so let's assume that the posix > module exposes the portable subset of the OS interface. To keep things > simple, let's also assume there are only three supported general OS > platforms: unix, nt and mac. The proposal then is that importing the > platform's module by name will import both the portable and non-portable > interface elements. Importing the posix module will import just that > portion of the interface that is truly portable across all platforms. To > add new functionality to the posix interface it would have to be added to > all three platforms. The posix module will be able to ferret out the > platform it is running on and import the correct OS-independent posix > implementation: > > import sys > _plat = sys.platform > del sys > > if _plat == "mac": from posixmac import * > elif _plat == "nt": from posixnt import * > else: from posixunix import * # some unix variant > > The platform-dependent module would simply import everything it could, e.g.: > > from posixunix import * > from nonposixunix import * > > The os module would vanish or be deprecated with its current behavior > intact. The documentation would be modified so that the posix module > documents the portable interface and the OS-dependent module's documentation > documents the rest and just refers users to the posix module docs for the > portable stuff. > > In theory, this could be done for 1.6, however as I've proposed it, the > semantics of importing the posix module would change. Dan Connolly probably > isn't going to have a problem with that, though I suppose Guido might... If > this idea is good enough for 1.6, perhaps we leave os and posix module > semantics alone and add a module named "portable", "portableos" or > "portableposix" or something equally arcane. And the advantage of this would be...? Basically, it seems you're just renaming the functionality of os to posix. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Wed Aug 18 17:10:41 1999 From: skip at mojam.com (Skip Montanaro) Date: Wed, 18 Aug 1999 10:10:41 -0500 (CDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <199908181454.KAA07692@eric.cnri.reston.va.us> References: <199908181447.JAA05151@dolphin.mojam.com> <199908181454.KAA07692@eric.cnri.reston.va.us> Message-ID: <14266.51743.904066.470431@dolphin.mojam.com> Guido> And the advantage of this would be...? Guido> Basically, it seems you're just renaming the functionality of os Guido> to posix. I see a few advantages. 1. We will get the meaning of the noun "posix" more or less right. Programmers coming from other languages are used to thinking of programming to a POSIX API or the "POSIX subset of the OS API". Witness all the "#ifdef _POSIX" in the header files on my Linux box In Python, the exact opposite is true. Importing the posix module is documented to be the non-portable way to interface to Unix platforms. 2. You would make it clear on all platforms when you expect to be programming in a non-portable fashion, by importing the platform-specific os (unix, nt, mac). "import unix" would mean I expect this code to only run on Unix machines. You could argue that you are declaring your non-portability by importing the posix module today, but to the casual user or to a new Python programmer with a C or C++ background, that won't be obvious. 3. If Dan Connolly's contention is correct, importing the os module today is not all that portable. I can't really say one way or the other, because I'm lucky enough to be able to confine my serious programming to Unix. I'm sure there's someone out there that can try the following on a few platforms: import os dir(os) and compare the output. Skip From jack at oratrix.nl Wed Aug 18 17:33:20 1999 From: jack at oratrix.nl (Jack Jansen) Date: Wed, 18 Aug 1999 17:33:20 +0200 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: Message by Skip Montanaro , Wed, 18 Aug 1999 09:47:23 -0500 , <199908181447.JAA05151@dolphin.mojam.com> Message-ID: <19990818153320.D61F6303120@snelboot.oratrix.nl> > The proposal then is that importing the > platform's module by name will import both the portable and non-portable > interface elements. Importing the posix module will import just that > portion of the interface that is truly portable across all platforms. There's one slight problem with this: when you use functionality that is partially portable, i.e. a call that is available on Windows and Unix but not on the Mac. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From akuchlin at mems-exchange.org Wed Aug 18 17:39:30 1999 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Wed, 18 Aug 1999 11:39:30 -0400 (EDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14266.51743.904066.470431@dolphin.mojam.com> References: <199908181447.JAA05151@dolphin.mojam.com> <199908181454.KAA07692@eric.cnri.reston.va.us> <14266.51743.904066.470431@dolphin.mojam.com> Message-ID: <14266.54194.715887.808096@amarok.cnri.reston.va.us> Skip Montanaro writes: > 2. You would make it clear on all platforms when you expect to be > programming in a non-portable fashion, by importing the > platform-specific os (unix, nt, mac). "import unix" would mean I To my mind, POSIX == Unix; other platforms may have bits of POSIX-ish functionality, but most POSIX functions will only be found on Unix systems. One of my projects for 1.6 is to go through the O'Reilly POSIX book and add all the missing calls to the posix modules. Practically none of those functions would exist on Windows or Mac. Perhaps it's really a documentation fix: the os module should document only those features common to all of the big 3 platforms (Unix, Windows, Mac), and have pointers to a section for each of the platform-specific modules, listing the platform-specific functions. -- A.M. Kuchling http://starship.python.net/crew/amk/ Setting loose on the battlefield weapons that are able to learn may be one of the biggest mistakes mankind has ever made. It could also be one of the last. -- Richard Forsyth, "Machine Learning for Expert Systems" From skip at mojam.com Wed Aug 18 17:52:20 1999 From: skip at mojam.com (Skip Montanaro) Date: Wed, 18 Aug 1999 10:52:20 -0500 (CDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14266.54194.715887.808096@amarok.cnri.reston.va.us> References: <199908181447.JAA05151@dolphin.mojam.com> <199908181454.KAA07692@eric.cnri.reston.va.us> <14266.51743.904066.470431@dolphin.mojam.com> <14266.54194.715887.808096@amarok.cnri.reston.va.us> Message-ID: <14266.54907.143970.101594@dolphin.mojam.com> Andrew> Perhaps it's really a documentation fix: the os module should Andrew> document only those features common to all of the big 3 Andrew> platforms (Unix, Windows, Mac), and have pointers to a section Andrew> for each of the platform-specific modules, listing the Andrew> platform-specific functions. Perhaps. Should that read ... the os module should *expose* only those features common to all of the big 3 platforms ... ? Skip From skip at mojam.com Wed Aug 18 17:54:11 1999 From: skip at mojam.com (Skip Montanaro) Date: Wed, 18 Aug 1999 10:54:11 -0500 (CDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <19990818153320.D61F6303120@snelboot.oratrix.nl> References: <199908181447.JAA05151@dolphin.mojam.com> <19990818153320.D61F6303120@snelboot.oratrix.nl> Message-ID: <14266.54991.27912.12075@dolphin.mojam.com> >>>>> "Jack" == Jack Jansen writes: >> The proposal then is that importing the >> platform's module by name will import both the portable and non-portable >> interface elements. Importing the posix module will import just that >> portion of the interface that is truly portable across all platforms. Jack> There's one slight problem with this: when you use functionality that is Jack> partially portable, i.e. a call that is available on Windows and Unix but not Jack> on the Mac. Agreed. I'm not sure what to do there. Is the intersection of the common OS calls on Unix, Windows and Mac so small as to be useless (or are there some really gotta have functions not in the intersection because they are missing only on the Mac)? Skip From guido at CNRI.Reston.VA.US Wed Aug 18 18:16:27 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Wed, 18 Aug 1999 12:16:27 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: Your message of "Wed, 18 Aug 1999 10:52:20 CDT." <14266.54907.143970.101594@dolphin.mojam.com> References: <199908181447.JAA05151@dolphin.mojam.com> <199908181454.KAA07692@eric.cnri.reston.va.us> <14266.51743.904066.470431@dolphin.mojam.com> <14266.54194.715887.808096@amarok.cnri.reston.va.us> <14266.54907.143970.101594@dolphin.mojam.com> Message-ID: <199908181616.MAA07901@eric.cnri.reston.va.us> > ... the os module should *expose* only those features common to all of > the big 3 platforms ... Why? My experience has been that functionality that was thought to be Unix specific has gradually become available on other platforms, which makes it hard to decide in which module a function should be placed. The proper test for portability of a program is not whether it imports certain module names, but whether it uses certain functions from those modules (and whether it uses them in a portable fashion). As platforms evolve, a program that was previously thought to be non-portable might become more portable. --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov at inrialpes.fr Wed Aug 18 19:33:44 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Wed, 18 Aug 1999 18:33:44 +0100 (NFT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14266.54991.27912.12075@dolphin.mojam.com> from "Skip Montanaro" at "Aug 18, 99 10:54:11 am" Message-ID: <199908181733.SAA08434@pukapuka.inrialpes.fr> Everybody's right in this debate. I have to type a lot to express objectively my opinion, but better filter my reasoning and just say the conclusion. Having in mind: - what POSIX is - what an OS is - that an OS may or may not comply w/ the POSIX standard, and if it doesn't, it may do so in a couple of years (Windows 3K and PyOS come to mind ;-) - that the os module claims portability amongst the different OSes, mainly regarding their filesystem & process management services, hence it's exposing only a *subset* of the os specific services - the current state of Python It would be nice: - to leave the os module as a common denominator - to have a "unix" module (which could further incorporate the different brands of unix) - to have the posix module capture the fraction of posix functionality, exported from a particular OS specific module, and add the appropriate POSIX propaganda in the docs - to manage to do this, or argue what's wrong with the above -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal at lemburg.com Thu Aug 19 12:02:26 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 19 Aug 1999 12:02:26 +0200 Subject: [Python-Dev] Quick-and-dirty weak references References: <199908181312.OAA20542@pukapuka.inrialpes.fr> <37BAAC00.27A34FF7@lemburg.com> Message-ID: <37BBD632.3F66419C@lemburg.com> [about weak references and a sample implementation in mxProxy] With the help of Vladimir, I have solved the problem and uploaded a modified version of the prerelease: http://starship.skyport.net/~lemburg/mxProxy-pre0.2.0.zip The archive now also contains a precompiled Win32 PYD file for those on WinXX platforms. Please give it a try and tell me what you think. Cheers, -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 134 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jack at oratrix.nl Thu Aug 19 16:06:01 1999 From: jack at oratrix.nl (Jack Jansen) Date: Thu, 19 Aug 1999 16:06:01 +0200 Subject: [Python-Dev] Optimization idea Message-ID: <19990819140602.433BC303120@snelboot.oratrix.nl> I just had yet another idea for optimizing Python that looks so plausible that I guess someone else must have looked into it already (and, hence, probably rejected it:-): We add to the type structure a "type identifier" number, a small integer for the common types (int=1, float=2, string=3, etc) and 0 for everything else. When eval_code2 sees, for instance, a MULTIPLY operation it does something like the following: case BINARY_MULTIPLY: w = POP(); v = POP(); code = (BINARY_MULTIPLY << 8) | ((v->ob_type->tp_typeid) << 4) | ((w->ob_type->tp_typeid); x = (binopfuncs[code])(v, w); .... etc ... The idea is that all the 256 BINARY_MULTIPLY entries would be filled with PyNumber_Multiply, except for a few common cases. The int*int field could point straight to int_mul(), etc. Assuming the common cases are really more common than the uncommon cases the fact that they jump straight out to the implementation function in stead of mucking around in PyNumber_Multiply and PyNumber_Coerce should easily offset the added overhead of shifts, ors and indexing. Any thoughts? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From guido at CNRI.Reston.VA.US Thu Aug 19 16:05:28 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Thu, 19 Aug 1999 10:05:28 -0400 Subject: [Python-Dev] Localization expert needed Message-ID: <199908191405.KAA10401@eric.cnri.reston.va.us> My contact at HP is asking for expert advice on localization and multi-byte characters. I have little to share except pointing to Martin von Loewis and Pythonware. Does anyone on this list have a suggestion besides those? Don't hesitate to recommend yourself -- there's money in it! --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Wed, 18 Aug 1999 23:15:55 -0700 From: JOE_ELLSWORTH To: guido at CNRI.Reston.VA.US Subject: Localization efforts and state in Python. Hi Guido. Can you give me some references to The best references currently available for using Python in CGI applications when multi-byte localization is known to be needed? Who is the expert in this in the Python area? Can you recomend that they work with us in this area? Thanks, Joe E. ------- End of Forwarded Message From guido at CNRI.Reston.VA.US Thu Aug 19 16:15:28 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Thu, 19 Aug 1999 10:15:28 -0400 Subject: [Python-Dev] Optimization idea In-Reply-To: Your message of "Thu, 19 Aug 1999 16:06:01 +0200." <19990819140602.433BC303120@snelboot.oratrix.nl> References: <19990819140602.433BC303120@snelboot.oratrix.nl> Message-ID: <199908191415.KAA10432@eric.cnri.reston.va.us> > I just had yet another idea for optimizing Python that looks so > plausible that I guess someone else must have looked into it already > (and, hence, probably rejected it:-): > > We add to the type structure a "type identifier" number, a small integer for > the common types (int=1, float=2, string=3, etc) and 0 for everything else. > > When eval_code2 sees, for instance, a MULTIPLY operation it does something > like the following: > case BINARY_MULTIPLY: > w = POP(); > v = POP(); > code = (BINARY_MULTIPLY << 8) | > ((v->ob_type->tp_typeid) << 4) | > ((w->ob_type->tp_typeid); > x = (binopfuncs[code])(v, w); > .... etc ... > > The idea is that all the 256 BINARY_MULTIPLY entries would be filled with > PyNumber_Multiply, except for a few common cases. The int*int field could > point straight to int_mul(), etc. > > Assuming the common cases are really more common than the uncommon cases the > fact that they jump straight out to the implementation function in stead of > mucking around in PyNumber_Multiply and PyNumber_Coerce should easily offset > the added overhead of shifts, ors and indexing. You're assuming that arithmetic operations are a major time sink. I doubt that; much of my code contains hardly any arithmetic these days. Of course, if you *do* have a piece of code that does a lot of basic arithmetic, it might pay off -- but even then I would guess that the majority of opcodes are things like list accessors and variable. But we needn't speculate. It's easy enough to measure the speedup: you can use tp_xxx5 in the type structure and plug a typecode into it for the int and float types. (Note that you would need a separate table of binopfuncs per operator.) --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov at inrialpes.fr Thu Aug 19 21:09:26 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Thu, 19 Aug 1999 20:09:26 +0100 (NFT) Subject: [Python-Dev] about line numbers Message-ID: <199908191909.UAA20618@pukapuka.inrialpes.fr> [Tim, in an earlier msg] > > Would be more valuable to rethink the debugger's breakpoint approach so that > SET_LINENO is never needed (line-triggered callbacks are expensive because > called so frequently, turning each dynamic SET_LINENO into a full-blown > Python call; Ok. In the meantime I think that folding the redundant SET_LINENO doesn't hurt. I ended up with a patchlet that seems to have no side effects, that updates the lnotab as it should and that even makes pdb a bit more clever, IMHO. Consider an extreme case for the function f (listed below). Currently, we get the following: ------------------------------------------- >>> from test import f >>> import dis, pdb >>> dis.dis(f) 0 SET_LINENO 1 3 SET_LINENO 2 6 SET_LINENO 3 9 SET_LINENO 4 12 SET_LINENO 5 15 LOAD_CONST 1 (1) 18 STORE_FAST 0 (a) 21 SET_LINENO 6 24 SET_LINENO 7 27 SET_LINENO 8 30 LOAD_CONST 2 (None) 33 RETURN_VALUE >>> pdb.runcall(f) > test.py(1)f() -> def f(): (Pdb) list 1, 20 1 -> def f(): 2 """Comment about f""" 3 """Another one""" 4 """A third one""" 5 a = 1 6 """Forth""" 7 "and pdb can set a breakpoint on this one (simple quotes)" 8 """but it's intelligent about triple quotes...""" [EOF] (Pdb) step > test.py(2)f() -> """Comment about f""" (Pdb) step > test.py(3)f() -> """Another one""" (Pdb) step > test.py(4)f() -> """A third one""" (Pdb) step > test.py(5)f() -> a = 1 (Pdb) step > test.py(6)f() -> """Forth""" (Pdb) step > test.py(7)f() -> "and pdb can set a breakpoint on this one (simple quotes)" (Pdb) step > test.py(8)f() -> """but it's intelligent about triple quotes...""" (Pdb) step --Return-- > test.py(8)f()->None -> """but it's intelligent about triple quotes...""" (Pdb) >>> ------------------------------------------- With folded SET_LINENO, we have this: ------------------------------------------- >>> from test import f >>> import dis, pdb >>> dis.dis(f) 0 SET_LINENO 5 3 LOAD_CONST 1 (1) 6 STORE_FAST 0 (a) 9 SET_LINENO 8 12 LOAD_CONST 2 (None) 15 RETURN_VALUE >>> pdb.runcall(f) > test.py(5)f() -> a = 1 (Pdb) list 1, 20 1 def f(): 2 """Comment about f""" 3 """Another one""" 4 """A third one""" 5 -> a = 1 6 """Forth""" 7 "and pdb can set a breakpoint on this one (simple quotes)" 8 """but it's intelligent about triple quotes...""" [EOF] (Pdb) break 7 Breakpoint 1 at test.py:7 (Pdb) break 8 *** Blank or comment (Pdb) step > test.py(8)f() -> """but it's intelligent about triple quotes...""" (Pdb) step --Return-- > test.py(8)f()->None -> """but it's intelligent about triple quotes...""" (Pdb) >>> ------------------------------------------- i.e, pdb stops at (points to) the first real instruction and doesn't step trough the doc strings. Or is there something I'm missing here? -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 -------------------------------[ cut here ]--------------------------- *** compile.c-orig Thu Aug 19 19:27:13 1999 --- compile.c Thu Aug 19 19:00:31 1999 *************** *** 615,620 **** --- 615,623 ---- int arg; { if (op == SET_LINENO) { + if (!Py_OptimizeFlag && c->c_last_addr == c->c_nexti - 3) + /* Hack for folding several SET_LINENO in a row. */ + c->c_nexti -= 3; com_set_lineno(c, arg); if (Py_OptimizeFlag) return; From guido at CNRI.Reston.VA.US Thu Aug 19 23:10:33 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Thu, 19 Aug 1999 17:10:33 -0400 Subject: [Python-Dev] about line numbers In-Reply-To: Your message of "Thu, 19 Aug 1999 20:09:26 BST." <199908191909.UAA20618@pukapuka.inrialpes.fr> References: <199908191909.UAA20618@pukapuka.inrialpes.fr> Message-ID: <199908192110.RAA12755@eric.cnri.reston.va.us> Earlier, you argued that this is "not an optimization," but rather avoiding redundancy. I should have responded right then that I disagree, or at least I'm lukewarm about your patch. Either you're not using -O, and then you don't care much about this; or you care, and then you should be using -O. Rather than encrusting the code with more and more ad-hoc micro optimizations, I'd prefer to have someone look into Tim's suggestion of supporting more efficient breakpoints... --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov at inrialpes.fr Fri Aug 20 14:45:46 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 20 Aug 1999 13:45:46 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <199908192110.RAA12755@eric.cnri.reston.va.us> from "Guido van Rossum" at "Aug 19, 99 05:10:33 pm" Message-ID: <199908201245.NAA27098@pukapuka.inrialpes.fr> Guido van Rossum wrote: > > Earlier, you argued that this is "not an optimization," but rather > avoiding redundancy. I haven't argued so much; I asked whether this would be reasonable. Probably I should have said that I don't see the purpose of emitting SET_LINENO instructions for those nodes for which the compiler generates no code, mainly because (as I learned subsequently) SET_LINENO serve no other purpose but debugging. As I haven't payed much attention to this aspect of the code, I thought thay they might still be used for tracebacks. But I couldn't have said that because I didn't know it. > I should have responded right then that I disagree, ... Although I agree this is a minor issue, I'm interested in your argument here, if it's something else than the dialectic: "we're more interested in long term improvements" which is also my opinion. > ... or at least I'm lukewarm about your patch. No surprise here :-) But I haven't found another way of not generating SET_LINENO for doc strings other than backpatching. > Either you're > not using -O, and then you don't care much about this; or you care, > and then you should be using -O. Neither of those. I don't really care, frankly. I was just intrigued by the consecutive SET_LINENO in my disassemblies, so I started to think and ask questions about it. > > Rather than encrusting the code with more and more ad-hoc micro > optimizations, I'd prefer to have someone look into Tim's suggestion > of supporting more efficient breakpoints... This is *the* real issue with the real potential solution. I'm willing to have a look at this (although I don't know pdb/bdb in its finest details). All suggestions and thoughts are welcome. We would probably leave the SET_LINENO opcode as is and (eventually) introduce a new opcode (instead of transforming/renaming it) for compatibility reasons, methinks. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From gmcm at hypernet.com Fri Aug 20 18:04:22 1999 From: gmcm at hypernet.com (Gordon McMillan) Date: Fri, 20 Aug 1999 11:04:22 -0500 Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: <19990818110213.A558F303120@snelboot.oratrix.nl> References: Message by "M.-A. Lemburg" , Wed, 18 Aug 1999 11:02:02 +0200 , <37BA768A.50DF5574@lemburg.com> Message-ID: <1276961301-70195@hypernet.com> In reply to no one in particular: I've often wished that the instance type object had an (optimized) __decref__ slot. With nothing but hand-waving to support it, I'll claim that would enable all these games. - Gordon From gmcm at hypernet.com Fri Aug 20 18:04:22 1999 From: gmcm at hypernet.com (Gordon McMillan) Date: Fri, 20 Aug 1999 11:04:22 -0500 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/ In-Reply-To: <19990818153320.D61F6303120@snelboot.oratrix.nl> References: Message by Skip Montanaro , Wed, 18 Aug 1999 09:47:23 -0500 , <199908181447.JAA05151@dolphin.mojam.com> Message-ID: <1276961295-70552@hypernet.com> Jack Jansen wrote: > There's one slight problem with this: when you use functionality > that is partially portable, i.e. a call that is available on Windows > and Unix but not on the Mac. It gets worse, I think. How about the inconsistencies in POSIX support among *nixes? How about NT being a superset of Win9x? How about NTFS having capabilities that FAT does not? I'd guess there are inconsistencies between Mac flavors, too. The Java approach (if you can't do it everywhere, you can't do it) sucks. In some cases you could probably have the missing functionality (in os) fail silently, but in other cases that would be a disaster. "Least-worst"-is-not-necessarily-"good"-ly y'rs - Gordon From tismer at appliedbiometrics.com Fri Aug 20 17:05:47 1999 From: tismer at appliedbiometrics.com (Christian Tismer) Date: Fri, 20 Aug 1999 17:05:47 +0200 Subject: [Python-Dev] about line numbers References: <199908191909.UAA20618@pukapuka.inrialpes.fr> <199908192110.RAA12755@eric.cnri.reston.va.us> Message-ID: <37BD6ECB.9DD17460@appliedbiometrics.com> Guido van Rossum wrote: > > Earlier, you argued that this is "not an optimization," but rather > avoiding redundancy. I should have responded right then that I > disagree, or at least I'm lukewarm about your patch. Either you're > not using -O, and then you don't care much about this; or you care, > and then you should be using -O. > > Rather than encrusting the code with more and more ad-hoc micro > optimizations, I'd prefer to have someone look into Tim's suggestion > of supporting more efficient breakpoints... I didn't think of this before, but I just realized that I have something like that already in Stackless Python. It is possible to set a breakpoint at every opcode, for every frame. Adding an extra opcode for breakpoints is a good thing as well. The former are good for tracing, conditionla breakpoints and such, and cost a little more time since the is always one extra function call. The latter would be a quick, less versatile thing. The implementation of inserting extra breakpoint opcodes for running code turns out to be easy to implement, if the running frame gets a local extra copy of its code object, with the breakpoints replacing the original opcodes. The breakpoint handler would then simply look into the original code object. Inserting breakpoints on the source level gives us breakpoints per procedure. Doing it in a running frame gives "instance" level debugging of code. Checking a monitor function on every opcode is slightly more expensive but most general. We can have it all, what do you think. I'm going to finish and publish the stackless/continous package and submit a paper by end of September. Should I include this debugging feature? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From guido at CNRI.Reston.VA.US Fri Aug 20 17:09:32 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Fri, 20 Aug 1999 11:09:32 -0400 Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: Your message of "Fri, 20 Aug 1999 11:04:22 CDT." <1276961301-70195@hypernet.com> References: Message by "M.-A. Lemburg" , Wed, 18 Aug 1999 11:02:02 +0200 , <37BA768A.50DF5574@lemburg.com> <1276961301-70195@hypernet.com> Message-ID: <199908201509.LAA14726@eric.cnri.reston.va.us> > In reply to no one in particular: > > I've often wished that the instance type object had an (optimized) > __decref__ slot. With nothing but hand-waving to support it, I'll > claim that would enable all these games. Without context, I don't know when this would be called. If you want this called on all DECREFs (regardless of the refcount value), realize that this is a huge slowdown because it would mean the DECREF macro has to inspect the type object, which means several indirections. This would slow down *every* DECREF operation, not just those on instances with a __decref__ slot, because the DECREF macro doesn't know the type of the object! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at CNRI.Reston.VA.US Fri Aug 20 17:13:16 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Fri, 20 Aug 1999 11:13:16 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/ In-Reply-To: Your message of "Fri, 20 Aug 1999 11:04:22 CDT." <1276961295-70552@hypernet.com> References: Message by Skip Montanaro , Wed, 18 Aug 1999 09:47:23 -0500 , <199908181447.JAA05151@dolphin.mojam.com> <1276961295-70552@hypernet.com> Message-ID: <199908201513.LAA14741@eric.cnri.reston.va.us> From: "Gordon McMillan" > Jack Jansen wrote: > > > There's one slight problem with this: when you use functionality > > that is partially portable, i.e. a call that is available on Windows > > and Unix but not on the Mac. > > It gets worse, I think. How about the inconsistencies in POSIX > support among *nixes? How about NT being a superset of Win9x? How > about NTFS having capabilities that FAT does not? I'd guess there are > inconsistencies between Mac flavors, too. > > The Java approach (if you can't do it everywhere, you can't do it) > sucks. In some cases you could probably have the missing > functionality (in os) fail silently, but in other cases that would > be a disaster. The Python policy has always been "if it's available, there's a standard name and API for it; if it's not available, the function is not defined or will raise an exception; you can use hasattr(os, ...) or catch exceptions to cope if you can live without it." There are a few cases where unavailable calls are emulated, a few where they are made no-ops, and a few where they are made to raise an exception uncoditionally, but in most cases the function will simply not exist, so it's easy to test. --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov at inrialpes.fr Fri Aug 20 22:54:10 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 20 Aug 1999 21:54:10 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <37BD6ECB.9DD17460@appliedbiometrics.com> from "Christian Tismer" at "Aug 20, 99 05:05:47 pm" Message-ID: <199908202054.VAA26970@pukapuka.inrialpes.fr> I'll try to sketch here the scheme I'm thinking of for the callback/breakpoint issue (without SET_LINENO), although some technical details are still missing. I'm assuming the following, in this order: 1) No radical changes in the current behavior, i.e. preserve the current architecture / strategy as much as possible. 2) We dont have breakpoints per opcode, but per source line. For that matter, we have sys.settrace (and for now, we don't aim to have sys.settracei that would be called on every opcode, although we might want this in the future) 3) SET_LINENO disappear. Actually, SET_LINENO are conditional breakpoints, used for callbacks from C to Python. So the basic problem is to generate these callbacks. If any of the above is not an appropriate assumption and we want a radical change in the strategy of setting breakpoints/ generating callbacks, then this post is invalid. The solution I'm thinking of: a) Currently, we have a function PyCode_Addr2Line which computes the source line from the opcode's address. I hereby assume that we can write the reverse function PyCode_Line2Addr that returns the address from a given source line number. I don't have the implementation, but it should be doable. Furthermore, we can compute, having the co_lnotab table and co_firstlineno, the source line range for a code object. As a consequence, even with the dumbiest of all algorithms, by looping trough this source line range, we can enumerate with PyCode_Line2Addr the sequence of addresses for the source lines of this code object. b) As Chris pointed out, in case sys.settrace is defined, we can allocate and keep a copy of the original code string per frame. We can further dynamically overwrite the original code string with a new (internal, one byte) CALL_TRACE opcode at the addresses we have enumerated in a). The CALL_TRACE opcodes will trigger the callbacks from C to Python, just as the current SET_LINENO does. c) At execution time, whenever a CALL_TRACE opcode is reached, we trigger the callback and if it returns successfully, we'll fetch the original opcode for the current location from the copy of the original co_code. Then we directly jump to the arg fetch code (or in case we fetch the entire original opcode in CALL_TRACE - we jump to the dispatch code). Hmm. I think that's all. At the heart of this scheme is the PyCode_Line2Addr function, which is the only blob in my head, for now. Christian Tismer wrote: > > I didn't think of this before, but I just realized that > I have something like that already in Stackless Python. > It is possible to set a breakpoint at every opcode, for every > frame. Adding an extra opcode for breakpoints is a good thing > as well. The former are good for tracing, conditionla breakpoints > and such, and cost a little more time since the is always one extra > function call. The latter would be a quick, less versatile thing. I don't think I understand clearly the difference you're talking about, and why the one thing is better that the other, probably because I'm a bit far from stackless python. > I'm going to finish and publish the stackless/continous package > and submit a paper by end of September. Should I include this debugging > feature? Write the paper first, you have more than enough material to talk about already ;-). Then if you have time to implement some debugging support, you could always add another section, but it won't be a central point of your paper. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From guido at CNRI.Reston.VA.US Fri Aug 20 21:59:24 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Fri, 20 Aug 1999 15:59:24 -0400 Subject: [Python-Dev] about line numbers In-Reply-To: Your message of "Fri, 20 Aug 1999 21:54:10 BST." <199908202054.VAA26970@pukapuka.inrialpes.fr> References: <199908202054.VAA26970@pukapuka.inrialpes.fr> Message-ID: <199908201959.PAA16105@eric.cnri.reston.va.us> > I'll try to sketch here the scheme I'm thinking of for the > callback/breakpoint issue (without SET_LINENO), although some > technical details are still missing. > > I'm assuming the following, in this order: > > 1) No radical changes in the current behavior, i.e. preserve the > current architecture / strategy as much as possible. > > 2) We dont have breakpoints per opcode, but per source line. For that > matter, we have sys.settrace (and for now, we don't aim to have > sys.settracei that would be called on every opcode, although we might > want this in the future) > > 3) SET_LINENO disappear. Actually, SET_LINENO are conditional breakpoints, > used for callbacks from C to Python. So the basic problem is to generate > these callbacks. They used to be the only mechanism by which the traceback code knew the current line number (long before the debugger hooks existed), but with the lnotab, that's no longer necessary. > If any of the above is not an appropriate assumption and we want a radical > change in the strategy of setting breakpoints/ generating callbacks, then > this post is invalid. Sounds reasonable. > The solution I'm thinking of: > > a) Currently, we have a function PyCode_Addr2Line which computes the source > line from the opcode's address. I hereby assume that we can write the > reverse function PyCode_Line2Addr that returns the address from a given > source line number. I don't have the implementation, but it should be > doable. Furthermore, we can compute, having the co_lnotab table and > co_firstlineno, the source line range for a code object. > > As a consequence, even with the dumbiest of all algorithms, by looping > trough this source line range, we can enumerate with PyCode_Line2Addr > the sequence of addresses for the source lines of this code object. > > b) As Chris pointed out, in case sys.settrace is defined, we can allocate > and keep a copy of the original code string per frame. We can further > dynamically overwrite the original code string with a new (internal, > one byte) CALL_TRACE opcode at the addresses we have enumerated in a). > > The CALL_TRACE opcodes will trigger the callbacks from C to Python, > just as the current SET_LINENO does. > > c) At execution time, whenever a CALL_TRACE opcode is reached, we trigger > the callback and if it returns successfully, we'll fetch the original > opcode for the current location from the copy of the original co_code. > Then we directly jump to the arg fetch code (or in case we fetch the > entire original opcode in CALL_TRACE - we jump to the dispatch code). Tricky, but doable. > Hmm. I think that's all. > > At the heart of this scheme is the PyCode_Line2Addr function, which is > the only blob in my head, for now. I'm pretty sure that this would be straightforward. I'm a little anxious about modifying the code, and was thinking myself of a way to specify a bitvector of addresses where to break. But that would still cause some overhead for code without breakpoints, so I guess you're right (and it's certainly a long-standing tradition in breakpoint-setting!) --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov at inrialpes.fr Fri Aug 20 23:22:12 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 20 Aug 1999 22:22:12 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <199908201959.PAA16105@eric.cnri.reston.va.us> from "Guido van Rossum" at "Aug 20, 99 03:59:24 pm" Message-ID: <199908202122.WAA26956@pukapuka.inrialpes.fr> Guido van Rossum wrote: > > > I'm a little anxious about modifying the code, and was thinking myself > of a way to specify a bitvector of addresses where to break. But that > would still cause some overhead for code without breakpoints, so I > guess you're right (and it's certainly a long-standing tradition in > breakpoint-setting!) > Hm. You're probably right, especially if someone wants to inspect a code object from the debugger or something. But I belive, that we can manage to redirect the instruction pointer in the beginning of eval_code2 to the *copy* of co_code, and modify the copy with CALL_TRACE, preserving the original intact. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From skip at mojam.com Fri Aug 20 22:25:25 1999 From: skip at mojam.com (Skip Montanaro) Date: Fri, 20 Aug 1999 15:25:25 -0500 (CDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/ In-Reply-To: <1276961295-70552@hypernet.com> References: <199908181447.JAA05151@dolphin.mojam.com> <19990818153320.D61F6303120@snelboot.oratrix.nl> <1276961295-70552@hypernet.com> Message-ID: <14269.47443.192469.525132@dolphin.mojam.com> Gordon> It gets worse, I think. How about the inconsistencies in POSIX Gordon> support among *nixes? How about NT being a superset of Win9x? Gordon> How about NTFS having capabilities that FAT does not? I'd guess Gordon> there are inconsistencies between Mac flavors, too. To a certain degree I think the C module(s) that interface to the underlying OS's API can iron out differences. In other cases you may have to document minor (known) differences. In still other cases you may have to relegate particular functionality to the OS-dependent modules. Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/~skip/ 847-971-7098 From gmcm at hypernet.com Sat Aug 21 00:38:14 1999 From: gmcm at hypernet.com (Gordon McMillan) Date: Fri, 20 Aug 1999 17:38:14 -0500 Subject: [Python-Dev] Quick-and-dirty weak references In-Reply-To: <199908201509.LAA14726@eric.cnri.reston.va.us> References: Your message of "Fri, 20 Aug 1999 11:04:22 CDT." <1276961301-70195@hypernet.com> Message-ID: <1276937670-1491544@hypernet.com> [me] > > > > I've often wished that the instance type object had an (optimized) > > __decref__ slot. With nothing but hand-waving to support it, I'll > > claim that would enable all these games. [Guido] > Without context, I don't know when this would be called. If you > want this called on all DECREFs (regardless of the refcount value), > realize that this is a huge slowdown because it would mean the > DECREF macro has to inspect the type object, which means several > indirections. This would slow down *every* DECREF operation, not > just those on instances with a __decref__ slot, because the DECREF > macro doesn't know the type of the object! This was more 2.0-ish speculation, and really thinking of classic C++ ref counting where decref would be a function call, not a macro. Still a slowdown, of course, but not quite so massive. The upside is opening up all kinds of tricks at the type object and user class levels, (such as weak refs and copy on write etc). Worth it? I'd think so, but I'm not a speed demon. - Gordon From tim_one at email.msn.com Sat Aug 21 10:09:17 1999 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 21 Aug 1999 04:09:17 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14266.51743.904066.470431@dolphin.mojam.com> Message-ID: <000201beebac$776d32e0$0c2d2399@tim> [Skip Montanaro] > ... > 3. If Dan Connolly's contention is correct, importing the os module > today is not all that portable. I can't really say one way or the > other, because I'm lucky enough to be able to confine my serious > programming to Unix. I'm sure there's someone out there that > can try the following on a few platforms: > > import os > dir(os) > > and compare the output. There's no need to, Skip. Just read the os module docs; where a function says, e.g., "Availability: Unix.", it doesn't show up on a Windows or Mac box. In that sense using (some) os functions is certainly unportable. But I have no sympathy for the phrasing of Dan's complaint: if he calls os.getegid(), *he* knows perfectly well that's a Unix-specific function, and expressing outrage over it not working on NT is disingenuous. OTOH, I don't think you're going to find anything in the OS module documented as available only on Windows or only on Macs, and some semi-portable functions (notoriosly chmod) are documented in ways that make sense only to Unixheads. This certainly gives a strong impression of Unix-centricity to non-Unix weenies, and has got to baffle true newbies completely. So 'twould be nice to have a basic os module all of whose functions "run everywhere", whose interfaces aren't copies of cryptic old Unixisms, and whose docs are platform neutral. If Guido is right that the os functions tend to get more portable over time, fine, that module can grow over time too. In the meantime, life would be easier for everyone except Python's implementers. From Vladimir.Marangozov at inrialpes.fr Sat Aug 21 17:34:32 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Sat, 21 Aug 1999 16:34:32 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <199908202122.WAA26956@pukapuka.inrialpes.fr> from "Vladimir Marangozov" at "Aug 20, 99 10:22:12 pm" Message-ID: <199908211534.QAA22392@pukapuka.inrialpes.fr> [me] > > Guido van Rossum wrote: > > > > > > I'm a little anxious about modifying the code, and was thinking myself > > of a way to specify a bitvector of addresses where to break. But that > > would still cause some overhead for code without breakpoints, so I > > guess you're right (and it's certainly a long-standing tradition in > > breakpoint-setting!) > > > > Hm. You're probably right, especially if someone wants to inspect > a code object from the debugger or something. But I belive, that > we can manage to redirect the instruction pointer in the beginning > of eval_code2 to the *copy* of co_code, and modify the copy with > CALL_TRACE, preserving the original intact. > I wrote a very rough first implementation of this idea. The files are at: http://sirac.inrialpes.fr/~marangoz/python/lineno/ Basically, what I did is: 1) what I said :-) 2) No more SET_LINENO 3) In tracing mode, a copy of the original code is put in an additional slot (co_tracecode) of the code object. Then it's overwritten with CALL_TRACE opcodes at the locations returned by PyCode_Line2Addr. The VM is routed to execute this code, and not the original one. 4) When tracing is off (i.e. sys_tracefunc is NULL) the VM fallbacks to normal execution of the original code. A couple of things that need finalization: a) how to deallocate the modified code string when tracing is off b) the value of CALL_TRACE (I almost randomly picked 76) c) I don't handle the cases where sys_tracefunc is enabled or disabled within the same code object. Tracing or not is determined before the main loop. d) update pdb, so that it does not allow setting breakpoints on lines with no code. To achieve this, I think that python versions of PyCode_Addr2Line & PyCode_Line2Addr have to be integrated into pdb as helper functions. e) correct bugs and design flaws f) something else? And here's the sample session of my lousy function f with this 'proof of concept' code: >>> from test import f >>> import dis, pdb >>> dis.dis(f) 0 LOAD_CONST 1 (1) 3 STORE_FAST 0 (a) 6 LOAD_CONST 2 (None) 9 RETURN_VALUE >>> pdb.runcall(f) > test.py(5)f() -> a = 1 (Pdb) list 1, 10 1 def f(): 2 """Comment about f""" 3 """Another one""" 4 """A third one""" 5 -> a = 1 6 """Forth""" 7 "and pdb can set a breakpoint on this one (simple quotes)" 8 """but it's intelligent about triple quotes...""" [EOF] (Pdb) step > test.py(8)f() -> """but it's intelligent about triple quotes...""" (Pdb) step --Return-- > test.py(8)f()->None -> """but it's intelligent about triple quotes...""" (Pdb) >>> -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tismer at appliedbiometrics.com Sat Aug 21 19:10:50 1999 From: tismer at appliedbiometrics.com (Christian Tismer) Date: Sat, 21 Aug 1999 19:10:50 +0200 Subject: [Python-Dev] about line numbers References: <199908211534.QAA22392@pukapuka.inrialpes.fr> Message-ID: <37BEDD9A.DBA817B1@appliedbiometrics.com> Vladimir Marangozov wrote: ... > I wrote a very rough first implementation of this idea. The files are at: > > http://sirac.inrialpes.fr/~marangoz/python/lineno/ > > Basically, what I did is: > > 1) what I said :-) > 2) No more SET_LINENO > 3) In tracing mode, a copy of the original code is put in an additional > slot (co_tracecode) of the code object. Then it's overwritten with > CALL_TRACE opcodes at the locations returned by PyCode_Line2Addr. I'd rather keep the original code object as it is, create a copy with inserted breakpoints and put that into the frame slot. Pointing back to the original from there. Then I'd redirect the code from the CALL_TRACE opcode completely to a user-defined function. Getting rid of the extra code object would be done by this function when tracing is off. It also vanishes automatically when the frame is released. > a) how to deallocate the modified code string when tracing is off By making the copy a frame property which is temporary, I think. Or, if tracing should work for all frames, by pushing the original in the back of the modified. Both works. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From Vladimir.Marangozov at inrialpes.fr Sat Aug 21 23:40:05 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Sat, 21 Aug 1999 22:40:05 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <37BEDD9A.DBA817B1@appliedbiometrics.com> from "Christian Tismer" at "Aug 21, 99 07:10:50 pm" Message-ID: <199908212140.WAA51054@pukapuka.inrialpes.fr> Chris, could you please repeat that step by step in more detail? I'm not sure I understand your suggestions. Christian Tismer wrote: > > Vladimir Marangozov wrote: > ... > > I wrote a very rough first implementation of this idea. The files are at: > > > > http://sirac.inrialpes.fr/~marangoz/python/lineno/ > > > > Basically, what I did is: > > > > 1) what I said :-) > > 2) No more SET_LINENO > > 3) In tracing mode, a copy of the original code is put in an additional > > slot (co_tracecode) of the code object. Then it's overwritten with > > CALL_TRACE opcodes at the locations returned by PyCode_Line2Addr. > > I'd rather keep the original code object as it is, create a copy > with inserted breakpoints and put that into the frame slot. You seem to suggest to duplicate the entire code object, right? And reference the modified duplicata from the current frame? I actually duplicate only the opcode string (that is, the co_code string object) and I don't see the point of duplicating the entire code object. Keeping a reference from the current frame makes sense, but won't it deallocate the modified version on every frame release (then redo all the code duplication work for every frame) ? > Pointing back to the original from there. I don't understand this. What points back where? > > Then I'd redirect the code from the CALL_TRACE opcode completely > to a user-defined function. What user-defined function? I don't understand that either... Except the sys_tracefunc, what other (user-defined) function do we have here? Is it a Python or a C function? > Getting rid of the extra code object would be done by this function > when tracing is off. How exactly? This seems to be obvious for you, but obviously, not for me ;-) > It also vanishes automatically when the frame is released. The function or the extra code object? > > > a) how to deallocate the modified code string when tracing is off > > By making the copy a frame property which is temporary, I think. I understood that the frame lifetime could be exploited "somehow"... > Or, if tracing should work for all frames, by pushing the original > in the back of the modified. Both works. Tracing is done for all frames, if sys_tracefunc is not NULL, which is a function that usually ends up in the f_trace slot. > > ciao - chris I'm confused. I didn't understand your idea. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tismer at appliedbiometrics.com Sat Aug 21 23:23:10 1999 From: tismer at appliedbiometrics.com (Christian Tismer) Date: Sat, 21 Aug 1999 23:23:10 +0200 Subject: [Python-Dev] about line numbers References: <199908212140.WAA51054@pukapuka.inrialpes.fr> Message-ID: <37BF18BE.B3D58836@appliedbiometrics.com> Vladimir Marangozov wrote: > > Chris, could you please repeat that step by step in more detail? > I'm not sure I understand your suggestions. I think I was too quick. I thought of copying the whole code object, of course. ... > > I'd rather keep the original code object as it is, create a copy > > with inserted breakpoints and put that into the frame slot. > > You seem to suggest to duplicate the entire code object, right? > And reference the modified duplicata from the current frame? Yes. > I actually duplicate only the opcode string (that is, the co_code string > object) and I don't see the point of duplicating the entire code object. > > Keeping a reference from the current frame makes sense, but won't it > deallocate the modified version on every frame release (then redo all the > code duplication work for every frame) ? You get two options by that. 1) permanently modifying one code object to be traceable is pushing a copy of the original "behind" by means of some co_back pointer. This keeps the patched one where the original was, and makes a global debugging version. 2) Creating a copy for one frame, and putting the original in to an co_back pointer. This gives debugging just for this one frame. ... > > Then I'd redirect the code from the CALL_TRACE opcode completely > > to a user-defined function. > > What user-defined function? I don't understand that either... > Except the sys_tracefunc, what other (user-defined) function do we have here? > Is it a Python or a C function? I would suggest a Python function, of course. > > Getting rid of the extra code object would be done by this function > > when tracing is off. > > How exactly? This seems to be obvious for you, but obviously, not for me ;-) If the permanent tracing "1)" is used, just restore the code object's contents from the original in co_back, and drop co_back. In the "2)" version, just pull the co_back into the frame's code pointer and loose the reference to the copy. Occours automatically on frame release. > > It also vanishes automatically when the frame is released. > > The function or the extra code object? The extra code object. ... > I'm confused. I didn't understand your idea. Forget it, it isn't more than another brain fart :-) ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From tim_one at email.msn.com Sun Aug 22 03:25:22 1999 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 21 Aug 1999 21:25:22 -0400 Subject: [Python-Dev] about line numbers In-Reply-To: <199908131347.OAA30740@pukapuka.inrialpes.fr> Message-ID: <000001beec3d$348f0160$cb2d2399@tim> [going back a week here, to dict resizing ...] [Vladimir Marangozov] > ... > All in all, for performance reasons, dicts remain an exception > to the rule of releasing memory ASAP. Yes, except I don't think there is such a rule! The actual rule is a balancing act between the cost of keeping memory around "just in case", and the expense of getting rid of it. Resizing a dict is extraordinarily expensive because the entire table needs to be rearranged, but lists make this tradeoff too (when you del a list element or list slice, it still goes thru NRESIZE, which still keeps space for as many as 100 "extra" elements around). The various internal caches for int and frame objects (etc) also play this sort of game; e.g., if I happen to have a million ints sitting around at some time, Python effectively assumes I'll never want to reuse that int storage for anything other than ints again. python-rarely-releases-memory-asap-ly y'rs - tim From Vladimir.Marangozov at inrialpes.fr Sun Aug 22 21:41:59 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Sun, 22 Aug 1999 20:41:59 +0100 (NFT) Subject: [Python-Dev] Memory (was: about line numbers, which was shrinking dicts) In-Reply-To: <000001beec3d$348f0160$cb2d2399@tim> from "Tim Peters" at "Aug 21, 99 09:25:22 pm" Message-ID: <199908221941.UAA54480@pukapuka.inrialpes.fr> Tim Peters wrote: > > [going back a week here, to dict resizing ...] Yes, and the subject line does not correspond to the contents because at the moment I've sent this message, I ran out of disk space and the mailer picked a random header after destroying half of the messages in this mailbox. > > [Vladimir Marangozov] > > ... > > All in all, for performance reasons, dicts remain an exception > > to the rule of releasing memory ASAP. > > Yes, except I don't think there is such a rule! The actual rule is a > balancing act between the cost of keeping memory around "just in case", and > the expense of getting rid of it. Good point. > > Resizing a dict is extraordinarily expensive because the entire table needs > to be rearranged, but lists make this tradeoff too (when you del a list > element or list slice, it still goes thru NRESIZE, which still keeps space > for as many as 100 "extra" elements around). > > The various internal caches for int and frame objects (etc) also play this > sort of game; e.g., if I happen to have a million ints sitting around at > some time, Python effectively assumes I'll never want to reuse that int > storage for anything other than ints again. > > python-rarely-releases-memory-asap-ly y'rs - tim Yes, and I'm somewhat sensible to this issue afer spending 6 years in a team which deals a lot with memory management (mainly DSM). In other words, you say that Python tolerates *virtual* memory fragmentation (a funny term :-). In the case of dicts and strings, we tolerate "internal fragmentation" (a contiguous chunk is allocated, then partially used). In the case of ints, floats or frames, we tolerate "external fragmentation". And as you said, Python tolerates this because of the speed/space tradeoff. Hopefully, all we deal with at this level is virtual memory, so even if you have zillions of ints, it's the OS VMM that will help you more with its long-term scheduling than Python's wild guesses about a hypothetical usage of zillions of ints later. I think that some OS concepts can give us hints on how to reduce our virtual fragmentation (which, as we all know, is a not a very good thing). A few kewords: compaction, segmentation, paging, sharing. We can't do much about our internal fragmentation, except changing the algorithms of dicts & strings (which is not appealing anyways). But it would be nice to think about the external fragmentation of Python's caches. Or even try to reduce the internal fragmentation in combination with the internal caches... BTW, this is the whole point of PyMalloc: in a virtual memory world, try to reduce the distance between the user view and the OS view on memory. PyMalloc addresses the fragmentation problem at a lower level of granularity than an OS (that is, *within* a page), because most Python's objects are very small. However, it can't handle efficiently large chunks like the int/float caches. Basically what it does is: segmentation of the virtual space and sharing of the cached free space. I think that Python could improve on sharing its internal caches, without significant slowdowns... The bottom line is that there's still plenty of room for exploring alternate mem mgt strategies that fit better Python's memory needs as a whole. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From jack at oratrix.nl Sun Aug 22 23:25:56 1999 From: jack at oratrix.nl (Jack Jansen) Date: Sun, 22 Aug 1999 23:25:56 +0200 Subject: [Python-Dev] Converting C objects to Python objects and back Message-ID: <19990822212601.2D4BE18BA0D@oratrix.oratrix.nl> Here's another siy idea, not having to do with optimization. On the Mac, and as far as I know on Windows as well, there are quite a few OS API structures that have a Python Object representation that is little more than the PyObject boilerplate plus a pointer to the C API object. (And, of course, lots of methods to operate on the object). To convert these from Python to C I always use boilerplate code like WindowPtr *win; PyArg_ParseTuple(args, "O&", PyWin_Convert, &win); where PyWin_Convert is the function that takes a PyObject * and a void **, does the typecheck and sets the pointer. A similar way is used to convert C pointers back to Python objects in Py_BuildValue. What I was thinking is that it would be nice (if you are _very_ careful) if this functionality was available in struct. So, if I would somehow obtain (in a Python string) a C structure that contained, say, a WindowPtr and two ints, I would be able to say win, x, y = struct.unpack("Ohh", Win.WindowType) and struct would be able, through the WindowType type object, to get at the PyWin_Convert and PyWin_New functions. A nice side issue is that you can add an option to PyArg_Parsetuple so you can say PyArg_ParseTuple(args, "O+", Win_WinObject, &win) and you don't have to remember the different names the various types use for their conversion routines. Is this worth pursuing is is it just too dangerous? And, if it is worth pursuing, I have to stash away the two function pointers somewhere in the TypeObject, should I grab one of the tp_xxx fields for this or is there a better place? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From fdrake at acm.org Mon Aug 23 16:54:07 1999 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 23 Aug 1999 10:54:07 -0400 (EDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <000201beebac$776d32e0$0c2d2399@tim> References: <14266.51743.904066.470431@dolphin.mojam.com> <000201beebac$776d32e0$0c2d2399@tim> Message-ID: <14273.24719.865520.797568@weyr.cnri.reston.va.us> Tim Peters writes: > OTOH, I don't think you're going to find anything in the OS module > documented as available only on Windows or only on Macs, and some Tim, Actually, the spawn*() functions are included in os and are documented as Windows-only, along with the related P_* constants. These are provided by the nt module. > everywhere", whose interfaces aren't copies of cryptic old Unixisms, and > whose docs are platform neutral. I'm alwasy glad to see documentation patches, or even pointers to specific problems. Being a Unix-weenie myself, making the documentation more readable to Windows-weenies can be difficult at times. But given useful pointers, I can usually pull it off, or at least drive someone who canto do so. ;-) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From tim_one at email.msn.com Tue Aug 24 08:32:49 1999 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 24 Aug 1999 02:32:49 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14273.24719.865520.797568@weyr.cnri.reston.va.us> Message-ID: <000701beedfa$7c5c8e40$902d2399@tim> [Fred L. Drake, Jr.] > Actually, the spawn*() functions are included in os and are > documented as Windows-only, along with the related P_* constants. > These are provided by the nt module. I stand corrected, Fred -- so how do the Unix dweebs like this Windows crap cluttering "their" docs ? [Tim, pitching a portable sane interface to a portable sane subset of os functionality] > I'm alwasy glad to see documentation patches, or even pointers to > specific problems. Being a Unix-weenie myself, making the > documentation more readable to Windows-weenies can be difficult at > times. But given useful pointers, I can usually pull it off, or at > least drive someone who canto do so. ;-) No, it's deeper than that. Some of the inherited Unix interfaces are flatly incomprehensible to anyone other than a Unix-head, but the functionality is supplied only in that form (docs may ease the pain, but the interfaces still suck); for example, mkdir (path[, mode]) Create a directory named path with numeric mode mode. The default mode is 0777 (octal). On some systems, mode is ignored. Where it is used, the current umask value is first masked out. Availability: Macintosh, Unix, Windows. If you have a sister or parent or 3-year-old child (they're all equivalent for this purpose ), just imagine them reading that. If you can't, I'll have my sister call you . Raw numeric permission modes, octal mode notation, and the "umask" business are Unix-specific -- and even Unices supply symbolic ways to specify permissions. chmod is likely the one I hear the most gripes about. Windows heads are looking to change "file attributes", the name "chmod" is gibberish to them, most of the Unix mode bits make no sense under Windows (& contra Guido's optimism, never will) even if you know the secret octal code, and Windows has several attributes (hidden bit, system bit, archive bit) chmod can't get at. The only portable functionality here is the write bit, but no non-Unix person could possibly guess either that chmod is the function they need, or what to type after someone tells them it's chmod. So this is less a doc issue than that more of os needs to become more like os.path (i.e., intelligently named functions with intelligently abstracted interfaces). never-grasped-what-ken-thompson-had-against-trailing-"e"s-ly y'rs - tim From skip at mojam.com Tue Aug 24 19:21:53 1999 From: skip at mojam.com (Skip Montanaro) Date: Tue, 24 Aug 1999 12:21:53 -0500 (CDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <000701beedfa$7c5c8e40$902d2399@tim> References: <14273.24719.865520.797568@weyr.cnri.reston.va.us> <000701beedfa$7c5c8e40$902d2399@tim> Message-ID: <14274.53860.210265.71990@dolphin.mojam.com> Tim> chmod is likely the one I hear the most gripes about. Windows Tim> heads are looking to change "file attributes", the name "chmod" is Tim> gibberish to them Well, we could confuse everyone and rename "chmod" to "chfat" (is that like file system liposuction?). Windows probably has an equivalent function whose name is 17 characters long which we'd all love to type, I'm sure. ;-) Tim> most of the Unix mode bits make no sense under Windows (& contra Tim> Guido's optimism, never will) even if you know the secret octal Tim> code ... It beats a secret handshake. Imagine all the extra peripherals we'd have to make available for everyone's computer. ;-) Tim> So this is less a doc issue than that more of os needs to become Tim> more like os.path (i.e., intelligently named functions with Tim> intelligently abstracted interfaces). Hasn't Guido's position been that the interface modules like os, posix, etc are just a thin layer over the underlying API (Guido: note how I cleverly attributed this position to you but also placed the responsibility for correctness on your head!)? If that's the case, perhaps we should provide a slightly higher level module that abstracts the file system as objects, and adopts a more user-friendly approach to the secret octal codes. Those of us worried about job security could continue to use the lower level module and leave the higher level interface for former Visual Basic programmers. Tim> never-grasped-what-ken-thompson-had-against-trailing-"e"s-ly y'rs - maybe-the-"e"-key-stuck-on-his-TTY-ly y'rs... Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/~skip/ 847-971-7098 | Python: Programming the way Guido indented... From fdrake at acm.org Tue Aug 24 20:21:44 1999 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 24 Aug 1999 14:21:44 -0400 (EDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14274.53860.210265.71990@dolphin.mojam.com> References: <14273.24719.865520.797568@weyr.cnri.reston.va.us> <000701beedfa$7c5c8e40$902d2399@tim> <14274.53860.210265.71990@dolphin.mojam.com> Message-ID: <14274.58040.138331.413958@weyr.cnri.reston.va.us> Skip Montanaro writes: > whose name is 17 characters long which we'd all love to type, I'm sure. ;-) Just 17? ;-) > Tim> So this is less a doc issue than that more of os needs to become > Tim> more like os.path (i.e., intelligently named functions with > Tim> intelligently abstracted interfaces). Sounds like some doc improvements can really help improve things, at least in the short term. > correctness on your head!)? If that's the case, perhaps we should provide a > slightly higher level module that abstracts the file system as objects, and > adopts a more user-friendly approach to the secret octal codes. Those of us I'm all for an object interface to a logical filesystem; having had to write just such a thing in Java not long ago, and we have a similar construct in Python (not by me, though), that we use in our Knowbot work. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From tim_one at email.msn.com Wed Aug 25 09:02:21 1999 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 25 Aug 1999 03:02:21 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14274.53860.210265.71990@dolphin.mojam.com> Message-ID: <000801beeec7$c6f06b20$fc2d153f@tim> [Skip Montanaro] > Well, we could confuse everyone and rename "chmod" to "chfat" ... I don't want to rename anything, nor do I want to use MS-specific names. chmod is both the wrong spelling & the wrong functionality for all non-Unix systems. os.path did a Good Thing by, e.g., introducing getmtime(), despite that everyone knows it's just os.stat()[8]. New isreadonly(path) and setreadonly(path) are more what I'm after; nothing beyond that is portable, & never will be. > Windows probably has an equivalent function whose name is 17 > characters long Indeed, SetFileAttributes is exactly 17 characters long (you moonlighting on NT, Skip?!). But while Windows geeks would like to use that, it's both the wrong spelling & the wrong functionality for all non-Windows systems. > ... > Hasn't Guido's position been that the interface modules like os, > posix, etc are just a thin layer over the underlying API (Guido: > note how I cleverly attributed this position to you but also placed > the responsibility for correctness on your head!)? If that's the > case, perhaps we should provide a slightly higher level module that > abstracts the file system as objects, and adopts a more user-friendly > approach to the secret octal codes. Like that, yes. > Those of us worried about job security could continue to use the > lower level module and leave the higher level interface for former > Visual Basic programmers. You're just *begging* Guido to make the Python2 os module take all of its names from the Win32 API . it's-no-lamer-to-be-ignorant-of-unix-names-than-it-is- to-be-ignorant-of-chinese-ly y'rs - tim From tim_one at email.msn.com Wed Aug 25 09:05:31 1999 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 25 Aug 1999 03:05:31 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14274.58040.138331.413958@weyr.cnri.reston.va.us> Message-ID: <000901beeec8$380d05c0$fc2d153f@tim> [Fred L. Drake, Jr.] > ... > I'm all for an object interface to a logical filesystem; having > had to write just such a thing in Java not long ago, and we have > a similar construct in Python (not by me, though), that we use in > our Knowbot work. Well, don't read anything unintended into this, but Guido *is* out of town, and you *do* have the power to check in code outside the doc subtree ... barry-will-help-he's-been-itching-to-revolt-too-ly y'rs - tim From bwarsaw at cnri.reston.va.us Wed Aug 25 13:20:16 1999 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 25 Aug 1999 07:20:16 -0400 (EDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart References: <14274.58040.138331.413958@weyr.cnri.reston.va.us> <000901beeec8$380d05c0$fc2d153f@tim> Message-ID: <14275.53616.585669.890621@anthem.cnri.reston.va.us> >>>>> "TP" == Tim Peters writes: TP> Well, don't read anything unintended into this, but Guido *is* TP> out of town, and you *do* have the power to check in code TP> outside the doc subtree ... TP> barry-will-help-he's-been-itching-to-revolt-too-ly y'rs I'll bring the pitchforks if you bring the torches! -Barry From skip at mojam.com Wed Aug 25 17:17:35 1999 From: skip at mojam.com (Skip Montanaro) Date: Wed, 25 Aug 1999 10:17:35 -0500 (CDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <000901beeec8$380d05c0$fc2d153f@tim> References: <14274.58040.138331.413958@weyr.cnri.reston.va.us> <000901beeec8$380d05c0$fc2d153f@tim> Message-ID: <14276.2229.983969.228891@dolphin.mojam.com> > I'm all for an object interface to a logical filesystem; having had to > write just such a thing in Java not long ago, and we have a similar > construct in Python (not by me, though), that we use in our Knowbot > work. Fred, Since this is the dev group, how about showing us the Knowbot's logical filesystem API, and let's do some dev-ing... Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/~skip/ 847-971-7098 | Python: Programming the way Guido indented... From fdrake at acm.org Wed Aug 25 18:22:52 1999 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 25 Aug 1999 12:22:52 -0400 (EDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <000801beeec7$c6f06b20$fc2d153f@tim> References: <14274.53860.210265.71990@dolphin.mojam.com> <000801beeec7$c6f06b20$fc2d153f@tim> Message-ID: <14276.6236.605103.369339@weyr.cnri.reston.va.us> Tim Peters writes: > os.path did a Good Thing by, e.g., introducing getmtime(), despite that > everyone knows it's just os.stat()[8]. New isreadonly(path) and > setreadonly(path) are more what I'm after; nothing beyond that is portable, Tim, I think we can simply declare that isreadonly() checks that the file doesn't allow the user to read it, but setreadonly() sounds to me like it wouldn't be portable to Unix. There's more than one (reasonable) way to make a file unreadable to a user just by manipulating permission bits, and which is best will vary according to both the user and the file's existing permissions. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake at acm.org Wed Aug 25 18:26:25 1999 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 25 Aug 1999 12:26:25 -0400 (EDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <000901beeec8$380d05c0$fc2d153f@tim> References: <14274.58040.138331.413958@weyr.cnri.reston.va.us> <000901beeec8$380d05c0$fc2d153f@tim> Message-ID: <14276.6449.428851.402955@weyr.cnri.reston.va.us> Tim Peters writes: > Well, don't read anything unintended into this, but Guido *is* out > of town, and you *do* have the power to check in code outside the > doc subtree ... Good thing I turned of the python-checkins list when I added the curly bracket patch I've been working on! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake at acm.org Wed Aug 25 20:46:30 1999 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 25 Aug 1999 14:46:30 -0400 (EDT) Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14276.2229.983969.228891@dolphin.mojam.com> References: <14274.58040.138331.413958@weyr.cnri.reston.va.us> <000901beeec8$380d05c0$fc2d153f@tim> <14276.2229.983969.228891@dolphin.mojam.com> Message-ID: <14276.14854.366220.664463@weyr.cnri.reston.va.us> Skip Montanaro writes: > Since this is the dev group, how about showing us the Knowbot's logical > filesystem API, and let's do some dev-ing... Well, I took a look at it, and I must confess it's just not really different from the set of interfaces in the os module; the important point is that they are methods instead of functions (other than a few data items: sep, pardir, curdir). The path attribute provided the same interface as os.path. Its only user-visible state is the current-directory setting, which may or may not be that useful. We left off chmod(), which would make Tim happy, but that was only because it wasn't meaningful in context. We'd have to add it (or something equivalent) for a general purpose filesystem object. So Tim's only happy if he can come up with a general interface that is actually portable (consider my earlier comments on setreadonly()). On the other hand, you don't need chmod() or anything like it for most situations where a filesystem object would be useful. An FTPFilesystem class would not be hard to write! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From jack at oratrix.nl Wed Aug 25 23:43:16 1999 From: jack at oratrix.nl (Jack Jansen) Date: Wed, 25 Aug 1999 23:43:16 +0200 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: Message by "Fred L. Drake, Jr." , Wed, 25 Aug 1999 12:22:52 -0400 (EDT) , <14276.6236.605103.369339@weyr.cnri.reston.va.us> Message-ID: <19990825214321.D50AD18BA0F@oratrix.oratrix.nl> But in Python, with its nice high-level datastructures, couldn't we design the Mother Of All File Attribute Calls, which would optionally map functionality from one platform to another? As an example consider the Mac resource fork size. If on unix I did fattrs = os.getfileattributes(filename) rfsize = fattrs.get('resourceforksize') it would raise an exception. If, however, I did rfsize = fattrs.get('resourceforksize', compat=1) I would get a "close approximation", 0. Note that you want some sort of a compat parameter, not a default value, as for some attributes (the various atime/mtime/ctimes, permission bits, etc) you'd get a default based on other file attributes that do exist on the current platform. Hmm, the file-attribute-object idea has the added advantage that you can then use setfileattributes(filename, fattrs) to be sure that you've copied all relevant attributes, independent of the platform you're on. Mapping permissions takes a bit more (design-) work, with unix having user/group/other only and Windows having full-fledged ACLs (or nothing at all, depending how you look at it:-), but should also be doable. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From Vladimir.Marangozov at inrialpes.fr Thu Aug 26 08:10:01 1999 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Thu, 26 Aug 1999 07:10:01 +0100 (NFT) Subject: [Python-Dev] about line numbers In-Reply-To: <199908211534.QAA22392@pukapuka.inrialpes.fr> from "Vladimir Marangozov" at "Aug 21, 99 04:34:32 pm" Message-ID: <199908260610.HAA20304@pukapuka.inrialpes.fr> [me, dropping SET_LINENO] > > I wrote a very rough first implementation of this idea. The files are at: > > http://sirac.inrialpes.fr/~marangoz/python/lineno/ > > ... > > A couple of things that need finalization: > > ... An updated version is available at the same location. I think that this one does The Right Thing (tm). a) Everything is internal to the VM and totally hidden, as it should be. b) No modifications of the code and frame objects (no additional slots) c) The modified code string (used for tracing) is allocated dynamically when the 1st frame pointing to its original switches in trace mode, and is deallocated automatically when the last frame pointing to its original dies. I feel better with this code so I can stop thinking about it and move on :-) (leaving it to your appreciation). What's next? File attributes? ;-) It's not easy to weight what kind of common interface would be easy to grasp, intuitive and unambiguous for the average user. I think that the people on this list (being core developers) are more or less biased here (I'd say more than less). Perhaps some input from the community (c.l.py) would help? -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tim_one at email.msn.com Thu Aug 26 07:06:57 1999 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 26 Aug 1999 01:06:57 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14276.14854.366220.664463@weyr.cnri.reston.va.us> Message-ID: <000301beef80$d26158c0$522d153f@tim> [Fred L. Drake, Jr.] > ... > We left off chmod(), which would make Tim happy, but that was only > because it wasn't meaningful in context. I'd be appalled to see chmod go away; for many people it's comfortable and useful. I want *another* way, to do what little bit is portable in a way that doesn't require first mastering a badly designed interface from a dying OS . > We'd have to add it (or something equivalent) for a general purpose > filesystem object. So Tim's only happy if he can come up with a > general interface that is actually portable (consider my earlier > comments on setreadonly()). I don't care about general here; making up a general new way to spell everything that everyone may want to do under every OS would create an interface even worse than chmod's. My sister doesn't want to create files that are read-only to the world but writable to her group -- she just wants to mark certain precious files as read-only to minimize the chance of accidental destruction. What she wants is easy to do under Windows or Unix, and I expect she's the norm rather than the exception. > On the other hand, you don't need chmod() or anything like it for > most situations where a filesystem object would be useful. An > FTPFilesystem class would not be hard to write! An OO filesystem object with a .makereadonly method suits me fine . From tim_one at email.msn.com Thu Aug 26 07:06:54 1999 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 26 Aug 1999 01:06:54 -0400 Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart In-Reply-To: <14276.6236.605103.369339@weyr.cnri.reston.va.us> Message-ID: <000201beef80$d072f640$522d153f@tim> [Fred L. Drake, Jr.] > I think we can simply declare that isreadonly() checks that the > file doesn't allow the user to read it, Had more in mind that the file doesn't allow the user to write it . > but setreadonly() sounds to me like it wouldn't be portable to Unix. > There's more than one (reasonable) way to make a file unreadable to > a user just by manipulating permission bits, and which is best will > vary according to both the user and the file's existing permissions. "Portable" implies least common denominator, and the plain meaning of read-only is that nobody (whether owner, group or world in Unix) has write permission. People wanting something beyond that are going beyond what's portable, and that's fine -- I'm not suggesting getting rid of chmod for Unix dweebs. But by the same token, Windows dweebs should get some other (as non-portable as chmod) way to fiddle the bits important on *their* OS (only one of which chmod can affect). Billions of newbies will delightedly stick to the portable interface with the name that makes sense. the-percentage-of-programmers-doing-systems-programming-shrinks-by- the-millisecond-ly y'rs - tim From mal at lemburg.com Sat Aug 28 16:37:50 1999 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 28 Aug 1999 16:37:50 +0200 Subject: [Python-Dev] Iterating over dictionaries and objects in general References: <990826114149.ZM59302@rayburn.hcs.tl> <199908261702.NAA01866@eric.cnri.reston.va.us> <37C57E01.2ADC02AE@digicool.com> <990826150216.ZM60002@rayburn.hcs.tl> <37C5BAF1.4D6C1031@lemburg.com> <37C5C320.CF11BC7C@digicool.com> <37C643B0.7ECA586@lemburg.com> <37C69FB3.9CB279C7@digicool.com> Message-ID: <37C7F43E.67EEAB98@lemburg.com> [Followup to a discussion on psa-members about iterating over dictionaries without creating intermediate lists] Jim Fulton wrote: > > "M.-A. Lemburg" wrote: > > > > Jim Fulton wrote: > > > > > > > The problem with the PyDict_Next() approach is that it will only > > > > work reliably from within a single C call. You can't return > > > > to Python between calls to PyDict_Next(), because those could > > > > modify the dictionary causing the next PyDict_Next() call to > > > > fail or core dump. > > > > > > I do this all the time without problem. Basically, you provide an > > > index and if the index is out of range, you simply get an end-of-data return. > > > The only downside of this approach is that you might get "incorrect" > > > results if the dictionary is modified between calls. This isn't > > > all that different from iterating over a list with an index. > > > > Hmm, that's true... but what if the dictionary gets resized > > in between iterations ? The item layout is then likely to > > change, so you could potentially get complet bogus. > > I think I said that. :) Just wanted to verify my understanding ;-) > > Even iterating over items twice may then occur, I guess. > > Yup. > > Again, this is not so different from iterating over > a list using a range: > > l=range(10) > for i in range.len(l): > l.insert(0,'Bruce') > print l[i] > > This always outputs 'Bruce'. :) Ok, so the "risk" is under user control. Fine with me... > > Or perhaps via a special dictionary iterator, so that the following > > works: > > > > for item in dictrange(d): > > ... > > Yup. > > > The iterator could then also take some extra actions to insure > > that the dictionary hasn't been resized. > > I don't think it should do that. It should simply > stop when it has run out of items. I think I'll give such an iterator a spin. Would be a nice extension to mxTools. BTW, a generic type slot for iterating over types would probably be a nice feature too. The type slot could provide hooks of the form it_first, it_last, it_next, it_prev which all work integer index based, e.g. in pseudo code: int i; PyObject *item; /* set up i and item to point to the first item */ if (obj.it_first(&i,&item) < 0) goto onError; while (1) { PyObject_Print(item); /* move i and item to the next item; an IndexError is raised in case there are no more items */ if (obj.it_next(&i,&item) < 0) { PyErr_Clear(); break; } } These slots would cover all problem instances where iteration over non-sequences or non-uniform sequences (i.e. sequences like objects which don't provide konvex index sets, e.g. 1,2,3,6,7,8,11,12) is required, e.g. dictionaries, multi-segment buffers -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 127 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gward at cnri.reston.va.us Mon Aug 30 21:02:22 1999 From: gward at cnri.reston.va.us (Greg Ward) Date: Mon, 30 Aug 1999 15:02:22 -0400 Subject: [Python-Dev] Portable "spawn" module for core? Message-ID: <19990830150222.B428@cnri.reston.va.us> Hi all -- it recently occured to me that the 'spawn' module I wrote for the Distutils (and which Perry Stoll extended to handle NT), could fit nicely in the core library. On Unix, it's just a front-end to fork-and-exec; on NT, it's a front-end to spawnv(). In either case, it's just enough code (and just tricky enough code) that not everybody should have to duplicate it for their own uses. The basic idea is this: from spawn import spawn ... spawn (['cmd', 'arg1', 'arg2']) # or spawn (['cmd'] + args) you get the idea: it takes a *list* representing the command to spawn: no strings to parse, no shells to get in the way, no sneaky meta-characters ruining your day, draining your efficiency, or compromising your security. (Conversely, no pipelines, redirection, etc.) The 'spawn()' function just calls '_spawn_posix()' or '_spawn_nt()' depending on os.name. Additionally, it takes a couple of optional keyword arguments (all booleans): 'search_path', 'verbose', and 'dry_run', which do pretty much what you'd expect. The module as it's currently in the Distutils code is attached. Let me know what you think... Greg -- Greg Ward - software developer gward at cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913 From skip at mojam.com Mon Aug 30 21:11:50 1999 From: skip at mojam.com (Skip Montanaro) Date: Mon, 30 Aug 1999 14:11:50 -0500 (CDT) Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: <19990830150222.B428@cnri.reston.va.us> References: <19990830150222.B428@cnri.reston.va.us> Message-ID: <14282.54880.922571.792484@dolphin.mojam.com> Greg> it recently occured to me that the 'spawn' module I wrote for the Greg> Distutils (and which Perry Stoll extended to handle NT), could fit Greg> nicely in the core library. How's spawn.spawn semantically different from the Windows-dependent os.spawn? How are stdout/stdin/stderr connected to the child process - just like fork+exec or something slightly higher level like os.popen? If it's semantically like os.spawn and a little bit higher level abstraction than fork+exec, I'd vote for having the os module simply import it: from spawn import spawn and thus make that function more widely available... Greg> The module as it's currently in the Distutils code is attached. Not in the message I saw... Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/~skip/ 847-971-7098 | Python: Programming the way Guido indented... From gward at cnri.reston.va.us Mon Aug 30 21:14:57 1999 From: gward at cnri.reston.va.us (Greg Ward) Date: Mon, 30 Aug 1999 15:14:57 -0400 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: <19990830150222.B428@cnri.reston.va.us>; from Greg Ward on Mon, Aug 30, 1999 at 03:02:22PM -0400 References: <19990830150222.B428@cnri.reston.va.us> Message-ID: <19990830151457.C428@cnri.reston.va.us> On 30 August 1999, To python-dev at python.org said: > The module as it's currently in the Distutils code is attached. Let me > know what you think... New definition of "attached": I'll just reply to my own message with the code I meant to attach. D'oh! ------------------------------------------------------------------------ """distutils.spawn Provides the 'spawn()' function, a front-end to various platform- specific functions for launching another program in a sub-process.""" # created 1999/07/24, Greg Ward __rcsid__ = "$Id: spawn.py,v 1.2 1999/08/29 18:20:56 gward Exp $" import sys, os, string from distutils.errors import * def spawn (cmd, search_path=1, verbose=0, dry_run=0): """Run another program, specified as a command list 'cmd', in a new process. 'cmd' is just the argument list for the new process, ie. cmd[0] is the program to run and cmd[1:] are the rest of its arguments. There is no way to run a program with a name different from that of its executable. If 'search_path' is true (the default), the system's executable search path will be used to find the program; otherwise, cmd[0] must be the exact path to the executable. If 'verbose' is true, a one-line summary of the command will be printed before it is run. If 'dry_run' is true, the command will not actually be run. Raise DistutilsExecError if running the program fails in any way; just return on success.""" if os.name == 'posix': _spawn_posix (cmd, search_path, verbose, dry_run) elif os.name in ( 'nt', 'windows' ): # ??? _spawn_nt (cmd, search_path, verbose, dry_run) else: raise DistutilsPlatformError, \ "don't know how to spawn programs on platform '%s'" % os.name # spawn () def _spawn_nt ( cmd, search_path=1, verbose=0, dry_run=0): import string executable = cmd[0] if search_path: paths = string.split( os.environ['PATH'], os.pathsep) base,ext = os.path.splitext(executable) if (ext != '.exe'): executable = executable + '.exe' if not os.path.isfile(executable): paths.reverse() # go over the paths and keep the last one for p in paths: f = os.path.join( p, executable ) if os.path.isfile ( f ): # the file exists, we have a shot at spawn working executable = f if verbose: print string.join ( [executable] + cmd[1:], ' ') if not dry_run: # spawn for NT requires a full path to the .exe rc = os.spawnv (os.P_WAIT, executable, cmd) if rc != 0: raise DistutilsExecError("command failed: %d" % rc) def _spawn_posix (cmd, search_path=1, verbose=0, dry_run=0): if verbose: print string.join (cmd, ' ') if dry_run: return exec_fn = search_path and os.execvp or os.execv pid = os.fork () if pid == 0: # in the child try: #print "cmd[0] =", cmd[0] #print "cmd =", cmd exec_fn (cmd[0], cmd) except OSError, e: sys.stderr.write ("unable to execute %s: %s\n" % (cmd[0], e.strerror)) os._exit (1) sys.stderr.write ("unable to execute %s for unknown reasons" % cmd[0]) os._exit (1) else: # in the parent # Loop until the child either exits or is terminated by a signal # (ie. keep waiting if it's merely stopped) while 1: (pid, status) = os.waitpid (pid, 0) if os.WIFSIGNALED (status): raise DistutilsExecError, \ "command %s terminated by signal %d" % \ (cmd[0], os.WTERMSIG (status)) elif os.WIFEXITED (status): exit_status = os.WEXITSTATUS (status) if exit_status == 0: return # hey, it succeeded! else: raise DistutilsExecError, \ "command %s failed with exit status %d" % \ (cmd[0], exit_status) elif os.WIFSTOPPED (status): continue else: raise DistutilsExecError, \ "unknown error executing %s: termination status %d" % \ (cmd[0], status) # _spawn_posix () ------------------------------------------------------------------------ -- Greg Ward - software developer gward at cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913 From gward at cnri.reston.va.us Mon Aug 30 21:31:55 1999 From: gward at cnri.reston.va.us (Greg Ward) Date: Mon, 30 Aug 1999 15:31:55 -0400 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: <14282.54880.922571.792484@dolphin.mojam.com>; from Skip Montanaro on Mon, Aug 30, 1999 at 02:11:50PM -0500 References: <19990830150222.B428@cnri.reston.va.us> <14282.54880.922571.792484@dolphin.mojam.com> Message-ID: <19990830153155.D428@cnri.reston.va.us> On 30 August 1999, Skip Montanaro said: > > Greg> it recently occured to me that the 'spawn' module I wrote for the > Greg> Distutils (and which Perry Stoll extended to handle NT), could fit > Greg> nicely in the core library. > > How's spawn.spawn semantically different from the Windows-dependent > os.spawn? My understanding (purely from reading Perry's code!) is that the Windows spawnv() and spawnve() calls require the full path of the executable, and there is no spawnvp(). Hence, the bulk of Perry's '_spawn_nt()' function is code to search the system path if the 'search_path' flag is true. In '_spawn_posix()', I just use either 'execv()' or 'execvp()' for this. The bulk of my code is the complicated dance required to wait for a fork'ed child process to finish. > How are stdout/stdin/stderr connected to the child process - just > like fork+exec or something slightly higher level like os.popen? Just like fork 'n exec -- '_spawn_posix()' is just a front end to fork and exec (either execv or execvp). In a previous life, I *did* implement a spawning module for a certain other popular scripting language that handles redirection and capturing (backticks in the shell and that other scripting language). It was a lot of fun, but pretty hairy. Took three attempts gradually developed over two years to get it right in the end. In fact, it does all the easy stuff that a Unix shell does in spawning commands, ie. search the path, fork 'n exec, and redirection and capturing. Doesn't handle the tricky stuff, ie. pipelines and job control. The documentation for this module is 22 pages long; the code is 600+ lines of somewhat tricky Perl (1300 lines if you leave in comments and blank lines). That's why the Distutils spawn module doesn't do anything with std{out,err,in}. > If it's semantically like os.spawn and a little bit higher level > abstraction than fork+exec, I'd vote for having the os module simply > import it: So os.spawnv and os.spawnve would be Windows-specific, but os.spawn portable? Could be confusing. And despite the recent extended discussion of the os module, I'm not sure if this fits the model. BTW, is there anything like this on the Mac? On what other OSs does it even make sense to talk about programs spawning other programs? (Surely those GUI user interfaces have to do *something*...) Greg -- Greg Ward - software developer gward at cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913 From skip at mojam.com Mon Aug 30 21:52:49 1999 From: skip at mojam.com (Skip Montanaro) Date: Mon, 30 Aug 1999 14:52:49 -0500 (CDT) Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: <19990830153155.D428@cnri.reston.va.us> References: <19990830150222.B428@cnri.reston.va.us> <14282.54880.922571.792484@dolphin.mojam.com> <19990830153155.D428@cnri.reston.va.us> Message-ID: <14282.57574.918011.54595@dolphin.mojam.com> Greg> BTW, is there anything like this on the Mac? There will be, once Jack Jansen contributes _spawn_mac... ;-) Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/~skip/ 847-971-7098 | Python: Programming the way Guido indented... From jack at oratrix.nl Mon Aug 30 23:25:04 1999 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 30 Aug 1999 23:25:04 +0200 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: Message by Greg Ward , Mon, 30 Aug 1999 15:31:55 -0400 , <19990830153155.D428@cnri.reston.va.us> Message-ID: <19990830212509.7F5C018B9FB@oratrix.oratrix.nl> Recently, Greg Ward said: > BTW, is there anything like this on the Mac? On what other OSs does it > even make sense to talk about programs spawning other programs? (Surely > those GUI user interfaces have to do *something*...) Yes, but the interface is quite a bit more high-level, so it's pretty difficult to reconcile with the Unix and Windows "every argument is a string" paradigm. You start the process and pass along an AppleEvent (basically an RPC-call) that will be presented to the program upon startup. So on the mac there's a serious difference between (inventing the API interface here, cut down to make it understandable to non-macheads:-) spawn("netscape", ("Open", "file.html")) and spawn("netscape", ("OpenURL", "http://foo.com/file.html")) The mac interface is (of course:-) infinitely more powerful, allowing you to talk to running apps, adressing stuff in it as COM/OLE does, etc. but unfortunately the simple case of spawn("rm", "-rf", "/") is impossible to represent in a meaningful way. Add to that the fact that there's no stdin/stdout/stderr and there's little common ground. The one area of common ground is "run program X on files Y and Z and wait (or don't wait) for completion", so that is something that could maybe have a special method that could be implemented on all three mentioned platforms (and probably everything else as well). And even then it'll be surprising to Mac users that they have to _exit_ their editor (if you specify wait), not something people commonly do. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From guido at CNRI.Reston.VA.US Mon Aug 30 23:29:55 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Mon, 30 Aug 1999 17:29:55 -0400 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: Your message of "Mon, 30 Aug 1999 23:25:04 +0200." <19990830212509.7F5C018B9FB@oratrix.oratrix.nl> References: <19990830212509.7F5C018B9FB@oratrix.oratrix.nl> Message-ID: <199908302129.RAA08442@eric.cnri.reston.va.us> > Recently, Greg Ward said: > > BTW, is there anything like this on the Mac? On what other OSs does it > > even make sense to talk about programs spawning other programs? (Surely > > those GUI user interfaces have to do *something*...) > > Yes, but the interface is quite a bit more high-level, so it's pretty > difficult to reconcile with the Unix and Windows "every argument is a > string" paradigm. You start the process and pass along an AppleEvent > (basically an RPC-call) that will be presented to the program upon > startup. > > So on the mac there's a serious difference between (inventing the API > interface here, cut down to make it understandable to non-macheads:-) > spawn("netscape", ("Open", "file.html")) > and > spawn("netscape", ("OpenURL", "http://foo.com/file.html")) > > The mac interface is (of course:-) infinitely more powerful, allowing > you to talk to running apps, adressing stuff in it as COM/OLE does, > etc. but unfortunately the simple case of spawn("rm", "-rf", "/") is > impossible to represent in a meaningful way. > > Add to that the fact that there's no stdin/stdout/stderr and there's > little common ground. The one area of common ground is "run program X > on files Y and Z and wait (or don't wait) for completion", so that is > something that could maybe have a special method that could be > implemented on all three mentioned platforms (and probably everything > else as well). And even then it'll be surprising to Mac users that > they have to _exit_ their editor (if you specify wait), not something > people commonly do. Indeed. I'm guessing that Greg wrote his code specifically to drive compilers, not so much to invoke an editor on a specific file. It so happens that the Windows compilers have command lines that look sufficiently like the Unix compilers that this might actually work. On the Mac, driving the compilers is best done using AppleEvents, so it's probably better to to try to abuse the spawn() interface for that... (Greg, is there a higher level where the compiler actions are described without referring to specific programs, but perhaps just to compiler actions and input and output files?) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at CNRI.Reston.VA.US Mon Aug 30 23:35:45 1999 From: guido at CNRI.Reston.VA.US (Guido van Rossum) Date: Mon, 30 Aug 1999 17:35:45 -0400 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: Your message of "Mon, 30 Aug 1999 15:02:22 EDT." <19990830150222.B428@cnri.reston.va.us> References: <19990830150222.B428@cnri.reston.va.us> Message-ID: <199908302135.RAA08467@eric.cnri.reston.va.us> > it recently occured to me that the 'spawn' module I wrote for the > Distutils (and which Perry Stoll extended to handle NT), could fit > nicely in the core library. On Unix, it's just a front-end to > fork-and-exec; on NT, it's a front-end to spawnv(). In either case, > it's just enough code (and just tricky enough code) that not everybody > should have to duplicate it for their own uses. > > The basic idea is this: > > from spawn import spawn > ... > spawn (['cmd', 'arg1', 'arg2']) > # or > spawn (['cmd'] + args) > > you get the idea: it takes a *list* representing the command to spawn: > no strings to parse, no shells to get in the way, no sneaky > meta-characters ruining your day, draining your efficiency, or > compromising your security. (Conversely, no pipelines, redirection, > etc.) > > The 'spawn()' function just calls '_spawn_posix()' or '_spawn_nt()' > depending on os.name. Additionally, it takes a couple of optional > keyword arguments (all booleans): 'search_path', 'verbose', and > 'dry_run', which do pretty much what you'd expect. > > The module as it's currently in the Distutils code is attached. Let me > know what you think... I'm not sure that the verbose and dry_run options belong in the standard library. When both are given, this does something semi-useful; for Posix that's basically just printing the arguments, while for NT it prints the exact command that will be executed. Not sure if that's significant though. Perhaps it's better to extract the code that runs the path to find the right executable and make that into a separate routine. (Also, rather than reversing the path, I would break out of the loop at the first hit.) --Guido van Rossum (home page: http://www.python.org/~guido/) From gward at cnri.reston.va.us Mon Aug 30 23:38:36 1999 From: gward at cnri.reston.va.us (Greg Ward) Date: Mon, 30 Aug 1999 17:38:36 -0400 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: <199908302129.RAA08442@eric.cnri.reston.va.us>; from Guido van Rossum on Mon, Aug 30, 1999 at 05:29:55PM -0400 References: <19990830212509.7F5C018B9FB@oratrix.oratrix.nl> <199908302129.RAA08442@eric.cnri.reston.va.us> Message-ID: <19990830173836.F428@cnri.reston.va.us> On 30 August 1999, Guido van Rossum said: > Indeed. I'm guessing that Greg wrote his code specifically to drive > compilers, not so much to invoke an editor on a specific file. It so > happens that the Windows compilers have command lines that look > sufficiently like the Unix compilers that this might actually work. Correct, but the spawn module I posted should work for any case where you want to run an external command synchronously without redirecting I/O. (And it could probably be extended to handle those cases, but a) I don't need them for Distutils [yet!], and b) I don't know how to do it portably.) > On the Mac, driving the compilers is best done using AppleEvents, so > it's probably better to to try to abuse the spawn() interface for > that... (Greg, is there a higher level where the compiler actions are > described without referring to specific programs, but perhaps just to > compiler actions and input and output files?) [off-topic alert... probably belongs on distutils-sig, but there you go] Yes, my CCompiler class is all about providing a (hopefully) compiler- and platform-neutral interface to a C/C++ compiler. Currently there're only two concrete subclasses of this: UnixCCompiler and MSVCCompiler, and they both obviously use spawn, because Unix C compilers and MSVC both provide that kind of interface. A hypothetical sibling class that provides an interface to some Mac C compiler might use a souped-up spawn that "knows about" Apple Events, or it might use some other interface to Apple Events. If Jack's simplified summary of what passing Apple Events to a command looks like is accurate, maybe spawn can be souped up to work on the Mac. Or we might need a dedicated module for running Mac programs. So does anybody have code to run external programs on the Mac using Apple Events? Would it be possible/reasonable to add that as '_spawn_mac()' to my spawn module? Greg -- Greg Ward - software developer gward at cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913 From jack at oratrix.nl Mon Aug 30 23:52:29 1999 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 30 Aug 1999 23:52:29 +0200 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: Message by Greg Ward , Mon, 30 Aug 1999 17:38:36 -0400 , <19990830173836.F428@cnri.reston.va.us> Message-ID: <19990830215234.ED4E718B9FB@oratrix.oratrix.nl> Hmm, if we're talking a "Python Make" or some such here the best way would probably be to use Tool Server. Tool Server is a thing that is based on Apple's old MPW programming environment, that is still supported by compiler vendors like MetroWerks. The nice thing of Tool Server for this type of work is that it _is_ command-line based, so you can probably send it things like spawn("cc", "-O", "test.c") But, although I know it is possible to do this with ToolServer, I haven't a clue on how to do it... -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From tim_one at email.msn.com Tue Aug 31 07:44:18 1999 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 31 Aug 1999 01:44:18 -0400 Subject: [Python-Dev] Portable "spawn" module for core? In-Reply-To: <19990830153155.D428@cnri.reston.va.us> Message-ID: <000101bef373$de2974c0$932d153f@tim> [Greg Ward] > ... > In a previous life, I *did* implement a spawning module for > a certain other popular scripting language that handles > redirection and capturing (backticks in the shell and that other > scripting language). It was a lot of fun, but pretty hairy. Took > three attempts gradually developed over two years to get it right > in the end. In fact, it does all the easy stuff that a Unix shell > does in spawning commands, ie. search the path, fork 'n exec, and > redirection and capturing. Doesn't handle the tricky stuff, ie. > pipelines and job control. > > The documentation for this module is 22 pages long; the code > is 600+ lines of somewhat tricky Perl (1300 lines if you leave > in comments and blank lines). That's why the Distutils spawn > module doesn't do anything with std{out,err,in}. Note that win/tclWinPipe.c-- which contains the Windows-specific support for Tcl's "exec" cmd --is about 3,200 lines of C. It does handle pipelines and redirection, and even fakes pipes as needed with temp files when it can identify a pipeline component as belonging to the 16-bit subsystem. Even so, the Tcl help page for "exec" bristles with hilarious caveats under the Windows subsection; e.g., When redirecting from NUL:, some applications may hang, others will get an infinite stream of "0x01" bytes, and some will actually correctly get an immediate end-of-file; the behavior seems to depend upon something compiled into the application itself. When redirecting greater than 4K or so to NUL:, some applications will hang. The above problems do not happen with 32-bit applications. Still, people seem very happy with Tcl's exec, and I'm certain no language tries harder to provide a portable way to "do command lines". Two points to that: 1) If Python ever wants to do something similar, let's steal the Tcl code (& unlike stealing Perl's code, stealing Tcl's code actually looks possible -- it's very much better organized and written). 2) For all its heroic efforts to hide platform limitations, int Tcl_ExecObjCmd(dummy, interp, objc, objv) ClientData dummy; /* Not used. */ Tcl_Interp *interp; /* Current interpreter. */ int objc; /* Number of arguments. */ Tcl_Obj *CONST objv[]; /* Argument objects. */ { #ifdef MAC_TCL Tcl_AppendResult(interp, "exec not implemented under Mac OS", (char *)NULL); return TCL_ERROR; #else ... a-generalized-spawn-is-a-good-start-ly y'rs - tim From fredrik at pythonware.com Tue Aug 31 08:39:56 1999 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 31 Aug 1999 08:39:56 +0200 Subject: [Python-Dev] Portable "spawn" module for core? References: <19990830150222.B428@cnri.reston.va.us> Message-ID: <005101bef37b$b0415070$f29b12c2@secret.pythonware.com> Greg Ward wrote: > it recently occured to me that the 'spawn' module I wrote for the > Distutils (and which Perry Stoll extended to handle NT), could fit > nicely in the core library. On Unix, it's just a front-end to > fork-and-exec; on NT, it's a front-end to spawnv(). any reason this couldn't go into the os module instead? just add parts of it to os.py, and change the docs to say that spawn* are supported on Windows and Unix... (supporting the full set of spawn* primitives would of course be nice, btw. just like os.py provides all exec variants...)