From arigo at tunes.org Sat Nov 1 07:46:28 2003 From: arigo at tunes.org (Armin Rigo) Date: Sat Nov 1 07:50:30 2003 Subject: [Python-Dev] Looking for master thesis ideas involving Python In-Reply-To: <2mad7h72sr.fsf@starship.python.net> References: <5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com> <3FA1C6CD.6050201@ocf.berkeley.edu> <3FA0A210.10605@ocf.berkeley.edu> <2mhe1rj7n8.fsf@starship.python.net> <3FA1C6CD.6050201@ocf.berkeley.edu> <5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com> <5.1.1.6.0.20031031111429.03110880@telecommunity.com> <2mad7h72sr.fsf@starship.python.net> Message-ID: <20031101124628.GA26463@vicky.ecs.soton.ac.uk> Hello Michael, On Fri, Oct 31, 2003 at 05:08:36PM +0000, Michael Hudson wrote: > > be getting a 12-25% decrease in memory use for the base object, > > though. > > More than that in the good cases. Something I forgot was that you'd > probably have to knock variable length types on the head. Why? Armin From nas-python at python.ca Sat Nov 1 22:36:57 2003 From: nas-python at python.ca (Neil Schemenauer) Date: Sat Nov 1 22:35:25 2003 Subject: [Python-Dev] Deprecate the buffer object? In-Reply-To: <0b8e01c39f34$1d31e1f0$0500a8c0@eden> References: <200310300230.h9U2UId08398@oma.cosc.canterbury.ac.nz> <0b8e01c39f34$1d31e1f0$0500a8c0@eden> Message-ID: <20031102033657.GA8137@mems-exchange.org> On Fri, Oct 31, 2003 at 09:21:06AM +1100, Mark Hammond wrote: > Thus, my preference is to fix the buffer object by fixing the interface as > much as possible. > > Here is a sketch of a solution, incorporating both Neil and Greg's ideas: > > * Type object gets a new flag - TP_HAS_BUFFER_INFO, corresponding to a new > 'getbufferinfoproc' slot in the PyBufferProcs structure (note - a function > pointer, not static flags as Neil suggested) > > * New function 'getbufferinfoproc' returns a bitmask - Py_BUFFER_FIXED is > one (and currently the only) flag that can be returned. What does this flag mean? To my mind, there are several different types of memory buffers and the buffer interface does not distinguish between all of them. Is the size and position of the buffer fixed? Is the buffer immutable (it may be readonly by the buffer object but writable via some other mechanism)? The first question can be avoided by using Greg's idea of always refreshing the size and position. The second question cannot be answered using the current interface. I supposed if the buffer is immutable then it is implied that the its size and position is fixed. > * New buffer functions PyObject_AsFixedCharBuffer, etc. These check the new > flag (and a type lacking TP_HAS_BUFFER_INFO is assumed to *not* be fixed) > > * Buffer object keeps a reference to the existing object (as it does now). > Its getbufferinfoproc delegates to the underlying object. > > * Buffer object *never* keeps a pointer to the buffer - only to the object. > Functions like tp_hash always re-fetch the buffer on demand. The buffer > returned by the buffer object is then guaranteed to be as reliable as the > underlying object. (This may be a semantic issue with hash(), but > conceptually seems fine. Potential solution here - add Py_BUFFER_READONLY > as a buffer flag, then hash() semantics could do the right thing) You can't use the base objects hash if the buffer has a explicit size of offset. Neil From nas-python at python.ca Sat Nov 1 22:49:24 2003 From: nas-python at python.ca (Neil Schemenauer) Date: Sat Nov 1 22:47:47 2003 Subject: [Python-Dev] Deprecate the buffer object? In-Reply-To: <200310300230.h9U2UId08398@oma.cosc.canterbury.ac.nz> References: <087001c39e73$70333e60$0500a8c0@eden> <200310300230.h9U2UId08398@oma.cosc.canterbury.ac.nz> Message-ID: <20031102034924.GB8137@mems-exchange.org> On Thu, Oct 30, 2003 at 03:30:18PM +1300, Greg Ewing wrote: > That's completely different from what I had in mind, which was: > > (1) Keep a reference to the base object in the buffer object, and > > (2) Use the buffer API to fetch a fresh pointer from the > base object each time it's needed. I've just uploaded a (rough) patch that implements your idea. http://www.python.org/sf/832058 Neil From greg at electricrain.com Sun Nov 2 01:20:50 2003 From: greg at electricrain.com (Gregory P. Smith) Date: Sun Nov 2 01:20:56 2003 Subject: [Python-Dev] Looking for master thesis ideas involving Python In-Reply-To: <3FA06DC5.70407@ocf.berkeley.edu> References: <338366A6D2E2CA4C9DAEAE652E12A1DED6B3F8@au3010avexu1.global.avaya.com> <3FA06DC5.70407@ocf.berkeley.edu> Message-ID: <20031102062050.GA5805@zot.electricrain.com> > >>How about re-engineering the interpreter to make it more MP > >>friendly? (This is probably a bigger task than a Masters thesis.) > >>The current interpreter serializes on the global interpreter lock > >>(GIL) and blocks everything. ... > I will still consider this, though. > > -Brett If you take this on there is no doubt you'll receive many-a-beer from people on this list! :) From greg at electricrain.com Sun Nov 2 04:25:17 2003 From: greg at electricrain.com (Gregory P. Smith) Date: Sun Nov 2 04:25:22 2003 Subject: [Python-Dev] bsddb test case deadlocks fixed Message-ID: <20031102092517.GB5805@zot.electricrain.com> I just committed the fixes necessary for test_bsddb.py to complete without deadlocking (yay!). It should remove all possibility of a bsddb deadlock in single threaded applications as well as allow for multiple iterator/generator objects to operate properly on a database at once and make the _DBWithCursor __iter__ implementation more efficient by not asking for the values from the db since it only returns the keys. I believe there are still race conditions that could lead to a deadlock in the bsddb interface due to a current lack of locking around its internal open|closed DBCursor management. I'm opening a SF bug to track that. A test case to prove the theory is needed. Let me know if you see any problems. I'm sorry about allowing the deadlock to be committed in the first place. I routinely run the large bsddb test suite when doing bsddb development but test_bsddb.py contained additional coverage for the recent iterator interface that is not present in the large test suite; now I run both. - Greg From greg at electricrain.com Sun Nov 2 05:00:06 2003 From: greg at electricrain.com (Gregory P. Smith) Date: Sun Nov 2 05:00:12 2003 Subject: [Python-Dev] Re: test_bsddb blocks testing popitem - reason In-Reply-To: <200310281112.21162.aleaxit@yahoo.com> References: <200310251232.55044.aleaxit@yahoo.com> <200310271125.16879.aleaxit@yahoo.com> <20031027215648.GM3929@zot.electricrain.com> <200310281112.21162.aleaxit@yahoo.com> Message-ID: <20031102100006.GA17328@zot.electricrain.com> On Tue, Oct 28, 2003 at 11:12:21AM +0100, Alex Martelli wrote: > On Monday 27 October 2003 10:56 pm, Gregory P. Smith wrote: > > What about the behaviour of multiple iterators for the same dict being > > used at once (either interleaved or by multiple threads; it shouldn't > > matter)? I expect that works fine in python. > > If the dict is not being modified, or if the only modifications on it are > assigning different values for already-existing keys, multiple iterators > on the same unchanging dict do work fine in one or more threads. > But note that iterators only "read" the dict, don't change it. If any > change to the set of keys in the dict happens, all bets are off. ... > > This is something the _DBWithCursor iteration interface does not currently > > support due to its use of a single DBCursor internally. > > > > _DBWithCursor is currently written such that the cursor is never closed > > once created. This leaves tons of potential for deadlock even in single > > threaded apps. Reworking _DBWithCursor into a _DBThatUsesCursorsSafely > > such that each iterator creates its own cursor in an internal pool > > and other non cursor methods that would write to the db destroy all > > cursors after saving their current() position so that the iterators can > > reopen+reposition them is a solution. > > Woof. I think I understand what you're saying. However, writing to a > dict (in the sense of changing the sets of keys) while one is iterating > on the dict is NOT supported in Python -- basically "undefined behavior" > (which does NOT include possibilities of crashes and deadlocks, though). > So, maybe, we could get away with something a bit less rich here? I just implemented and committed something about that rich. I believe I could simplify it: have __iter__() and iteritems() return if their cursor was closed out from underneath them instead of the current attempt to reopen a cursor, reposition themselves, and keep going [which could still have unpredictable results since a db modification could rearrange the keys in some types of databases]. > So, maybe I _should_ just fix popitem that way and see if all tests pass? > I dunno -- it feels a bit like fixing the symptoms and leaving some deep > underlying problems intact... My commit fixed the deadlock problem for the single threaded case and wrote a test case to prove it. I opened a SF bug to track fixing the deadlock possibilities in the multithreaded case (and a memory leak i believe i added). -g From aleaxit at yahoo.com Sun Nov 2 06:11:33 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Sun Nov 2 06:11:40 2003 Subject: [Python-Dev] Re: test_bsddb blocks testing popitem - reason In-Reply-To: <20031102100006.GA17328@zot.electricrain.com> References: <200310251232.55044.aleaxit@yahoo.com> <200310281112.21162.aleaxit@yahoo.com> <20031102100006.GA17328@zot.electricrain.com> Message-ID: <200311021211.33462.aleaxit@yahoo.com> On Sunday 02 November 2003 11:00 am, Gregory P. Smith wrote: ... > > So, maybe, we could get away with something a bit less rich here? > > I just implemented and committed something about that rich. Super! I've just updated, built, and re-run all tests (on 2.4), and they all go smoothly. > My commit fixed the deadlock problem for the single threaded case and > wrote a test case to prove it. I opened a SF bug to track fixing the > deadlock possibilities in the multithreaded case (and a memory leak i > believe i added). OK, I understand this fix isn't the be-all end-all, but still, it makes things much better than they were before. *THANKS*! Alex From skip at manatee.mojam.com Sun Nov 2 08:00:59 2003 From: skip at manatee.mojam.com (Skip Montanaro) Date: Sun Nov 2 08:01:09 2003 Subject: [Python-Dev] Weekly Python Bug/Patch Summary Message-ID: <200311021300.hA2D0x8Y004595@manatee.mojam.com> Bug/Patch Summary ----------------- 548 open / 4296 total bugs (+48) 190 open / 2438 total patches (-1) New Bugs -------- python 2.3.2 make test segfault (2003-10-26) http://python.org/sf/830573 httplib.HTTPConnection._send_request header parsing bug (2003-10-27) http://python.org/sf/831271 Solaris term.h needs curses.h (2003-10-27) http://python.org/sf/831574 httplib hardcodes Accept-Encoding (2003-10-28) http://python.org/sf/831747 Docstring for pyclbr.readmodule() is incorrect (2003-10-28) http://python.org/sf/831969 C++ extensions using SWIG and MinGW (2003-10-28) http://python.org/sf/832159 Build fails in ossaudiodev.c with missing macros (2003-10-29) http://python.org/sf/832236 Wrong reference for specific minidom methods (2003-10-29) http://python.org/sf/832251 Bad Security Advice in CGI Documentation (2003-10-29) http://python.org/sf/832515 Inconsitent line numbering in traceback (2003-10-29) http://python.org/sf/832535 Please link modules with shared lib (2003-10-29) http://python.org/sf/832799 urllib.urlencode doesn't work for output from cgi.parse_qs (2003-10-30) http://python.org/sf/833405 Incorrect priority 'in' and '==' (2003-10-31) http://python.org/sf/833905 Ctrl+key combos stop working in IDLE (2003-10-31) http://python.org/sf/833957 Mouse wheel crashes program (2003-11-01) http://python.org/sf/834351 python and lithuanian locales (2003-11-02) http://python.org/sf/834452 simple bsddb interface potential for deadlock with threads (2003-11-02) http://python.org/sf/834461 New Patches ----------- deprecate or fix buffer object (2003-10-28) http://python.org/sf/832058 Implementation PEP 322: Reverse Iteration (2003-11-01) http://python.org/sf/834422 Closed Bugs ----------- test_signal hangs -- signal broken on OpenBSD? (2002-04-26) http://python.org/sf/549081 urllib2 and proxy (2003-01-02) http://python.org/sf/661042 os.popen with mode "rb" fails on Unix (2003-03-13) http://python.org/sf/703198 Test failures on Linux, Python 2.3b1 tarball (2003-04-26) http://python.org/sf/728051 Memory leak on open() only in 2.3? (2003-08-15) http://python.org/sf/789402 urllib.urlopen for https doesn't always provide readlines (2003-08-20) http://python.org/sf/792101 gc.get_referrers() is inherently dangerous (2003-08-23) http://python.org/sf/793822 dis.disassemble_string() broken (2003-09-23) http://python.org/sf/811294 int ("ffffffd3", 16) gives error (2003-09-24) http://python.org/sf/811898 Email.message example missing arg (2003-10-03) http://python.org/sf/817178 httplib.SSLFile lacks readlines() method (2003-10-07) http://python.org/sf/819510 Package Manager Scrolling Behavior (2003-10-15) http://python.org/sf/824430 dict.__init__ doesn't call subclass's __setitem__. (2003-10-16) http://python.org/sf/824854 wrong error message of islice indexing (2003-10-20) http://python.org/sf/827190 ctime is not creation time (2003-10-21) http://python.org/sf/827902 setattr(obj, BADNAME, value) does not raises exception (2003-10-24) http://python.org/sf/829458 python-mode.el: py-b-of-def-or-class looks inside strings (2003-10-25) http://python.org/sf/830347 Closed Patches -------------- Add isxxx() methods to string objects (2002-05-30) http://python.org/sf/562501 Enhanced file constructor (2002-09-11) http://python.org/sf/608182 Experimental Inno Setup Win32 installer (2002-10-24) http://python.org/sf/628301 terminal type option subnegotiation in telnetlib (2003-04-17) http://python.org/sf/723364 Allows os.forkpty to work on more platforms (Solaris!) (2003-05-04) http://python.org/sf/732401 fix problem in about dialog (2003-07-21) http://python.org/sf/775057 pydoc's usage should use basename (2003-08-08) http://python.org/sf/785689 termios module on IRIX (2003-08-11) http://python.org/sf/787189 ignore "b" and "t" mode modifiers in posix_popen (2003-08-13) http://python.org/sf/788404 POP3 over SSL support for poplib (2003-08-19) http://python.org/sf/791706 [_ssl.c] SSL_write() called with -1 as size (2003-09-10) http://python.org/sf/803998 socket.ssl should check certificates (2003-09-22) http://python.org/sf/810754 sprout more file operations in SSLFile, fixes 792101 (2003-10-04) http://python.org/sf/817854 let's get rid of cyclic object comparison (2003-10-17) http://python.org/sf/825639 Add list.copysort() (2003-10-17) http://python.org/sf/825814 itertoolsmodule.c: islice error messages (827190) (2003-10-25) http://python.org/sf/830070 python-mode.el: (py-point 'bod) doesn't quite work (2003-10-25) http://python.org/sf/830341 From martin at v.loewis.de Sun Nov 2 14:05:55 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Sun Nov 2 14:06:11 2003 Subject: [Python-Dev] Looking for master thesis ideas involving Python In-Reply-To: <20031101124628.GA26463@vicky.ecs.soton.ac.uk> References: <5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com> <3FA1C6CD.6050201@ocf.berkeley.edu> <3FA0A210.10605@ocf.berkeley.edu> <2mhe1rj7n8.fsf@starship.python.net> <3FA1C6CD.6050201@ocf.berkeley.edu> <5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com> <5.1.1.6.0.20031031111429.03110880@telecommunity.com> <2mad7h72sr.fsf@starship.python.net> <20031101124628.GA26463@vicky.ecs.soton.ac.uk> Message-ID: Armin Rigo writes: > On Fri, Oct 31, 2003 at 05:08:36PM +0000, Michael Hudson wrote: > > > be getting a 12-25% decrease in memory use for the base object, > > > though. > > > > More than that in the good cases. Something I forgot was that you'd > > probably have to knock variable length types on the head. > > Why? Assuming "to knock on the head" means "to put an end to": If you put all objects of the same type into a pool, you really want all objects to have the same side, inside a pool. With that assumption, garbage objects can be reallocated without causing fragmentation. If objects in a pool have different sizes, it is not possible to have an efficient reallocation strategy. Of course, you could try to make a compacting garbage collector, but that would break the current programming model even more (as object references would stop being pointers). Regards, Martin From martin at v.loewis.de Sun Nov 2 14:10:33 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Sun Nov 2 14:10:47 2003 Subject: [Python-Dev] Weekly Python Bug/Patch Summary In-Reply-To: <200311021300.hA2D0x8Y004595@manatee.mojam.com> References: <200311021300.hA2D0x8Y004595@manatee.mojam.com> Message-ID: Skip Montanaro writes: > Bug/Patch Summary > ----------------- > > 548 open / 4296 total bugs (+48) > 190 open / 2438 total patches (-1) How do you compute the deltas? On Oct 26, in http://mail.python.org/pipermail/python-dev/2003-October/039559.html you write 547 open / 4276 total bugs (+42) 205 open / 2432 total patches (+7) Regards, Martin From aleaxit at yahoo.com Sun Nov 2 17:19:42 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Sun Nov 2 17:19:51 2003 Subject: [Python-Dev] reflections on basestring -- and other abstract basetypes Message-ID: <200311022319.42725.aleaxit@yahoo.com> 1. Shouldn't class UserString.UserString inherit from basestring? After all, basestring exists specifically in order to encourage typetests of the form isinstance(x, basestring) -- wouldn't it be better if such tests could also catch "user-tweaked strings" derived from UserString ... ? 2. If we do want to encourage such typetest idioms, it might be a good idea to provide some other such abstract basetypes for the purpose. For example, I see quite a few cases of isinstance(x, (int,long,gmpy.mpz)) in my code -- and that, despite the fact that I'm not enamoured of typetesting as a general idea and that I'm quite aware that this kind of check could miss some other kind of user-coded "integeroid number". If there was an abstract basetype, say "baseinteger", from which int and long derived, I'd be happy to tweak gmpy to make mpz subclass it (in 2.4 and later versions of Python only, of course) and allow such typetests to happen more smoothly, faster and with more generality too. 3. And perhaps baseinteger (and float and complex) should all subclass yet another basetype, say "basenumber"? Why not? I admit that right now I have no use cases where I _do_ want to accept complex numbers as well as int, long, float, and gmpy thingies (so, maybe there should be a more specific "basereal" keeping complex out...?), but apart from this detail such an abstract basetype would be similarly useful (in practice I would use it since I do not expect complex in my apps anyway). 4. Furthermore, providing "basenumber" would let user-coded classes "flag" in a simple and direct way "I'm emulating numbers". This might well be useful _to Python itself_... Right now, I'm stuck for an answer to the bug that a user-coded class which exposes __mul__ but not __rmul__ happens to support its instances being multiplied by an integer on the right -- quite surprising to users! The problem is that this behavior is apparently expected, though not documented, when the user-coded class is trying to simulate a _sequence_ rather than a number. So, I can't just take the peculiar "accidental commutativity with integers only" away. IF a user class could flag itself as "numeroid" by inheriting basenumber, THEN the "accidental commutativity" COULD be easily removed at least for such classes. 5. in fact, now that we fill in type descriptor slots bases on user-coded classes' special methods, I suspect this isn't the only such issue. While "flagging" (inheriting one of the abstract basetypes) would be entirely optional for user-coded classes, it would at least provide a way to _explicitly disambiguate_ what it is that the user-coded class IS trying to emulate, if the user wants to. 6. of course, for that to be any use, the various basetypes should not be "ambiguously" multiply inheritable from. Right now, is isnt so...: >>> class x(basestring, int): pass ... >>> isinstance(x(), int) True >>> isinstance(x(), basestring) True ...does anybody see any problem if, in 2.4, we take away the ability to multiply inherit from basestring AND also from another builtin type which does not in turn inherit from basestring...? I have the impression that right now this is working "sort of accidentally", rather than by design. 7. one might of course think of other perhaps-useful abstract basetypes, such as e.g. basesequence or basemapping -- right now the new forthcoming built-in 'reverse' is trying to avoid "accidentally working" on mappings by featuretesting for (e.g.) has_key, but if the user could optionally subclass either of these abstract basetypes (but not both at once, see [6]:-), that might ease reverse's task in some cases. Why, such abstract basetypes might even make operator.isMappingType useful again -- right now, of course: >>> operator.isMappingType([]) True and therefore there isn't much point in that function:-). But I think that points 1-6 may be enough to discuss for the moment (and I brace myself for the flames of the antitypetesters -- why, if I hadn't matured this idea myself I might well be one of the flamers:-) so I have no concrete proposals sub [7] -- yet. ...just a sec... Ok, ready -- fire away! Alex From python at rcn.com Sun Nov 2 17:52:26 2003 From: python at rcn.com (Raymond Hettinger) Date: Sun Nov 2 17:53:28 2003 Subject: [Python-Dev] reflections on basestring -- and other abstractbasetypes In-Reply-To: <200311022319.42725.aleaxit@yahoo.com> Message-ID: <002601c3a193$fcad8300$e841fea9@oemcomputer> > 1. Shouldn't class UserString.UserString inherit from basestring? The functionality of UserString has been subsumed by inheriting from str. So, its main purpose now is to keep old code working which means that it is probably not wise to suddenly convert it from a classic class to a new-style class. > 3. And perhaps baseinteger (and float and complex) should all subclass yet > another basetype, say "basenumber"? Why not? I admit that right now > I have no use cases where I _do_ want to accept complex numbers as > well as int, long, float, and gmpy thingies (so, maybe there should be > a > more specific "basereal" keeping complex out...?), but apart from this > detail such an abstract basetype would be similarly useful (in > practice > I would use it since I do not expect complex in my apps anyway). At one time, I also requested an abstract numeric inheritance hierarchy with real=union(int,float,long) and numbers=union(real,complex). However, much time has passed and the need has never risen again. > ...does anybody see any problem if, in 2.4, we take away the ability to > multiply inherit from basestring AND also from another builtin type which > does not in turn inherit from basestring. I would rather leave this open than introduce code to prevent it. My sense is that blocking it would introduce complexity in coding, documentation, understanding, and debugging while offering near zero payoff. > right now the new > forthcoming built-in 'reverse' is trying to avoid "accidentally > working" > on mappings by featuretesting for (e.g.) has_key, but if the user > could optionally subclass either of these abstract basetypes (but not > both at once, see [6]:-), that might ease reverse's task in some > cases. In the C code, the actual test is for PySequence_Check() which seems to do a good job of finding non-mapping objects implementing __getitem__. > Why, such abstract basetypes might even make operator.isMappingType > useful again -- right now, of course: > >>> operator.isMappingType([]) > True > and therefore there isn't much point in that function:-). In the meantime, I would like to remove that function from the operator module. It is broken. > > ...just a sec... > > > Ok, ready -- fire away! So, 1.5.2 wasn't good enough for you. Perhaps *this* change will be to your liking. Fry type checking dog, fry! Raymond From arigo at tunes.org Sun Nov 2 18:35:16 2003 From: arigo at tunes.org (Armin Rigo) Date: Sun Nov 2 18:39:24 2003 Subject: [Python-Dev] Looking for master thesis ideas involving Python In-Reply-To: References: <3FA1C6CD.6050201@ocf.berkeley.edu> <3FA0A210.10605@ocf.berkeley.edu> <2mhe1rj7n8.fsf@starship.python.net> <3FA1C6CD.6050201@ocf.berkeley.edu> <5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com> <5.1.1.6.0.20031031111429.03110880@telecommunity.com> <2mad7h72sr.fsf@starship.python.net> <20031101124628.GA26463@vicky.ecs.soton.ac.uk> Message-ID: <20031102233516.GA22361@vicky.ecs.soton.ac.uk> Hello Martin, On Sun, Nov 02, 2003 at 08:05:55PM +0100, Martin v. L?wis wrote: > > > More than that in the good cases. Something I forgot was that you'd > > > probably have to knock variable length types on the head. > > > > Why? > > Assuming "to knock on the head" means "to put an end to": > > If you put all objects of the same type into a pool, you really want > all objects to have the same side, inside a pool. With that > assumption, garbage objects can be reallocated without causing > fragmentation. If objects in a pool have different sizes, it is not > possible to have an efficient reallocation strategy. "Not easy" would have been more appropriate. It is still basically what malloc() does. One way would be to use Python's current memory allocator, by adapting it to sort objects into pools not only according to size but also according to type. What seems to me like a good solution would be to use one relatively large "arena" per type and Python's memory allocator to subdivide each arena. If each arena starts at a pointer address which is properly aligned, then *(p&MASK) gives you the type of any object, and possibly even without much cache-miss overhead because there are not so many arenas in total (probably only 1-2 per type in common cases, and arenas can be large). A bientot, Armin. From martin at v.loewis.de Sun Nov 2 19:18:53 2003 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun Nov 2 19:19:02 2003 Subject: [Python-Dev] Looking for master thesis ideas involving Python In-Reply-To: <20031102233516.GA22361@vicky.ecs.soton.ac.uk> References: <3FA1C6CD.6050201@ocf.berkeley.edu> <3FA0A210.10605@ocf.berkeley.edu> <2mhe1rj7n8.fsf@starship.python.net> <3FA1C6CD.6050201@ocf.berkeley.edu> <5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com> <5.1.1.6.0.20031031111429.03110880@telecommunity.com> <2mad7h72sr.fsf@starship.python.net> <20031101124628.GA26463@vicky.ecs.soton.ac.uk> <20031102233516.GA22361@vicky.ecs.soton.ac.uk> Message-ID: <3FA59EED.1020900@v.loewis.de> Armin Rigo wrote: >>If you put all objects of the same type into a pool, you really want >>all objects to have the same side, inside a pool. With that >>assumption, garbage objects can be reallocated without causing >>fragmentation. If objects in a pool have different sizes, it is not >>possible to have an efficient reallocation strategy. > > > "Not easy" would have been more appropriate. It is still basically what > malloc() does. That's why I said "efficient". What malloc basically does is not efficient. It gets worse if, at reallocation time, you are not only bound by size, but also by type. E.g. if you have deallocated a tuple of 10 elements, and then reallocate a tuple of 6, the wasted space can only hold a tuple of 1 element, nothing else. > One way would be to use Python's current memory allocator, by adapting it to > sort objects into pools not only according to size but also according to type. > What seems to me like a good solution would be to use one relatively large > "arena" per type and Python's memory allocator to subdivide each arena. If > each arena starts at a pointer address which is properly aligned, then > *(p&MASK) gives you the type of any object, and possibly even without much > cache-miss overhead because there are not so many arenas in total (probably > only 1-2 per type in common cases, and arenas can be large). So where do you put strings with 100,000 elements (characters)? Or any other object that exceeds an arena in size? Regards, Martin From bmr at austin.rr.com Sun Nov 2 21:27:14 2003 From: bmr at austin.rr.com (Brian Rzycki) Date: Sun Nov 2 21:27:18 2003 Subject: [Python-Dev] new language ideas Message-ID: <3C7C23AC-0DA5-11D8-8CA1-00039376D608@austin.rr.com> Hi all, I've been tinkering with a bit of a pet project on and off for some time now. I'm basically trying to adapt the python style/syntax for a language that is a bit better suited as a classical systems programming language. To be able to program with Python at the high level and something very pythonesque at the lower level is very appealing. :) Well, when thinking about this, I've come up with a few ideas I think might benefit Python as well. Please forgive me if these are repeats, I've never seen anything related to this in the PEPs or on the list. I'm just tossing these out for Python's benefit... /me dons asbestos long-johns... Multiline comments -------------------------- #BEGIN ... #END Everything in between is ignored. It would be very useful when debugging decent sized blocks of code. I know certain editors can auto-comment blocks, but it can be difficult to un-auto-comment said block. The same smart editors could colorize the block accordingly, minimizing readiblity issues. __doc__ variable ------------------------ docstrings are really a special case of programmer documentation. It'd be a lot nicer if there were some way to isolate certain portions of information in the docstring. Most contain a short description as well as the expect things (I won't say types on this list) ;). docstrings could be aliased through the dictionary __doc__. The exact symantics are a bit fuzzy right now, but I wanted to toss out the idea for public scrutiny. Here's an example: def f(x): "does nothing, really." return(x) In this case, __doc__.desc would equal the docstring. This would allow for backward compatibility and allow for extension. Think author, webpage, and version at the global scope and pre/post conditions, dynamically created information about a function/class. bit access of integers ---------------------------- Like strings, we can use [] to index into python integers. It'd be a nice way to set/read individual bits of a given integer. For example: x = 5 x[0] = 0 print x (prints 4) The details of how to index (I was assuming big-endian in this example) are open to discussion. This would make bit-banging in python be even easier than C (not to mention easier to read). This assumes we want Python to be good at bit-banging. ;) alternative base notation --------------------------------- Python inherited C's notation for numbers of non-decimal bases. I propose another with simpler syntax: number_base. An example: x = 24b_16 y = 1001_2 z = 96zz_36 The range for this notation would be 2 to 36 for the base. This allows for the entire alphabet plus numbers to be used as numerical placeholders. I'd be happy if _2, _8, _16 were the only ones implemented because those are the most commonly used. It would be nice to treat it almost as if it were a call to a radix() function. I think the notation has a nice look to it and I think makes it easy to read. So that's it for now. Let me know what you think. -Brian Rzycki From tim.one at comcast.net Sun Nov 2 21:48:42 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Nov 2 21:48:47 2003 Subject: [Python-Dev] Looking for master thesis ideas involving Python In-Reply-To: <3FA59EED.1020900@v.loewis.de> Message-ID: [Martin v. L?wis, on schemes to segregate object memory by type, with the type pointer shared at a calculated address] > ... > So where do you put strings with 100,000 elements (characters)? Or any > other object that exceeds an arena in size? You allocate enough extra memory so that there's room to stick a type pointer at the calculated address; or, IOW, it becomes a one-object pool but of unusually large size. Somes bytes may be lost at the start of the allocated region to allow planting a type pointer at a pool-aligned address; but by assumption the object is "very large", so the wastage can be small in percentage terms. That said, the current pymalloc is relentlessy about speeding alloc/free of exactly-the-same-size small blocks -- there's not much code that could be reused in a type-segregated scheme (the debug pymalloc wrapper is a different story -- it can wrap any malloc/free). From DavidA at ActiveState.com Sun Nov 2 22:36:46 2003 From: DavidA at ActiveState.com (David Ascher) Date: Sun Nov 2 22:30:46 2003 Subject: [Python-Dev] OT: programming language creator or serial killer? Message-ID: <3FA5CD4E.3020805@ActiveState.com> A fun online quiz IMO (flash): http://www.malevole.com/mv/misc/killerquiz/ I got 3/10 =) From python at rcn.com Sun Nov 2 22:50:58 2003 From: python at rcn.com (Raymond Hettinger) Date: Sun Nov 2 22:52:01 2003 Subject: [Python-Dev] OT: programming language creator or serial killer? In-Reply-To: <3FA5CD4E.3020805@ActiveState.com> Message-ID: <000301c3a1bd$b14e5ea0$e841fea9@oemcomputer> [David Ascher] > A fun online quiz IMO (flash): > > http://www.malevole.com/mv/misc/killerquiz/ > > I got 3/10 =) That was great link. I got 8/10. Raymond From pje at telecommunity.com Sun Nov 2 22:54:32 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Nov 2 22:53:33 2003 Subject: [Python-Dev] new language ideas In-Reply-To: <3C7C23AC-0DA5-11D8-8CA1-00039376D608@austin.rr.com> Message-ID: <5.1.0.14.0.20031102224600.021c2ec0@mail.telecommunity.com> At 08:27 PM 11/2/03 -0600, Brian Rzycki wrote: >Multiline comments >-------------------------- >#BEGIN >.. >#END > >Everything in between is ignored. It would be very useful when debugging >decent sized blocks of code. I know certain editors can auto-comment >blocks, but it can be difficult to un-auto-comment said block. The same >smart editors could colorize the block accordingly, minimizing readiblity >issues. Just triple quote. I usually use """ for actual strings in my programs, and if I need to comment out a block I use '''. >bit access of integers >---------------------------- >Like strings, we can use [] to index into python integers. It'd be a nice >way to set/read individual bits of a given integer. For example: > >x = 5 >x[0] = 0 >print x >(prints 4) > >The details of how to index (I was assuming big-endian in this example) >are open to discussion. This would make bit-banging in python be even >easier than C (not to mention easier to read). This assumes we want >Python to be good at bit-banging. ;) Integers are immutable. What you want is a bit array; you could write one of your own in Python easily enough, or C if you need higher performance. Or maybe you could supply a patch for the Python 'array' module to support a bit type. >alternative base notation >--------------------------------- >Python inherited C's notation for numbers of non-decimal bases. I propose >another with simpler syntax: number_base. An example: > >x = 24b_16 >y = 1001_2 >z = 96zz_36 > >The range for this notation would be 2 to 36 for the base. This allows >for the entire alphabet plus numbers to be used as numerical >placeholders. I'd be happy if _2, _8, _16 were the only ones implemented >because those are the most commonly used. Python already implements 8 and 16, using 0 and 0x prefixes. Presumably, you're therefore requesting an 0b or some such. Note that you can already do this like so: >>> print int("100100",2) 36 However, if I were using bit strings a lot, I'd probably convert them to integers or longs in hex form, just to keep the program more compact. From fincher.8 at osu.edu Mon Nov 3 00:34:13 2003 From: fincher.8 at osu.edu (Jeremy Fincher) Date: Sun Nov 2 23:35:57 2003 Subject: [Python-Dev] Looking for master thesis ideas involving Python In-Reply-To: <2mad7h72sr.fsf@starship.python.net> References: <5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com> <5.1.1.6.0.20031031111429.03110880@telecommunity.com> <2mad7h72sr.fsf@starship.python.net> Message-ID: <200311030034.13193.fincher.8@osu.edu> On Friday 31 October 2003 12:08 pm, Michael Hudson wrote: > More than that in the good cases. Something I forgot was that you'd > probably have to knock variable length types on the head. That's something I've always wondered about -- what exactly is a "variable length type" and why are they special? From what I gather, they're types (long, str, and tuple are the main ones I know of) whose struct is actually of variable size -- rather than contain a pointer to a variable-size thing, they contain the variable-size thing themselves. What do we gain from them? (if there's some documentation I overlooked, feel free to point me to it.) Thanks, Jeremy From martin at v.loewis.de Sun Nov 2 23:56:27 2003 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun Nov 2 23:56:44 2003 Subject: [Python-Dev] Looking for master thesis ideas involving Python In-Reply-To: <200311030034.13193.fincher.8@osu.edu> References: <5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com> <5.1.1.6.0.20031031111429.03110880@telecommunity.com> <2mad7h72sr.fsf@starship.python.net> <200311030034.13193.fincher.8@osu.edu> Message-ID: <3FA5DFFB.8030004@v.loewis.de> Jeremy Fincher wrote: > That's something I've always wondered about -- what exactly is a > "variable length type" and why are they special? From what I gather, > they're types (long, str, and tuple are the main ones I know of) whose > struct is actually of variable size -- rather than contain a pointer > to a variable-size thing, they contain the variable-size thing > themselves. Correct. Examples include strings and tuples, but not lists and dictionaries. > What do we gain from them? Speed, by saving an extra allocation upon creation; also some speed by saving an indirection upon access. It only works if the number of items in the object is not going to change over the lifetime of the object - in particular, for immutable objects. There is actually an exception to this rule: If you own the only reference to the object, you can afford to change its size (available for strings only). Regards, Martin From aleaxit at yahoo.com Mon Nov 3 02:54:52 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Mon Nov 3 02:54:59 2003 Subject: [Python-Dev] reflections on basestring -- and other abstractbasetypes In-Reply-To: <002601c3a193$fcad8300$e841fea9@oemcomputer> References: <002601c3a193$fcad8300$e841fea9@oemcomputer> Message-ID: <200311030854.52823.aleaxit@yahoo.com> On Sunday 02 November 2003 11:52 pm, Raymond Hettinger wrote: > > 1. Shouldn't class UserString.UserString inherit from basestring? > > The functionality of UserString has been subsumed by inheriting from > str. So, its main purpose now is to keep old code working which means > that it is probably not wise to suddenly convert it from a classic class > to a new-style class. OK, I guess. The implementation doesn't offer all that much extra convenience when compared to inheriting str, anyway -- no "factoring out" a la DictMixin, for example. Presumably there's little demand. > At one time, I also requested an abstract numeric inheritance hierarchy > with real=union(int,float,long) and numbers=union(real,complex). > However, much time has passed and the need has never risen again. I guess I just play too much with numbers...;-). > > multiply inherit from basestring AND also from another builtin type > which > > does not in turn inherit from basestring. > > I would rather leave this open than introduce code to prevent it. My > sense is that blocking it would introduce complexity in coding, > documentation, understanding, and debugging while offering near zero > payoff. The payoff would be just in avoiding confusion. I don't see what complexity there could be in making each base* abstracttype incompatible with the others -- guess I'm missing something...? > In the C code, the actual test is for PySequence_Check() which seems to > do a good job of finding non-mapping objects implementing __getitem__. Unless I'm mistaken, that's exactly operator.isSequenceType(), and: >>> import operator, UserDict >>> operator.isSequenceType(UserDict.UserDict()) True ...wouldn't it be NICE to let the user help code needing to disambiguate sequences from mappings by inheriting basesequence or basemapping...? > operator.isMappingType ... > In the meantime, I would like to remove that function from the operator > module. It is broken. Yes, but isn't isSequenceType pretty iffy too...? Alex From python at rcn.com Mon Nov 3 03:16:06 2003 From: python at rcn.com (Raymond Hettinger) Date: Mon Nov 3 03:18:00 2003 Subject: [Python-Dev] reflections on basestring -- and other abstractbasetypes In-Reply-To: <200311030854.52823.aleaxit@yahoo.com> Message-ID: <000101c3a1e2$d9f811a0$e841fea9@oemcomputer> [Alex] > > > multiply inherit from basestring AND also from another builtin type > > which > > > does not in turn inherit from basestring. [Raymond] > > I would rather leave this open than introduce code to prevent it. My > > sense is that blocking it would introduce complexity in coding, > > documentation, understanding, and debugging while offering near zero > > payoff. [Alex] > The payoff would be just in avoiding confusion. I don't see what > complexity there could be in making each base* abstracttype > incompatible with the others -- guess I'm missing something...? More rules to remember: Thing X doesn't work with thing Y but W which is like X never got taken care of. More docs to read and write: You would document that the combination is illegal and explain why, right? More code to implement the check for prohibited combinations. Payoff: only when someone multiply inherits from an abstract builtin type and another builtin type. Does anyone other than you, me, Armin, and Tim even use multiple inheritance? This basically never comes up unless we're spending an evening seeing how creating toy problems just to push the features to the limits. Put another way: Is this a real world problem for anyone outside python blackbelts who already know better? Answer: Probably not. [Raymond] > > operator.isMappingType > ... > > In the meantime, I would like to remove that function from the operator > > module. It is broken. [Alex] > Yes, but isn't isSequenceType pretty iffy too...? Nope. >>> import operator >>> map(operator.isSequenceType, [(), [], 'ab', u'ab', {}, 1]) [True, True, True, True, False, False] >>> map(operator.isMappingType, [(), [], 'ab', u'ab', {}, 1]) [True, True, True, True, True, False] The first is 100% correct. The second has four false positives. For user defined classes implementing __getitem__, neither function can distinguish between a mapping or a sequence. This is the best they can do. Raymond Hettinger From aleaxit at yahoo.com Mon Nov 3 03:40:02 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Mon Nov 3 03:40:12 2003 Subject: [Python-Dev] reflections on basestring -- and other abstractbasetypes In-Reply-To: <000101c3a1e2$d9f811a0$e841fea9@oemcomputer> References: <000101c3a1e2$d9f811a0$e841fea9@oemcomputer> Message-ID: <200311030940.02592.aleaxit@yahoo.com> On Monday 03 November 2003 09:16 am, Raymond Hettinger wrote: ... > type and another builtin type. Does anyone other than you, me, Armin, > and Tim even use multiple inheritance? This basically never comes up I think you're not very acquainted with people coming from C++ or Eiffel... > > Yes, but isn't isSequenceType pretty iffy too...? > > Nope. > > >>> import operator > >>> map(operator.isSequenceType, [(), [], 'ab', u'ab', {}, 1]) > > [True, True, True, True, False, False] > > >>> map(operator.isMappingType, [(), [], 'ab', u'ab', {}, 1]) > > [True, True, True, True, True, False] > > The first is 100% correct. > The second has four false positives. Right: isSequenceType works on built-ins, isMappingType doesn't. > For user defined classes implementing __getitem__, neither function can > distinguish between a mapping or a sequence. This is the best they can > do. OK -- so, if we had basesequence and basemapping, the user COULD help make the distinction totally reliable (if multiply inheriting from both was allowed, the user could also make a total unusable muddle of course:-). Alex From anthony at interlink.com.au Mon Nov 3 03:48:18 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Nov 3 03:52:06 2003 Subject: [Python-Dev] bsddb test case deadlocks fixed In-Reply-To: <20031102092517.GB5805@zot.electricrain.com> Message-ID: <200311030848.hA38mItM008890@localhost.localdomain> >From what I understand, these fixes aren't just fixes to the test suite, but also to fix real problems with the bsddb code itself. In that case, should it be added to the 23 branch? I'd be a solid +1 on this for 2.3.3. Anyone else? Anthony -- Anthony Baxter It's never too late to have a happy childhood. From aleaxit at yahoo.com Mon Nov 3 03:54:24 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Mon Nov 3 03:54:30 2003 Subject: [Python-Dev] bsddb test case deadlocks fixed In-Reply-To: <200311030848.hA38mItM008890@localhost.localdomain> References: <200311030848.hA38mItM008890@localhost.localdomain> Message-ID: <200311030954.24191.aleaxit@yahoo.com> On Monday 03 November 2003 09:48 am, Anthony Baxter wrote: > From what I understand, these fixes aren't just fixes to the test suite, > but also to fix real problems with the bsddb code itself. In that case, > should it be added to the 23 branch? I'd be a solid +1 on this for 2.3.3. > > Anyone else? Anything that makes bsddb less flaky on 2.3.* gets a big hearty enthusiastic +1 from me too. Alex From mwh at python.net Mon Nov 3 06:35:05 2003 From: mwh at python.net (Michael Hudson) Date: Mon Nov 3 06:35:08 2003 Subject: [Python-Dev] Looking for master thesis ideas involving Python In-Reply-To: <20031102233516.GA22361@vicky.ecs.soton.ac.uk> (Armin Rigo's message of "Sun, 2 Nov 2003 23:35:16 +0000") References: <3FA1C6CD.6050201@ocf.berkeley.edu> <3FA0A210.10605@ocf.berkeley.edu> <2mhe1rj7n8.fsf@starship.python.net> <3FA1C6CD.6050201@ocf.berkeley.edu> <5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com> <5.1.1.6.0.20031031111429.03110880@telecommunity.com> <2mad7h72sr.fsf@starship.python.net> <20031101124628.GA26463@vicky.ecs.soton.ac.uk> <20031102233516.GA22361@vicky.ecs.soton.ac.uk> Message-ID: <2mu15l65xy.fsf@starship.python.net> Armin Rigo writes: > Hello Martin, > > On Sun, Nov 02, 2003 at 08:05:55PM +0100, Martin v. L?wis wrote: >> > > More than that in the good cases. Something I forgot was that you'd >> > > probably have to knock variable length types on the head. >> > >> > Why? >> >> Assuming "to knock on the head" means "to put an end to": >> >> If you put all objects of the same type into a pool, you really want >> all objects to have the same side, inside a pool. With that >> assumption, garbage objects can be reallocated without causing >> fragmentation. If objects in a pool have different sizes, it is not >> possible to have an efficient reallocation strategy. > > "Not easy" would have been more appropriate. It is still basically what > malloc() does. Well, yeah, but as Tim said pymalloc gets its wins from assuming that each allocation is the same size. You could combine my idea with some other allocation scheme, certainly, but given the relative paucity of variable length types and the reduction in allocator overhead using something like pymalloc gives us, I think it might just be easier to not do them any more. Of course, I don't see myself having any time to play with this idea any time soon, and it's probably not really beefy enough to get a masters thesis from, so maybe we'll never know. > One way would be to use Python's current memory allocator, by > adapting it to sort objects into pools not only according to size > but also according to type. That's pretty much what I was suggesting. > What seems to me like a good solution would be to use one relatively > large "arena" per type and Python's memory allocator to subdivide > each arena. If each arena starts at a pointer address which is > properly aligned, then *(p&MASK) gives you the type of any object, > and possibly even without much cache-miss overhead because there are > not so many arenas in total (probably only 1-2 per type in common > cases, and arenas can be large). Hmm, maybe. I'm not going to make guesses about that one :-) Cheers, mwh -- ... Windows proponents tell you that it will solve things that your Unix system people keep telling you are hard. The Unix people are right: they are hard, and Windows does not solve them, ... -- Tim Bradshaw, comp.lang.lisp From mwh at python.net Mon Nov 3 07:14:31 2003 From: mwh at python.net (Michael Hudson) Date: Mon Nov 3 07:14:35 2003 Subject: [Python-Dev] reflections on basestring -- and other abstract basetypes In-Reply-To: <200311022319.42725.aleaxit@yahoo.com> (Alex Martelli's message of "Sun, 2 Nov 2003 23:19:42 +0100") References: <200311022319.42725.aleaxit@yahoo.com> Message-ID: <2m8ymx6448.fsf@starship.python.net> Alex Martelli writes: > 1. Shouldn't class UserString.UserString inherit from basestring? After all, > basestring exists specifically in order to encourage typetests of the form > isinstance(x, basestring) -- wouldn't it be better if such tests could > also catch "user-tweaked strings" derived from UserString ... ? > > 2. If we do want to encourage such typetest idioms, it might be a good idea > to provide some other such abstract basetypes for the purpose. I'd really rather not. I think this is a slippery slope I want to stay right at the top of. Doing different things depending on which protocol a function argument happens to implement is icky, even if it's sometimes extremely convenient. I don't think we should make it easier. > 4. Furthermore, providing "basenumber" would let user-coded classes "flag" > in a simple and direct way "I'm emulating numbers". This might well be > useful _to Python itself_... > Right now, I'm stuck for an answer to the bug that a user-coded class > which exposes __mul__ but not __rmul__ happens to support its instances > being multiplied by an integer on the right -- quite surprising to users! > The problem is that this behavior is apparently expected, though not > documented, when the user-coded class is trying to simulate a _sequence_ > rather than a number. So, I can't just take the peculiar "accidental > commutativity with integers only" away. > IF a user class could flag itself as "numeroid" by inheriting basenumber, > THEN the "accidental commutativity" COULD be easily removed at least > for such classes. This is just a bug, albeit a subtle and hard to fix one. And, as a paid up member of the anti-operator-overloading-bigot camp, I'll just say: a) if your user coded class is so unlike a number as to not be multipliable by an int, why are you overloading '*'? and b) if Python had different operators for sequence repition and multiplying numbers, the relavent bug would be much easier to fix... > 7. one might of course think of other perhaps-useful abstract basetypes, > such as e.g. basesequence or basemapping -- right now the new > forthcoming built-in 'reverse' is trying to avoid "accidentally working" > on mappings by featuretesting for (e.g.) has_key, but if the user > could optionally subclass either of these abstract basetypes (but not > both at once, see [6]:-), that might ease reverse's task in some cases. Well, I (unsurprisingly, given the above) think this problem again comes from using the same notation for two different things (mappings and sequences). Or looking at it another way, it comes from ancient misdesigns in the C API that it's now essential impossible to fix (that sq_item is an intargfunc, roughly). I don't think we should try to cover up these misfeatures with another. Cheers, mwh -- The meaning of "brunch" is as yet undefined. -- Simon Booth, ucam.chat From aleaxit at yahoo.com Mon Nov 3 07:47:10 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Mon Nov 3 07:47:22 2003 Subject: [Python-Dev] check-in policy, trunk vs maintenance branch Message-ID: <200311031347.10995.aleaxit@yahoo.com> I made a few bugfix check-ins to the 2.3 maintenance branch this weekend and Michael Hudson commented that he thinks that so doing is a bad idea, that bug fixes should filter from the 2.4 trunk to the 2.3 branch and not the other way around. Is this indeed the policy (have I missed some guidelines about it)? I guess for this round of fixes I will find the time to forward-port them to the 2.4 trunk (in AMPLE time for a 2.4 release -- as 2.3.3 is going to come well before 2.4 releases, the other way 'round wouldn't be quite so sure:-), but what about the future? Should fixes applicable to both 2.3.* and 2.4 be made [a] always to both trunk and branch, [b] always to the trunk but to the branch only once one comes around to that, [c] always to the branch but to the trunk only once one comes around to that, ...? Oh, incidentally, if it matters -- most were docs issues, including as "docs" also some changes to comments that previously were misleading or ambiguous. I guess that my problem is that I think of 2.3.* fixes as things that will be useful to "the general Python-using public" pretty soon, with 2.4 far off in the future, so that it appears to me that trying to make 2.3.* as well fixed as possible has higher priority. But if that conflicts with policy, I will of course change anyway. Thanks, Alex From mcherm at mcherm.com Mon Nov 3 08:35:57 2003 From: mcherm at mcherm.com (Michael Chermside) Date: Mon Nov 3 08:36:07 2003 Subject: [Python-Dev] new language ideas Message-ID: <1067866557.3fa659bd77a4d@mcherm.com> Brian Rzycki writes: > Multiline comments Already got it. Triple quoting. > __doc__ variable Just making __doc__ a dictionary instead of a string doesn't achieve anything *unless* there is a fairly standard set of expected keys in this dictionary. (This is documentation, so the list of standard keys doesn't have to be universal but without a common set of keys you can expect to encounter, the only thing you can really do is to print out the entire contents of the dictionary, and if all you can do is print it you might as well just be using a string.) You write: > Think author, > webpage, and version at the global scope and pre/post conditions, > dynamically created information about a function/class. which is an interesting-sounding list, but if I saw a PEP which proposed making __doc__ a dictionary which *didn't* specify just what the "common" key would be and what they would contain, then I'd be -1 on it. And if it *did* specify, I imagine there would be far more controversy than you expect. > bit access of integers > ---------------------------- > Like strings, we can use [] to index into python integers. Hmm... very interesting, actually. But on reflection, I think we're better off leaving integers as *numbers* and having a *separate* type for bitmasks. This separate type could even be written in Python (I doubt the speed of a C implementation would be worthwhile... the real advantage of the type would be ease of use, not performance). Clearly it would have a convert-to-integer feature (perhaps one which would let you specify whether you wanted signed or unsigned, and what width, and what endian-ness, etc.). > alternative base notation > --------------------------------- > Python inherited C's notation for numbers of non-decimal bases. I > propose another with simpler syntax: number_base. Definite -1 from me. Several reasons. Here's a number in hex: b4a0_16 Oh wait... sorry, that's not a number, that's an identifier. Another reason is that it's just not something that is done all that frequently. Another reason is that we already have TWO syntaxes for doing numbers in different bases: There's the 0x prefix for hex and the 0 prefix for octal (but if I had my way we'd dump that... who uses octal?). And there's the "int('', )" syntax which has just a few more characters than your solution and is IMHO more readable. Even if I'm shooting most of these down, don't give up... you're certainly injecting a little creative thought into the process. Sometimes that stirs up really exciting ideas. -- Michael Chermside From mcherm at mcherm.com Mon Nov 3 08:55:12 2003 From: mcherm at mcherm.com (Michael Chermside) Date: Mon Nov 3 08:55:25 2003 Subject: [Python-Dev] reflections on basestring -- and other abstractbasetypes Message-ID: <1067867712.3fa65e4084e79@mcherm.com> Alex muses on basestring: > 2. If we do want to encourage such typetest idioms, it might be a good idea > to provide some other such abstract basetypes for the purpose. [...] > If there was an abstract basetype, say "baseinteger", from which int and > long derived, Great idea... I think there should be single type from which all built-in integer-like types inherit, and which user-designed types can inherit if they want to behave like integers. I think that type should be called "int". Once the int/long distinction is completely gone, this will be quite clean, the only confusion now is that the int/long distinction isn't yet completely hidden. > 4. Furthermore, providing "basenumber" would let user-coded classes "flag" > in a simple and direct way "I'm emulating numbers". Okay, that sounds like it might be useful, at least to those people who work with wierd varieties of numbers. But I can't think how. Normally, I figure that if you overload addition, multiplication, subtraction, and perhaps a few other such operators, then you're trying to emulate numbers (that or you're abusing operator overloading, and I have no real sympathy for you). What use cases do you have for "basenumber" (I don't mean examples of classes that would inherit from basenumber, I mean examples where that inheritance would make a difference)? > IF a user class could flag itself as "numeroid" by inheriting basenumber, > THEN the "accidental commutativity" COULD be easily removed at least > for such classes. Okay, that's one use case. Any others? 'cause I'm coming up blank. > ...does anybody see any problem if, in 2.4, we take away the ability to > multiply inherit from basestring AND also from another builtin type which > does not in turn inherit from basestring...? I do! I personally wouldn't try to create the class "perlnum" which inherits from basestring and also basenumber and which tries to magicaly know which is desired and convert back and forth on demand. But I'm sure *someone* out there is just dying to write such a class. Why prevent them? Not that I'd every USE such a monstrocity, but just don't see the ADVANTAGE in providing the programmer with a straightjacket by typechecking them (at the language level) to prevent uses outside of those envisioned by the language implementers. It sounds decidedly non-pythonic to me. -- Michael Chermside From mwh at python.net Mon Nov 3 08:58:05 2003 From: mwh at python.net (Michael Hudson) Date: Mon Nov 3 08:58:10 2003 Subject: [Python-Dev] check-in policy, trunk vs maintenance branch In-Reply-To: <200311031347.10995.aleaxit@yahoo.com> (Alex Martelli's message of "Mon, 3 Nov 2003 13:47:10 +0100") References: <200311031347.10995.aleaxit@yahoo.com> Message-ID: <2mznfd4kr6.fsf@starship.python.net> Alex Martelli writes: > I made a few bugfix check-ins to the 2.3 maintenance branch this > weekend and Michael Hudson commented that he thinks that so doing is > a bad idea, that bug fixes should filter from the 2.4 trunk to the > 2.3 branch and not the other way around. Is this indeed the policy > (have I missed some guidelines about it)? Well, it's more practice than policy. I guess the (my...) thinking was that the trunk gets more testing, so it's a proving ground for fixes. It also depends on who's going to be release monkey for the next point release. The branch is to a certain extent "theirs" and they should get to decide how things work. I'm not sure who's got the hat at the moment (Anthony?). > I guess for this round of fixes I will find the time to forward-port > them to the 2.4 trunk (in AMPLE time for a 2.4 release -- as 2.3.3 > is going to come well before 2.4 releases, the other way 'round > wouldn't be quite so sure:-), but what about the future? Should > fixes applicable to both 2.3.* and 2.4 be made [a] always to both > trunk and branch, [b] always to the trunk but to the branch only > once one comes around to that, [c] always to the branch but to the > trunk only once one comes around to that, ...? My order of preference were I to be 2.3.3 monkey would be [a], then [b]. > Oh, incidentally, if it matters -- most were docs issues, including > as "docs" also some changes to comments that previously were > misleading or ambiguous. > > I guess that my problem is that I think of 2.3.* fixes as things > that will be useful to "the general Python-using public" pretty > soon, with 2.4 far off in the future, so that it appears to me that > trying to make 2.3.* as well fixed as possible has higher priority. > But if that conflicts with policy, I will of course change anyway. Maybe a decision could be made now and the conclusions written down somewhere? My habits are to do all work in the trunk checkout and then backport, but I could adapt if the decision went the other way. Sometimes it's not clear whether a fix is applicable to the branch, for one thing. Cheers, mwh -- Well, yes. I don't think I'd put something like "penchant for anal play" and "able to wield a buttplug" in a CV unless it was relevant to the gig being applied for... -- Matt McLeod, alt.sysadmin.recovery From aahz at pythoncraft.com Mon Nov 3 09:01:23 2003 From: aahz at pythoncraft.com (Aahz) Date: Mon Nov 3 09:01:26 2003 Subject: [Python-Dev] check-in policy, trunk vs maintenance branch In-Reply-To: <200311031347.10995.aleaxit@yahoo.com> References: <200311031347.10995.aleaxit@yahoo.com> Message-ID: <20031103140123.GA14146@panix.com> On Mon, Nov 03, 2003, Alex Martelli wrote: > > I made a few bugfix check-ins to the 2.3 maintenance branch this > weekend and Michael Hudson commented that he thinks that so doing is a > bad idea, that bug fixes should filter from the 2.4 trunk to the 2.3 > branch and not the other way around. Is this indeed the policy (have > I missed some guidelines about it)? PEP 6: As individual patches get contributed to the feature release fork, each patch contributor is requested to consider whether the patch is a bug fix suitable for inclusion in a patch release. If the patch is considered suitable, the patch contributor will mail the SourceForge patch (bug fix?) number to the maintainers' mailing list. That seems clear enough to me, though it could probably stand some updating for using appropriate vocabulary and matching current practice. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan From cpr at emsoftware.com Mon Nov 3 09:44:27 2003 From: cpr at emsoftware.com (Chris Ryland) Date: Mon Nov 3 09:45:05 2003 Subject: [Python-Dev] Looking for master thesis ideas involving Python Message-ID: <397F8C9A-0E0C-11D8-8358-000393DC534A@emsoftware.com> Michael Hudson wrote: > Remove the ob_type field from all PyObjects. Make pymalloc mandatory, > make it use type specific pools and store a pointer to the type object > at the start of each pool. > > So instead of > p->ob_type > it's > *(p&MASK) > > I think having each type in its own pools would also let you lose the > gc_next & gc_prev fields. > > Combined with a non-refcount GC, you could hammer sizeof(PyIntObject) > down to sizeof(long)! Yes, this is a variant of an implementation technique used in early Lisp and Lisp-like language systems with types (e.g., Harvard's EL-1) back in the early 70's (at least--that's when I first encountered it). In those systems, you'd use the "page #" (higher-order bits) of a pointer to reference a type table. Good idea, but perhaps less effective these days where memory isn't quite so dear. (Back then, a large system was a PDP-10 with 256K 36-bit words, or around 1MB.) Cheers! --Chris Ryland / Em Software, Inc. / www.emsoftware.com From mwh at python.net Mon Nov 3 09:52:28 2003 From: mwh at python.net (Michael Hudson) Date: Mon Nov 3 09:52:33 2003 Subject: [Python-Dev] Looking for master thesis ideas involving Python In-Reply-To: <397F8C9A-0E0C-11D8-8358-000393DC534A@emsoftware.com> (Chris Ryland's message of "Mon, 3 Nov 2003 09:44:27 -0500") References: <397F8C9A-0E0C-11D8-8358-000393DC534A@emsoftware.com> Message-ID: <2mvfq14i8j.fsf@starship.python.net> Chris Ryland writes: > Michael Hudson wrote: >> Remove the ob_type field from all PyObjects. Make pymalloc mandatory, >> make it use type specific pools and store a pointer to the type object >> at the start of each pool. >> >> So instead of >> p->ob_type >> it's >> *(p&MASK) >> >> I think having each type in its own pools would also let you lose the >> gc_next & gc_prev fields. >> >> Combined with a non-refcount GC, you could hammer sizeof(PyIntObject) >> down to sizeof(long)! > > Yes, this is a variant of an implementation technique used in early > Lisp and Lisp-like language systems with types (e.g., Harvard's EL-1) > back in the early 70's (at least--that's when I first encountered > it). In those systems, you'd use the "page #" (higher-order bits) of a > pointer to reference a type table. Heh, that's interesting to know. Nothing new under the sun & all that. > Good idea, but perhaps less effective these days where memory isn't > quite so dear. (Back then, a large system was a PDP-10 with 256K > 36-bit words, or around 1MB.) Cache memory is still expensive: if we can get more PyObjects into each cache line, we still win (at least, that's what I was thinking). Also, for say small tuples, the overhead of gc fields, refcount and type pointer is really frightening. Yes, memory is cheap, but using 3 or so times as much as we need to is still excessive. Cheers, mwh -- If trees could scream, would we be so cavalier about cutting them down? We might, if they screamed all the time, for no good reason. -- Jack Handey From arigo at tunes.org Mon Nov 3 09:58:34 2003 From: arigo at tunes.org (Armin Rigo) Date: Mon Nov 3 10:02:29 2003 Subject: [Python-Dev] Looking for master thesis ideas involving Python In-Reply-To: <3FA59EED.1020900@v.loewis.de> References: <3FA0A210.10605@ocf.berkeley.edu> <2mhe1rj7n8.fsf@starship.python.net> <3FA1C6CD.6050201@ocf.berkeley.edu> <5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com> <5.1.1.6.0.20031031111429.03110880@telecommunity.com> <2mad7h72sr.fsf@starship.python.net> <20031101124628.GA26463@vicky.ecs.soton.ac.uk> <20031102233516.GA22361@vicky.ecs.soton.ac.uk> <3FA59EED.1020900@v.loewis.de> Message-ID: <20031103145834.GA22719@vicky.ecs.soton.ac.uk> Hello Martin, On Mon, Nov 03, 2003 at 01:18:53AM +0100, "Martin v. L?wis" wrote: > >"Not easy" would have been more appropriate. It is still basically what > >malloc() does. > > That's why I said "efficient". What malloc basically does is not > efficient. It gets worse if, at reallocation time, you are not only > bound by size, but also by type. E.g. if you have deallocated a tuple of > 10 elements, and then reallocate a tuple of 6, the wasted space can only > hold a tuple of 1 element, nothing else. That's why we have a custom allocator in Python, to minimize this kind of impact by subdividing arenas into pools of objects grouped by size. I admit that adding the type constrain adds burden to the allocator, though. > So where do you put strings with 100,000 elements (characters)? Or any > other object that exceeds an arena in size? These ones are not a problem, because objects and arena can be larger than the MASK. You get to the start of the arena by masking bits away from the address of the *beginning* of the object. An arena can be of any size as long as all the objects it contains starts in the first MASK bytes. For a very large object, the arena would contain only this object, which would then start at the beginning of the arena. I'm more concerned about medium-sized objects, e.g. the ones whose size is 51% of MASK. At the moment I don't see a good solution for these. A bientot, Armin. From anthony at interlink.com.au Mon Nov 3 10:01:56 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Nov 3 10:05:40 2003 Subject: [Python-Dev] check-in policy, trunk vs maintenance branch In-Reply-To: <2mznfd4kr6.fsf@starship.python.net> Message-ID: <200311031501.hA3F1uH0016389@localhost.localdomain> >>> Michael Hudson wrote > Well, it's more practice than policy. I guess the (my...) thinking > was that the trunk gets more testing, so it's a proving ground for > fixes. > > It also depends on who's going to be release monkey for the next point > release. The branch is to a certain extent "theirs" and they should > get to decide how things work. I'm not sure who's got the hat at the > moment (Anthony?). Unless someone desperately wants it, I'm happy to keep on doing it. What I'd prefer: - Apply to trunk first (assuming, of course, that the patch isn't something that's only needed on the branch - at this point in time, I can't see that happening, as release23-maint and the trunk haven't diverged far enough yet) - Mark (in checkin message) if the patch is a bugfix candidate - If you're comfortable that the patch is a non-controversial bugfix, then commit it to the branch as well, AFTER you have run the unittests on the branch to make sure it still works) What makes for a controversial vs non-controversial patch? There's a couple of things I think are important to bear in mind: - Functionality changes are controversial. Unless there's been a discussion and agreement (or BDFL fiat ) on python-dev, it shouldn't go in. - Major changes just near a release are going to be controversial, as it makes the life of the release-monkey-of-the-moment more painful. At the end of the day, if you're not sure your patch should go to the branch, then mark it so in the checkin message, and someone (me, mwh, someone else willing to look into it) can make a judgment call. On the other hand, no-one's going to jump up and down screaming if you do check something in that probably shouldn't have gone in - we can always just revert it if necessary. I reserve the right to jump up and down if someone checks something in when I'm in the middle of a release and the branch is frozen, though . Also, if you're checking something into the branch, please try and make it obvious that the change is a backport or whatever. Something like Backport of is good. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From jeremy at alum.mit.edu Mon Nov 3 10:36:33 2003 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon Nov 3 10:39:32 2003 Subject: [Python-Dev] check-in policy, trunk vs maintenance branch In-Reply-To: <200311031347.10995.aleaxit@yahoo.com> References: <200311031347.10995.aleaxit@yahoo.com> Message-ID: <1067873793.19568.27.camel@localhost.localdomain> On Mon, 2003-11-03 at 07:47, Alex Martelli wrote: > I made a few bugfix check-ins to the 2.3 maintenance branch this weekend and > Michael Hudson commented that he thinks that so doing is a bad idea, that bug > fixes should filter from the 2.4 trunk to the 2.3 branch and not the other way > around. Is this indeed the policy (have I missed some guidelines about it)? It is customary to fix things on the trunk first, then backport to branches where it is needed. People who maintain branches often watch the trunk to look for things that need to be backported. As far as I know, no one watches the branches to look for things to port to the trunk. It may get lost if it's only on a branch. The best thing to do is your option [a]: Fix it in both places at once. Then there's nothing to be forgotten when time for a release rolls around. Jeremy From skip at pobox.com Mon Nov 3 10:42:31 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Nov 3 10:42:38 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/bsddb __init__.py, 1.11, 1.12 In-Reply-To: References: Message-ID: <16294.30567.537151.106168@montanaro.dyndns.org> greg> import UserDict greg> class _iter_mixin(UserDict.DictMixin): greg> def __iter__(self): greg> try: ... Should _iter_mixin inherit from dict, or is there a backward compatibility issue? Skip From arigo at tunes.org Mon Nov 3 10:55:36 2003 From: arigo at tunes.org (Armin Rigo) Date: Mon Nov 3 10:59:29 2003 Subject: [Python-Dev] Looking for master thesis ideas involving Python In-Reply-To: <2mu15l65xy.fsf@starship.python.net> References: <3FA0A210.10605@ocf.berkeley.edu> <2mhe1rj7n8.fsf@starship.python.net> <3FA1C6CD.6050201@ocf.berkeley.edu> <5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com> <5.1.1.6.0.20031031111429.03110880@telecommunity.com> <2mad7h72sr.fsf@starship.python.net> <20031101124628.GA26463@vicky.ecs.soton.ac.uk> <20031102233516.GA22361@vicky.ecs.soton.ac.uk> <2mu15l65xy.fsf@starship.python.net> Message-ID: <20031103155536.GA29074@vicky.ecs.soton.ac.uk> Hello Michael, On Mon, Nov 03, 2003 at 11:35:05AM +0000, Michael Hudson wrote: > > "Not easy" would have been more appropriate. It is still basically what > > malloc() does. > > Well, yeah, but as Tim said pymalloc gets its wins from assuming that > each allocation is the same size. You could combine my idea with some > other allocation scheme, certainly, but given the relative paucity of > variable length types and the reduction in allocator overhead using > something like pymalloc gives us, I think it might just be easier to > not do them any more. Of course, I don't see myself having any time > to play with this idea any time soon, and it's probably not really > beefy enough to get a masters thesis from, so maybe we'll never know. Ok. I expect it to be much easier to experiment with with PyPy anyway. Armin From mwh at python.net Mon Nov 3 11:00:58 2003 From: mwh at python.net (Michael Hudson) Date: Mon Nov 3 11:01:02 2003 Subject: [Python-Dev] Looking for master thesis ideas involving Python In-Reply-To: <20031103155536.GA29074@vicky.ecs.soton.ac.uk> (Armin Rigo's message of "Mon, 3 Nov 2003 15:55:36 +0000") References: <3FA0A210.10605@ocf.berkeley.edu> <2mhe1rj7n8.fsf@starship.python.net> <3FA1C6CD.6050201@ocf.berkeley.edu> <5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com> <5.1.1.6.0.20031031111429.03110880@telecommunity.com> <2mad7h72sr.fsf@starship.python.net> <20031101124628.GA26463@vicky.ecs.soton.ac.uk> <20031102233516.GA22361@vicky.ecs.soton.ac.uk> <2mu15l65xy.fsf@starship.python.net> <20031103155536.GA29074@vicky.ecs.soton.ac.uk> Message-ID: <2mr80p4f2d.fsf@starship.python.net> Armin Rigo writes: > Hello Michael, > > On Mon, Nov 03, 2003 at 11:35:05AM +0000, Michael Hudson wrote: >> > "Not easy" would have been more appropriate. It is still basically what >> > malloc() does. >> >> Well, yeah, but as Tim said pymalloc gets its wins from assuming that >> each allocation is the same size. You could combine my idea with some >> other allocation scheme, certainly, but given the relative paucity of >> variable length types and the reduction in allocator overhead using >> something like pymalloc gives us, I think it might just be easier to >> not do them any more. Of course, I don't see myself having any time >> to play with this idea any time soon, and it's probably not really >> beefy enough to get a masters thesis from, so maybe we'll never know. > > Ok. I expect it to be much easier to experiment with with PyPy anyway. This had occured to me too :-) Cheers, mwh -- Never meddle in the affairs of NT. It is slow to boot and quick to crash. -- Stephen Harris -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html From aleaxit at yahoo.com Mon Nov 3 11:02:54 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Mon Nov 3 11:03:04 2003 Subject: [Python-Dev] reflections on basestring -- and other abstractbasetypes In-Reply-To: <1067867712.3fa65e4084e79@mcherm.com> References: <1067867712.3fa65e4084e79@mcherm.com> Message-ID: <200311031702.54774.aleaxit@yahoo.com> On Monday 03 November 2003 02:55 pm, Michael Chermside wrote: > Alex muses on basestring: > > 2. If we do want to encourage such typetest idioms, it might be a good > > idea to provide some other such abstract basetypes for the purpose. > > [...] > > > If there was an abstract basetype, say "baseinteger", from which int > > and long derived, > > Great idea... I think there should be single type from which all built-in > integer-like types inherit, and which user-designed types can inherit > if they want to behave like integers. I think that type should be called > "int". Once the int/long distinction is completely gone, this will be Unfortunately, unless int is made an abstract type, that doesn't help at all to "type-flag" user-coded types (be they C-coded or Python-coded): they want to tell "whoever it may concern" that they're intended to be usable as integers, but not uselessly carry around an instance of int for the purpose (and need to contort their own layout, if C-coded, for that). Abstract basetypes such as basestring are useful only to "flag" types as (intending to conform to) some concept: they don't carry implementation. Specifically, basestring has no other use except supporting isinstance (and, I guess, issubclass in some cases:-). Concrete types such as int carry more baggage (and provide more uses). I'm not sure whether it makes sense to have basestring in Python, but I assume it must -- it's a recent addition, not "legacy", so why would it have been accepted if it made no sense? So, a user-coded type can flag itself as intending to be stringlike, if it wishes, without carrying any baggage due to that. Why is intlike so drastically different? > quite clean, the only confusion now is that the int/long distinction isn't > yet completely hidden. > > > 4. Furthermore, providing "basenumber" would let user-coded classes > > "flag" in a simple and direct way "I'm emulating numbers". > > Okay, that sounds like it might be useful, at least to those people who > work with wierd varieties of numbers. But I can't think how. Normally, By allowing a simple test for "is X supposed to be a number", just like isinstance(X, basestring) allows an equally simple test for "is X supposed to be a string". For example, such tests as imaplib.py's isinstance(date_time, (int, float)) (I'm not sure why long is omitted here) would simplify to isinstance(date_time, basenumber) There aren't many such checks in the standard library, because overall it doesn't do much with numbers (while it does work a lot with strings). But, the categories of use cases aren't very different: either one is asserting that X is-a [something], a la "assert isinstance(X,...", or one is checking whether X is-a [something] (i.e. X is allowed to be either a "something", or not, and there is different behavior in either case). > I figure that if you overload addition, multiplication, subtraction, and > perhaps a few other such operators, then you're trying to emulate numbers > (that or you're abusing operator overloading, and I have no real sympathy All these operators are defined, in various branch of maths, for things that are very different from "a number". Surely you're not claiming that Numeric is "abusing operator overloading" by allowing users to code a+b, a*b, a-b etc where a and b are multi-dimensional arrays? The ability to use such notation, which is fully natural in the application areas those users come from, is important to many users. > for you). What use cases do you have for "basenumber" (I don't mean > examples of classes that would inherit from basenumber, I mean examples > where that inheritance would make a difference)? Let me offer just a couple of use cases, one per kind. For example, def __mul__(self, other): if isinstance(other, self.KnownNumberTypes): return self.__class__([ x*other for x in self.items ]) else: # etc etc, various other multiplication cases right now, that (class, actually) attribute KnownNumberTypes starts out "knowing" about int, long, float, gmpy.mpz, etc, and may require user customization (e.g by subclassing) if any other "kind of (scalar) number" needs to be supported; besides, the isinstance check must walk linearly down the tuple of known number types each time. (I originally had quite a different test structure: try: other + 0 except TypeError: # other is not a number # various other multiplication cases else: # other is a number, so... return self.__class__([ x*other for x in self.items ]) but the performance for typical benchmarks improved with the isinstance test, so, reluctantly, that's what I changed to). If an abstract basetype 'basenumber' caught many useful cases, I'd put it right at the start of the KnownNumberTypes tuple, omit all subclasses thereof from it, get better performance, AND be able to document very simply what the user must do to ensure his own custom type is known to me as "a number". That's a case where I need to accept both numbers and non-numbers and do different things. As for "checking it's a number" I find it quite OK to do it by trying X+0 and letting the exception, if any, propagate -- just as "checking if it's a string" could proceed by doing X+''. But maybe I'm just old-fashioned in this acceptance -- particularly if one thinks of C-coded extensions, checking for a basetype might be far handier. E.g., in Python/bltinmodule.c , function builtin_sum uses C-coded typechecking to single out strings as an error case: /* reject string values for 'start' parameter */ if (PyObject_TypeCheck(result, &PyBaseString_Type)) { PyErr_SetString(PyExc_TypeError, "sum() can't sum strings [use ''.join(seq) instea [etc]. Now, what builtin_sum really "wants" to do is to accept numbers, only -- it's _documented_ as being meant for "numbers": it uses +, NOT +=, so its performance on sequences, matrix and array-ish things, etc, is not going to be good. But -- it can't easily _test_ whether something "is a number". If we had a PyBaseNumber_Type to use here, it would be smooth, easy, and fast to check for it. > > IF a user class could flag itself as "numeroid" by inheriting > > basenumber, THEN the "accidental commutativity" COULD be easily removed > > at least for such classes. > > Okay, that's one use case. Any others? 'cause I'm coming up blank. I see a few other cases in the standard library which want to treat "numbers" in some specific way different from other types (often forgetting longs:-), e.g. Lib/plat-mac/plistlib.py has one. In gmpy, I would often like some operations to be able to accept "a number", perhaps by letting it try to transform itself into a float as a worst case (so complex numbers would fail there), but I definitely do NOT want to accept non-number objects which "happen to be able to return a value from float(x)", such as strings. In all such cases of wanting to check if something "is a number", an abstract basetype might be handy, smooth, fast. > > ...does anybody see any problem if, in 2.4, we take away the ability to > > multiply inherit from basestring AND also from another builtin type which > > does not in turn inherit from basestring...? > > I do! I personally wouldn't try to create the class "perlnum" which > inherits from basestring and also basenumber and which tries to magicaly > know which is desired and convert back and forth on demand. But I'm > sure *someone* out there is just dying to write such a class. Why > prevent them? Not that I'd every USE such a monstrocity, but just don't > see the ADVANTAGE in providing the programmer with a straightjacket by > typechecking them (at the language level) to prevent uses outside of > those envisioned by the language implementers. It sounds decidedly > non-pythonic to me. How would it be different from saying that if something is a mapping it cannot also be a sequence (and vice versa) and trying to distinguish between the two cases (and, currently, failing for user-coded types because there IS no way to reliably flag them one way or another)? The purpose of the hypothetical abstract basetypes is to let the user optionally flag types in an unambiguous way. Types that aren't flagged would presumably keep muddling through like today, for backwards compatibility. But allowing the use of multiple basetypes only seems mean to introduce ambiguity again and it seems to me that it would have no added value, while providing (at least) a warning for it would help prevent user mistakes. Alex From aleaxit at yahoo.com Mon Nov 3 11:38:23 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Mon Nov 3 11:38:36 2003 Subject: [Python-Dev] check-in policy, trunk vs maintenance branch In-Reply-To: <200311031501.hA3F1uH0016389@localhost.localdomain> References: <200311031501.hA3F1uH0016389@localhost.localdomain> Message-ID: <200311031738.23373.aleaxit@yahoo.com> On Monday 03 November 2003 04:01 pm, Anthony Baxter wrote: > >>> Michael Hudson wrote > > > > Well, it's more practice than policy. I guess the (my...) thinking > > was that the trunk gets more testing, so it's a proving ground for > > fixes. > > > > It also depends on who's going to be release monkey for the next point > > release. The branch is to a certain extent "theirs" and they should > > get to decide how things work. I'm not sure who's got the hat at the > > moment (Anthony?). > > Unless someone desperately wants it, I'm happy to keep on doing it. What And a *big THANKS!* for this -- from us all, I'm sure. > I'd prefer: > > - Apply to trunk first (assuming, of course, that the patch isn't > something that's only needed on the branch - at this point in time, I can't > see that happening, as release23-maint and the trunk haven't diverged far > enough yet) No, but there may be some cases. E.g., one of the doc fix I proposed (but didn't commit) is to the reference manual, documenting that list comprehensions currently (2.3) "leak" control variables, but code should not rely on that since it will be fixed in the future. That doc fix would not make much sense in 2.4, assuming the leakage will be fixed then, as it is currently predicted it will be. > - Mark (in checkin message) if the patch is a bugfix candidate > - If you're comfortable that the patch is a non-controversial bugfix, > then commit it to the branch as well, AFTER you have run the unittests on > the branch to make sure it still works) [nod] yes -- makes a lot of sense. > What makes for a controversial vs non-controversial patch? There's a couple > of things I think are important to bear in mind: > > - Functionality changes are controversial. Unless there's been a > discussion and agreement (or BDFL fiat ) on python-dev, it shouldn't Surely the BDFL could afford a better car than _that_?!-) > go in. - Major changes just near a release are going to be controversial, > as it makes the life of the release-monkey-of-the-moment more painful. Good point. > At the end of the day, if you're not sure your patch should go to the > branch, then mark it so in the checkin message, and someone (me, mwh, > someone else willing to look into it) can make a judgment call. OK. > On the other hand, no-one's going to jump up and down screaming if you do > check something in that probably shouldn't have gone in - we can always > just revert it if necessary. I reserve the right to jump up and down if > someone checks something in when I'm in the middle of a release and the > branch is frozen, though . Makes sense. > Also, if you're checking something into the branch, please try and make it > obvious that the change is a backport or whatever. Something like > Backport of > is good. Unfortunately I didn't do that for my check-ins this weekend ('cause they weren't backports...:-) but sure, I will try and clarify that in the future. As soon as I can make time, I'll "forward-port" to the 2.4 trunk the fixes I had made only to the 2.3 maintenance branch. Alex From pje at telecommunity.com Mon Nov 3 11:38:52 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Nov 3 11:39:03 2003 Subject: [Python-Dev] Looking for master thesis ideas involving Python In-Reply-To: <2mu15l65xy.fsf@starship.python.net> References: <20031102233516.GA22361@vicky.ecs.soton.ac.uk> <3FA1C6CD.6050201@ocf.berkeley.edu> <3FA0A210.10605@ocf.berkeley.edu> <2mhe1rj7n8.fsf@starship.python.net> <3FA1C6CD.6050201@ocf.berkeley.edu> <5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com> <5.1.1.6.0.20031031111429.03110880@telecommunity.com> <2mad7h72sr.fsf@starship.python.net> <20031101124628.GA26463@vicky.ecs.soton.ac.uk> <20031102233516.GA22361@vicky.ecs.soton.ac.uk> Message-ID: <5.1.1.6.0.20031103113450.03470270@telecommunity.com> At 11:35 AM 11/3/03 +0000, Michael Hudson wrote: >Armin Rigo writes: > > > What seems to me like a good solution would be to use one relatively > > large "arena" per type and Python's memory allocator to subdivide > > each arena. If each arena starts at a pointer address which is > > properly aligned, then *(p&MASK) gives you the type of any object, > > and possibly even without much cache-miss overhead because there are > > not so many arenas in total (probably only 1-2 per type in common > > cases, and arenas can be large). > >Hmm, maybe. I'm not going to make guesses about that one :-) You guys do realize that this scheme would make it impossible to change an object's type, right? Unless of course you have some way to "search and replace" all references to an object. And if you were to say, "well, we'll only use this trick for non-heap types", my question would be, how's the code doing *(p&MASK) going to know how *not* to do that? If heap types have a different layout, how can you inherit from a builtin type in pure Python? And so on. From pje at telecommunity.com Mon Nov 3 11:44:12 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Nov 3 11:44:19 2003 Subject: [Python-Dev] reflections on basestring -- and other abstractbasetypes In-Reply-To: <200311031702.54774.aleaxit@yahoo.com> References: <1067867712.3fa65e4084e79@mcherm.com> <1067867712.3fa65e4084e79@mcherm.com> Message-ID: <5.1.1.6.0.20031103114125.024e7b10@telecommunity.com> At 05:02 PM 11/3/03 +0100, Alex Martelli wrote: >Let me offer just a couple of use cases, one per kind. For example, > >def __mul__(self, other): > if isinstance(other, self.KnownNumberTypes): > return self.__class__([ x*other for x in self.items ]) > else: > # etc etc, various other multiplication cases > >right now, that (class, actually) attribute KnownNumberTypes starts out >"knowing" about int, long, float, gmpy.mpz, etc, and may require user >customization (e.g by subclassing) if any other "kind of (scalar) number" >needs to be supported; besides, the isinstance check must walk linearly >down the tuple of known number types each time. (I originally had >quite a different test structure: > try: other + 0 > except TypeError: # other is not a number > # various other multiplication cases > else: > # other is a number, so... > return self.__class__([ x*other for x in self.items ]) >but the performance for typical benchmarks improved with the isinstance >test, so, reluctantly, that's what I changed to). If an abstract basetype >'basenumber' caught many useful cases, I'd put it right at the start of >the KnownNumberTypes tuple, omit all subclasses thereof from it, get >better performance, AND be able to document very simply what the user >must do to ensure his own custom type is known to me as "a number". This is the sort of thing that just begs for open generic functions with multiple dispatch, though. Even object adaptation doesn't easily generalize to operations better expressed as f(x,y) than x.f(y). From sjoerd at acm.org Mon Nov 3 11:50:25 2003 From: sjoerd at acm.org (Sjoerd Mullender) Date: Mon Nov 3 11:50:40 2003 Subject: [Python-Dev] check-in policy, trunk vs maintenance branch In-Reply-To: <200311031738.23373.aleaxit@yahoo.com> References: <200311031501.hA3F1uH0016389@localhost.localdomain> <200311031738.23373.aleaxit@yahoo.com> Message-ID: <3FA68751.3060306@acm.org> Alex Martelli wrote: > On Monday 03 November 2003 04:01 pm, Anthony Baxter wrote: >> - Functionality changes are controversial. Unless there's been a >>discussion and agreement (or BDFL fiat ) on python-dev, it shouldn't > > > Surely the BDFL could afford a better car than _that_?!-) It *is* the car he used to drive in Amsterdam... -- Sjoerd Mullender From skip at pobox.com Mon Nov 3 12:12:23 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Nov 3 12:12:38 2003 Subject: [Python-Dev] check-in policy, trunk vs maintenance branch In-Reply-To: <200311031738.23373.aleaxit@yahoo.com> References: <200311031501.hA3F1uH0016389@localhost.localdomain> <200311031738.23373.aleaxit@yahoo.com> Message-ID: <16294.35959.394535.409224@montanaro.dyndns.org> Alex> No, but there may be some cases. E.g., one of the doc fix I Alex> proposed (but didn't commit) is to the reference manual, Alex> documenting that list comprehensions currently (2.3) "leak" Alex> control variables, but code should not rely on that since it will Alex> be fixed in the future. That doc fix would not make much sense in Alex> 2.4, assuming the leakage will be fixed then, as it is currently Alex> predicted it will be. Sure, but the documentation should reflect the current implementation. It's the job of the people who change the list comprehension implementation to also correct the documentation to be in sync with their changes to the code. Skip From guido at python.org Mon Nov 3 12:43:06 2003 From: guido at python.org (Guido van Rossum) Date: Mon Nov 3 12:43:18 2003 Subject: [Python-Dev] reflections on basestring -- and other abstract basetypes In-Reply-To: Your message of "Sun, 02 Nov 2003 23:19:42 +0100." <200311022319.42725.aleaxit@yahoo.com> References: <200311022319.42725.aleaxit@yahoo.com> Message-ID: <200311031743.hA3Hh6O24217@12-236-54-216.client.attbi.com> > 1. Shouldn't class UserString.UserString inherit from basestring? > After all, basestring exists specifically in order to encourage > typetests of the form isinstance(x, basestring) -- wouldn't it be > better if such tests could also catch "user-tweaked strings" > derived from UserString ... ? I wish I had time for this thread today, but it doesn't look like it. I just wish to express that we shouldn't lightly mess with this. I added basestr specifically to support some code that was interested in testing whether something was one of the *builtin* string types (or a subclass thereof). But I don't recall details and won't be able to dig them up today. --Guido van Rossum (home page: http://www.python.org/~guido/) From mcherm at mcherm.com Mon Nov 3 13:51:23 2003 From: mcherm at mcherm.com (Michael Chermside) Date: Mon Nov 3 13:51:29 2003 Subject: [Python-Dev] reflections on basestring -- and other abstractbasetypes Message-ID: <1067885483.3fa6a3ab94efd@mcherm.com> I (Michael Chermside) wrote: > Great idea... I think there should be single type from which all built-in > integer-like types inherit, and which user-designed types can inherit > if they want to behave like integers. I think that type should be called > "int". Alex replies: > Unfortunately, unless int is made an abstract type, that doesn't help at > all to "type-flag" user-coded types (be they C-coded or Python-coded): > they want to tell "whoever it may concern" that they're intended to be > usable as integers, but not uselessly carry around an instance of int for > the purpose (and need to contort their own layout, if C-coded, for that). Valid point. Of course, we've reduced the use cases to those which want to emulate integers and ALSO don't want the layout of ints. It seems like a small number of situations, but there ARE some, and it IS a valid point. > Abstract basetypes such as basestring are useful only to "flag" types as > (intending to conform to) some concept: they don't carry implementation. Well, yes, but Python strives very hard to not NEED to know what type an object is before operating on it. As long as it supports the operations that are used, it's "good enough". It's an ideal, not a universal rule, and there are pleanty of small exceptions, but to introduce a system of basetypes seems inappropriate. On the other hand, string and unicode need a common base class because they are a special case. Really, there are two things going on... the need to process arbitrary collections of bytes, and the need to process arbitrary collections of characters. The whole thing is thrown into confusion because "string" is used for storing characters, particularly when the characters are expected to be ascii. This is for historical reasons, performance reasons, out of ignorance, because "string" is easier to type and u"" is more annoying... lots of reasons both good and bad. But since lots of string objects contain character data just like unicode objects, we need a type lable for dealing with "character data", and that can't be either "unicode" or "string". I don't see any such issue in numbers (although the int/long flaw is somewhat similar, but that's being healed). > Surely you're not claiming that > Numeric is "abusing operator overloading" by allowing users to code > a+b, a*b, a-b etc where a and b are multi-dimensional arrays? The > ability to use such notation, which is fully natural in the application areas > those users come from, is important to many users. Um... no, I didn't mean to claim that. When I wrote it, I was thinking "okay, you'd only use these operations (sensibly) on something which had an algebra... ie, a number." But that was wrong... matrices have an algebra, but they're NOT numbers. I wrote: > What use cases do you have for "basenumber" (I don't mean > examples of classes that would inherit from basenumber, I mean examples > where that inheritance would make a difference)? Alex responded with actual examples, and I'll have to take the time to read them properly before I can respond meaningfully. (But THANKS for giving specific examples... it always helps me reason about abstract ideas (like "are baseclasses wise for numbers") when I have a few concrete examples to check myself against as I go.) Let this be a warning to me... be careful of getting in an argument with Alex, since he'll swamp me with far more well-reasoned arguments and examples than I have time to _read_, much less respond to. -- Michael Chermside From raymond.hettinger at verizon.net Mon Nov 3 14:00:59 2003 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Mon Nov 3 14:01:43 2003 Subject: [Python-Dev] PEP 322: Reverse Iteration Message-ID: <002601c3a23c$e6a1b280$e841fea9@oemcomputer> The pep has been through several rounds of public comment on comp.lang.python. As a result, the proposal has evolved away from several methods called iter_backwards() and into a simple builtin function called reversed(). Other simplifications emerged as well. The improved pep is at: www.python.org/sf/pep-0322.html Thanks to many posts by Alex, the only issue of significance is avoiding having a new builtin. My strong feeling is that the essential simplicity and utility of the function would be lost if it got tucked away in some other namespace. The flipside is our common desire to keep the builtin namespace as compact as possible. So, I would like to solicit your thoughts and judgments on whether the PEP merits a new builtin. The proposal and remaining issue are both so simply stated that it was difficult to keep the newsgroup discussion focused. The posts immediately veered towards developing exotic ways to attach the function to other namespaces. Instead of repeating that discussion, hopefully we can just decide whether to accept the pep. Thank you, Raymond Hettinger -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20031103/52b9b197/attachment.html From wtrenker at shaw.ca Mon Nov 3 07:10:10 2003 From: wtrenker at shaw.ca (William Trenker) Date: Mon Nov 3 14:14:22 2003 Subject: [Python-Dev] new language ideas In-Reply-To: <1067866557.3fa659bd77a4d@mcherm.com> References: <1067866557.3fa659bd77a4d@mcherm.com> Message-ID: <20031103121010.2bbad5e9.wtrenker@shaw.ca> Michael Chermside wrote: > Just making __doc__ a dictionary instead of a string doesn't achieve > anything *unless* there is a fairly standard set of expected keys > in this dictionary. Here's a couple of possibilities: - the Dublin Core (DC), or some sub-set. DC been quite widely accepted (eg: Zope). - keys to support version control and CVS integration. (I'm not a CVS expert so this might be off the wall.) Something like this might be integrated nicely with docutils and other automation tools. Regards, Bill From mwh at python.net Mon Nov 3 14:59:40 2003 From: mwh at python.net (Michael Hudson) Date: Mon Nov 3 14:59:46 2003 Subject: [Python-Dev] Looking for master thesis ideas involving Python In-Reply-To: <5.1.1.6.0.20031103113450.03470270@telecommunity.com> (Phillip J. Eby's message of "Mon, 03 Nov 2003 11:38:52 -0500") References: <20031102233516.GA22361@vicky.ecs.soton.ac.uk> <3FA1C6CD.6050201@ocf.berkeley.edu> <3FA0A210.10605@ocf.berkeley.edu> <2mhe1rj7n8.fsf@starship.python.net> <3FA1C6CD.6050201@ocf.berkeley.edu> <5.1.0.14.0.20031031084822.01e5a020@mail.telecommunity.com> <5.1.1.6.0.20031031111429.03110880@telecommunity.com> <2mad7h72sr.fsf@starship.python.net> <20031101124628.GA26463@vicky.ecs.soton.ac.uk> <20031102233516.GA22361@vicky.ecs.soton.ac.uk> <5.1.1.6.0.20031103113450.03470270@telecommunity.com> Message-ID: <2mism1440j.fsf@starship.python.net> "Phillip J. Eby" writes: > At 11:35 AM 11/3/03 +0000, Michael Hudson wrote: >>Armin Rigo writes: >> >> > What seems to me like a good solution would be to use one relatively >> > large "arena" per type and Python's memory allocator to subdivide >> > each arena. If each arena starts at a pointer address which is >> > properly aligned, then *(p&MASK) gives you the type of any object, >> > and possibly even without much cache-miss overhead because there are >> > not so many arenas in total (probably only 1-2 per type in common >> > cases, and arenas can be large). >> >>Hmm, maybe. I'm not going to make guesses about that one :-) > > You guys do realize that this scheme would make it impossible to > change an object's type, right? Unless of course you have some way to > "search and replace" all references to an object. I'd got this far... > And if you were to say, "well, we'll only use this trick for non-heap > types", my question would be, how's the code doing *(p&MASK) going to > know how *not* to do that? If heap types have a different layout, how > can you inherit from a builtin type in pure Python? And so on. ... but somehow this point had escaped me. Well, you could do something like having a cookie[1] at the start of heap type pools that says "the pointer to the type object is actually at *(p-4)" but that's pretty sick (and puts branches in every type access). Darn. Oh well, I suspected my idea had to have some large problem, it just took longer that I expected for someone to spot it :-) Cheers, mwh [1] e.g. NULL... -- Presumably pronging in the wrong place zogs it. -- Aldabra Stoddart, ucam.chat From aahz at pythoncraft.com Mon Nov 3 15:15:57 2003 From: aahz at pythoncraft.com (Aahz) Date: Mon Nov 3 15:16:04 2003 Subject: [Python-Dev] PEP 322: Reverse Iteration In-Reply-To: <002601c3a23c$e6a1b280$e841fea9@oemcomputer> References: <002601c3a23c$e6a1b280$e841fea9@oemcomputer> Message-ID: <20031103201557.GB2397@panix.com> On Mon, Nov 03, 2003, Raymond Hettinger wrote: > > The pep has been through several rounds of public comment on > comp.lang.python. As a result, the proposal has evolved away from > several methods called iter_backwards() and into a simple builtin > function called reversed(). Other simplifications emerged as well. The > improved pep is at: > > www.python.org/sf/pep-0322.html > > Thanks to many posts by Alex, the only issue of significance is avoiding > having a new builtin. My strong feeling is that the essential simplicity > and utility of the function would be lost if it got tucked away in some > other namespace. The flipside is our common desire to keep the builtin > namespace as compact as possible. I'm -1 until the PEP includes this issue, then my vote changes to -0. (I.e., I generally agree with Alex about the builtin issue, but not strongly enough to actively oppose this PEP as long as it's properly documented.) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan From python at rcn.com Mon Nov 3 16:20:53 2003 From: python at rcn.com (Raymond Hettinger) Date: Mon Nov 3 16:21:03 2003 Subject: FW: [Python-Dev] PEP 322: Reverse Iteration Message-ID: <001101c3a250$5d35dbc0$e841fea9@oemcomputer> > > The pep has been through several rounds of public comment on > > comp.lang.python. As a result, the proposal has evolved away from > > several methods called iter_backwards() and into a simple builtin > > function called reversed(). Other simplifications emerged as well. The > > improved pep is at: > > > > www.python.org/sf/pep-0322.html [Aahz] > Oops? That URL don't work. Drat! http://www.python.org/peps/pep-0322.html Raymond Hettinger From python at rcn.com Mon Nov 3 16:33:49 2003 From: python at rcn.com (Raymond Hettinger) Date: Mon Nov 3 16:33:55 2003 Subject: [Python-Dev] PEP 322: Reverse Iteration In-Reply-To: <20031103201557.GB2397@panix.com> Message-ID: <001201c3a252$2b7965a0$e841fea9@oemcomputer> [Aahz] > I'm -1 until the PEP includes this issue, then my vote changes to -0. > > (I.e., I generally agree with Alex about the builtin issue, but not > strongly enough to actively oppose this PEP as long as it's properly > documented.) Okay, added a section to document the chief issue. BTW, Alex said he was +1 on the idea, but only +0 on it being a builtin. Raymond Hettinger ################################################################# ################################################################# ################################################################# ##### ##### ##### ################################################################# ################################################################# ################################################################# From aleaxit at yahoo.com Mon Nov 3 17:24:14 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Mon Nov 3 17:37:30 2003 Subject: [Python-Dev] check-in policy, trunk vs maintenance branch In-Reply-To: <3FA68751.3060306@acm.org> References: <200311031501.hA3F1uH0016389@localhost.localdomain> <200311031738.23373.aleaxit@yahoo.com> <3FA68751.3060306@acm.org> Message-ID: <200311032324.14570.aleaxit@yahoo.com> On Monday 03 November 2003 17:50, Sjoerd Mullender wrote: > Alex Martelli wrote: > > On Monday 03 November 2003 04:01 pm, Anthony Baxter wrote: > >> - Functionality changes are controversial. Unless there's been a > >>discussion and agreement (or BDFL fiat ) on python-dev, it > >> shouldn't > > > > Surely the BDFL could afford a better car than _that_?!-) > > It *is* the car he used to drive in Amsterdam... Ah, NOW I finally understand the occasional acrimony...! I will point out that although I _am_ Italian, and did use to work for mech CAD giant think3, our CAD programs were NOT used by FIAT (by Pininfarina, yes, but then their industrial designs are widely used by firms all over the world) -- I drive a Honda car, and back when I was a biker I drove a Honda bike (and my NON-motor bike is an Atala:-). Better...?-) Alex From aleaxit at yahoo.com Mon Nov 3 17:37:16 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Mon Nov 3 17:37:34 2003 Subject: [Python-Dev] PEP 322: Reverse Iteration In-Reply-To: <001201c3a252$2b7965a0$e841fea9@oemcomputer> References: <001201c3a252$2b7965a0$e841fea9@oemcomputer> Message-ID: <200311032337.16147.aleaxit@yahoo.com> On Monday 03 November 2003 22:33, Raymond Hettinger wrote: > [Aahz] > > > I'm -1 until the PEP includes this issue, then my vote changes to -0. > > > > (I.e., I generally agree with Alex about the builtin issue, but not > > strongly enough to actively oppose this PEP as long as it's properly > > documented.) > > Okay, added a section to document the chief issue. > BTW, Alex said he was +1 on the idea, but only +0 on it being a builtin. Uh, did I? OK maybe I did. But what about "revrange" (which I'd LOVE to incarnate as an iterator-returning irange with an optional reverse= argument) -- was that knocked out of contention? I claimed that just revrange would be too specialized BUT irange would be JUST RIGHT... Alex From martin at v.loewis.de Mon Nov 3 17:43:22 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Mon Nov 3 17:43:37 2003 Subject: [Python-Dev] check-in policy, trunk vs maintenance branch In-Reply-To: <200311031347.10995.aleaxit@yahoo.com> References: <200311031347.10995.aleaxit@yahoo.com> Message-ID: Alex Martelli writes: > I made a few bugfix check-ins to the 2.3 maintenance branch this > weekend and Michael Hudson commented that he thinks that so doing is > a bad idea, that bug fixes should filter from the 2.4 trunk to the > 2.3 branch and not the other way around. Is this indeed the policy > (have I missed some guidelines about it)? Atleast that's the policy I was following, indicating backports with "backported to 2.3" in the checkin message. > I guess for this round of fixes I will find the time to forward-port them to > the 2.4 trunk (in AMPLE time for a 2.4 release -- as 2.3.3 is going to come > well before 2.4 releases, the other way 'round wouldn't be quite so sure:-), > but what about the future? Should fixes applicable to both 2.3.* and 2.4 > be made [a] always to both trunk and branch, I prefer to do them on both the trunk and the branch simultaneously. Having them on the branch simplifies the life of the release manager, and having them on the trunk gives them atleast some testing. I had to back out both patches occasionally, but this is not a bug problem unless a release of the branch is imminent. > Oh, incidentally, if it matters -- most were docs issues, including > as "docs" also some changes to comments that previously were > misleading or ambiguous. It does matter. For doc changes, any kind of improvement is acceptable (IMO), as there is no risk of breaking existing applications. > I guess that my problem is that I think of 2.3.* fixes as things > that will be useful to "the general Python-using public" pretty > soon, with 2.4 far off in the future, so that it appears to me that > trying to make 2.3.* as well fixed as possible has higher priority. > But if that conflicts with policy, I will of course change anyway. If you don't forward-port your changes, nobody will. So you satisfy the general public now, with a view of taking corrections away from them in the future. Regards, Martin From martin at v.loewis.de Mon Nov 3 17:47:12 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Mon Nov 3 17:48:13 2003 Subject: [Python-Dev] check-in policy, trunk vs maintenance branch In-Reply-To: <200311031738.23373.aleaxit@yahoo.com> References: <200311031501.hA3F1uH0016389@localhost.localdomain> <200311031738.23373.aleaxit@yahoo.com> Message-ID: Alex Martelli writes: > No, but there may be some cases. E.g., one of the doc fix I proposed (but > didn't commit) is to the reference manual, documenting that list > comprehensions currently (2.3) "leak" control variables, but code should not > rely on that since it will be fixed in the future. That doc fix would not > make much sense in 2.4, assuming the leakage will be fixed then, as > it is currently predicted it will be. Don't trust predictions. If the patch is formally correct now, apply it now. If you then find it is not needed a week from now, back it out. Alternatively, put the patch on SF, wait for a week, and then apply it to branch only. Regards, Martin From guido at python.org Mon Nov 3 18:26:32 2003 From: guido at python.org (Guido van Rossum) Date: Mon Nov 3 18:26:39 2003 Subject: [Python-Dev] PEP 322: Reverse Iteration In-Reply-To: Your message of "Mon, 03 Nov 2003 23:37:16 +0100." <200311032337.16147.aleaxit@yahoo.com> References: <001201c3a252$2b7965a0$e841fea9@oemcomputer> <200311032337.16147.aleaxit@yahoo.com> Message-ID: <200311032326.hA3NQW124882@12-236-54-216.client.attbi.com> > > BTW, Alex said he was +1 on the idea, but only +0 on it being a builtin. > > Uh, did I? OK maybe I did. But what about "revrange" (which I'd LOVE > to incarnate as an iterator-returning irange with an optional reverse= > argument) -- was that knocked out of contention? I claimed that just > revrange would be too specialized BUT irange would be JUST RIGHT... This surprised me a bit too. The majority of Raymond's examples in the PEP (when I last saw it a week ago) were reverse numeric ranges, usually of the form revrange(n) -- which we currently have to spell as range(n-1, -1, -1) (I think :-) and which the new proposal would turn into reversed(range(n)). According to Raymond, a built-in that would do just that only drew (a small number of) negative responses in the newsgroup. Such a thing would face zero opposition if it was part of itertools: itertools.revrange([start, ] stop[, step]) makes total sense to me... --Guido van Rossum (home page: http://www.python.org/~guido/) From tdelaney at avaya.com Mon Nov 3 19:02:14 2003 From: tdelaney at avaya.com (Delaney, Timothy C (Timothy)) Date: Mon Nov 3 19:02:23 2003 Subject: PEP 318 (was Re: [Python-Dev] Re: Guido's Magic Code was: inlinesort option) Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DEDF5C1F@au3010avexu1.global.avaya.com> > From: Alex Martelli [mailto:aleaxit@yahoo.com] > > BTW, when we do come around to PEP 318, I would suggest the 'as' > clause on a class statement as the best way to specify a metaclass. I just realised what has been bugging me about the idea of def foop() as staticmethod: and it applies equally well to class Newstyle as type: Basically, it completely changes the semantics associated with 'as' in Python - which are to give something a different name (technically, to rebind the object to a different name). OTOH, the first case above means 'create this (function) object, call this decorator, and bind the name to the new object'. So instead of taking an existing object (with an existing name) and rebinding it to a new name, it is creating an object, doing something to it and binding it to a name. A definite deviation from the current 'as' semantics, but understandable. However, the second case above is doing something completely different. It is creating a new object (a class) and binding it to a name. As a side effect, it is changing the metaclass of the object. The 'as' in this case has nothing whatsoever to do with binding the object name, but a name in the object's namespace. I suppose you could make the argument that the metaclass has to act as a decorator (like in the function def above) and set the __metaclass__ attribute, but that would mean that existing metaclasses couldn't work. It would also mean you were defining the semantics at an implementation level. I'm worried that I'm being too picky here, because I *like* the way the above reads. I'm just worried about overloading 'as' with too many essentially unrelated meanings. Tim Delaney From pje at telecommunity.com Mon Nov 3 20:09:55 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Nov 3 20:09:00 2003 Subject: PEP 318 (was Re: [Python-Dev] Re: Guido's Magic Code was: inlinesort option) In-Reply-To: <338366A6D2E2CA4C9DAEAE652E12A1DEDF5C1F@au3010avexu1.global .avaya.com> Message-ID: <5.1.0.14.0.20031103200302.01e595c0@mail.telecommunity.com> At 11:02 AM 11/4/03 +1100, Delaney, Timothy C (Timothy) wrote: > > From: Alex Martelli [mailto:aleaxit@yahoo.com] > > > > BTW, when we do come around to PEP 318, I would suggest the 'as' > > clause on a class statement as the best way to specify a metaclass. > >I just realised what has been bugging me about the idea of > > def foop() as staticmethod: > >and it applies equally well to > > class Newstyle as type: > >Basically, it completely changes the semantics associated with 'as' in >Python - which are to give something a different name (technically, to >rebind the object to a different name). > >OTOH, the first case above means 'create this (function) object, call this >decorator, and bind the name to the new object'. So instead of taking an >existing object (with an existing name) and rebinding it to a new name, it >is creating an object, doing something to it and binding it to a name. A >definite deviation from the current 'as' semantics, but understandable. > >However, the second case above is doing something completely different. It >is creating a new object (a class) and binding it to a name. As a side >effect, it is changing the metaclass of the object. The 'as' in this case >has nothing whatsoever to do with binding the object name, but a name in >the object's namespace. > >I suppose you could make the argument that the metaclass has to act as a >decorator (like in the function def above) and set the __metaclass__ >attribute, but that would mean that existing metaclasses couldn't work. It >would also mean you were defining the semantics at an implementation level. > >I'm worried that I'm being too picky here, because I *like* the way the >above reads. I'm just worried about overloading 'as' with too many >essentially unrelated meanings. Well, there's always 'is'... def foop() is staticmethod: class Newstyle is type: Interestingly, this usage is rather similar to Eiffel, which IIRC introduces code suites with 'is', although I think without the modifier. I'm not all that enthused about the metaclass usage, mainly because there's already an okay syntax (__metaclass__) for it. I'd rather that class decorators (if added) were decorators in the same way as function decorators. Why? Because I think that correct, combinable class decorators are probably easier for most people to write than correct, combinable metaclasses, and they are more easily combined than metaclasses are. From greg at electricrain.com Mon Nov 3 20:23:10 2003 From: greg at electricrain.com (Gregory P. Smith) Date: Mon Nov 3 20:23:17 2003 Subject: [Python-Dev] bsddb test case deadlocks fixed In-Reply-To: <200311030954.24191.aleaxit@yahoo.com> References: <200311030848.hA38mItM008890@localhost.localdomain> <200311030954.24191.aleaxit@yahoo.com> Message-ID: <20031104012310.GC17328@zot.electricrain.com> On Mon, Nov 03, 2003 at 09:54:24AM +0100, Alex Martelli wrote: > On Monday 03 November 2003 09:48 am, Anthony Baxter wrote: > > From what I understand, these fixes aren't just fixes to the test suite, > > but also to fix real problems with the bsddb code itself. In that case, > > should it be added to the 23 branch? I'd be a solid +1 on this for 2.3.3. > > > > Anyone else? > > Anything that makes bsddb less flaky on 2.3.* gets a big hearty enthusiastic > +1 from me too. > > Alex There are no deadlock problems in the current 2.3.2 bsddb module as it does not have thread support enabled (meaning is likely to crash if someone uses it from multiple threads at once). The recent changes to bsddb have been to enable thread support and fix some singlethreaded deadlocks that thread support introduced due to the BerkeleyDB's internal locking. There is still the potential for multithreaded bsddb compatibility interface use to deadlock. This bug tracks the issue: http://sourceforge.net/tracker/?func=detail&aid=834461&group_id=5470&atid=105470 Net effect on release23-branch if we did this today: + multithreaded bsddb use now allowed (instead of crashes or corruption) - multithreaded bsddb use could deadlock depending on how it is used. (anything that creates a cursor internally including many of the inherited DictMixin dictionary methods could cause it) From jeremy at alum.mit.edu Tue Nov 4 01:10:12 2003 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Tue Nov 4 01:13:16 2003 Subject: [Python-Dev] XXX undetected error (why=3) Message-ID: <1067926212.19568.47.camel@localhost.localdomain> I've been seeing these problems sporadically over the last several months. Fred and I tracked one of them down to a bug in pyexpat.c. I noticed several more running the Zope 3 test suite today. I'd like to change the code for the check to call Py_FatalError() instead of printing a message to stderr. The check is only enabled during a debug build. I'd be much happier debugging this from a core dump than trying to figure out what happened to cause the message to be printed. Any objections? Jeremy From aleaxit at yahoo.com Tue Nov 4 03:12:23 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Tue Nov 4 03:12:30 2003 Subject: [Python-Dev] bsddb test case deadlocks fixed In-Reply-To: <20031104012310.GC17328@zot.electricrain.com> References: <200311030848.hA38mItM008890@localhost.localdomain> <200311030954.24191.aleaxit@yahoo.com> <20031104012310.GC17328@zot.electricrain.com> Message-ID: <200311040912.23213.aleaxit@yahoo.com> On Tuesday 04 November 2003 02:23 am, Gregory P. Smith wrote: ... > There are no deadlock problems in the current 2.3.2 bsddb module as > it does not have thread support enabled (meaning is likely to crash if > someone uses it from multiple threads at once). Ah! Shows you how much I understood of your patch -- I hadn't grasped this! > Net effect on release23-branch if we did this today: > > + multithreaded bsddb use now allowed (instead of crashes or corruption) Generally, extending functionality (as opposed to: fixing bugs or clarifying docs) is not a goal for 2.3.* -- but I don't know if the fact that bsddb isn't thread-safe in 2.3 counts as "a bug", or rather as functionality deliberately kept limited, to avoid e.g such bugs as the one you've just removed, and other possibilities you mention: > - multithreaded bsddb use could deadlock depending on how it is used. I think that just having the 2.3.* docs explicitly mention the lack of thread-safety might then perhaps be better than backporting the changes. Alex From aleaxit at yahoo.com Tue Nov 4 03:24:13 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Tue Nov 4 03:24:18 2003 Subject: PEP 318 (was Re: [Python-Dev] Re: Guido's Magic Code was: inlinesort option) In-Reply-To: <5.1.0.14.0.20031103200302.01e595c0@mail.telecommunity.com> References: <5.1.0.14.0.20031103200302.01e595c0@mail.telecommunity.com> Message-ID: <200311040924.13894.aleaxit@yahoo.com> On Tuesday 04 November 2003 02:09 am, Phillip J. Eby wrote: ... > I'm not all that enthused about the metaclass usage, mainly because there's > already an okay syntax (__metaclass__) for it. I'd rather that class Hmmm -- why is: class Foo: __metaclass__ = MetaFoo ... "ok", compared to e.g.: class Foo is MetaFoo: ... while, again for example, def foo(): ... foo = staticmethod(foo) is presumably deemed "not ok" compared to e.g.: def foo() is staticmethod: ... ??? Both cases of current syntax do the job (perhaps not elegantly but they do) and in both cases a new syntax would increase elegance. > decorators (if added) were decorators in the same way as function > decorators. Why? Because I think that correct, combinable class > decorators are probably easier for most people to write than correct, > combinable metaclasses, and they are more easily combined than metaclasses > are. Combinable metaclasses may not be trivial to write, but with multiple inheritance it will often be feasible (except, presumably, when implied layout or __new__ have conflicting requirements). Of course, not having use cases of either custom metaclasses or class decorators in production use, the discussion does risk being a bit abstract. Did you have any specific use case in mind? Alex From aleaxit at yahoo.com Tue Nov 4 03:56:23 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Tue Nov 4 03:56:29 2003 Subject: PEP 318 (was Re: [Python-Dev] Re: Guido's Magic Code was: inlinesort option) In-Reply-To: <338366A6D2E2CA4C9DAEAE652E12A1DEDF5C1F@au3010avexu1.global.avaya.com> References: <338366A6D2E2CA4C9DAEAE652E12A1DEDF5C1F@au3010avexu1.global.avaya.com> Message-ID: <200311040956.23759.aleaxit@yahoo.com> On Tuesday 04 November 2003 01:02 am, Delaney, Timothy C (Timothy) wrote: > > From: Alex Martelli [mailto:aleaxit@yahoo.com] > > > > BTW, when we do come around to PEP 318, I would suggest the 'as' > > clause on a class statement as the best way to specify a metaclass. > > I just realised what has been bugging me about the idea of > > def foop() as staticmethod: > > and it applies equally well to > > class Newstyle as type: > > Basically, it completely changes the semantics associated with 'as' in > Python - which are to give something a different name (technically, to > rebind the object to a different name). Yes, that's what the 'as' clause means in from and import statements, of course. > OTOH, the first case above means 'create this (function) object, call this > decorator, and bind the name to the new object'. So instead of taking an > existing object (with an existing name) and rebinding it to a new name, it > is creating an object, doing something to it and binding it to a name. A > definite deviation from the current 'as' semantics, but understandable. I'm not sure I follow. "import X as y" means basically y = __import__('X') (give or take a little:-). 'def foo() as staticmethod:' would mean instead foo = staticmethod(new.function(, globals(), 'foo')) so what comes after the 'as' is a name to bind in the existing case, it's a callable to call in the new proposed syntax. There is a binding in each case, and in each case something is called to obtain the object to bind; I think the distinction between new and existing object is spurious -- __import__ can perfectly well be creating a new object -- but the real distinction is that the name to bind is given after 'as' in the existing case, it's NOT so given in the new proposed one. > However, the second case above is doing something completely different. It Not at all -- it does: Newstyle = type('Newstyle', (), ) where is built from the body of the 'class' statement, just like, above, is built from the body of the 'def' statement. I find this rather close to the 'as staticmethod' case: that one calls staticmethod (the callable after the 'as') and binds the result to the name before the 'as', this one calls type (the callable after the 'as') and binds the result to the name before the 'as'. > is creating a new object (a class) and binding it to a name. As a side > effect, it is changing the metaclass of the object. The 'as' in this case "changing"? From what? It's _establishing_ the type of the name it's binding, just as (e.g.) staticmethod(...) is. I.e., stripping the syntax we have in today's Python: >>> xx = type('xx', (), {'ba':23}) >>> type(xx) >>> xx = staticmethod(lambda ba: 23) >>> type(xx) ...so where's the "completely different" or the "changing" in one case and not the other...? > has nothing whatsoever to do with binding the object name, but a name in > the object's namespace. It has everything to do with determining the type of the object, just like e.g. staticmethod would. > I suppose you could make the argument that the metaclass has to act as a > decorator (like in the function def above) and set the __metaclass__ > attribute, but that would mean that existing metaclasses couldn't work. It > would also mean you were defining the semantics at an implementation level. I'm sure I've lost you completely here, sorry. >>> class xx(object): pass ... >>> xx.__metaclass__ Traceback (most recent call last): File "", line 1, in ? AttributeError: type object 'xx' has no attribute '__metaclass__' why would a class created this way have to set '__metaclass__', again? A metaclass is the class object's type, and it's called to create the class object. If I do "xx = type('xx', (), {})" I get exactly the same result as with the above "class xx" statement -- no more, no less. "class" just gives me neat syntax to determine the 3 arguments with which the metaclass is called -- a string that's the classname, a tuple of bases, and a dictionary. That "__metaclass__ attribute" is just an optional hack which Python can decide to determine _which_ metaclass to call (in alternative to others, even today) for a certain 'class' statement. > I'm worried that I'm being too picky here, because I *like* the way the > above reads. I'm just worried about overloading 'as' with too many > essentially unrelated meanings. I accept that in both 'def foo() as X' and 'class foo as X' the X in "as X" is very different from its role in 'import foo as X' -- in the import statement, X is just a name to which to bind an object, while in the def and class statements X would be a callable to call in order to get the object -- and the name to bind would be the one right after the def or class keywords instead. So maybe we should do as Phillip Eby suggests and use 'is' instead - that's slightly stretched too, because after "def foo() is staticmethod:" it would NOT be the case that 'foo is staticmethod' holds, but, rather, that isinstance(foo, staticmethod) [so we're saying "IS-A", not really "IS"]. But the def and class statements cases are SO close -- in both what comes after the 'is' (or 'as') is a callable anyway. The debate is then just, should said callable be called with an already prepared (function or class) object, just to decorate it; or should it rather be called with the elementary "bricks" needed to build the object, so it can build it properly. Incidentally, it seems to me that it might not be a problem to overload e.g. staticmethod so it can be called with multiple arguments (same as new.function) and internally calls new.function itself, should there be any need for that (not that I can see any use case right now, just musing...). Alex From aleaxit at yahoo.com Tue Nov 4 04:04:38 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Tue Nov 4 04:04:48 2003 Subject: [Python-Dev] PEP 322: Reverse Iteration In-Reply-To: <200311032326.hA3NQW124882@12-236-54-216.client.attbi.com> References: <001201c3a252$2b7965a0$e841fea9@oemcomputer> <200311032337.16147.aleaxit@yahoo.com> <200311032326.hA3NQW124882@12-236-54-216.client.attbi.com> Message-ID: <200311041004.38285.aleaxit@yahoo.com> On Tuesday 04 November 2003 12:26 am, Guido van Rossum wrote: > > > BTW, Alex said he was +1 on the idea, but only +0 on it being a > > > builtin. > > > > Uh, did I? OK maybe I did. But what about "revrange" (which I'd LOVE > > to incarnate as an iterator-returning irange with an optional reverse= > > argument) -- was that knocked out of contention? I claimed that just > > revrange would be too specialized BUT irange would be JUST RIGHT... > > This surprised me a bit too. The majority of Raymond's examples in > the PEP (when I last saw it a week ago) were reverse numeric ranges, > usually of the form revrange(n) -- which we currently have to spell as > range(n-1, -1, -1) (I think :-) and which the new proposal would turn > into reversed(range(n)). According to Raymond, a built-in that would > do just that only drew (a small number of) negative responses in the > newsgroup. > > Such a thing would face zero opposition if it was part of itertools: > itertools.revrange([start, ] stop[, step]) makes total sense to me... And what about irange with an optional reverse= argument? I did have (and write about on c.l.py) a case where I currently code: if godown: iseq = xrange(len(sq)-1, start-1, -1) else: iseq = xrange(start, len(sq), 1) for index in iseq: ... and would be just delighted to be able to code, instead, for index in irange(start, len(sq), reverse=godown): ... Even when the need to reverse can more easily be hardwired in the source (a more common case), would for index in irange(start, stop, reverse=True): be really so much worse than for index in revrange(start, stop): ...? Alex From aleaxit at yahoo.com Tue Nov 4 05:02:22 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Tue Nov 4 05:02:30 2003 Subject: [Python-Dev] reflections on basestring -- and other abstractbasetypes In-Reply-To: <1067885483.3fa6a3ab94efd@mcherm.com> References: <1067885483.3fa6a3ab94efd@mcherm.com> Message-ID: <200311041102.22873.aleaxit@yahoo.com> On Monday 03 November 2003 07:51 pm, Michael Chermside wrote: > I (Michael Chermside) wrote: > > Great idea... I think there should be single type from which all built-in > > integer-like types inherit, and which user-designed types can inherit > > if they want to behave like integers. I think that type should be called > > "int". > > Alex replies: > > Unfortunately, unless int is made an abstract type, that doesn't help at > > all to "type-flag" user-coded types (be they C-coded or Python-coded): > > they want to tell "whoever it may concern" that they're intended to be > > usable as integers, but not uselessly carry around an instance of int for > > the purpose (and need to contort their own layout, if C-coded, for that). > > Valid point. Of course, we've reduced the use cases to those which want > to emulate integers and ALSO don't want the layout of ints. It seems like > a small number of situations, but there ARE some, and it IS a valid point. If some code is happy with extending an existing concrete type there is of course no problem -- it just goes and do it. Sorry, I was taking that for granted. But, e.g., gmpy.mpz wants to keep "the integer" in the form that makes the underlying GMP library happy, and any similar wrapper over a library supplying some special implementations of integers (there are quite a few besides GMP) would be similar in this way. > > Abstract basetypes such as basestring are useful only to "flag" types as > > (intending to conform to) some concept: they don't carry implementation. > > Well, yes, but Python strives very hard to not NEED to know what type > an object is before operating on it. As long as it supports the operations > that are used, it's "good enough". It's an ideal, not a universal rule, > and there are pleanty of small exceptions, but to introduce a system > of basetypes seems inappropriate. It's a wonderful idea, and I generally crusade against typechecking, but I think there are enough "small exceptions" that some basestring-like abstract basetypes may be warranted (not necessarily "a system", mind you). Typechecking against an abstract type is quite different and less of a problem than doing so against a concrete type, btw -- exactly because it's not a big problem for a user-coded type to "flag" itself by inheriting from the abstract basetype in question, if need be... it doesn't carry the baggage that inheriting from a concrete type does. > On the other hand, string and unicode need a common base class because > they are a special case. Really, there are two things going on... the They're special to Python itself and its standard library because there is a lot more string-processing and processing of text going on there than any other kind. I.e., the usefulness of basestring is more obvious because Python itself and the standard library are "keen" users of strings of all kinds;-). > both good and bad. But since lots of string objects contain character > data just like unicode objects, we need a type lable for dealing > with "character data", and that can't be either "unicode" or "string". > > I don't see any such issue in numbers (although the int/long flaw > is somewhat similar, but that's being healed). But int/long, and float, have enough similarities AND differences too. Adding a Decimal or a Rational type (I hope both will eventually occur) will IMHO show that even more clearly. > > Surely you're not claiming that > > Numeric is "abusing operator overloading" by allowing users to code > > a+b, a*b, a-b etc where a and b are multi-dimensional arrays? The > > ability to use such notation, which is fully natural in the application > > areas those users come from, is important to many users. > > Um... no, I didn't mean to claim that. When I wrote it, I was thinking > "okay, you'd only use these operations (sensibly) on something which > had an algebra... ie, a number." But that was wrong... matrices have > an algebra, but they're NOT numbers. Yes, we totally agree on this. > I wrote: > > What use cases do you have for "basenumber" (I don't mean > > examples of classes that would inherit from basenumber, I mean examples > > where that inheritance would make a difference)? > > Alex responded with actual examples, and I'll have to take the time > to read them properly before I can respond meaningfully. (But THANKS > for giving specific examples... it always helps me reason about > abstract ideas (like "are baseclasses wise for numbers") when I have > a few concrete examples to check myself against as I go.) I hope I chose the examples well then...;-) > Let this be a warning to me... be careful of getting in an argument > with Alex, since he'll swamp me with far more well-reasoned arguments > and examples than I have time to _read_, much less respond to. indeed...;-). Alex From aleaxit at yahoo.com Tue Nov 4 05:06:19 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Tue Nov 4 05:06:24 2003 Subject: [Python-Dev] reflections on basestring -- and other abstract basetypes In-Reply-To: <200311031743.hA3Hh6O24217@12-236-54-216.client.attbi.com> References: <200311022319.42725.aleaxit@yahoo.com> <200311031743.hA3Hh6O24217@12-236-54-216.client.attbi.com> Message-ID: <200311041106.19798.aleaxit@yahoo.com> On Monday 03 November 2003 06:43 pm, Guido van Rossum wrote: > > 1. Shouldn't class UserString.UserString inherit from basestring? > > After all, basestring exists specifically in order to encourage > > typetests of the form isinstance(x, basestring) -- wouldn't it be > > better if such tests could also catch "user-tweaked strings" > > derived from UserString ... ? > > I wish I had time for this thread today, but it doesn't look like it. > I just wish to express that we shouldn't lightly mess with this. I Aye aye cap'n -- we'll just be squabbling and NOT messing until your say-so, anyway;-). > added basestr specifically to support some code that was interested in > testing whether something was one of the *builtin* string types (or a > subclass thereof). But I don't recall details and won't be able to > dig them up today. basestring usage has become rather widespread today, anyway; the specific reason it was introduced is interesting to know, but looking at how it's used e.g. in the std lib is probably more meaningful. Of course, we always look at string-ish things with more interest because we use SO many of them, of all kinds, in Python itself and its stdlib. But -- numbers may be very important too, to some subset of Python's users... _and_ in a secondary sense to Python itself in some cases. Alex From barry at python.org Tue Nov 4 07:44:12 2003 From: barry at python.org (Barry Warsaw) Date: Tue Nov 4 07:44:19 2003 Subject: [Python-Dev] bsddb test case deadlocks fixed In-Reply-To: <200311040912.23213.aleaxit@yahoo.com> References: <200311030848.hA38mItM008890@localhost.localdomain> <200311030954.24191.aleaxit@yahoo.com> <20031104012310.GC17328@zot.electricrain.com> <200311040912.23213.aleaxit@yahoo.com> Message-ID: <1067949852.26825.3.camel@anthem> On Tue, 2003-11-04 at 03:12, Alex Martelli wrote: > Generally, extending functionality (as opposed to: fixing bugs or clarifying > docs) is not a goal for 2.3.* -- but I don't know if the fact that bsddb > isn't thread-safe in 2.3 counts as "a bug", or rather as functionality > deliberately kept limited, to avoid e.g such bugs as the one you've just > removed, and other possibilities you mention: > > > - multithreaded bsddb use could deadlock depending on how it is used. > > I think that just having the 2.3.* docs explicitly mention the lack of > thread-safety might then perhaps be better than backporting the changes. It's just the DB-API that's not thread-safe. The full blown BerkeleyDB API (a.k.a. bsddb3) should be fine. It sure is tempting to claim that the lack of DB-API thread-safety for BerkeleyDB is a bug and should be fixed for 2.3.*, but I think Greg should make the final determination. If it isn't, then yes, the docs need to clearly state that's the case. -Barry From barry at python.org Tue Nov 4 07:46:40 2003 From: barry at python.org (Barry Warsaw) Date: Tue Nov 4 07:46:45 2003 Subject: PEP 318 (was Re: [Python-Dev] Re: Guido's Magic Code was: inlinesort option) In-Reply-To: <200311040924.13894.aleaxit@yahoo.com> References: <5.1.0.14.0.20031103200302.01e595c0@mail.telecommunity.com> <200311040924.13894.aleaxit@yahoo.com> Message-ID: <1067949999.26825.6.camel@anthem> On Tue, 2003-11-04 at 03:24, Alex Martelli wrote: > class Foo is MetaFoo: > def foo() is staticmethod: My preference would be for metaclass specification to use "is" and for method decoration to use "as". They seem like different specializations that should have a different pronunciation. -Barry From anthony at interlink.com.au Tue Nov 4 07:55:15 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Nov 4 07:59:02 2003 Subject: [Python-Dev] bsddb test case deadlocks fixed In-Reply-To: <1067949852.26825.3.camel@anthem> Message-ID: <200311041255.hA4CtF1O007177@localhost.localdomain> >>> Barry Warsaw wrote > It's just the DB-API that's not thread-safe. The full blown BerkeleyDB > API (a.k.a. bsddb3) should be fine. > > It sure is tempting to claim that the lack of DB-API thread-safety for > BerkeleyDB is a bug and should be fixed for 2.3.*, but I think Greg > should make the final determination. If it isn't, then yes, the docs > need to clearly state that's the case. At the very least, the test suite should pass on the 23 branch. It currently hangs or crashes on many/most platforms I've tried it on. If this is because the test suite is doing multi-threaded things and that Just Won't Work, then the test suite should be fixed. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From barry at python.org Tue Nov 4 08:34:43 2003 From: barry at python.org (Barry Warsaw) Date: Tue Nov 4 08:34:55 2003 Subject: [Python-Dev] bsddb test case deadlocks fixed In-Reply-To: <200311041255.hA4CtF1O007177@localhost.localdomain> References: <200311041255.hA4CtF1O007177@localhost.localdomain> Message-ID: <1067952882.26825.37.camel@anthem> On Tue, 2003-11-04 at 07:55, Anthony Baxter wrote: > At the very least, the test suite should pass on the 23 branch. It currently > hangs or crashes on many/most platforms I've tried it on. If this is because > the test suite is doing multi-threaded things and that Just Won't Work, then > the test suite should be fixed. Not for me. Works fine on RH9, except for a crash in test_re. -Barry ====================================================================== FAIL: test_bug_418626 (__main__.ReTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "Lib/test/test_re.py", line 409, in test_bug_418626 self.assertRaises(RuntimeError, re.search, '(a|b)*?c', 10000*'ab'+'cd') File "/home/barry/projects/python23/Lib/unittest.py", line 295, in failUnlessRaises raise self.failureException, excName AssertionError: RuntimeError ====================================================================== FAIL: test_stack_overflow (__main__.ReTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "Lib/test/test_re.py", line 418, in test_stack_overflow self.assertRaises(RuntimeError, re.match, '(x)*', 50000*'x') File "/home/barry/projects/python23/Lib/unittest.py", line 295, in failUnlessRaises raise self.failureException, excName AssertionError: RuntimeError ---------------------------------------------------------------------- From aleaxit at yahoo.com Tue Nov 4 08:38:32 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Tue Nov 4 08:38:38 2003 Subject: [Python-Dev] bsddb test case deadlocks fixed In-Reply-To: <1067952882.26825.37.camel@anthem> References: <200311041255.hA4CtF1O007177@localhost.localdomain> <1067952882.26825.37.camel@anthem> Message-ID: <200311041438.32802.aleaxit@yahoo.com> On Tuesday 04 November 2003 02:34 pm, Barry Warsaw wrote: > On Tue, 2003-11-04 at 07:55, Anthony Baxter wrote: > > At the very least, the test suite should pass on the 23 branch. It > > currently hangs or crashes on many/most platforms I've tried it on. If > > this is because the test suite is doing multi-threaded things and that > > Just Won't Work, then the test suite should be fixed. > > Not for me. Works fine on RH9, except for a crash in test_re. Doesn't look like a crash but rather a failure I already discussed: > ====================================================================== > FAIL: test_bug_418626 (__main__.ReTests) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "Lib/test/test_re.py", line 409, in test_bug_418626 > self.assertRaises(RuntimeError, re.search, '(a|b)*?c', 10000*'ab'+'cd') > File "/home/barry/projects/python23/Lib/unittest.py", line 295, in > failUnlessRaises raise self.failureException, excName > AssertionError: RuntimeError The bug that this is testing for has gone away: the re engine doesn't stack overflow on this any more. The tests have been updated in 2.4 but not on the 2.3 branch. I mentioned that and asked whether I should just update the 2.3 tests, but apparently the concept is that this should rather be done by whoever fixed the bug, instead (or during the backport phase to prepare 2.3.3). Same, apparently, for the other test-failure you mention. Alex From mwh at python.net Tue Nov 4 08:38:54 2003 From: mwh at python.net (Michael Hudson) Date: Tue Nov 4 08:38:57 2003 Subject: [Python-Dev] bsddb test case deadlocks fixed In-Reply-To: <1067952882.26825.37.camel@anthem> (Barry Warsaw's message of "Tue, 04 Nov 2003 08:34:43 -0500") References: <200311041255.hA4CtF1O007177@localhost.localdomain> <1067952882.26825.37.camel@anthem> Message-ID: <2mwuag1cep.fsf@starship.python.net> Barry Warsaw writes: > On Tue, 2003-11-04 at 07:55, Anthony Baxter wrote: > >> At the very least, the test suite should pass on the 23 branch. It currently >> hangs or crashes on many/most platforms I've tried it on. If this is because >> the test suite is doing multi-threaded things and that Just Won't Work, then >> the test suite should be fixed. > > Not for me. Works fine on RH9, except for a crash in test_re. That's because some naughty person backported the _sre recursion removal but not the test suite to match. Oi! Cheers, mwh -- I have gathered a posie of other men's flowers, and nothing but the thread that binds them is my own. -- Montaigne From anthony at interlink.com.au Tue Nov 4 08:43:17 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Nov 4 08:47:07 2003 Subject: [Python-Dev] bsddb test case deadlocks fixed In-Reply-To: <200311041438.32802.aleaxit@yahoo.com> Message-ID: <200311041343.hA4DhHqm008168@localhost.localdomain> >>> Alex Martelli wrote > The bug that this is testing for has gone away: the re engine doesn't > stack overflow on this any more. The tests have been updated in 2.4 > but not on the 2.3 branch. I mentioned that and asked whether I should > just update the 2.3 tests, but apparently the concept is that this should > rather be done by whoever fixed the bug, instead (or during the backport > phase to prepare 2.3.3). Hm. I must have mis-spoken. If you see a bugfix that should go on the branch but hasn't, please feel completely free to do the backport. I have a mail folder with -checkins messages that need to be checked for backportage, but I only get to this periodically (and not at all in the last couple of weeks, alas). I do plan to clear this out sometime this week... Anthony -- Anthony Baxter It's never too late to have a happy childhood. From anthony at interlink.com.au Tue Nov 4 09:01:59 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Nov 4 09:05:46 2003 Subject: [Python-Dev] test warnings for "%x"%id(foo) on 2.3 branch Message-ID: <200311041402.hA4E20dR015943@localhost.localdomain> I'm seeing a couple of warnings that I don't remember seeing at the time of the 2.3.2 release. Given what they are, it's possible that it's just a random thing (whether the id is < 0 or not). test_minidom /home/anthony/src/py/23maint/Lib/xml/dom/minidom.py:797: FutureWarning: %u/%o/%x/%X of negative int will return a signed string in Python 2.4 and up return "" % (self.tagName, id(self)) test_repr /home/anthony/src/py/23maint/Lib/test/test_repr.py:91: FutureWarning: %u/%o/%x/%X of negative int will return a signed string in Python 2.4 and up eq(r(i3), (""%id(i3))) Anyone want to suggest an appropriate fix, or fix them? Otherwise I'll put it on the to-do list. Anthony From anthony at interlink.com.au Tue Nov 4 09:15:18 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Nov 4 09:19:19 2003 Subject: [Python-Dev] bsddb test case deadlocks fixed In-Reply-To: <2mwuag1cep.fsf@starship.python.net> Message-ID: <200311041415.hA4EFIwe016470@localhost.localdomain> >>> Michael Hudson wrote > That's because some naughty person backported the _sre recursion > removal but not the test suite to match. Oi! Fixed. The test_re_groupref_exists is still disabled on 2.3 branch, because it still fails on 2.3 Anthony -- Anthony Baxter It's never too late to have a happy childhood. From Paul.Moore at atosorigin.com Tue Nov 4 09:41:11 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Tue Nov 4 09:41:57 2003 Subject: [Python-Dev] PEP 322: Reverse Iteration Message-ID: <16E1010E4581B049ABC51D4975CEDB8802C099A4@UKDCX001.uk.int.atosorigin.com> From: Guido van Rossum [mailto:guido@python.org] > Such a thing would face zero opposition if it was part of itertools: > itertools.revrange([start, ] stop[, step]) makes total sense to me... I also like Alex's suggestion of itertools.irange([start,] stop[, step][,reverse=False]) I'd rather this than a revrange - that feels over-specialised, whereas an irange with a reverse keyword parameter seems natural. I'd still support this addition to itertools whether or not the reversed() builtin was implemented (although with irange, reversed() loses a lot of its use cases...) Paul. From mwh at python.net Tue Nov 4 10:08:57 2003 From: mwh at python.net (Michael Hudson) Date: Tue Nov 4 10:09:05 2003 Subject: [Python-Dev] [gmane.comp.sysutils.autotools.announce] Autoconf 2.58 released Message-ID: <2msml4188m.fsf@starship.python.net> We want to be using this asap to get rid of the aclocal hacks, right? I suppose waiting a *few* days for a brown-paper-bag-release situation would be prudent. Cheers, mwh -------------- next part -------------- An embedded message was scrubbed... From: Akim Demaille Subject: Autoconf 2.58 released Date: Tue, 04 Nov 2003 15:57:52 +0100 Size: 6974 Url: http://mail.python.org/pipermail/python-dev/attachments/20031104/e5c3cc5d/attachment.mht -------------- next part -------------- -- This is the fixed point problem again; since all some implementors do is implement the compiler and libraries for compiler writing, the language becomes good at writing compilers and not much else! -- Brian Rogoff, comp.lang.functional From guido at python.org Tue Nov 4 10:34:58 2003 From: guido at python.org (Guido van Rossum) Date: Tue Nov 4 10:35:06 2003 Subject: [Python-Dev] PEP 322: Reverse Iteration In-Reply-To: Your message of "Tue, 04 Nov 2003 14:41:11 GMT." <16E1010E4581B049ABC51D4975CEDB8802C099A4@UKDCX001.uk.int.atosorigin.com> References: <16E1010E4581B049ABC51D4975CEDB8802C099A4@UKDCX001.uk.int.atosorigin.com> Message-ID: <200311041534.hA4FYwa26001@12-236-54-216.client.attbi.com> > From: Guido van Rossum [mailto:guido@python.org] > > Such a thing would face zero opposition if it was part of itertools: > > itertools.revrange([start, ] stop[, step]) makes total sense to me... [Paul Moore] > I also like Alex's suggestion of > itertools.irange([start,] stop[, step][,reverse=False]) > > I'd rather this than a revrange - that feels over-specialised, whereas > an irange with a reverse keyword parameter seems natural. Hm, I don't know why it feels that way for you. It would be more verbose and I expect this option will always be a *constant*. (One of my rules-of-thumb for API design is that if you have a Boolean option whose value is expected to be always a constant, you've really defined two methods and API-wise you're better off with two separate methods. Although there are exceptions.) > I'd still support this addition to itertools whether or not the > reversed() builtin was implemented (although with irange, reversed() > loses a lot of its use cases...) Exactly: I am proposing this *because* it takes care of most of the use cases for reversed(), and reversed() doesn't need to be a builtin then. (If we can live with importing i[rev]range() from itertools, we can certainly live with importing the more powerful but less frequently needed reversed() from somewhere.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Nov 4 10:53:08 2003 From: guido at python.org (Guido van Rossum) Date: Tue Nov 4 10:53:14 2003 Subject: [Python-Dev] PEP 322: Reverse Iteration In-Reply-To: Your message of "Tue, 04 Nov 2003 10:04:38 +0100." <200311041004.38285.aleaxit@yahoo.com> References: <001201c3a252$2b7965a0$e841fea9@oemcomputer> <200311032337.16147.aleaxit@yahoo.com> <200311032326.hA3NQW124882@12-236-54-216.client.attbi.com> <200311041004.38285.aleaxit@yahoo.com> Message-ID: <200311041553.hA4Fr8226136@12-236-54-216.client.attbi.com> > And what about irange with an optional reverse= argument? I did have > (and write about on c.l.py) a case where I currently code: > > if godown: > iseq = xrange(len(sq)-1, start-1, -1) > else: > iseq = xrange(start, len(sq), 1) > for index in iseq: > ... > > and would be just delighted to be able to code, instead, > > for index in irange(start, len(sq), reverse=godown): > ... > > Even when the need to reverse can more easily be hardwired in > the source (a more common case), would > > for index in irange(start, stop, reverse=True): > > be really so much worse than > > for index in revrange(start, stop): > > ...? Darn. At your recommendation I tried reading my inbox in reverse today (how appropriate... :-) and I missed this use case when I said I'd rather have two functions. Oh well. I do think that the savings in typing from having a reverse= keyword for that one use case are easily outnumbered by the extra typing for the much more common use case that has reverse=True. But really, I could live with either one, so Raymond can decide based upon the evidence, and as I said, either way having this in itertools is an argument against making reversed() a builtin. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Nov 4 10:55:32 2003 From: guido at python.org (Guido van Rossum) Date: Tue Nov 4 10:55:52 2003 Subject: PEP 318 (was Re: [Python-Dev] Re: Guido's Magic Code was: inlinesort option) In-Reply-To: Your message of "Tue, 04 Nov 2003 09:24:13 +0100." <200311040924.13894.aleaxit@yahoo.com> References: <5.1.0.14.0.20031103200302.01e595c0@mail.telecommunity.com> <200311040924.13894.aleaxit@yahoo.com> Message-ID: <200311041555.hA4FtWg26154@12-236-54-216.client.attbi.com> > Hmmm -- why is: > > class Foo: > __metaclass__ = MetaFoo > ... > > "ok", compared to e.g.: > > class Foo is MetaFoo: > ... > > while, again for example, > > def foo(): > ... > foo = staticmethod(foo) > > is presumably deemed "not ok" compared to e.g.: > > def foo() is staticmethod: > ... > > ??? > > Both cases of current syntax do the job (perhaps not elegantly but > they do) and in both cases a new syntax would increase elegance. Perhaps (I haven't really thought this through) because you can place the __metaclass__ thing right at the top of the class definition, while the staticmethod thing must necessarily come after the entire method definition. Also I expect that __metaclass__ usage is rather more rare than static or class methods are. And often one introduces a metaclass by inheriting from a base class whose sole (or main) purpose is to change the metaclass -- just like inheriting from object. --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Tue Nov 4 11:30:11 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Nov 4 11:30:23 2003 Subject: PEP 318 (was Re: [Python-Dev] Re: Guido's Magic Code was: inlinesort option) Message-ID: <5.1.1.6.0.20031104112105.031b80e0@telecommunity.com> At 09:24 AM 11/4/03 +0100, Alex Martelli wrote: >On Tuesday 04 November 2003 02:09 am, Phillip J. Eby wrote: > ... > > I'm not all that enthused about the metaclass usage, mainly because there's > > already an okay syntax (__metaclass__) for it. I'd rather that class > >Hmmm -- why is: > >class Foo: > __metaclass__ = MetaFoo > ... > >"ok", compared to e.g.: > >class Foo is MetaFoo: > ... > >while, again for example, > > def foo(): > ... > foo = staticmethod(foo) > >is presumably deemed "not ok" compared to e.g.: > > def foo() is staticmethod: > ... > >??? Isn't it obvious from the above? Note the positioning of the '...' in all but the third example you've shown. :) > > decorators (if added) were decorators in the same way as function > > decorators. Why? Because I think that correct, combinable class > > decorators are probably easier for most people to write than correct, > > combinable metaclasses, and they are more easily combined than metaclasses > > are. > >Combinable metaclasses may not be trivial to write, but with multiple >inheritance it will often be feasible (except, presumably, when implied >layout or __new__ have conflicting requirements). I guess my point is that it's harder to *learn* how to write a co-operative metaclass, than it is to simply *write* a co-operative decorator. A metaclass must explicitly invoke its collaborators, but a decorator is just a simple function; the chaining is external. Now, certainly you and I both know how to write our metaclasses co-operatively, but I believe both of us have also been told (repeatedly) that we're not typical Python programmers. :) > Of course, not having use >cases of either custom metaclasses or class decorators in production use, the >iscussion does risk being a bit abstract. Did you have any specific use case >in mind? PyProtocols has an API call that "wants" to be a class decorator, to declare interface information about the class, e.g: class MyClass is protocols.instancesProvide(IFoo): .... But, since there are no such things as class decorators, it actually uses a sys._getframe() hack to replace the metaclass and simulate decoratorness. Steve Alexander originally proposed the idea as an implementation technique for interface declarations in Zope 3, and I worked up the actual implementation, that's now shared by PyProtocols and Zope 3. So the above is actually rendered now as: class MyClass: protocols.advise(instancesProvide=[IFoo]) (Note that any explicit __metaclass__ declaration has to come *before* the advise() call.) The principal limitation of this technique is that writing co-operative decorators of this sort is just as difficult as writing co-operative metaclasses. So, PyProtocols and Zope 3 include a library function, 'addClassAdvisor(decorator_callable)' which adds a decorator function (in a PEP 218-style execution order) to those that will be called on the resulting class. IOW, we created a decorator mechanism for classes that is almost identical to the PEP 218 mechanism for functions, to make it easy to call functions on a created class, using declarations that occur near the class statement. This was specifically to make it easier to do simple decorator-like things, without writing metaclasses, and thus not interfering with user-supplied metaclasses. Note, by the way, that since you can only have one explicit metaclass, and Python does not automatically generate new metaclasses, users must explicitly mix metaclasses in order to use them. That's all well and good for gurus such as ourselves, but if you're creating a framework that wants to play nicely with other frameworks, and is for non-guru users, then metaclasses are right out unless they're the *only* way to achieve the desired effect. For supplying framework metadata, decorators are an adequate mechanism that's simpler to implement, and are therefore preferable. Since the 'addClassAdvisor()' mechanism has been available, I've used it for other framework metadata annotations, such as security restrictions, and to perform miscellaneous other "postprocessing" operations on classes. Now, in the time since Steve Alexander first proposed the idea, I've actually grown to like the in-body declaration style for classes, and it's possible that PEP 218-style declaration for classes would be more unwieldy. So I'm only +0 on having a class decorator syntax at all. But I do think that if there *is* a class decorator syntax, its semantics should exactly match function decorator syntax, and am therefore -1 on it being metaclass syntax. In my experience, non-guru usage of metaclasses is usually by inheriting the metaclass from a framework base class, and this is the "right way to do it" because the user shouldn't need to know about metaclasses unless they are mixing them. (And if Python mixed them for you, there'd be no need for non-gurus to know about metaclasses at all.) From pje at telecommunity.com Tue Nov 4 11:35:49 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Nov 4 11:35:57 2003 Subject: PEP 318 (was Re: [Python-Dev] Re: Guido's Magic Code was: inlinesort option) In-Reply-To: <5.1.1.6.0.20031104112105.031b80e0@telecommunity.com> Message-ID: <5.1.1.6.0.20031104113520.031bb290@telecommunity.com> At 11:30 AM 11/4/03 -0500, Phillip J. Eby wrote: >'addClassAdvisor(decorator_callable)' which adds a decorator function (in >a PEP 218-style execution order) to those that will be called on the >resulting class. > >IOW, we created a decorator mechanism for classes that is almost identical >to the PEP 218 mechanism for functions, to make it easy to call functions >on a created class, using declarations that occur near the class >statement. This was specifically to make it easier to do simple >decorator-like things, without writing metaclasses, and thus not >interfering with user-supplied metaclasses. Oops. I meant PEP 318, obviously. From tim.one at comcast.net Tue Nov 4 13:58:51 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Nov 4 13:58:56 2003 Subject: [Python-Dev] XXX undetected error (why=3) In-Reply-To: <1067926212.19568.47.camel@localhost.localdomain> Message-ID: [Jeremy Hylton] > ... > I'd like to change the code for the check to call Py_FatalError() > instead of printing a message to stderr. The check is only enabled > during a debug build. I'd be much happier debugging this from a core > dump than trying to figure out what happened to cause the message to > be printed. > > Any objections? +1. Having catastrophic errors fly by on stderr isn't a good idea even without the (strong) debuggability argument. From python at rcn.com Tue Nov 4 14:50:14 2003 From: python at rcn.com (Raymond Hettinger) Date: Tue Nov 4 14:50:29 2003 Subject: [Python-Dev] PEP 322: Reverse Iteration In-Reply-To: <200311041553.hA4Fr8226136@12-236-54-216.client.attbi.com> Message-ID: <002801c3a30c$def8fae0$6017c797@oemcomputer> > I do think that the savings in typing from having a reverse= keyword > for that one use case are easily outnumbered by the extra typing for > the much more common use case that has reverse=True. > > But really, I could live with either one, so Raymond can decide based > upon the evidence, and as I said, either way having this in itertools > is an argument against making reversed() a builtin. Candidate itertools are expected to accept general iterables as inputs and to work well with each other. This function accepts only sequences as inputs and cannot handle outputs from other itertools. IOW, it doesn't belong in the toolset. As proposed, the reversed() function is much more general than a backwards xrange. Handling any sequence is a nice plus and should not be tossed away. I would like reversed() to be usable anywhere someone is tempted to write seq[::-1]. reversed() is a fundamental looping construct. Tucking it away in another module in not in harmony with having it readily accessible for everyday work. Having dotted access to the function makes its use less attractive. My original proposal was to have methods attached to a few sequence types. I was deluged with mail pushing toward a more universal builtin function and that's what is on the table now. There have been many notes of support but their voices have been partially drowned by naming discussions and some weird ideas on places to put it. I do not support putting it in another namespace, turning it into a keyword argument, or making it into yet another version of xrange. What's out there now is simple and direct. Everyone, please accept it as is. Raymond Hettinger From guido at python.org Tue Nov 4 15:31:02 2003 From: guido at python.org (Guido van Rossum) Date: Tue Nov 4 15:31:36 2003 Subject: [Python-Dev] PEP 322: Reverse Iteration In-Reply-To: Your message of "Tue, 04 Nov 2003 14:50:14 EST." <002801c3a30c$def8fae0$6017c797@oemcomputer> References: <002801c3a30c$def8fae0$6017c797@oemcomputer> Message-ID: <200311042031.hA4KV2B26709@12-236-54-216.client.attbi.com> > > I do think that the savings in typing from having a reverse= keyword > > for that one use case are easily outnumbered by the extra typing for > > the much more common use case that has reverse=True. > > > > But really, I could live with either one, so Raymond can decide based > > upon the evidence, and as I said, either way having this in itertools > > is an argument against making reversed() a builtin. > > Candidate itertools are expected to accept general iterables as inputs > and to work well with each other. This function accepts only sequences > as inputs and cannot handle outputs from other itertools. IOW, it > doesn't belong in the toolset. Ah, you misunderstood. I was only arguing for irange(..., reverse=True) or irevrange(...); since irange() is already in itertools, there can clearly be no objection to adding the reverse option somehow. But since (a) at least 60% of the examples are satisfied with something like irevrange(), and (b) having irevrange() in itertool is acceptable, my (c) conclusion is that reversed() doesn't need to be a builtin either. I didn't say it had to go into itertools! > As proposed, the reversed() function is much more general than a > backwards xrange. Handling any sequence is a nice plus and should not > be tossed away. I would like reversed() to be usable anywhere someone > is tempted to write seq[::-1]. Sure. But is this needed often enough to deserve adding a builtin? If you can prove it would be used as frequently as sum() you'd have a point. > reversed() is a fundamental looping construct. Tucking it away in > another module in not in harmony with having it readily accessible for > everyday work. Having dotted access to the function makes its use less > attractive. The same can be said for several functions in itertools... > My original proposal was to have methods attached to a few sequence > types. I was deluged with mail pushing toward a more universal builtin > function and that's what is on the table now. There have been many > notes of support but their voices have been partially drowned by naming > discussions and some weird ideas on places to put it. > > I do not support putting it in another namespace, turning it into a > keyword argument, or making it into yet another version of xrange. > What's out there now is simple and direct. Everyone, please accept it > as is. Sorry, I have to push back on that. We still need to contain the growth of the language, and that includes the set of builtins and (to a lesser extent) the standard library. You have to show that this is truly important enough to add to the builtins. Maybe you can propose to take away an existing builtin to make room *first*. --Guido van Rossum (home page: http://www.python.org/~guido/) From jack at performancedrivers.com Tue Nov 4 15:33:06 2003 From: jack at performancedrivers.com (Jack Diederich) Date: Tue Nov 4 15:39:42 2003 Subject: [Python-Dev] check-in policy, trunk vs maintenance branch In-Reply-To: <1067873793.19568.27.camel@localhost.localdomain>; from jeremy@alum.mit.edu on Mon, Nov 03, 2003 at 10:36:33AM -0500 References: <200311031347.10995.aleaxit@yahoo.com> <1067873793.19568.27.camel@localhost.localdomain> Message-ID: <20031104153306.E22751@localhost.localdomain> On Mon, Nov 03, 2003 at 10:36:33AM -0500, Jeremy Hylton wrote: > On Mon, 2003-11-03 at 07:47, Alex Martelli wrote: > > I made a few bugfix check-ins to the 2.3 maintenance branch this weekend and > > Michael Hudson commented that he thinks that so doing is a bad idea, that bug > > fixes should filter from the 2.4 trunk to the 2.3 branch and not the other way > > around. Is this indeed the policy (have I missed some guidelines about it)? > > It is customary to fix things on the trunk first, then backport to > branches where it is needed. People who maintain branches often watch > the trunk to look for things that need to be backported. As far as I > know, no one watches the branches to look for things to port to the > trunk. It may get lost if it's only on a branch. > > The best thing to do is your option [a]: Fix it in both places at once. > Then there's nothing to be forgotten when time for a release rolls > around. > If we aren't using CVS tagging features, it just falls under personal preference. If we are, it is easier to import all the changes from the branch to the trunk, tag it is 'import_to_trunk_N' and then next time something changes just look at the diff between the 'import_to_trunk_N' tag to now, mark as 'import_to_trunk_N+1', rinse and repeat. Doing it w/ tags has the benefit that you can do a one-liner that says 'try to import any changes from the branch.' -jackdied From aleaxit at yahoo.com Tue Nov 4 15:50:32 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Tue Nov 4 15:50:38 2003 Subject: [Python-Dev] test warnings for "%x"%id(foo) on 2.3 branch In-Reply-To: <200311041402.hA4E20dR015943@localhost.localdomain> References: <200311041402.hA4E20dR015943@localhost.localdomain> Message-ID: <200311042150.32358.aleaxit@yahoo.com> On Tuesday 04 November 2003 03:01 pm, Anthony Baxter wrote: > I'm seeing a couple of warnings that I don't remember seeing at > the time of the 2.3.2 release. Given what they are, it's possible > that it's just a random thing (whether the id is < 0 or not). > > test_minidom > /home/anthony/src/py/23maint/Lib/xml/dom/minidom.py:797: FutureWarning: > %u/%o/%x/%X of negative int will return a signed string in Python 2.4 and > up return "" % (self.tagName, id(self)) > > test_repr > /home/anthony/src/py/23maint/Lib/test/test_repr.py:91: FutureWarning: > %u/%o/%x/%X of negative int will return a signed string in Python 2.4 and > up eq(r(i3), (""%id(i3))) > > Anyone want to suggest an appropriate fix, or fix them? Otherwise I'll > put it on the to-do list. Not sure if it's "appropriate", but what other tests appear to be doing is to explicitly mark warnings (& specifically this one) as ignored: regrtest.py:# I see no other way to suppress these warnings; regrtest.py:warnings.filterwarnings("ignore", "hex/oct constants", FutureWarning, regrtest.py: warnings.filterwarnings("ignore", "hex/oct constants", FutureWarning, test_builtin.py:warnings.filterwarnings("ignore", "hex../oct.. of negative int", test_builtin.py: FutureWarning, __name__) test_compile.py: warnings.filterwarnings("ignore", "hex/oct constants", FutureWarning) test_compile.py: warnings.filterwarnings("ignore", "hex.* of negative int", FutureWarning) test_hexoct.py:warnings.filterwarnings("ignore", "hex/oct constants", FutureWarning, Alex From pf_moore at yahoo.co.uk Tue Nov 4 16:00:40 2003 From: pf_moore at yahoo.co.uk (Paul Moore) Date: Tue Nov 4 16:02:11 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration References: <002801c3a30c$def8fae0$6017c797@oemcomputer> <200311042031.hA4KV2B26709@12-236-54-216.client.attbi.com> Message-ID: Guido van Rossum writes: >> Candidate itertools are expected to accept general iterables as inputs >> and to work well with each other. This function accepts only sequences >> as inputs and cannot handle outputs from other itertools. IOW, it >> doesn't belong in the toolset. > > Ah, you misunderstood. I was only arguing for irange(..., > reverse=True) or irevrange(...); since irange() is already in > itertools, there can clearly be no objection to adding the reverse > option somehow. Actually, irange() is not in itertools at the moment. Raymond could argue that irange() isn't a suitable candidate for itertools, but given the existence of count() and repeat(), I suspect that isn't a particularly convincing argument. Arguing that irange() is too similar to range() and xrange() is closer, but I'd say that irange is the *right* way to do it. [x]range should be relegated to backward-compatibility tools, much like the file xreadlines() method and the xreadlines module. Raymond - are you dead set against an irange() function in itertools? Assume for now that it's a simple version without a reverse argument. > But since (a) at least 60% of the examples are satisfied with > something like irevrange(), and (b) having irevrange() in itertool > is acceptable, my (c) conclusion is that reversed() doesn't need to > be a builtin either. I didn't say it had to go into itertools! Raymond seems very protective of the concept of reversed() as a builtin. I'm not saying that's wrong, but I *personally* haven't seen enough evidence yet to be convinced either way. The i{rev}range() issues seem to be getting caught up in this. My view: 1. I think a "plain" irange() would be useful to add into itertools. In the (very) long term, it could replace [x]range, but that's less of an issue to me. 2. A way of getting a reversed {i,x}range() has some clear use cases. This seems useful to add (although here, I'm going on evidence of others' code - in my code I tend to loop over containers much more often than over ranges of numbers). 3. A general reversed() function seems theoretically useful, but the concrete use cases seem fairly thin on the ground. I'm broadly in favour, because I (possibly like Raymond) have a bias for clean, general solutions. But I can see that "practicality beats purity" may hold here. My proposals: 1. Add a plain irange() to itertools. 2. IF the general reversed() is deemed too theoretical, add EITHER a reverse argument to irange, or an irevrange to itertools. Both feel to me a little iffy, but that's my generality bias again. 3. IF the general reversed() is accepted (builtin or not) leave the irange function in its simple form. > Sorry, I have to push back on that. We still need to contain the > growth of the language, and that includes the set of builtins and (to > a lesser extent) the standard library. You have to show that this is > truly important enough to add to the builtins. Maybe you can propose > to take away an existing builtin to make room *first*. xrange (in favour of itertools.irange())? :-) [Personally, I'm still not 100% sure I see Raymond's strong reluctance to have reversed() in itertools, but as both are his babies, and he clearly has a very definite vision for both, I don't feel that I want to argue this one with him]. Paul. -- This signature intentionally left blank From neal at metaslash.com Tue Nov 4 16:11:19 2003 From: neal at metaslash.com (Neal Norwitz) Date: Tue Nov 4 16:11:27 2003 Subject: [Python-Dev] PEP 322: Reverse Iteration In-Reply-To: <200311042031.hA4KV2B26709@12-236-54-216.client.attbi.com> References: <002801c3a30c$def8fae0$6017c797@oemcomputer> <200311042031.hA4KV2B26709@12-236-54-216.client.attbi.com> Message-ID: <20031104211119.GS7212@epoch.metaslash.com> On Tue, Nov 04, 2003 at 12:31:02PM -0800, Guido van Rossum wrote: > > We still need to contain the growth of the language, and that > includes the set of builtins and (to a lesser extent) the standard > library. Maybe you can propose to take away an existing builtin to > make room *first*. Oh boy! You opened a can of worms. :-) I won't suggest adding any builtins (including reverse), but I will suggest (re)moving quite a few. This is a suggestion towards the future. I realize nothing should be removed in 2.4. Currently, we have >>> len(filter(lambda x: x[0].islower(), dir(__builtins__))) 66 Below are the builtins I'd like to see removed. I've given a short cryptic comment for many. Several can be removed because they become redundant (e.g., long, open, raw_input, xrange). apply, buffer (or replace implementation with something useful), coerce, intern, long (int/long unification), open (same as file) (maybe pending deprecation in 2.4?), raw_input (become input), reduce (assuming another mechanism will exist) (deprecate 2.4?), reload (some other mechanism related to import?), slice (or maybe move to a module), xrange (unify with range) For 2.4 I'd suggest we officially deprecate: apply, coerce, intern. Pending deprecation for: open, reduce, and maybe slice. I don't know how to deal with input/raw_input. While it seems goofy, perhaps something like this: 2.4 deprecate input 2.5 make input == raw_input, pending deprecation for raw_input 2.6 deprecate raw_input 2.7 remove raw_input Or just wait for 3.0. :-) Math related builtins: abs, complex, divmod, pow, round, sum Perhaps, some of these could be moved to a module Move these to sys?: hash, id Formating utilities (move some/all to some module): chr, hex, oct, ord, repr, unichr Not sure about these, chr, oct and unichr seem to be the least used in my code. For any builtin that's moved, make a pending deprecation when used as a builtin for 2.4 and full deprecation for 2.5. For anything that is likely to be (re)moved in 3.0, perhaps we should at least use pending deprecations now. Even if we don't know what will happen, at least people start getting an idea of the direction for the future. Doing-my-best-to-shrink-the-language, :-) Neal From aleaxit at yahoo.com Tue Nov 4 16:47:45 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Tue Nov 4 16:47:52 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: References: <002801c3a30c$def8fae0$6017c797@oemcomputer> <200311042031.hA4KV2B26709@12-236-54-216.client.attbi.com> Message-ID: <200311042247.45386.aleaxit@yahoo.com> On Tuesday 04 November 2003 10:00 pm, Paul Moore wrote: ... > Arguing that irange() is too similar to range() and xrange() is > closer, but I'd say that irange is the *right* way to do it. [x]range Agreed. The reverse= optional argument would be delightful gravy for me, but I could do without and not suffer too badly IF reversed was available: where I now have if godown: iseq = xrange(len(sq)-1, start-1, -1) else: iseq = xrange(start, len(sq), 1) for index in iseq: ... and dream of compacting it all the way to: for index in irange(start, len(sq), reverse=godown): ... I could do something like: iseq = irange(start, len(sq)) if godown: iseq = reversed(iseq) for index in iseq: ... And after all I have found only that one use case where I need to loop either forward or backward in my Python code. I expected more because I remember how horribly constraining Pascal's strong separation between iteration forwards and backwards (for i := a to b do ... vs for i :- b downto a do...) felt compared to being able to just code "DO 10 I = IST, ITO, IDELTA" in Fortran -- lo that many years ago. I guess I write different kinds of programs these days. > should be relegated to backward-compatibility tools, much like the > file xreadlines() method and the xreadlines module. Seconded. > Raymond - are you dead set against an irange() function in itertools? > Assume for now that it's a simple version without a reverse argument. ...and that we ALSO get your cherished reversed built-in -- there is most emphatically no mutual incompatibility between them... > Raymond seems very protective of the concept of reversed() as a > builtin. I'm not saying that's wrong, but I *personally* haven't seen > enough evidence yet to be convinced either way. The i{rev}range() I'm slowly coming to accept it -- it's sure way more appropriate as a built-in than many that currently crowd the builtins namespace. > issues seem to be getting caught up in this. > > My view: > > 1. I think a "plain" irange() would be useful to add into itertools. Yes! > In the (very) long term, it could replace [x]range, but that's less > of an issue to me. It's probably more important to me, I guess. > 2. A way of getting a reversed {i,x}range() has some clear use cases. > This seems useful to add (although here, I'm going on evidence of > others' code - in my code I tend to loop over containers much more > often than over ranges of numbers). Me too -- by now I've replaced basically all the old for i in xrange(len(seq)): value = seq[i] ... into shiny new for i, value in enumerate(seq): ... Admittedly some reverse iterations are like that -- and being able to code for i in revrange(len(seq)): value = seq[i] ... while better than (eek) "for i in xrange(len(seq)-1, -1, -1)):", still is NOT quite as smooth as for value in reversed(seq): ... or reversed(enumerate(seq)) if the index IS needed. BTW, I do have spots where: seq.reverse() try: #...region of code where seq is used reversed... finally: seq.reverse() # put it back rightside-up again (including one where I had forgotten the try/finally -- as I found out while looking for these cases...). Of course this only works because seq is a list and it might have all sort of downsides (e.g. if this was multithreaded code, which it isn't, it might interfere with other uses of seq; if seq was a global this function couldn't be recursive any more; ...). All in all I think those would benefit from reversed(seq) even if it has to be called in more spots within the "region of code". > 3. A general reversed() function seems theoretically useful, but the > concrete use cases seem fairly thin on the ground. I'm broadly in > favour, because I (possibly like Raymond) have a bias for clean, > general solutions. But I can see that "practicality beats purity" > may hold here. Funny, I originally felt queasy (about it being a built-in only) for "purity" about the overcrowded builtins namespace. I'm seeing enough use cases (even if irange DID grow a wonderful reverse= optional arg...) that practicality is gradually winning me over. I.e., practicality beats purity is what is winning me over, while to you it suggests dampening your "broadly in favour"... we both mention iterating over sequences more than over indices, but to me that's a suggestion that reversed has a place, while you don't seem to think that follows... > My proposals: > > 1. Add a plain irange() to itertools. Yes!!! > 2. IF the general reversed() is deemed too theoretical, add EITHER a > reverse argument to irange, or an irevrange to itertools. Both feel > to me a little iffy, but that's my generality bias again. > 3. IF the general reversed() is accepted (builtin or not) leave the > irange function in its simple form. Sigh, OK, I guess. If I had to choose, reversed + irange plain only, or no reversed + irange w/optional argument, I guess I would grudgingly choose the former (having shifted my opinion). But I'd really like BOTH reversed (lets me iterate on sequence rather than on indices, often) AND irange with optional reversed= ... no irevrange please... > > Sorry, I have to push back on that. We still need to contain the > > growth of the language, and that includes the set of builtins and (to > > a lesser extent) the standard library. You have to show that this is > > truly important enough to add to the builtins. Maybe you can propose > > to take away an existing builtin to make room *first*. > > xrange (in favour of itertools.irange())? :-) Seconded. Neal Norwitz' "little list" has plenty more useful suggestions, though I wouldn't accept it as entirely sound. > [Personally, I'm still not 100% sure I see Raymond's strong reluctance > to have reversed() in itertools, but as both are his babies, and he Actually, I do: itertools shouldn't be limited to accepting sequences, they should accept iterator arguments. > clearly has a very definite vision for both, I don't feel that I want > to argue this one with him]. You have a point -- Raymond definitely HAS an overall vision on iterators &c and he's deserved lots of listening-to even though we can't quite see some specific point. Alex From martin at v.loewis.de Tue Nov 4 16:51:49 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Tue Nov 4 16:52:20 2003 Subject: [Python-Dev] test warnings for "%x"%id(foo) on 2.3 branch In-Reply-To: <200311041402.hA4E20dR015943@localhost.localdomain> References: <200311041402.hA4E20dR015943@localhost.localdomain> Message-ID: Anthony Baxter writes: > I'm seeing a couple of warnings that I don't remember seeing at > the time of the 2.3.2 release. Given what they are, it's possible > that it's just a random thing (whether the id is < 0 or not). What system is this on? I find it surprising that the id is < 0: on a 32-bit machine, this should only happen if you allocate more than 2GB. > Anyone want to suggest an appropriate fix, or fix them? Otherwise I'll > put it on the to-do list. I'd reformulate them as "%x" % (id(o) & 0xffffffffL) Of course, you have to replace 0xffffffffL with (unsigned)-1 of the system (i.e. 2l*sys.maxint+1). I wonder whether creating a function sys.unsigned(id(o)) would be appropriate, which returns its arguments for positive numbers, and PyLong_FromUnsignedLong((unsigned)arg) otherwise. Regards, Martin From martin at v.loewis.de Tue Nov 4 16:57:10 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Tue Nov 4 16:57:24 2003 Subject: [Python-Dev] Autoconf 2.58 released In-Reply-To: <2msml4188m.fsf@starship.python.net> References: <2msml4188m.fsf@starship.python.net> Message-ID: Michael Hudson writes: > We want to be using this asap to get rid of the aclocal hacks, right? Sounds good to me. If you volunteer, please feel free to update AC_PREREQ when you consider it appropriate. We need to consider whether the bump the autoconf version used on the 2.3 branch, or whether developers would be required to use to autoconf 2.5x releases. Regards, Martin From tdelaney at avaya.com Tue Nov 4 16:57:43 2003 From: tdelaney at avaya.com (Delaney, Timothy C (Timothy)) Date: Tue Nov 4 16:57:50 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DEDF5DBE@au3010avexu1.global.avaya.com> > From: Alex Martelli [mailto:aleaxit@yahoo.com] > > or reversed(enumerate(seq)) if the index IS needed. Hmm - wouldn't this give an iterator that returned two values - an iterable for the seq, and an iterable for the indexes of seq? I would think this would need to be: reversed(*enumerate(seq)) with the presumption being that reversed would reverse each parameter and return them in lockstep. Tim Delaney From jeremy at zope.com Tue Nov 4 16:59:19 2003 From: jeremy at zope.com (Jeremy Hylton) Date: Tue Nov 4 17:03:00 2003 Subject: [Python-Dev] check-in policy, trunk vs maintenance branch In-Reply-To: <20031104153306.E22751@localhost.localdomain> References: <200311031347.10995.aleaxit@yahoo.com> <1067873793.19568.27.camel@localhost.localdomain> <20031104153306.E22751@localhost.localdomain> Message-ID: <1067983159.19568.64.camel@localhost.localdomain> On Tue, 2003-11-04 at 15:33, Jack Diederich wrote: > On Mon, Nov 03, 2003 at 10:36:33AM -0500, Jeremy Hylton wrote: > > On Mon, 2003-11-03 at 07:47, Alex Martelli wrote: > > > I made a few bugfix check-ins to the 2.3 maintenance branch this weekend and > > > Michael Hudson commented that he thinks that so doing is a bad idea, that bug > > > fixes should filter from the 2.4 trunk to the 2.3 branch and not the other way > > > around. Is this indeed the policy (have I missed some guidelines about it)? > > > > It is customary to fix things on the trunk first, then backport to > > branches where it is needed. People who maintain branches often watch > > the trunk to look for things that need to be backported. As far as I > > know, no one watches the branches to look for things to port to the > > trunk. It may get lost if it's only on a branch. > > > > The best thing to do is your option [a]: Fix it in both places at once. > > Then there's nothing to be forgotten when time for a release rolls > > around. > > > > If we aren't using CVS tagging features, it just falls under personal > preference. I think there's more than personal preference involved. We ought to be consistent in how we apply patches to avoid missing things. > If we are, it is easier to import all the changes from > the branch to the trunk, tag it is 'import_to_trunk_N' and then > next time something changes just look at the diff between the > 'import_to_trunk_N' tag to now, mark as 'import_to_trunk_N+1', rinse > and repeat. Doing it w/ tags has the benefit that you can do > a one-liner that says 'try to import any changes from the branch.' The branch has bug fixes and changes that don't necessarily show up on the trunk. For example, a bug that exists in code that was removed or completely rewritten on the trunk. It also doesn't address the stability issue: A maintenance branch gets less testing, and committers should be cautious about changes. Committing on the trunk first gives you a chance to test out the changes there and get feedback. Jeremy From aleaxit at yahoo.com Tue Nov 4 17:04:00 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Tue Nov 4 17:04:05 2003 Subject: [Python-Dev] PEP 322: Reverse Iteration In-Reply-To: <200311042031.hA4KV2B26709@12-236-54-216.client.attbi.com> References: <002801c3a30c$def8fae0$6017c797@oemcomputer> <200311042031.hA4KV2B26709@12-236-54-216.client.attbi.com> Message-ID: <200311042304.00006.aleaxit@yahoo.com> On Tuesday 04 November 2003 09:31 pm, Guido van Rossum wrote: ... > option somehow. But since (a) at least 60% of the examples are > satisfied with something like irevrange(), and (b) having irevrange() I'm not sure it's as high as that, depending on how strictly one wants to define "satisfied". Say I want to (do some change, say call f() on each item) the prefix of a list of numbers, stopping at the first zero I meet. In old Python: for i in xrange(len(listofnum)): value = listofnum[i] if not value: break listofnum[i] = f(value) but today I'd rather code this: for i, value in enumerate(listofnum): if not value: break listofnum[i] = f(value) more concise and neat. So, what if I want to do it to the _suffix_ , the tail, of the list, stopping at the first zero I meet going backwards? W/o irevrange, eek: for i in xrange(-1, -len(listofnum)-1, -1): # usual 3-line body or equivalently for i in xrange(len(listofnum)-1, -1, -1): # usual 3-line body but irevrange would only fix the for clause itself: for i in irevrange(len(listofnum)): # usual 3-line body the body remains stuck at the old-python 3-liner. reversed does better: for i, value in reversed(enumerate(listofnum)): if not value: break listofnum[i] = f(value) i.e. it lets me use the "modern" Python idiom. If you consider thise case "satisfied" by irevrange then maybe 60% is roughly right. But it seems to me that only reversed "satisfies" it fully. > > be tossed away. I would like reversed() to be usable anywhere someone > > is tempted to write seq[::-1]. > > Sure. But is this needed often enough to deserve adding a builtin? I used to think it didn't, but the more at looked at code with this in mind, the more I'm convincing myself otherwise. > If you can prove it would be used as frequently as sum() you'd have a > point. No, not as frequently as sum, but then this applies to many other builtins. > > reversed() is a fundamental looping construct. Tucking it away in > > another module in not in harmony with having it readily accessible for > > everyday work. Having dotted access to the function makes its use less > > attractive. > > The same can be said for several functions in itertools... True, but adding ONE builtin is not like adding half a dozen. > > What's out there now is simple and direct. Everyone, please accept it > > as is. > > Sorry, I have to push back on that. We still need to contain the > growth of the language, and that includes the set of builtins and (to > a lesser extent) the standard library. You have to show that this is > truly important enough to add to the builtins. Maybe you can propose > to take away an existing builtin to make room *first*. I don't know if Raymond has responded to this specific request, but I've seen other responses and I entirely concur that LOTS of existing built-ins -- such as apply, coerce, filter, input, intern, oct, round -- could be usefully be deprecated/removed/moved elsewhere (e.g. to a new "legacy.py" module of short one-liners for apply, filter, ... -- to math for round, oct ... -- legacy could also 'from math import' the latter names, so that "from legacy import *" would make old modules keep working///). Alex From aleaxit at yahoo.com Tue Nov 4 17:09:24 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Tue Nov 4 17:09:29 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <338366A6D2E2CA4C9DAEAE652E12A1DEDF5DBE@au3010avexu1.global.avaya.com> References: <338366A6D2E2CA4C9DAEAE652E12A1DEDF5DBE@au3010avexu1.global.avaya.com> Message-ID: <200311042309.24836.aleaxit@yahoo.com> On Tuesday 04 November 2003 10:57 pm, Delaney, Timothy C (Timothy) wrote: > > From: Alex Martelli [mailto:aleaxit@yahoo.com] > > > > or reversed(enumerate(seq)) if the index IS needed. > > Hmm - wouldn't this give an iterator that returned two values - an iterable > for the seq, and an iterable for the indexes of seq? I must be missing something. enumerate(x) is an iterator with len(x) values, each a pair; why would reversing it somehow "transpose" it...? > I would think this would need to be: > > reversed(*enumerate(seq)) > > with the presumption being that reversed would reverse each parameter and > return them in lockstep. I'm not sure if reversed should take several parameters, but it if did this would be like calling: reversed( (0, x[0]), (1,x[1]), (2,x[2]) ) If it "reversed each parameter and returned them in lockstep" then I'd have x first and (0,1,2) second, no? Alex From guido at python.org Tue Nov 4 18:27:26 2003 From: guido at python.org (Guido van Rossum) Date: Tue Nov 4 18:27:38 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: Your message of "Tue, 04 Nov 2003 21:00:40 GMT." References: <002801c3a30c$def8fae0$6017c797@oemcomputer> <200311042031.hA4KV2B26709@12-236-54-216.client.attbi.com> Message-ID: <200311042327.hA4NRQD27067@12-236-54-216.client.attbi.com> > Arguing that irange() is too similar to range() and xrange() is > closer, but I'd say that irange is the *right* way to do it. [x]range > should be relegated to backward-compatibility tools, much like the > file xreadlines() method and the xreadlines module. Hm. There's a usage pattern that seems easy with [x]range() but not so easy with irange(): R = xrange(...) for x in R: ... for y in R: ... IMO, being able to say "for x in R" rather than having to remember the arguments to irange() and having to say "for x in irange(a, b, c)" is a big and useful advantage. IOW [x]range() returns a *sequence* which is more powerful than an iterator, because it can be iterated more than once. Now, the same could be accomplished with copyable iterators, but it is still more work: I = irange(...) R1, R1 = tee(I) for x in R1: ... for x in R2: ... > Raymond - are you dead set against an irange() function in itertools? > Assume for now that it's a simple version without a reverse argument. > > > But since (a) at least 60% of the examples are satisfied with > > something like irevrange(), and (b) having irevrange() in itertool > > is acceptable, my (c) conclusion is that reversed() doesn't need to > > be a builtin either. I didn't say it had to go into itertools! > > Raymond seems very protective of the concept of reversed() as a > builtin. I'm not saying that's wrong, but I *personally* haven't seen > enough evidence yet to be convinced either way. The i{rev}range() > issues seem to be getting caught up in this. Right. > My view: > > 1. I think a "plain" irange() would be useful to add into itertools. > In the (very) long term, it could replace [x]range, but that's less > of an issue to me. > 2. A way of getting a reversed {i,x}range() has some clear use cases. > This seems useful to add (although here, I'm going on evidence of > others' code - in my code I tend to loop over containers much more > often than over ranges of numbers). > 3. A general reversed() function seems theoretically useful, but the > concrete use cases seem fairly thin on the ground. I'm broadly in > favour, because I (possibly like Raymond) have a bias for clean, > general solutions. But I can see that "practicality beats purity" > may hold here. > > My proposals: > > 1. Add a plain irange() to itertools. > 2. IF the general reversed() is deemed too theoretical, add EITHER a > reverse argument to irange, or an irevrange to itertools. Both feel > to me a little iffy, but that's my generality bias again. > 3. IF the general reversed() is accepted (builtin or not) leave the > irange function in its simple form. Hm. reversed(irange(...)) can't work, so you'd have to have both. > > Sorry, I have to push back on that. We still need to contain the > > growth of the language, and that includes the set of builtins and (to > > a lesser extent) the standard library. You have to show that this is > > truly important enough to add to the builtins. Maybe you can propose > > to take away an existing builtin to make room *first*. > > xrange (in favour of itertools.irange())? :-) > > [Personally, I'm still not 100% sure I see Raymond's strong reluctance > to have reversed() in itertools, but as both are his babies, and he > clearly has a very definite vision for both, I don't feel that I want > to argue this one with him]. That part I understand. reversed() is a function of a sequence (something with __len__ and __getitem__ methods), not of an iterator, and as such it doesn't belong in itertools. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Nov 4 18:29:06 2003 From: guido at python.org (Guido van Rossum) Date: Tue Nov 4 18:29:35 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: Your message of "Tue, 04 Nov 2003 22:47:45 +0100." <200311042247.45386.aleaxit@yahoo.com> References: <002801c3a30c$def8fae0$6017c797@oemcomputer> <200311042031.hA4KV2B26709@12-236-54-216.client.attbi.com> <200311042247.45386.aleaxit@yahoo.com> Message-ID: <200311042329.hA4NT7727092@12-236-54-216.client.attbi.com> > iseq = irange(start, len(sq)) > if godown: iseq = reversed(iseq) But this wouldn't work, would it? irange() is an iterator, but reversed() only works for sequences (it refuses to secretly buffer the whole thing). --Guido van Rossum (home page: http://www.python.org/~guido/) From python at rcn.com Tue Nov 4 18:37:22 2003 From: python at rcn.com (Raymond Hettinger) Date: Tue Nov 4 18:37:30 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <200311042329.hA4NT7727092@12-236-54-216.client.attbi.com> Message-ID: <001701c3a32c$98d9b980$0aba2c81@oemcomputer> > > iseq = irange(start, len(sq)) > > if godown: iseq = reversed(iseq) > > But this wouldn't work, would it? irange() is an iterator, but > reversed() only works for sequences (it refuses to secretly buffer the > whole thing). It works fine with xrange though. Raymond Hettinger From guido at python.org Tue Nov 4 18:44:27 2003 From: guido at python.org (Guido van Rossum) Date: Tue Nov 4 18:44:34 2003 Subject: [Python-Dev] PEP 322: Reverse Iteration In-Reply-To: Your message of "Tue, 04 Nov 2003 23:04:00 +0100." <200311042304.00006.aleaxit@yahoo.com> References: <002801c3a30c$def8fae0$6017c797@oemcomputer> <200311042031.hA4KV2B26709@12-236-54-216.client.attbi.com> <200311042304.00006.aleaxit@yahoo.com> Message-ID: <200311042344.hA4NiRQ27142@12-236-54-216.client.attbi.com> > > option somehow. But since (a) at least 60% of the examples are > > satisfied with something like irevrange(), and (b) having irevrange() > > I'm not sure it's as high as that, depending on how strictly one wants > to define "satisfied". There are 6 bullets in PEP 322's "real world use cases" section. The first one is not helped by reversed(). Of the remaining 5, three are simple numeric ranges (heapq.heapify(), platform.dist_try_harder() and random.shuffle()). That's exactly 60%. :-) > for i, value in reversed(enumerate(listofnum)): Sorry, this doesn't work. enumerate() returns an iterator, reversed() requires a sequence. > > If you can prove it would be used as frequently as sum() you'd have a > > point. > > No, not as frequently as sum, but then this applies to many other > builtins. Well, they are already there, and we're considering removing some. I'd like to set the bar for *new* builtins fairly high. (You all know the joke how Aspirin would never have been approevd by the FDA as an over-the-counter drug if it was invented today.) --Guido van Rossum (home page: http://www.python.org/~guido/) From fincher.8 at osu.edu Tue Nov 4 19:57:33 2003 From: fincher.8 at osu.edu (Jeremy Fincher) Date: Tue Nov 4 18:59:42 2003 Subject: [Python-Dev] PEP 322: Reverse Iteration In-Reply-To: <200311042344.hA4NiRQ27142@12-236-54-216.client.attbi.com> References: <002801c3a30c$def8fae0$6017c797@oemcomputer> <200311042304.00006.aleaxit@yahoo.com> <200311042344.hA4NiRQ27142@12-236-54-216.client.attbi.com> Message-ID: <200311041957.33530.fincher.8@osu.edu> On Tuesday 04 November 2003 06:44 pm, Guido van Rossum wrote: > > for i, value in reversed(enumerate(listofnum)): > > Sorry, this doesn't work. enumerate() returns an iterator, reversed() > requires a sequence. I believe the assumption is that enumerate (as well as the proposed irange) would grow an __reversed__ method to handle just that usage. Jeremy From guido at python.org Tue Nov 4 19:07:40 2003 From: guido at python.org (Guido van Rossum) Date: Tue Nov 4 19:07:51 2003 Subject: [Python-Dev] PEP 322: Reverse Iteration In-Reply-To: Your message of "Tue, 04 Nov 2003 19:57:33 EST." <200311041957.33530.fincher.8@osu.edu> References: <002801c3a30c$def8fae0$6017c797@oemcomputer> <200311042304.00006.aleaxit@yahoo.com> <200311042344.hA4NiRQ27142@12-236-54-216.client.attbi.com> <200311041957.33530.fincher.8@osu.edu> Message-ID: <200311050007.hA507eo27246@12-236-54-216.client.attbi.com> > > > for i, value in reversed(enumerate(listofnum)): > > > > Sorry, this doesn't work. enumerate() returns an iterator, reversed() > > requires a sequence. > > I believe the assumption is that enumerate (as well as the proposed irange) > would grow an __reversed__ method to handle just that usage. Ah, so it is. Then the PEP's abstract is wrong: """ This proposal is to add a builtin function to support reverse iteration over sequences. """ Also, the PEP should enumerate (:-) which built-in types should be modified in this way, to give an impression of the enormity (or not) of the task. --Guido van Rossum (home page: http://www.python.org/~guido/) From pedronis at bluewin.ch Tue Nov 4 19:47:33 2003 From: pedronis at bluewin.ch (Samuele Pedroni) Date: Tue Nov 4 19:44:57 2003 Subject: [Python-Dev] PEP 322: Reverse Iteration In-Reply-To: <200311050007.hA507eo27246@12-236-54-216.client.attbi.com> References: <002801c3a30c$def8fae0$6017c797@oemcomputer> <200311042304.00006.aleaxit@yahoo.com> <200311042344.hA4NiRQ27142@12-236-54-216.client.attbi.com> <200311041957.33530.fincher.8@osu.edu> Message-ID: <5.2.1.1.0.20031105013141.02804e38@pop.bluewin.ch> At 16:07 04.11.2003 -0800, Guido van Rossum wrote: > > > > for i, value in reversed(enumerate(listofnum)): > > > > > > Sorry, this doesn't work. enumerate() returns an iterator, reversed() > > > requires a sequence. > > > > I believe the assumption is that enumerate (as well as the proposed > irange) > > would grow an __reversed__ method to handle just that usage. > >Ah, so it is. Then the PEP's abstract is wrong: > >""" >This proposal is to add a builtin function to support reverse >iteration over sequences. >""" > >Also, the PEP should enumerate (:-) which built-in types should be >modified in this way, to give an impression of the enormity (or not) >of the task. what is not clear to me is that the PEP is explicit about reversed() refusing general iterables and in particular infinite iterators, but then the combination reversed enumerate.__reversed__ would accept them or not?. Will enumerate implement __reversed__ in terms of keeping the enumerate argument around instead of just a iterator derived from it and reproducing then the reversed behavior: limits checks and implementation strategy on the original argument if/when __reversed__ is called? so for x in reversed(enumerate(itertools.count())): pass would throw an exception instead of not terminating, OTHERWISE with the strategy of consuming the iterator if x is a finite iterator but without __len__ then reversed(x) would not work but reversed(enumerate(x)) would. Further enumerate.__iter__ does not enable re-iteration, simply it does not return a fresh iterator but what about enumerate.__reversed__ ? regards. From python at rcn.com Tue Nov 4 19:49:06 2003 From: python at rcn.com (Raymond Hettinger) Date: Tue Nov 4 19:49:27 2003 Subject: FW: [Python-Dev] PEP 322: Reverse Iteration Message-ID: <000e01c3a336$9dc068e0$d0ac2c81@oemcomputer> > > I believe the assumption is that enumerate (as well as the proposed > irange) > > would grow an __reversed__ method to handle just that usage. Unfortunately, that idea didn't work out. The enumerate object does not hold the original iterable; instead, it only has the result of iter(iterable). Without having the iterable, I don't see a way for it to call iterable.__reversed__. The essential problem that at creation time, the enumerate object does know that it is going to be called by reversed(). No other sequence object has to have a __reversed__ method. Like its cousin, __iter__, some objects may be a performance boost from a custom iterator but none of them have to have it. Raymond Hettinger From tdelaney at avaya.com Tue Nov 4 20:08:59 2003 From: tdelaney at avaya.com (Delaney, Timothy C (Timothy)) Date: Tue Nov 4 20:09:06 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DEDF5E75@au3010avexu1.global.avaya.com> > From: Alex Martelli [mailto:aleaxit@yahoo.com] > > I must be missing something. enumerate(x) is an iterator with len(x) > values, each a pair; why would reversing it somehow "transpose" it...? No - you're not. Brain fart on my part :( Tim Delaney From guido at python.org Tue Nov 4 20:12:39 2003 From: guido at python.org (Guido van Rossum) Date: Tue Nov 4 20:12:48 2003 Subject: [Python-Dev] PEP 322: Reverse Iteration In-Reply-To: Your message of "Wed, 05 Nov 2003 01:47:33 +0100." <5.2.1.1.0.20031105013141.02804e38@pop.bluewin.ch> References: <002801c3a30c$def8fae0$6017c797@oemcomputer> <200311042304.00006.aleaxit@yahoo.com> <200311042344.hA4NiRQ27142@12-236-54-216.client.attbi.com> <200311041957.33530.fincher.8@osu.edu> <5.2.1.1.0.20031105013141.02804e38@pop.bluewin.ch> Message-ID: <200311050112.hA51Cd327356@12-236-54-216.client.attbi.com> [Guido] > >Ah, so it is. Then the PEP's abstract is wrong: > > > >""" > >This proposal is to add a builtin function to support reverse > >iteration over sequences. > >""" > > > >Also, the PEP should enumerate (:-) which built-in types should be > >modified in this way, to give an impression of the enormity (or not) > >of the task. [Samuele] > what is not clear to me is that the PEP is explicit about reversed() > refusing general iterables and in particular infinite iterators, but then > the combination reversed enumerate.__reversed__ would accept them or not?. > Will enumerate implement __reversed__ in terms of keeping the enumerate > argument around instead of just a iterator derived from it and reproducing > then the reversed behavior: limits checks and implementation strategy on > the original argument if/when __reversed__ is called? > > so > > for x in reversed(enumerate(itertools.count())): > pass > > would throw an exception instead of not terminating, OTHERWISE with > the strategy of consuming the iterator if x is a finite iterator but > without __len__ then > > reversed(x) would not work but > > reversed(enumerate(x)) would. > > Further enumerate.__iter__ does not enable re-iteration, simply it > does not return a fresh iterator but what about > enumerate.__reversed__ ? In private mail Raymond withdrew the suggestion that enumerate() implement __reversed__; I think Raymond won't mind if I quote him here: [Raymond] > Unfortunately, that idea didn't work out. The enumerate object does not > hold the original iterable; instead, it only has the result of > iter(iterable). Without having the iterable, I don't see a way for it > to call iterable.__reversed__. The essential problem that at creation > time, the enumerate object does know that it is going to be called by > reversed(). > > No other sequence object has to have a __reversed__ method. Like its > cousin, __iter__, some objects may be a performance boost from a custom > iterator but none of them have to have it. So we're back to square one: reversed(enumerate(X)) won't work, even if reversed(X) works. I'm not sure I even like the idea of reversed() looking for a __reversed__ method at all. I like the original intention best: reversed() is for reverse iteration over *sequences*. (See the first paragraph of the section "Rejected Alternatives" in the PEP.) Anyway, as Raymond predicted, the discussion is being distracted by side issues. I personally like the idea better of having a variant of xrange() that generates a numerical sequence backwards better. Or perhaps we should just get used to recognizing that [x]range(n-1, -1, -1) iterates over range(n) backwards... --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at electricrain.com Tue Nov 4 20:28:51 2003 From: greg at electricrain.com (Gregory P. Smith) Date: Tue Nov 4 20:28:56 2003 Subject: [Python-Dev] simple bsddb interface thread support 2.3.x vs 2.4 In-Reply-To: <1067949852.26825.3.camel@anthem> References: <200311030848.hA38mItM008890@localhost.localdomain> <200311030954.24191.aleaxit@yahoo.com> <20031104012310.GC17328@zot.electricrain.com> <200311040912.23213.aleaxit@yahoo.com> <1067949852.26825.3.camel@anthem> Message-ID: <20031105012851.GE17328@zot.electricrain.com> On Tue, Nov 04, 2003 at 07:44:12AM -0500, Barry Warsaw wrote: > On Tue, 2003-11-04 at 03:12, Alex Martelli wrote: > > > Generally, extending functionality (as opposed to: fixing bugs or clarifying > > docs) is not a goal for 2.3.* -- but I don't know if the fact that bsddb > > isn't thread-safe in 2.3 counts as "a bug", or rather as functionality > > deliberately kept limited, to avoid e.g such bugs as the one you've just > > removed, and other possibilities you mention: > > > > > - multithreaded bsddb use could deadlock depending on how it is used. > > > > I think that just having the 2.3.* docs explicitly mention the lack of > > thread-safety might then perhaps be better than backporting the changes. > > It's just the DB-API that's not thread-safe. The full blown BerkeleyDB > API (a.k.a. bsddb3) should be fine. > > It sure is tempting to claim that the lack of DB-API thread-safety for > BerkeleyDB is a bug and should be fixed for 2.3.*, but I think Greg > should make the final determination. If it isn't, then yes, the docs > need to clearly state that's the case. This was brought up before 2.3.2 was released. The docs already state this in a nice and obvious warning: http://www.python.org/doc/2.3.2/lib/module-bsddb.html My vote it to leave bsddb in 2.3.2 as it is and not try to port the thread support over from 2.4cvs. It is not ready. The bsddb module has never supported multithreaded use in any past version of python. If the simple bsddb/__init__.py interface can support it for 2.4 thats great. It should always be recommended that people use the full bsddb.db when threads are involved. If simple bsddb still has non-trivial to describe multithreaded deadlock issues by the time a 2.4 release draws near I'll suggest pulling it out. (before then i need to write a test case to prove that it does actually have these problems) -g From greg at electricrain.com Wed Nov 5 00:51:05 2003 From: greg at electricrain.com (Gregory P. Smith) Date: Wed Nov 5 00:51:09 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/bsddb __init__.py, 1.11, 1.12 In-Reply-To: <16294.30567.537151.106168@montanaro.dyndns.org> References: <16294.30567.537151.106168@montanaro.dyndns.org> Message-ID: <20031105055105.GG17328@zot.electricrain.com> On Mon, Nov 03, 2003 at 09:42:31AM -0600, Skip Montanaro wrote: > > greg> import UserDict > greg> class _iter_mixin(UserDict.DictMixin): > greg> def __iter__(self): > greg> try: > ... > > Should _iter_mixin inherit from dict, or is there a backward compatibility > issue? Simply changing UserDict.DictMixin to dict doesn't work. In order to act like a dictionary it depends on DictMixin's multi-level implementation of all of the dict methods using the lower-level primitives. From aleaxit at yahoo.com Wed Nov 5 02:41:46 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Wed Nov 5 02:41:55 2003 Subject: [Python-Dev] simple bsddb interface thread support 2.3.x vs 2.4 In-Reply-To: <20031105012851.GE17328@zot.electricrain.com> References: <200311030848.hA38mItM008890@localhost.localdomain> <1067949852.26825.3.camel@anthem> <20031105012851.GE17328@zot.electricrain.com> Message-ID: <200311050841.46399.aleaxit@yahoo.com> On Wednesday 05 November 2003 02:28 am, Gregory P. Smith wrote: ... > This was brought up before 2.3.2 was released. The docs already state > this in a nice and obvious warning: > > http://www.python.org/doc/2.3.2/lib/module-bsddb.html You are entirely right: indeed, it's documented with *exemplary* clarity. > My vote it to leave bsddb in 2.3.2 as it is and not try to port the > thread support over from 2.4cvs. It is not ready. Absolutely. The fully-documented limitation of 2.3.*'s bsddb interface wrt multi-threading should be left alone even if we felt somewhat certain about a new implementation: enhancing functionality at the risk of introducing bugs is _not_ what the maintenance branch is about. Knowing that the new implementation isn't fully mature just reinforces this. > The bsddb module has never supported multithreaded use in any past version > of python. If the simple bsddb/__init__.py interface can support it > for 2.4 thats great. It should always be recommended that people use > the full bsddb.db when threads are involved. OK. This sounds very wise to me. > If simple bsddb still has non-trivial to describe multithreaded deadlock > issues by the time a 2.4 release draws near I'll suggest pulling it out. > (before then i need to write a test case to prove that it does actually > have these problems) Again, very advisable! Alex From Paul.Moore at atosorigin.com Wed Nov 5 06:24:50 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Wed Nov 5 06:25:37 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration Message-ID: <16E1010E4581B049ABC51D4975CEDB8803060D23@UKDCX001.uk.int.atosorigin.com> From: Guido van Rossum [mailto:guido@python.org] >> 1. Add a plain irange() to itertools. >> 2. IF the general reversed() is deemed too theoretical, add EITHER a >> reverse argument to irange, or an irevrange to itertools. Both feel >> to me a little iffy, but that's my generality bias again. >> 3. IF the general reversed() is accepted (builtin or not) leave the >> irange function in its simple form. > Hm. reversed(irange(...)) can't work, so you'd have to have both. Raymond is proposing (in the PEP) a custom reverse via a __reversed__ special method. I'm assuming that irange() [and enumerate(), and possibly others] would need such a method, in order to cover just this case. >From my POV, having reversed(enumerate()) work is essential. Also for irange() if that is accepted. I've not looked through the other itertools, but a trawl through those to ensure any that need it have custom reverse methods would also be sensible. Paul. From anthony at interlink.com.au Wed Nov 5 06:23:16 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Wed Nov 5 06:26:21 2003 Subject: [Python-Dev] test warnings for "%x"%id(foo) on 2.3 branch In-Reply-To: Message-ID: <200311051123.hA5BNGGc009525@localhost.localdomain> >>> Martin v. =?iso-8859-15?q?L=F6wis?= wrote > What system is this on? I find it surprising that the id is < 0: on a > 32-bit machine, this should only happen if you allocate more than 2GB. Redhat 10 beta3 (Fedora). I'm not entirely sure why it's generating these. Using current CVS python (although it also complains when building a 2.3.2 on this platform, but a 2.3.2 compiled on RH9 is fine. Python 2.3.2+ (#1, Nov 5 2003, 00:54:02) [GCC 3.3.1 20030930 (Red Hat Linux 3.3.1-6)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> id('sdkjhfdkfhsdkjfhsdkjfhdskf') -1083363920 It seems most things have a very large id now: >>> class a: pass ... >>> >>> print a() <__main__.a instance at 0xbf6dad4c> >>> print a() <__main__.a instance at 0xbf6dac2c> >>> print a() <__main__.a instance at 0xbf6dad0c> >>> print a() <__main__.a instance at 0xbf6dabcc> I wonder if it's some sort of "randomly jumble around the address space to prevent stack-smashing" thing? I seem to recall something about Position Independant Execution in the release notes. This version of RH will be the one released in the next week or so. > I'd reformulate them as > > "%x" % (id(o) & 0xffffffffL) > Of course, you have to replace 0xffffffffL with (unsigned)-1 of the > system (i.e. 2l*sys.maxint+1). Hm. "%x" % (id(o) & 2L*sys.maxint+1) is considerably less obvious that "%x"%id(o) > I wonder whether creating a function > sys.unsigned(id(o)) > would be appropriate, which returns its arguments for positive > numbers, and PyLong_FromUnsignedLong((unsigned)arg) otherwise. Possibly. I'm going to have to make the above patch to the 23 branch in any case - warnings from the standard test suite are bad. Would a different % format code be another option? Anthony -- Anthony Baxter It's never too late to have a happy childhood. From anthony at interlink.com.au Wed Nov 5 06:28:17 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Wed Nov 5 06:31:13 2003 Subject: [Python-Dev] test warnings for "%x"%id(foo) on 2.3 branch Message-ID: <200311051128.hA5BSHaG009610@localhost.localdomain> >>> Anthony Baxter wrote > Hm. "%x" % (id(o) & 2L*sys.maxint+1) > is considerably less obvious that "%x"%id(o) The best I can come up with at this moment using the 'struct' module is ''.join(['%02x'%ord(x) for x in struct.pack('>i', id(o))]), which is also pretty grotesque. Thinking about it further, the better fix might be to replace the test code that looks for an exact match with a regex-based match instead... Anthony -- Anthony Baxter It's never too late to have a happy childhood. From python at rcn.com Wed Nov 5 06:52:11 2003 From: python at rcn.com (Raymond Hettinger) Date: Wed Nov 5 06:52:19 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <200311042247.45386.aleaxit@yahoo.com> Message-ID: <001c01c3a393$3f87f2e0$e841fea9@oemcomputer> [Alex] > I'm slowly coming to accept it -- it's sure way more appropriate as a > built-in than many that currently crowd the builtins namespace. [Paul Moore] > > 3. A general reversed() function seems theoretically useful, but the > > concrete use cases seem fairly thin on the ground. I'm broadly in > > favour, because I (possibly like Raymond) have a bias for clean, > > general solutions. But I can see that "practicality beats purity" > > may hold here. [Alex] > Funny, I originally felt queasy (about it being a built-in only) for > "purity" > about the overcrowded builtins namespace. I'm seeing enough use > cases (even if irange DID grow a wonderful reverse= optional arg...) > that practicality is gradually winning me over. I.e., practicality beats > purity is what is winning me over, while to you it suggests dampening > your "broadly in favour"... we both mention iterating over sequences > more than over indices, but to me that's a suggestion that reversed > has a place . . . > You have a point -- Raymond definitely HAS an overall vision on > iterators &c and he's deserved lots of listening-to even though we > can't quite see some specific point. It appears that Alex has been won over to supporting reversed() as a builtin. Among the comp.lang.python crowd, nearly everyone supported some form of the PEP (with varying preferences on the name or where to put it). The community participation rate was high with about 120 posts across four threads contributing to hammering out the current version of the pep. Is there anything else that needs to be done in the way of research, voting, or cheerleading for pep to be accepted? Raymond From python at rcn.com Wed Nov 5 07:29:17 2003 From: python at rcn.com (Raymond Hettinger) Date: Wed Nov 5 07:29:25 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <16E1010E4581B049ABC51D4975CEDB8803060D23@UKDCX001.uk.int.atosorigin.com> Message-ID: <001e01c3a398$6e57a520$e841fea9@oemcomputer> > Raymond is proposing (in the PEP) a custom reverse via a __reversed__ > special method. That was requested by a number of contributors on comp.lang.python. It's purpose is to allow user's to add reverse iteration support to objects that otherwise only offer forward iteration but not sequence access. The custom reversed method is not an essential part of the proposal. It's just a hook for someone who might need it. > I'm assuming that irange() [and enumerate(), and possibly > others] would need such a method, in order to cover just this case. Not really. When you go to write the code, it becomes clear that it doesn't apply to enumerate or the other itertools. The issue is that the iterator object holds only the result of iter(iterable) and is in no position to re-probe the underlying iterable to see if it supports reverse iteration. The iterator object has no way of knowing in advance that it is going to be called by reversed(). So, I'm not proposing to add __reversed__ to any existing python objects. It may make sense for xrange, but that is an efficiency issue not an API issue (xrange already works with reversed() without adding a custom method). Raymond From fincher.8 at osu.edu Wed Nov 5 08:33:37 2003 From: fincher.8 at osu.edu (Jeremy Fincher) Date: Wed Nov 5 07:35:46 2003 Subject: [Python-Dev] PEP 322: Reverse Iteration In-Reply-To: <200311050112.hA51Cd327356@12-236-54-216.client.attbi.com> References: <002801c3a30c$def8fae0$6017c797@oemcomputer> <5.2.1.1.0.20031105013141.02804e38@pop.bluewin.ch> <200311050112.hA51Cd327356@12-236-54-216.client.attbi.com> Message-ID: <200311050833.37529.fincher.8@osu.edu> On Tuesday 04 November 2003 08:12 pm, Guido van Rossum wrote: > In private mail Raymond withdrew the suggestion that enumerate() > implement __reversed__; I think Raymond won't mind if I quote him here: > > [Raymond] > > > Unfortunately, that idea didn't work out. The enumerate object does not > > hold the original iterable; instead, it only has the result of > > iter(iterable). Without having the iterable, I don't see a way for it > > to call iterable.__reversed__. The essential problem that at creation > > time, the enumerate object does know that it is going to be called by > > reversed(). I had always assumed that enumerate.__reversed__ would attempt to call a reversed iterator on the sequence. Since enumerate is only sensibly used on sequences (which are guaranteed to provide a reverse iterator) it could never fail in sensible cases (unless there's some usage of enumerate on non-sequences that I'm missing). > I'm not sure I even like the idea of reversed() looking for a > __reversed__ method at all. I like the original intention best: > reversed() is for reverse iteration over *sequences*. (See the first > paragraph of the section "Rejected Alternatives" in the PEP.) I think the search for the __reversed__ method is the meat of the proposal; I can define for myself a simple two-line generator that iterates in reverse over sequences. What I need the language to define for me is a protocol for iterating over objects in reverse and for providing users of my own classes with the ability to iterate over them in reverse in a standard way. If this proposal could be satisfied by the simple definition: def reversed(seq): for i in xrange(len(seq)-1, -1, -1): yield seq[i] I wouldn't be for it. The reason I'm +1 is because I want a standard protocol for iterating in reverse over objects. Jeremy From python at rcn.com Wed Nov 5 08:03:31 2003 From: python at rcn.com (Raymond Hettinger) Date: Wed Nov 5 08:03:40 2003 Subject: [Python-Dev] Deprecating obsolete builtins In-Reply-To: <20031104211119.GS7212@epoch.metaslash.com> Message-ID: <002301c3a39d$36d00020$e841fea9@oemcomputer> [Neal Norwitz] > For 2.4 I'd suggest we officially deprecate: apply, coerce, intern. +1 Raymond From mwh at python.net Wed Nov 5 08:48:53 2003 From: mwh at python.net (Michael Hudson) Date: Wed Nov 5 08:48:58 2003 Subject: [Python-Dev] Deprecating obsolete builtins In-Reply-To: <002301c3a39d$36d00020$e841fea9@oemcomputer> (Raymond Hettinger's message of "Wed, 5 Nov 2003 08:03:31 -0500") References: <002301c3a39d$36d00020$e841fea9@oemcomputer> Message-ID: <2m7k2f0vui.fsf@starship.python.net> "Raymond Hettinger" writes: > [Neal Norwitz] >> For 2.4 I'd suggest we officially deprecate: apply, coerce, intern. > > +1 I think apply is probably widely enough used that this is too strong. It could be a right royal pain in the arse if you wanted to have code that still ran in 1.5.2. I realize that this poses other problems, but I don't feel we should be going out of our way to make it harder. not-a-fan-of-churn-ly y'rs mwh -- (Unfortunately, while you get Tom Baker saying "then we were attacked by monsters", he doesn't flash and make "neeeeooww-sploot" noises.) -- Gareth Marlow, ucam.chat, from Owen Dunn's review of the year From Paul.Moore at atosorigin.com Wed Nov 5 08:50:47 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Wed Nov 5 08:51:33 2003 Subject: [Python-Dev] PEP 322: Reverse Iteration Message-ID: <16E1010E4581B049ABC51D4975CEDB8803060D24@UKDCX001.uk.int.atosorigin.com> From: Jeremy Fincher [mailto:fincher.8@osu.edu] > If this proposal could be satisfied by the simple definition: > > def reversed(seq): > for i in xrange(len(seq)-1, -1, -1): > yield seq[i] > > I wouldn't be for it. The reason I'm +1 is because I want > a standard protocol for iterating in reverse over objects. The more I think about it, the less I see the need for reversed(). But I'm having a really difficult time articulating why. I don't see enough use cases for something which just reverses sequences, as above. I tend to loop over concrete sequences less and less these days, using iterators, generators, enumerate, itertools etc, far more. The simple reversed() above doesn't help at all there. OK, reversed([x]range) is useful, but as soon as an iterator-based irange existed, I'd use that for "forward" loops, and be most upset that reversed(irange) didn't work... Whenever I try to play with writing a reversed() which is more general than the code above, I get stuck because *something* needs reversing, but it's virtually never a sequence! So far, I've needed to reverse: itertools.count() itertools.zip() enumerate() But this is all fairly incestuous - all I'm proving is that *if* you need reversed() on something other than a sequence, you can't do it without help from something (the object itself, or something else). But the cases *I* care about have been pre-existing Python objects, which Raymond is not proposing to extend in that way! (I can see that having the __reversed__ protocol may help with user-defined objects, I just don't have such a need myself). I'm tending to come down in favour of just having a simple "generate numbers in reverse" function (whether that is irange(..., reverse=True) or irevrange, or something else). Like Guido, I think that covers most real cases. Especially in combination with itertools - reversed(seq) <===> imap(seq.__getitem__, irevrange(len(seq))) Hmm, that reads better with irevrange. Looks like Guido's judgement is right again... Actually, itertools.count() looks very much like it's relevant here. It has a start argument (defaulting to 0) but no stop or step. Maybe we should be extending this, rather than inventing a new itertool. count(start=0, end=, step=1, reverse=False) This adds a *lot* of generality to count. Or how about itertools.count() as above, and itertools.countdown() as a reversed version? OK. I think I've changed to -0 on PEP 322, and +1 on having irange and irevrange (or an extended count and countdown) in itertools. Paul. From aleaxit at yahoo.com Wed Nov 5 09:45:11 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Wed Nov 5 09:45:20 2003 Subject: [Python-Dev] Deprecating obsolete builtins In-Reply-To: <2m7k2f0vui.fsf@starship.python.net> References: <002301c3a39d$36d00020$e841fea9@oemcomputer> <2m7k2f0vui.fsf@starship.python.net> Message-ID: <200311051545.11246.aleaxit@yahoo.com> On Wednesday 05 November 2003 02:48 pm, Michael Hudson wrote: > "Raymond Hettinger" writes: > > [Neal Norwitz] > > > >> For 2.4 I'd suggest we officially deprecate: apply, coerce, intern. > > > > +1 > > I think apply is probably widely enough used that this is too strong. > > It could be a right royal pain in the arse if you wanted to have code > that still ran in 1.5.2. I realize that this poses other problems, > but I don't feel we should be going out of our way to make it harder. Removing _any_ built-in that was around in 1.5.2 will pose similar problems. How hard can it be, in Python source that needs to run on both 1.5.2 and 2.5, to, e.g.: try: import legacy_25x_152 except ImportError: pass where the "legacy module" would inject apply (etc) in builtins? (In 2.4, you'd "just" need to turn off deprecation warnings, which in such a stretched case as 1.5-to-2.4 you're surely doing anyway...). Guido has specifically asked for built-ins that could be deprecated. It doesn't seem to me that asking for deprecation warnings to be turned off, or a "legacy module" to be conditionally imported, is "going out of our way to make it harder" to have code running all the way from 1.5 to 2.5 -- if such a feat currently requires 99 units of effort it MAY move all the way to 100 this way, but I doubt the relative augmentation of effort is even as high as that. Alex From pje at telecommunity.com Wed Nov 5 09:54:43 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Nov 5 09:53:40 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <001e01c3a398$6e57a520$e841fea9@oemcomputer> References: <16E1010E4581B049ABC51D4975CEDB8803060D23@UKDCX001.uk.int.atosorigin.com> Message-ID: <5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com> At 07:29 AM 11/5/03 -0500, Raymond Hettinger wrote: >Not really. When you go to write the code, it becomes clear that it >doesn't apply to enumerate or the other itertools. The issue is that >the iterator object holds only the result of iter(iterable) and is in no >position to re-probe the underlying iterable to see if it supports >reverse iteration. The iterator object has no way of knowing in advance >that it is going to be called by reversed(). Why not change enumerate() to return an iterable, rather than an iterator? Then its __reversed__ method could attempt to delegate to the underlying iterable. Is it likely that anyone relies on enumerate() being an iterator, rather than an iterable? From guido at python.org Wed Nov 5 09:58:32 2003 From: guido at python.org (Guido van Rossum) Date: Wed Nov 5 09:58:48 2003 Subject: [Python-Dev] test warnings for "%x"%id(foo) on 2.3 branch In-Reply-To: Your message of "Wed, 05 Nov 2003 22:23:16 +1100." <200311051123.hA5BNGGc009525@localhost.localdomain> References: <200311051123.hA5BNGGc009525@localhost.localdomain> Message-ID: <200311051458.hA5EwWc29153@12-236-54-216.client.attbi.com> > > I'd reformulate them as > > > > "%x" % (id(o) & 0xffffffffL) > > Of course, you have to replace 0xffffffffL with (unsigned)-1 of the > > system (i.e. 2l*sys.maxint+1). > > Hm. "%x" % (id(o) & 2L*sys.maxint+1) > > is considerably less obvious that "%x"%id(o) > > > I wonder whether creating a function > > sys.unsigned(id(o)) > > would be appropriate, which returns its arguments for positive > > numbers, and PyLong_FromUnsignedLong((unsigned)arg) otherwise. > > Possibly. I'm going to have to make the above patch to the 23 branch > in any case - warnings from the standard test suite are bad. Would a > different % format code be another option? This warning will go away in 2.4 again, where %x with a negative int will return a hex number with a minus sign. So I'd be against introducing a new format code. I've forgotten in what code you found this, but the sys.maxint solution sounds like your best bet. In 2.4 we can also make id() return a long when the int value would be negative; I don't want to do that in 2.3 since changing the return type and value of a builtin in a minor release seems a compatibility liability -- but in 2.4 the difference between int and long will be wiped out even more than it already is, so it should be fine there. --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh at python.net Wed Nov 5 10:02:07 2003 From: mwh at python.net (Michael Hudson) Date: Wed Nov 5 10:02:12 2003 Subject: [Python-Dev] Deprecating obsolete builtins In-Reply-To: <200311051545.11246.aleaxit@yahoo.com> (Alex Martelli's message of "Wed, 5 Nov 2003 15:45:11 +0100") References: <002301c3a39d$36d00020$e841fea9@oemcomputer> <2m7k2f0vui.fsf@starship.python.net> <200311051545.11246.aleaxit@yahoo.com> Message-ID: <2mvfpyzwnk.fsf@starship.python.net> Alex Martelli writes: > On Wednesday 05 November 2003 02:48 pm, Michael Hudson wrote: >> "Raymond Hettinger" writes: >> > [Neal Norwitz] >> > >> >> For 2.4 I'd suggest we officially deprecate: apply, coerce, intern. >> > >> > +1 >> >> I think apply is probably widely enough used that this is too strong. >> >> It could be a right royal pain in the arse if you wanted to have code >> that still ran in 1.5.2. I realize that this poses other problems, >> but I don't feel we should be going out of our way to make it harder. > > Removing _any_ built-in that was around in 1.5.2 will pose similar > problems. Well, yeah, but I contend doing it to, say, coerce would cause less grief than apply. > How hard can it be, in Python source that needs to run on both 1.5.2 > and 2.5, to, e.g.: > > try: import legacy_25x_152 > except ImportError: pass > > where the "legacy module" would inject apply (etc) in builtins? (In > 2.4, you'd "just" need to turn off deprecation warnings, which in > such a stretched case as 1.5-to-2.4 you're surely doing anyway...). Yeah, I guess for apply that is no great stretch. > Guido has specifically asked for built-ins that could be deprecated. I know, but maybe I think he shouldn't have :-) ----- There's always going to be a tension between wanting to keep backwards compatibility and making the Python of tomorrow as perfect as possible. To me, leaving the builtins a little it cluttered just isn't that painful. And perhaps talking about people trying to keep code running on 1.5.2 and 2.4 wasn't a good example; I have more sympathy for people who are trying to upgrade the Python they use. Each little obstacle means that they are that little bit more likely to just throw their hands up in the air and keep on using 1.5.2 or 2.1 -- and that would be a Bad Thing. Cheers, mwh -- That one is easily explained away as massively intricate conspiracy, though. -- Chris Klein, alt.sysadmin.recovery From guido at python.org Wed Nov 5 10:02:55 2003 From: guido at python.org (Guido van Rossum) Date: Wed Nov 5 10:03:12 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: Your message of "Wed, 05 Nov 2003 06:52:11 EST." <001c01c3a393$3f87f2e0$e841fea9@oemcomputer> References: <001c01c3a393$3f87f2e0$e841fea9@oemcomputer> Message-ID: <200311051502.hA5F2tw29174@12-236-54-216.client.attbi.com> > Among the comp.lang.python crowd, nearly everyone supported some form of > the PEP (with varying preferences on the name or where to put it). The > community participation rate was high with about 120 posts across four > threads contributing to hammering out the current version of the pep. How many participants in those 120 posts? (I recall a thread where one individual posted 100 messages. :-) > Is there anything else that needs to be done in the way of research, > voting, or cheerleading for pep to be accepted? Yes. I'm getting cold feet about __reversed__. Some folks seem to think that reversed() can be made to work on many iterators by having the iterator supply __reversed__; I think this is asking for trouble (e.g. you already pointed out why it couldn't be done for enumerate()). I also still think that a reversed [x]range() would give us a bigger bang for the buck -- less bang, but also a lot less bucks. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From pedronis at bluewin.ch Wed Nov 5 10:06:18 2003 From: pedronis at bluewin.ch (Samuele Pedroni) Date: Wed Nov 5 10:03:41 2003 Subject: [Python-Dev] PEP 322: Reverse Iteration In-Reply-To: <16E1010E4581B049ABC51D4975CEDB8803060D24@UKDCX001.uk.int.a tosorigin.com> Message-ID: <5.2.1.1.0.20031105153551.028d4300@pop.bluewin.ch> At 13:50 05.11.2003 +0000, Moore, Paul wrote: >From: Jeremy Fincher [mailto:fincher.8@osu.edu] > > If this proposal could be satisfied by the simple definition: > > > > def reversed(seq): > > for i in xrange(len(seq)-1, -1, -1): > > yield seq[i] > > > > I wouldn't be for it. The reason I'm +1 is because I want > > a standard protocol for iterating in reverse over objects. > >The more I think about it, the less I see the need for reversed(). But I'm >having a really difficult time articulating why. > >I don't see enough use cases for something which just reverses sequences, >as above. I tend to loop over concrete sequences less and less these days, >using iterators, generators, enumerate, itertools etc, far more. The simple >reversed() above doesn't help at all there. OK, reversed([x]range) is useful, >but as soon as an iterator-based irange existed, I'd use that for "forward" >loops, and be most upset that reversed(irange) didn't work... > >Whenever I try to play with writing a reversed() which is more general than >the code above, I get stuck because *something* needs reversing, but it's >virtually never a sequence! > >So far, I've needed to reverse: > > itertools.count() > itertools.zip() > enumerate() > >But this is all fairly incestuous - all I'm proving is that *if* you need >reversed() on something other than a sequence, you can't do it without >help from something (the object itself, or something else). But the cases >*I* care about have been pre-existing Python objects, which Raymond is not >proposing to extend in that way! (I can see that having the __reversed__ >protocol may help with user-defined objects, I just don't have such a need >myself). 1) the problem is that reversed want to be simple and sweet, but general reverse iteration is not that simple. 2) itertools.count/ izip and enumerate produce iterators forgetting the original iterable so while nice reversed(count(9)) reversed(enumerate([1,2,3])) would require rather not straightforward mechanisms under the hood. Either one write and introduce revenumerate , revcount revizip OR one could make reversed also a functional allowing not only for reversed(it) # it implements __reversed__ or it's a sequence but also reversed(count,9) reversed(enumerate,[1,2,3]) reversed(izip,[1,2],[1,3]) [ the implementation would use some table to register the impl of all those behaviors], with possible behaviors: def rev_count(n): while True: yield n n -= 1 def rev_izip(*iterables): iterables = map(reversed, iterables) while True: result = [i.next() for i in iterables] yield tuple(result) def rev_enumerate(it): if hasattr(it, '__reversed__'): index = -1 # arbitrary but not totally meaningless :) for elem x.__reversed__(): yield (index,x) index -= -1 if hasattr(x, 'keys'): raise ValueError("mappings do not support reverse iteration") i = len(x) while i > 0: i -= 1 yield (i,x[i]) rev_behavior = { enumerate: rev_enumerate, ... } def reversed(*args): if len(args)>1: func = args[1] args = args[1:] rev_func = rev_behavior.get(func,None) if rev_func: for x in rev_func(args): yield x else: ... error else: ... Whether this is for general consumption is another matter. regards. From guido at python.org Wed Nov 5 10:06:09 2003 From: guido at python.org (Guido van Rossum) Date: Wed Nov 5 10:06:44 2003 Subject: [Python-Dev] Deprecating obsolete builtins In-Reply-To: Your message of "Wed, 05 Nov 2003 08:03:31 EST." <002301c3a39d$36d00020$e841fea9@oemcomputer> References: <002301c3a39d$36d00020$e841fea9@oemcomputer> Message-ID: <200311051506.hA5F69u29213@12-236-54-216.client.attbi.com> > [Neal Norwitz] > > For 2.4 I'd suggest we officially deprecate: apply, coerce, intern. > > +1 Isn't apply() already deprecated? Otherwise +1. --Guido van Rossum (home page: http://www.python.org/~guido/) From FBatista at uniFON.com.ar Wed Nov 5 10:09:23 2003 From: FBatista at uniFON.com.ar (Batista, Facundo) Date: Wed Nov 5 10:10:22 2003 Subject: [Python-Dev] Deprecating obsolete builtins Message-ID: #- > [Neal Norwitz] #- > > For 2.4 I'd suggest we officially deprecate: apply, #- coerce, intern. #- > +1 . Facundo From pedronis at bluewin.ch Wed Nov 5 10:13:49 2003 From: pedronis at bluewin.ch (Samuele Pedroni) Date: Wed Nov 5 10:11:09 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <200311051502.hA5F2tw29174@12-236-54-216.client.attbi.com> References: <001c01c3a393$3f87f2e0$e841fea9@oemcomputer> Message-ID: <5.2.1.1.0.20031105161029.028dd278@pop.bluewin.ch> At 07:02 05.11.2003 -0800, Guido van Rossum wrote: > > Among the comp.lang.python crowd, nearly everyone supported some form of > > the PEP (with varying preferences on the name or where to put it). The > > community participation rate was high with about 120 posts across four > > threads contributing to hammering out the current version of the pep. > >How many participants in those 120 posts? (I recall a thread where >one individual posted 100 messages. :-) > > > Is there anything else that needs to be done in the way of research, > > voting, or cheerleading for pep to be accepted? > >Yes. I'm getting cold feet about __reversed__. Some folks seem to >think that reversed() can be made to work on many iterators by having >the iterator supply __reversed__; I think this is asking for trouble >(e.g. you already pointed out why it couldn't be done for >enumerate()). yes, but __reversed__ is meanigful for iterables not iterators I had the impression that reversed(.) is related to iter(.) for reverse iteration and __reversed__ would correspond to __iter__ also for that, but this is meanigful for iterables that are not already iterators. For iterators __iter__ is typically the identity, while __reversed__ is not really applicable which probably means that reverse iteration is more complicated that forward iteration . regards. From guido at python.org Wed Nov 5 10:23:09 2003 From: guido at python.org (Guido van Rossum) Date: Wed Nov 5 10:23:18 2003 Subject: [Python-Dev] Deprecating obsolete builtins In-Reply-To: Your message of "Wed, 05 Nov 2003 15:45:11 +0100." <200311051545.11246.aleaxit@yahoo.com> References: <002301c3a39d$36d00020$e841fea9@oemcomputer> <2m7k2f0vui.fsf@starship.python.net> <200311051545.11246.aleaxit@yahoo.com> Message-ID: <200311051523.hA5FN9r29272@12-236-54-216.client.attbi.com> > Removing _any_ built-in that was around in 1.5.2 will pose similar > problems. Only proportional to the likelihood that it was used in 1.5.2, which is proportional to how useful it is. intern(): extremely unlikely (nobody knows what it's for); coerce(): rather unlikely (too advanced); apply(): very likely. > How hard can it be, in Python source that needs to run > on both 1.5.2 and 2.5, to, e.g.: > > try: import legacy_25x_152 > except ImportError: pass > > where the "legacy module" would inject apply (etc) in builtins? (In > 2.4, you'd "just" need to turn off deprecation warnings, which in > such a stretched case as 1.5-to-2.4 you're surely doing anyway...). The problem (and real cost, for some!) is that people who write code that should work for 1.5.2 and later end up having to do more maintenance on it for each new Python version they support. Maybe we should just be resigned to having a bunch of unwanted builtins until 3.0 comes along (where I'm okay with all bets being off). > Guido has specifically asked for built-ins that could be deprecated. > > It doesn't seem to me that asking for deprecation warnings to be > turned off, or a "legacy module" to be conditionally imported, is > "going out of our way to make it harder" to have code running all > the way from 1.5 to 2.5 -- if such a feat currently requires 99 units > of effort it MAY move all the way to 100 this way, but I doubt the > relative augmentation of effort is even as high as that. (a) It's always better to be able to use a common subset than to have to resort to version checking or version-specific hacks. (We've all learned this in the context of platform independence; I think the same applies to version independence.) (b) Since 2.4 and 2.5 don't yet exist (2.4 is at best a moving target), someone wanting to use a cross-version subset *now* has to settle for targeting and testing with 1.5.2 through 2.3. Forcing these folks to do a new release for 2.4 or 2.5 is not increasing their work from 99 to 100 units, it's increasing the work they have to do in the future from 0 to 1 (on an arbitrary scale :-). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Nov 5 10:28:01 2003 From: guido at python.org (Guido van Rossum) Date: Wed Nov 5 10:28:07 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: Your message of "Wed, 05 Nov 2003 09:54:43 EST." <5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com> References: <16E1010E4581B049ABC51D4975CEDB8803060D23@UKDCX001.uk.int.atosorigin.com> <5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com> Message-ID: <200311051528.hA5FS1l29291@12-236-54-216.client.attbi.com> > Why not change enumerate() to return an iterable, rather than an > iterator? Then its __reversed__ method could attempt to delegate to > the underlying iterable. Is it likely that anyone relies on > enumerate() being an iterator, rather than an iterable? I find it rather elegant to use enumerate() on a file to generate line numbers and lines together (adding 1 to the index to produce a more conventional line number). What's more elegant than for i, line in enumerate(f): print i+1, line, to print a file with line numbers??? I've used this in throwaway code at least, and would hate to lose it. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Nov 5 10:33:24 2003 From: guido at python.org (Guido van Rossum) Date: Wed Nov 5 10:33:32 2003 Subject: [Python-Dev] PEP 322: Reverse Iteration In-Reply-To: Your message of "Wed, 05 Nov 2003 08:33:37 EST." <200311050833.37529.fincher.8@osu.edu> References: <002801c3a30c$def8fae0$6017c797@oemcomputer> <5.2.1.1.0.20031105013141.02804e38@pop.bluewin.ch> <200311050112.hA51Cd327356@12-236-54-216.client.attbi.com> <200311050833.37529.fincher.8@osu.edu> Message-ID: <200311051533.hA5FXPX29334@12-236-54-216.client.attbi.com> > I think the search for the __reversed__ method is the meat of the > proposal; I can define for myself a simple two-line generator that > iterates in reverse over sequences. What I need the language to > define for me is a protocol for iterating over objects in reverse > and for providing users of my own classes with the ability to > iterate over them in reverse in a standard way. > > If this proposal could be satisfied by the simple definition: > > def reversed(seq): > for i in xrange(len(seq)-1, -1, -1): > yield seq[i] > > I wouldn't be for it. The reason I'm +1 is because I want a > standard protocol for iterating in reverse over objects. I would be *against* such a protocol. It would end up complicating almost everything that defines __iter__, for a very questionable pay-off (reverse iteration isn't that common except for some special cases). The PEP got as far as it is by focusing on simplicity and sequences. It is rapidly losing its innocence. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From python at rcn.com Wed Nov 5 10:55:10 2003 From: python at rcn.com (Raymond Hettinger) Date: Wed Nov 5 10:55:26 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <200311051502.hA5F2tw29174@12-236-54-216.client.attbi.com> Message-ID: <003301c3a3b5$3161b1c0$e841fea9@oemcomputer> > > Among the comp.lang.python crowd, nearly everyone supported some form of > > the PEP (with varying preferences on the name or where to put it). The > > community participation rate was high with about 120 posts across four > > threads contributing to hammering out the current version of the pep. > > How many participants in those 120 posts? (I recall a thread where > one individual posted 100 messages. :-) There were 31 participants. {'Mel Wilson': 1, 'Tom Anderson': 1, 'Dave Benjamin': 2, 'Stephen Horne': 14, 'David Abrahams': 12, 'David Mertz': 1, 'Ron Adam': 1, 'Terry Reedy': 4, 'Sean Ross': 6, 'Bengt Richter': 1, 'Andrew Dalke': 3, 'Michele Simionato': 2, 'Bernhard Herzog': 1, 'Raymond Hettinger': 15, 'David C': 1, 'Paul Moore': 3, 'Dang Griffith': 1, 'Roy Smith': 1, 'Alex Martelli': 14, 'Patrick Maupin': 1, 'Jeremy Fincher': 3, 'Steve Holden': 1, 'Robert Brewer': 1, 'Chad Netzer': 1, 'Werner Schiendl': 6, 'Peter Otten': 3, 'David Eppstein': 3, 'Fredrik Lundh': 1, 'Lulu of': 2, 'Michael Hudson': 1, 'John Roth': 4} > > Is there anything else that needs to be done in the way of research, > > voting, or cheerleading for pep to be accepted? > > Yes. I'm getting cold feet about __reversed__. What can I do to warm those feet? I spent a month making this proposal as perfect as possible, gathering support for it, trying each proposed modification, and enduring what feels like hazing. Still, there is a little bit of energy left if that what it takes to put the ball over the goal line. Getting this far hasn't been easy. Python people are quick to express negativity on just about anything and they take great pleasure is exploring every weird variant they can think of. > I also still think that a reversed [x]range() would give us a bigger > bang for the buck I'm not willing to go that route: * Several posters gave negative feedback on that option. * It doesn't address the ugly and inefficient s[::-1] approach which I really do not want to become *the* idiom. * Providing yet another variant of xrange() is a step backwards IMO. * It is not an extensible protocol like the reversed() / __reversed__ pair. * Except for the simple case of revrange(n), the multiple argument forms are not a simplification (IMO) and are still difficult to visually verify (try the example from random.shuffle). * A unique benefit to python is the ability to loop over containers without using indices. The current proposal supports that idea. The revrange() approach doesn't. Raymond Hettinger From aleaxit at yahoo.com Wed Nov 5 10:56:21 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Wed Nov 5 10:56:30 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <200311051502.hA5F2tw29174@12-236-54-216.client.attbi.com> References: <001c01c3a393$3f87f2e0$e841fea9@oemcomputer> <200311051502.hA5F2tw29174@12-236-54-216.client.attbi.com> Message-ID: <200311051656.21014.aleaxit@yahoo.com> On Wednesday 05 November 2003 04:02 pm, Guido van Rossum wrote: > > Among the comp.lang.python crowd, nearly everyone supported some form of > > the PEP (with varying preferences on the name or where to put it). The > > community participation rate was high with about 120 posts across four > > threads contributing to hammering out the current version of the pep. > > How many participants in those 120 posts? (I recall a thread where > one individual posted 100 messages. :-) I count 25 separate contributors to threads about PEP 322 (but I only see 75 posts there, and three threads, so I must be missing some of those that Raymond is counting -- or perhaps, not unlikely, they've expired off my newsserver). > > Is there anything else that needs to be done in the way of research, > > voting, or cheerleading for pep to be accepted? > > Yes. I'm getting cold feet about __reversed__. Some folks seem to > think that reversed() can be made to work on many iterators by having > the iterator supply __reversed__; I think this is asking for trouble > (e.g. you already pointed out why it couldn't be done for > enumerate()). I still think it could be, if enumerate kept a reference to its argument, but that's a detail -- I trust your instinct about such design issues (or I wouldn't be using Python...:-). So: let's keep it simple and have reversed be _exactly_ equivalent to (net of performance, hypothetical anomalous "pseudosequences" doing weird things, & exact error kinds/msgs): def reversed(sequence): for x in xrange(len(sequence)-1, -1, -1): yield sequence[x] no __reversed__, no complications, "no nuttin'". Putting that in the current 2.4 pre-alpha will let us start getting some experience with it and see if in the future we want to add refinements (always easier to add than to remove...:-) -- either to reverse or to other iterator-returning calls (e.g. reverse= optional arguments just like in the sort method of lists). Alex From aleaxit at yahoo.com Wed Nov 5 11:09:14 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Wed Nov 5 11:09:26 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com> References: <16E1010E4581B049ABC51D4975CEDB8803060D23@UKDCX001.uk.int.atosorigin.com> <5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com> Message-ID: <200311051709.14373.aleaxit@yahoo.com> On Wednesday 05 November 2003 03:54 pm, Phillip J. Eby wrote: ... > >reverse iteration. The iterator object has no way of knowing in advance > >that it is going to be called by reversed(). > > Why not change enumerate() to return an iterable, rather than an > iterator? Then its __reversed__ method could attempt to delegate to the > underlying iterable. Is it likely that anyone relies on enumerate() being > an iterator, rather than an iterable? I do rely on the _argument_ of enumerate being allowed to be just any iterator, yes -- e.g. in such idioms as: for i, x in enumerate(xs): if isgoodenough(x): return x elif istoohigh(i): raise GettingBoredError, i Yes, I _could_ recode that as: i = 0 for x in xs: if isgoodenough(x): return x i += 1 if istoohigh(i): raise GettingBoredError, i but, I don't _wanna_...:-). enumerate is just too slick! Of course, it would be fine for reverse(enumerate(x)) to fail for unsuitable values of x -- that's a separate issue. But actually it would not be a tragedy if I couldn't reverse(enumerate -- e.g. where I'd LIKE to code: for i, x in reverse(enumerate(xs)): if isbad(x): raise BadXError, x xs[i] = transform(x) I _might_ reasonably code: for i, x in enumerate(reverse(xs)): if isbad(x): raise BadXError, x xs[-1-i] = transform(x) that -1-i may not be the prettiest sight in the world, but I think this STILL beats the alternative of: for i in reversed_range(len(xs)): x = xs[i] if isbad(x): raise BadXError, x xs[i] = transform(x) not to mention today's for i in xrange(-1, -len(xs)-1, -1): x = xs[i] if isbad(x): raise BadXError, x xs[i] = transform(x) or: for i in xrange(len(xs)-1, -1, -1): x = xs[i] if isbad(x): raise BadXError, x xs[i] = transform(x) Alex From pedronis at bluewin.ch Wed Nov 5 11:34:29 2003 From: pedronis at bluewin.ch (Samuele Pedroni) Date: Wed Nov 5 11:33:42 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <200311051709.14373.aleaxit@yahoo.com> References: <5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com> <16E1010E4581B049ABC51D4975CEDB8803060D23@UKDCX001.uk.int.atosorigin.com> <5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com> Message-ID: <5.2.1.1.0.20031105171949.028f5cf8@pop.bluewin.ch> At 17:09 05.11.2003 +0100, Alex Martelli wrote: >On Wednesday 05 November 2003 03:54 pm, Phillip J. Eby wrote: > ... > > >reverse iteration. The iterator object has no way of knowing in advance > > >that it is going to be called by reversed(). > > > > Why not change enumerate() to return an iterable, rather than an > > iterator? Then its __reversed__ method could attempt to delegate to the > > underlying iterable. Is it likely that anyone relies on enumerate() being > > an iterator, rather than an iterable? I think he was wondering whether people rely on enumerate([1,2]).next i = enumerate([1,2]) i is iter(i) working , vs. needing iter(enumerate([1,2]).next I think he was proposing to implement enumerate as class enumerate(object): def __init__(self,iterable): self.iterable = iterable def __iter__(self): i = 0 for x in self.iterable: yield i,x i += 1 def __reversed__(self): rev = reversed(self.iterable) try: i = len(self.iterable)-1 except (TypeError,AttributeError): i = -1 for x in rev: yield i,x i -= 1 From marktrussell at btopenworld.com Wed Nov 5 11:48:31 2003 From: marktrussell at btopenworld.com (Mark Russell) Date: Wed Nov 5 11:48:23 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <200311051656.21014.aleaxit@yahoo.com> References: <001c01c3a393$3f87f2e0$e841fea9@oemcomputer> <200311051502.hA5F2tw29174@12-236-54-216.client.attbi.com> <200311051656.21014.aleaxit@yahoo.com> Message-ID: <1068050911.954.8.camel@localhost> On Wed, 2003-11-05 at 15:56, Alex Martelli wrote: > def reversed(sequence): > for x in xrange(len(sequence)-1, -1, -1): yield sequence[x] > > no __reversed__, no complications, "no nuttin'". If I was adding this as a library routine, I'd do: def reversed(sequence): try: seqlen = len(sequence) except TypeError: sequence = list(sequence) seqlen = len(sequence) for x in xrange(seqlen-1, -1, -1): yield sequence[x] OK, inefficient for iterators on long sequences, but it works with enumerate() etc and needs no changes to existing types. Mark From pje at telecommunity.com Wed Nov 5 12:02:09 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Nov 5 12:02:35 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <200311051528.hA5FS1l29291@12-236-54-216.client.attbi.com> References: <16E1010E4581B049ABC51D4975CEDB8803060D23@UKDCX001.uk.int.atosorigin.com> <5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20031105115935.03299bc0@telecommunity.com> At 07:28 AM 11/5/03 -0800, Guido van Rossum wrote: > > Why not change enumerate() to return an iterable, rather than an > > iterator? Then its __reversed__ method could attempt to delegate to > > the underlying iterable. Is it likely that anyone relies on > > enumerate() being an iterator, rather than an iterable? > >I find it rather elegant to use enumerate() on a file to generate line >numbers and lines together (adding 1 to the index to produce a more >conventional line number). What's more elegant than > > for i, line in enumerate(f): > print i+1, line, > >to print a file with line numbers??? I've used this in throwaway >code at least, and would hate to lose it. I thought 'for x in y' always called 'iter(y)', in which case the above still works. It's only this: ef = enumerate(f) while 1: try: i,line = ef.next() print i+1, line, except StopIteration: break That would break. From pje at telecommunity.com Wed Nov 5 12:06:13 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Nov 5 12:06:34 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <5.2.1.1.0.20031105171949.028f5cf8@pop.bluewin.ch> References: <200311051709.14373.aleaxit@yahoo.com> <5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com> <16E1010E4581B049ABC51D4975CEDB8803060D23@UKDCX001.uk.int.atosorigin.com> <5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20031105120524.03298560@telecommunity.com> At 05:34 PM 11/5/03 +0100, Samuele Pedroni wrote: >At 17:09 05.11.2003 +0100, Alex Martelli wrote: >>On Wednesday 05 November 2003 03:54 pm, Phillip J. Eby wrote: >> ... >> > >reverse iteration. The iterator object has no way of knowing in advance >> > >that it is going to be called by reversed(). >> > >> > Why not change enumerate() to return an iterable, rather than an >> > iterator? Then its __reversed__ method could attempt to delegate to the >> > underlying iterable. Is it likely that anyone relies on enumerate() being >> > an iterator, rather than an iterable? > >I think he was wondering whether people rely on > > >enumerate([1,2]).next >i = enumerate([1,2]) >i is iter(i) > >working , vs. needing iter(enumerate([1,2]).next Yes, precisely. >I think he was proposing to implement enumerate as > >class enumerate(object): > def __init__(self,iterable): > self.iterable = iterable > > def __iter__(self): > i = 0 > for x in self.iterable: > yield i,x > i += 1 > > def __reversed__(self): > rev = reversed(self.iterable) > try: > i = len(self.iterable)-1 > except (TypeError,AttributeError): > i = -1 > for x in rev: > yield i,x > i -= 1 Yes, except I hadn't thought it out in quite that much detail. Thanks for the clarification. From aleaxit at yahoo.com Wed Nov 5 13:14:52 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Wed Nov 5 13:15:05 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <5.2.1.1.0.20031105171949.028f5cf8@pop.bluewin.ch> References: <5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com> <5.2.1.1.0.20031105171949.028f5cf8@pop.bluewin.ch> Message-ID: <200311051914.52326.aleaxit@yahoo.com> On Wednesday 05 November 2003 05:34 pm, Samuele Pedroni wrote: ... > I think he was wondering whether people rely on > > enumerate([1,2]).next > i = enumerate([1,2]) > i is iter(i) > > working , vs. needing iter(enumerate([1,2]).next > > I think he was proposing to implement enumerate as > > class enumerate(object): > def __init__(self,iterable): > self.iterable = iterable > > def __iter__(self): > i = 0 > for x in self.iterable: > yield i,x > i += 1 > > def __reversed__(self): > rev = reversed(self.iterable) > try: > i = len(self.iterable)-1 > except (TypeError,AttributeError): > i = -1 > for x in rev: > yield i,x > i -= 1 Ah, I see -- thanks! Well, in theory you COULD add a 'next' method too: def next(self): self.iterable = iter(self.iterable) try: self.index += 1 except AttributeError: self.index = 0 return self.index, self.iterable.next() (or some reasonable optimization thereof:-) -- now __reversed__ would stop working after any .next call, but that would still be OK for all use cases I can think of. Alex From guido at python.org Wed Nov 5 13:33:56 2003 From: guido at python.org (Guido van Rossum) Date: Wed Nov 5 13:34:05 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: Your message of "Wed, 05 Nov 2003 16:56:21 +0100." <200311051656.21014.aleaxit@yahoo.com> References: <001c01c3a393$3f87f2e0$e841fea9@oemcomputer> <200311051502.hA5F2tw29174@12-236-54-216.client.attbi.com> <200311051656.21014.aleaxit@yahoo.com> Message-ID: <200311051833.hA5IXuN29576@12-236-54-216.client.attbi.com> > So: let's keep it simple and have reversed > be _exactly_ equivalent to (net of performance, hypothetical anomalous > "pseudosequences" doing weird things, & exact error kinds/msgs): > > def reversed(sequence): > for x in xrange(len(sequence)-1, -1, -1): yield sequence[x] > > no __reversed__, no complications, "no nuttin'". > > Putting that in the current 2.4 pre-alpha will let us start getting some > experience with it and see if in the future we want to add refinements > (always easier to add than to remove...:-) -- either to reverse or to > other iterator-returning calls (e.g. reverse= optional arguments just > like in the sort method of lists). I'd be for that, *if* we also allow as a possible outcome that reversed() simply doesn't find any use and we take it out before releasing 2.4b1. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Nov 5 13:43:16 2003 From: guido at python.org (Guido van Rossum) Date: Wed Nov 5 13:44:07 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: Your message of "Wed, 05 Nov 2003 10:55:10 EST." <003301c3a3b5$3161b1c0$e841fea9@oemcomputer> References: <003301c3a3b5$3161b1c0$e841fea9@oemcomputer> Message-ID: <200311051843.hA5IhGU29598@12-236-54-216.client.attbi.com> > > Yes. I'm getting cold feet about __reversed__. > > What can I do to warm those feet? > > I spent a month making this proposal as perfect as possible, gathering > support for it, trying each proposed modification, and enduring what > feels like hazing. Still, there is a little bit of energy left if that > what it takes to put the ball over the goal line. > > Getting this far hasn't been easy. Python people are quick to express > negativity on just about anything and they take great pleasure is > exploring every weird variant they can think of. I'm okay with adding reversed() as a builtin that works for sequences only but I'm not okay with adding the __reversed__ protocol. For me, the main advantage of reversed() is that it expresses better what I mean when I'm going over a list (or other concrete sequence) backwards. The __reversed__ protocol muddles the issue by inviting to try to make reversed() work for some iterators; I don't see the use case (or if I do see it, I see it as much less important than the previous one). > > I also still think that a reversed [x]range() would give us a bigger > > bang for the buck > > I'm not willing to go that route: > * Several posters gave negative feedback on that option. > * It doesn't address the ugly and inefficient s[::-1] approach which I > really do not want to become *the* idiom. > * Providing yet another variant of xrange() is a step backwards IMO. > * It is not an extensible protocol like the reversed() / __reversed__ > pair. > * Except for the simple case of revrange(n), the multiple argument forms > are not a simplification (IMO) and are still difficult to visually > verify (try the example from random.shuffle). > * A unique benefit to python is the ability to loop over containers > without using indices. The current proposal supports that idea. The > revrange() approach doesn't. Points well taken. About your last bullet, I wonder if one of the issues is that when doing a forward loop over a container, we don't really care that much about the order as long as we get all items (witness the popularity of looping over dicts). But when doing a reverse loop, we clearly *do* care about the order. So forward and reverse iteration are not symmetric. This may explains why 3 out of 5 examples you found *need* the index. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Wed Nov 5 14:43:43 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Wed Nov 5 14:44:20 2003 Subject: [Python-Dev] test warnings for "%x"%id(foo) on 2.3 branch In-Reply-To: <200311051128.hA5BSHaG009610@localhost.localdomain> References: <200311051128.hA5BSHaG009610@localhost.localdomain> Message-ID: Anthony Baxter writes: > >>> Anthony Baxter wrote > > Hm. "%x" % (id(o) & 2L*sys.maxint+1) > > is considerably less obvious that "%x"%id(o) > > The best I can come up with at this moment using the 'struct' module is > ''.join(['%02x'%ord(x) for x in struct.pack('>i', id(o))]), which is also > pretty grotesque. In what sense is this better - in particular if you would write mine as MAX_UINT = 2L*sys.maxint+1 ... "%x" % (id(o) & MAX_UINT) > Thinking about it further, the better fix might be to replace the test > code that looks for an exact match with a regex-based match instead... It's not just in test code, AFAIR - also in minidom __repr__ (or some such). Regards, Martin From martin at v.loewis.de Wed Nov 5 14:47:04 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Wed Nov 5 14:47:26 2003 Subject: [Python-Dev] Deprecating obsolete builtins In-Reply-To: <200311051506.hA5F69u29213@12-236-54-216.client.attbi.com> References: <002301c3a39d$36d00020$e841fea9@oemcomputer> <200311051506.hA5F69u29213@12-236-54-216.client.attbi.com> Message-ID: Guido van Rossum writes: > > [Neal Norwitz] > > > For 2.4 I'd suggest we officially deprecate: apply, coerce, intern. > > > > +1 > > Isn't apply() already deprecated? Otherwise +1. Not with a deprecation warning. Regards, Martin From fdrake at acm.org Wed Nov 5 14:50:38 2003 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed Nov 5 14:50:48 2003 Subject: [Python-Dev] Deprecating obsolete builtins In-Reply-To: References: <002301c3a39d$36d00020$e841fea9@oemcomputer> <200311051506.hA5F69u29213@12-236-54-216.client.attbi.com> Message-ID: <16297.21646.645041.827176@grendel.zope.com> Martin v. L?wis writes: > Not with a deprecation warning. But it does generate a PendingDeprecationWarning. Given the long history of apply(), that's about as strong a change as can be made just now, and much stronger than some would like. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From guido at python.org Wed Nov 5 14:57:02 2003 From: guido at python.org (Guido van Rossum) Date: Wed Nov 5 14:57:09 2003 Subject: [Python-Dev] Deprecating obsolete builtins In-Reply-To: Your message of "05 Nov 2003 20:47:04 +0100." References: <002301c3a39d$36d00020$e841fea9@oemcomputer> <200311051506.hA5F69u29213@12-236-54-216.client.attbi.com> Message-ID: <200311051957.hA5Jv2B29760@12-236-54-216.client.attbi.com> > > Isn't apply() already deprecated? Otherwise +1. > > Not with a deprecation warning. Ah, it's coming back. It's a silent deprecation, because there are too many uses still. Probably the same will hold for another release or two. --Guido van Rossum (home page: http://www.python.org/~guido/) From aleaxit at yahoo.com Wed Nov 5 15:34:55 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Wed Nov 5 15:35:05 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <200311051833.hA5IXuN29576@12-236-54-216.client.attbi.com> References: <001c01c3a393$3f87f2e0$e841fea9@oemcomputer> <200311051656.21014.aleaxit@yahoo.com> <200311051833.hA5IXuN29576@12-236-54-216.client.attbi.com> Message-ID: <200311052134.56018.aleaxit@yahoo.com> On Wednesday 05 November 2003 19:33, Guido van Rossum wrote: > > So: let's keep it simple and have reversed > > be _exactly_ equivalent to (net of performance, hypothetical anomalous ... > I'd be for that, *if* we also allow as a possible outcome that > reversed() simply doesn't find any use and we take it out before > releasing 2.4b1. Sure, why not? Determining the exact set of features is what pre-beta releases are for, in a sense. Alex From python at rcn.com Wed Nov 5 15:54:22 2003 From: python at rcn.com (Raymond Hettinger) Date: Wed Nov 5 15:54:39 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <200311051843.hA5IhGU29598@12-236-54-216.client.attbi.com> Message-ID: <003701c3a3de$fdc7c1e0$e841fea9@oemcomputer> [GvR] > I'm okay with adding reversed() as a builtin that works for sequences > only but I'm not okay with adding the __reversed__ protocol. > > For me, the main advantage of reversed() is that it expresses better > what I mean when I'm going over a list (or other concrete sequence) > backwards. The __reversed__ protocol muddles the issue by inviting to > try to make reversed() work for some iterators; I don't see the use > case (or if I do see it, I see it as much less important than the > previous one). I'm not married to the idea of __reversed__ but think it should probably be kept (if my intuition is off on this one, we can pull it out before the beta release). On the plus side: * Many of the original posters either specifically requested this or included some variation of it in their proposals. * There is a small group (including Jeremy Fincher) that consider a reversal protocol to be essential. * It is particularly useful for xrange() because it reduces the overhead to zero without touching the API. The implementation patch on SF shows that this can be done cleanly. Essentially, __reverse__ forwards the call to __iter__ with the arguments rearranged for reverse order. * It leaves open the possibility that someone could add __reverse__ to file objects, enabling them loop in reverse (helpful in reviewing log files for example). * There is a small group that passionately wants reverse() to work with enumerate() and Alex appears to be close to figuring out how to overcome the implementation challenges. * The iter/__iter__ pair neatly parallels reversed/__reversed__. * It is pythonic to put hooks in for just about everything. Sooner or later, someone needs the hook. For everyone else, it's invisible. On the minus side: * I think you got cold feet when some poster presented a wacky or misguided use for it. There's no avoiding that; even Alex's dirt simple __copy__ protocol can be turned into an atrocity by someone so inclined. > About your last bullet, I wonder if one of the > issues is that when doing a forward loop over a container, we don't > really care that much about the order as long as we get all items > (witness the popularity of looping over dicts). But when doing a > reverse loop, we clearly *do* care about the order. So forward and > reverse iteration are not symmetric. This may explains why 3 out of 5 > examples you found *need* the index. Incisive analysis. are-your-feet-feeling-warmer-now-ly yours, Raymond Hettinger From fincher.8 at osu.edu Wed Nov 5 17:00:31 2003 From: fincher.8 at osu.edu (Jeremy Fincher) Date: Wed Nov 5 16:02:09 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <200311051656.21014.aleaxit@yahoo.com> References: <001c01c3a393$3f87f2e0$e841fea9@oemcomputer> <200311051502.hA5F2tw29174@12-236-54-216.client.attbi.com> <200311051656.21014.aleaxit@yahoo.com> Message-ID: <200311051700.31096.fincher.8@osu.edu> On Wednesday 05 November 2003 10:56 am, Alex Martelli wrote: > def reversed(sequence): > for x in xrange(len(sequence)-1, -1, -1): yield sequence[x] > > no __reversed__, no complications, "no nuttin'". > > Putting that in the current 2.4 pre-alpha will let us start getting some > experience with it and see if in the future we want to add refinements > (always easier to add than to remove...:-) -- either to reverse or to > other iterator-returning calls (e.g. reverse= optional arguments just > like in the sort method of lists). It seems like a perfect candidate for that "tools" hierarchy you proposed before. As a builtin, I'd be surprised if it saw significant use. Jeremy From jeremy at alum.mit.edu Wed Nov 5 16:06:40 2003 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed Nov 5 16:09:50 2003 Subject: [Python-Dev] Deprecating obsolete builtins In-Reply-To: <200311051523.hA5FN9r29272@12-236-54-216.client.attbi.com> References: <002301c3a39d$36d00020$e841fea9@oemcomputer> <2m7k2f0vui.fsf@starship.python.net> <200311051545.11246.aleaxit@yahoo.com> <200311051523.hA5FN9r29272@12-236-54-216.client.attbi.com> Message-ID: <1068066399.26328.23.camel@localhost.localdomain> On Wed, 2003-11-05 at 10:23, Guido van Rossum wrote: > > Removing _any_ built-in that was around in 1.5.2 will pose similar > > problems. > > Only proportional to the likelihood that it was used in 1.5.2, which > is proportional to how useful it is. intern(): extremely unlikely > (nobody knows what it's for); coerce(): rather unlikely (too > advanced); apply(): very likely. The solution is to get people to stop using 1.5.2. I don't entirely understand why so many people write new code that needs to work with it. Jeremy From python at rcn.com Wed Nov 5 16:22:35 2003 From: python at rcn.com (Raymond Hettinger) Date: Wed Nov 5 16:22:51 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <003701c3a3de$fdc7c1e0$e841fea9@oemcomputer> Message-ID: <004401c3a3e2$eecdb1a0$e841fea9@oemcomputer> > >The __reversed__ protocol muddles the issue by inviting to > > try to make reversed() work for some iterators The invitation is to add efficient reverse iteration support to regular objects and user defined classes, not for iterators. Though I won't be suprised if someone tries, the only iterator that has a chance with this is enumerate, but that is not what the hook is for. Raymond From pedronis at bluewin.ch Wed Nov 5 16:44:59 2003 From: pedronis at bluewin.ch (Samuele Pedroni) Date: Wed Nov 5 16:43:22 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <200311051914.52326.aleaxit@yahoo.com> References: <5.2.1.1.0.20031105171949.028f5cf8@pop.bluewin.ch> <5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com> <5.2.1.1.0.20031105171949.028f5cf8@pop.bluewin.ch> Message-ID: <5.2.1.1.0.20031105223140.028f3a40@pop.bluewin.ch> At 19:14 05.11.2003 +0100, Alex Martelli wrote: >On Wednesday 05 November 2003 05:34 pm, Samuele Pedroni wrote: > ... > > I think he was wondering whether people rely on > > > > enumerate([1,2]).next > > i = enumerate([1,2]) > > i is iter(i) > > > > working , vs. needing iter(enumerate([1,2]).next > > > > I think he was proposing to implement enumerate as > > > > class enumerate(object): > > def __init__(self,iterable): > > self.iterable = iterable > > > > def __iter__(self): > > i = 0 > > for x in self.iterable: > > yield i,x > > i += 1 > > > > def __reversed__(self): > > rev = reversed(self.iterable) > > try: > > i = len(self.iterable)-1 > > except (TypeError,AttributeError): > > i = -1 > > for x in rev: > > yield i,x > > i -= 1 > >Ah, I see -- thanks! Well, in theory you COULD add a 'next' method too: > > def next(self): > self.iterable = iter(self.iterable) > try: self.index += 1 > except AttributeError: self.index = 0 > return self.index, self.iterable.next() > >(or some reasonable optimization thereof:-) -- now __reversed__ would stop >working after any .next call, but that would still be OK for all use cases I >can think of. well, you would also get an iterator hybrid that violates: """ Iterator objects also need to implement this method [__iter__]; they are required to return themselves. """ http://www.python.org/doc/2.3.2/ref/sequence-types.html#l2h-234 what one could do is: class enumerate(object): def __init__(self,iterable): self.iterable = iterable self.forward = None self.index = 0 def __iter__(self): return self def next(self): if not self.forward: self.forward = iter(self.iterable) i = self.index self.index += 1 return i, self.forward.next() def __reversed__(self): if self.forward: raise Exception,... rev = reversed(self.iterable) try: i = len(self.iterable)-1 except (TypeError,AttributeError): i = -1 for x in rev: yield i,x i -= 1 but is still an hybrid, setting a bad precedent of trying too hard to attach __reversed__ to an iterator, making enumerate just an iterable is not backward compatible but is a bit saner although it does not feel that natural either. regards. From guido at python.org Wed Nov 5 17:21:12 2003 From: guido at python.org (Guido van Rossum) Date: Wed Nov 5 17:22:18 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: Your message of "Wed, 05 Nov 2003 16:22:35 EST." <004401c3a3e2$eecdb1a0$e841fea9@oemcomputer> References: <004401c3a3e2$eecdb1a0$e841fea9@oemcomputer> Message-ID: <200311052221.hA5MLDX29943@12-236-54-216.client.attbi.com> > > >The __reversed__ protocol muddles the issue by inviting to > > > try to make reversed() work for some iterators > > The invitation is to add efficient reverse iteration support to regular > objects and user defined classes, not for iterators. Though I won't be > suprised if someone tries, the only iterator that has a chance with this > is enumerate, but that is not what the hook is for. Yeah, but there was widespread misunderstanding here (for a while even you and Alex were convinced that it was possible for enumerate). Several functions in itertools could easily be made to support __reversed__ *if* their argument supports it (or even if not in one case): chain(*iterables) -- you can define reversed(chain(*iterables)) as follows: for it in reversed(iterables): for element in reversed(it): yield element cycle(iterable) -- this one is infinite but reversed(cycle(x)) could be defined as cycle(reversed(x)). ifilter(pred, it) -- again, it's easy to define reversed(ifilter(P, X)) as ifilter(P, reversed(X)). Ditto for ifilterfalse. imap() -- this would not be so easy because the iterables might not be of equal length, so you can't map reversed(imap(F, X, Y)) to imap(F, reversed(X), reversed(Y)). But for a single sequence, again it could be done. islice() -- seems easy enough. starmap() -- simple, this is like imap() with a single argument. repeat() -- trivial! reversed(repeat(X[, N])) == repeat(X[, N]). dropwhile(), takewhile(), count() aren't amenable. So, unless you want to open this can of worms, I'd be for a version of reversed() that does *not* support __reversed__, making it perfectly clear it only applies to real sequences. --Guido van Rossum (home page: http://www.python.org/~guido/) From DavidA at ActiveState.com Wed Nov 5 17:16:24 2003 From: DavidA at ActiveState.com (David Ascher) Date: Wed Nov 5 17:22:43 2003 Subject: [Python-Dev] closure semantics In-Reply-To: <200310220158.21389.aleaxit@yahoo.com> References: <200310220121.52789.aleaxit@yahoo.com> <200310212340.h9LNeYq25691@12-236-54-216.client.attbi.com> <200310220158.21389.aleaxit@yahoo.com> Message-ID: <3FA976B8.9070806@ActiveState.com> Alex Martelli wrote: >So it can't be global, as it must stay a keyword for backwards compatibility >at least until 3.0. > Why? Removing keywords should be much simpler than adding them. I have no idea how hard it is to hack the parser to adjust, but I can't imagine how having 'global' no longer be a keyword as far as its concerned break b/w compatibility. What am I missing? From guido at python.org Wed Nov 5 17:27:43 2003 From: guido at python.org (Guido van Rossum) Date: Wed Nov 5 17:27:51 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: Your message of "Wed, 05 Nov 2003 22:44:59 +0100." <5.2.1.1.0.20031105223140.028f3a40@pop.bluewin.ch> References: <5.2.1.1.0.20031105171949.028f5cf8@pop.bluewin.ch> <5.1.0.14.0.20031105095228.0330f040@mail.telecommunity.com> <5.2.1.1.0.20031105171949.028f5cf8@pop.bluewin.ch> <5.2.1.1.0.20031105223140.028f3a40@pop.bluewin.ch> Message-ID: <200311052227.hA5MRhn29980@12-236-54-216.client.attbi.com> > but is still an hybrid, setting a bad precedent of trying too hard to > attach __reversed__ to an iterator, making enumerate just an iterable is > not backward compatible but is a bit saner although it does not feel that > natural either. Exactly. All I've heard is that some folks asked for __reversed__. I haven't heard any convincing use cases; the PEP doesn't have any. The only motivation in the PEP is this: """ Custom Reverse Objects may optionally provide a __reversed__ method that returns a custom reverse iterator. This allows reverse() to be applied to objects that do not have __getitem__() and __len__() but still have some useful way of providing reverse iteration. """ To me, this just *begs* for attempts to add __reversed__ to all sorts of things (including iterators) that aren't sequences. If the real use case is to speed up performance, I'd like to see a discussion of the attainable speed gain, and I'd like to see the absence of __getitem__ / __len__ removed from the motivation. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Nov 5 17:29:32 2003 From: guido at python.org (Guido van Rossum) Date: Wed Nov 5 17:29:39 2003 Subject: [Python-Dev] closure semantics In-Reply-To: Your message of "Wed, 05 Nov 2003 14:16:24 PST." <3FA976B8.9070806@ActiveState.com> References: <200310220121.52789.aleaxit@yahoo.com> <200310212340.h9LNeYq25691@12-236-54-216.client.attbi.com> <200310220158.21389.aleaxit@yahoo.com> <3FA976B8.9070806@ActiveState.com> Message-ID: <200311052229.hA5MTWT30008@12-236-54-216.client.attbi.com> > Alex Martelli wrote: > > >So it can't be global, as it must stay a keyword for backwards > >compatibility at least until 3.0. [David] > Why? Removing keywords should be much simpler than adding them. I > have no idea how hard it is to hack the parser to adjust, but I > can't imagine how having 'global' no longer be a keyword as far as > its concerned break b/w compatibility. > > What am I missing? I don't recall the context, but I think the real issue with removing 'global' is that there's too much code out there that uses the global syntax to remove the global statement before 3.0. --Guido van Rossum (home page: http://www.python.org/~guido/) From aleaxit at yahoo.com Wed Nov 5 17:34:23 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Wed Nov 5 17:34:32 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <5.2.1.1.0.20031105223140.028f3a40@pop.bluewin.ch> References: <5.2.1.1.0.20031105171949.028f5cf8@pop.bluewin.ch> <5.2.1.1.0.20031105223140.028f3a40@pop.bluewin.ch> Message-ID: <200311052334.23547.aleaxit@yahoo.com> On Wednesday 05 November 2003 22:44, Samuele Pedroni wrote: ... > > > I think he was wondering whether people rely on > > > > > > enumerate([1,2]).next ...which is one thing... > > > i = enumerate([1,2]) > > > i is iter(i) ...which is another. > >Ah, I see -- thanks! Well, in theory you COULD add a 'next' method too: Note I specifically didn't say "make enumerate return an iterator" -- I said "add a 'next' method". It's a non-special name (be that right or wrong) and thus there is no prohibition against non-iterators having such a method. > well, you would also get an iterator hybrid that violates: No you wouldn't -- you would get a non-iterator type which exposes a method named 'next', and that violates no Python rule. > attach __reversed__ to an iterator, making enumerate just an iterable is > not backward compatible but is a bit saner although it does not feel that > natural either. If anybody relies on that "i is iter(i)" then, yes. I have never seen that relied upon. I _have_ seen quite a few cases of reliance on calls to a 'next' method to "throw the first item away" (no doubt a call to iter(...) first would be preferable, but I'm just mentioning what I've seen). I'm not sure supporting dubious "happens to work" existing usage is _desirable_ -- I'm just saying it's _possible_ (in some cases, such as this one) without necessarily violating anything. Personally, since I found out that enumerate(reversed(x)) works almost as well as reversed(enumerate(x)) [[or other hypotheticals -- such as enumerate(x, reverse=True) OR reversed(x, enumerate=True)]], and better than revrange(len(x)), for my use cases, I'm not particularly pro NOR con wrt __reversed__ -- its pluses (which Raymond summarizes quite well) and its minuses (Guido's worry about it promoting unwarranted complications, my vague unease at "yet another special-case protocol via a special-method when adaptation would handle it more uniformly") are finely balanced. I just hope that, either with or without __reversed__, reversed _does_ get in, at least, as Guido pointed out, tentatively (since features, if need be, may be withdrawn before the beta phase). Alex From guido at python.org Wed Nov 5 17:35:01 2003 From: guido at python.org (Guido van Rossum) Date: Wed Nov 5 17:36:07 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: Your message of "Wed, 05 Nov 2003 15:54:22 EST." <003701c3a3de$fdc7c1e0$e841fea9@oemcomputer> References: <003701c3a3de$fdc7c1e0$e841fea9@oemcomputer> Message-ID: <200311052235.hA5MZ1F30026@12-236-54-216.client.attbi.com> > I'm not married to the idea of __reversed__ but think it should > probably be kept (if my intuition is off on this one, we can pull it > out before the beta release). Let's do it the other way around -- let's not add a complication until we have further proof it is needed. Remember YAGNI. :-) > On the plus side: > > * Many of the original posters either specifically requested this or > included some variation of it in their proposals. If any of them gave a good motivation or use case, those didn't make it into the PEP. > * There is a small group (including Jeremy Fincher) that consider a > reversal protocol to be essential. And I think that as a protocol it needs a separate PEP, because new protocols are much more involved than new builtins. > * It is particularly useful for xrange() because it reduces the overhead > to zero without touching the API. The implementation patch on SF shows > that this can be done cleanly. Essentially, __reverse__ forwards the > call to __iter__ with the arguments rearranged for reverse order. The implementation could special-case xrange() and lists and "optimize the snot out of them" without the need for a general protocol. > * It leaves open the possibility that someone could add __reverse__ to > file objects, enabling them loop in reverse (helpful in reviewing log > files for example). That's exactly the danger. Such a thing is much better coded as a separate object rather than adding it to the base file object. > * There is a small group that passionately wants reverse() to work with > enumerate() and Alex appears to be close to figuring out how to overcome > the implementation challenges. Doubtful. > * The iter/__iter__ pair neatly parallels reversed/__reversed__. The parallel is a fallacy (see one of my previous postys about the asymmetry). > * It is pythonic to put hooks in for just about everything. Sooner or > later, someone needs the hook. For everyone else, it's invisible. But hook design is harder than builtin design. > On the minus side: > > * I think you got cold feet when some poster presented a wacky or > misguided use for it. There's no avoiding that; even Alex's dirt simple > __copy__ protocol can be turned into an atrocity by someone so inclined. The __copy__ protocol is limited in practice to the expectations and promises of the copy module. The problem with __reversed__ is that everyone thinks it means what *they* would like to see. > are-your-feet-feeling-warmer-now-ly yours, No, this is one of the coldest weeks since my mvoe to Calif. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From aleaxit at yahoo.com Wed Nov 5 18:02:29 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Wed Nov 5 18:02:36 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <200311052221.hA5MLDX29943@12-236-54-216.client.attbi.com> References: <004401c3a3e2$eecdb1a0$e841fea9@oemcomputer> <200311052221.hA5MLDX29943@12-236-54-216.client.attbi.com> Message-ID: <200311060002.29814.aleaxit@yahoo.com> On Wednesday 05 November 2003 23:21, Guido van Rossum wrote: > > > >The __reversed__ protocol muddles the issue by inviting to > > > > try to make reversed() work for some iterators > > > > The invitation is to add efficient reverse iteration support to regular > > objects and user defined classes, not for iterators. Though I won't be > > suprised if someone tries, the only iterator that has a chance with > > this is enumerate, but that is not what the hook is for. > > Yeah, but there was widespread misunderstanding here (for a while even > you and Alex were convinced that it was possible for enumerate). It _is_ *possible*; it is not necessarily _opportune_ -- a different issue. Similarly, you point out below possibilities that may not be opportune. > So, unless you want to open this can of worms, I'd be for a version of > reversed() that does *not* support __reversed__, making it perfectly > clear it only applies to real sequences. Unless some _opportune_ (i.e., truly good:-) use case of "naturally reversible nonsequence" (doubly linked list...?-) arises (and the __reversed__ idea can inserted then -- just as it could be removed if reversed started out with it -- as long as we do it before the beta) reversed with or without __reversed__ seem anyway fine to me -- arguments being so finely balanced on both sides. Alex From guido at python.org Wed Nov 5 18:08:42 2003 From: guido at python.org (Guido van Rossum) Date: Wed Nov 5 18:08:49 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: Your message of "Thu, 06 Nov 2003 00:02:29 +0100." <200311060002.29814.aleaxit@yahoo.com> References: <004401c3a3e2$eecdb1a0$e841fea9@oemcomputer> <200311052221.hA5MLDX29943@12-236-54-216.client.attbi.com> <200311060002.29814.aleaxit@yahoo.com> Message-ID: <200311052308.hA5N8gU30099@12-236-54-216.client.attbi.com> > Unless some _opportune_ (i.e., truly good:-) use case of "naturally > reversible nonsequence" (doubly linked list...?-) arises (and the > __reversed__ idea can inserted then -- just as it could be removed > if reversed started out with it -- as long as we do it before the beta) > reversed with or without __reversed__ seem anyway fine to me -- > arguments being so finely balanced on both sides. It's more effort to add something later than to remove it (since there's always *someone* who's already dependent on it), so I see the argument about adding __reversed__ far from balanced. I see at most a 5% chance that reversed() would be removed before 2.3b1. If we add __reversed__ now I doubt that we'll remove it (assuming reversed() stays), but I still am unconvinced of the need (and I *am* convinced of the danger). So: - I am +1 on adding reversed() provisionally - I am -1 on adding __reversed__ at the same time --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at cosc.canterbury.ac.nz Wed Nov 5 18:26:12 2003 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed Nov 5 18:26:23 2003 Subject: [Python-Dev] Deprecating obsolete builtins In-Reply-To: <002301c3a39d$36d00020$e841fea9@oemcomputer> Message-ID: <200311052326.hA5NQCk11560@oma.cosc.canterbury.ac.nz> [Neal Norwitz] > For 2.4 I'd suggest we officially deprecate: apply, coerce, intern. In the case of intern, do you mean to move it into a module, or remove it altogether? If the latter, why? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From bsder at allcaps.org Wed Nov 5 19:26:12 2003 From: bsder at allcaps.org (Andrew P. Lentvorski, Jr.) Date: Wed Nov 5 19:25:49 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <200311051843.hA5IhGU29598@12-236-54-216.client.attbi.com> References: <003301c3a3b5$3161b1c0$e841fea9@oemcomputer> <200311051843.hA5IhGU29598@12-236-54-216.client.attbi.com> Message-ID: <20031105161052.W14642@mail.allcaps.org> On Wed, 5 Nov 2003, Guido van Rossum wrote: > I'm okay with adding reversed() as a builtin that works for sequences > only but I'm not okay with adding the __reversed__ protocol. But, doesn't this effectively take the PEP back to the original proposal of a sequence method that it drifted away from? With the restriction to sequences, reversed() is then likely to be implemented as a thin wrapper around seq.somerevmethod() which could then return either a new reversed sequence, an iterable, or an iterator depending upon efficiency, implementation, thread-safety, etc. Since reversed() is turning out not be generally applicable anyway, perhaps going back to the original idea of a sequence method would be a good thing? -a From guido at python.org Wed Nov 5 19:30:56 2003 From: guido at python.org (Guido van Rossum) Date: Wed Nov 5 19:31:02 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: Your message of "Wed, 05 Nov 2003 16:26:12 PST." <20031105161052.W14642@mail.allcaps.org> References: <003301c3a3b5$3161b1c0$e841fea9@oemcomputer> <200311051843.hA5IhGU29598@12-236-54-216.client.attbi.com> <20031105161052.W14642@mail.allcaps.org> Message-ID: <200311060030.hA60UuP30210@12-236-54-216.client.attbi.com> > > I'm okay with adding reversed() as a builtin that works for sequences > > only but I'm not okay with adding the __reversed__ protocol. > > But, doesn't this effectively take the PEP back to the original proposal > of a sequence method that it drifted away from? No, because making it a sequence method would require every sequence implementation to support it. Making it a builtin makes it work for all sequences (everything that supports __len__ and __getitem__ with random access, really). > With the restriction to sequences, reversed() is then likely to be > implemented as a thin wrapper around seq.somerevmethod() which could then > return either a new reversed sequence, an iterable, or an iterator > depending upon efficiency, implementation, thread-safety, etc. No. reversed() should *never* return a new sequence; it should return an iterator. > Since reversed() is turning out not be generally applicable anyway, > perhaps going back to the original idea of a sequence method would be a > good thing? No. The feedback on that was pretty uniformly negative. The PEP is 95% about reversed() on sequences and only a timy bit about __reversed__, so little is lost. --Guido van Rossum (home page: http://www.python.org/~guido/) From anthony at interlink.com.au Wed Nov 5 22:15:46 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Wed Nov 5 22:18:59 2003 Subject: [Python-Dev] test warnings for "%x"%id(foo) on 2.3 branch In-Reply-To: <200311051458.hA5EwWc29153@12-236-54-216.client.attbi.com> Message-ID: <200311060315.hA63Fndh000543@localhost.localdomain> >>> Guido van Rossum wrote > This warning will go away in 2.4 again, where %x with a negative int > will return a hex number with a minus sign. So I'd be against > introducing a new format code. I've forgotten in what code you found > this, but the sys.maxint solution sounds like your best bet. In 2.4 > we can also make id() return a long when the int value would be > negative; I don't want to do that in 2.3 since changing the return > type and value of a builtin in a minor release seems a compatibility > liability -- but in 2.4 the difference between int and long will be > wiped out even more than it already is, so it should be fine there. The code is basically something like this: Python 2.3.2+ (#1, Nov 5 2003, 00:54:02) [GCC 3.3.1 20030930 (Red Hat Linux 3.3.1-6)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> class a: pass ... >>> b=a() >>> repr(b) == '<__main__.a instance at 0x%x>'%id(b) __main__:1: FutureWarning: %u/%o/%x/%X of negative int will return a signed string in Python 2.4 and up True >>> For now, I'll patch the 2.3 code in the test suite to make it not complain. If %x will return a negative hex number, then the internals of id() must make sure that they return a positive number, or whatever does the standard repr will need to change as well. I'll log a bug on SF for it. Anthony From python at rcn.com Wed Nov 5 22:25:19 2003 From: python at rcn.com (Raymond Hettinger) Date: Wed Nov 5 22:25:29 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <200311052235.hA5MZ1F30026@12-236-54-216.client.attbi.com> Message-ID: <002501c3a415$9b06f9e0$e841fea9@oemcomputer> > > * There is a small group (including Jeremy Fincher) that consider a > > reversal protocol to be essential. > > And I think that as a protocol it needs a separate PEP, because new > protocols are much more involved than new builtins. Great idea. The champions for __reverse__ can plead their case there. > > * It is particularly useful for xrange() because it reduces the overhead > > to zero without touching the API. The implementation patch on SF shows > > that this can be done cleanly. Essentially, __reverse__ forwards the > > call to __iter__ with the arguments rearranged for reverse order. > > The implementation could special-case xrange() and lists and "optimize > the snot out of them" without the need for a general protocol. Agreed! I'll take __reversed__ out of the pep. May I mark this one as accepted and move on? Raymond Hettinger From DavidA at ActiveState.com Wed Nov 5 22:44:43 2003 From: DavidA at ActiveState.com (David Ascher) Date: Wed Nov 5 22:36:02 2003 Subject: [Python-Dev] closure semantics In-Reply-To: <200311052229.hA5MTWT30008@12-236-54-216.client.attbi.com> References: <200310220121.52789.aleaxit@yahoo.com> <200310212340.h9LNeYq25691@12-236-54-216.client.attbi.com> <200310220158.21389.aleaxit@yahoo.com> <3FA976B8.9070806@ActiveState.com> <200311052229.hA5MTWT30008@12-236-54-216.client.attbi.com> Message-ID: <3FA9C3AB.808@ActiveState.com> Guido van Rossum wrote: [Alex] >>>So it can't be global, as it must stay a keyword for backwards >>>compatibility at least until 3.0. [David] >>Why? Removing keywords should be much simpler than adding them. I >>have no idea how hard it is to hack the parser to adjust, but I >>can't imagine how having 'global' no longer be a keyword as far as >>its concerned break b/w compatibility. >> >>What am I missing? [GvR] > I don't recall the context, but I think the real issue with removing > 'global' is that there's too much code out there that uses the global > syntax to remove the global statement before 3.0. I would never have suggested that. Just that we can evolve the parser to retain the old usage global a,b,c while allowing a new usage global.a = value by removing 'global' from the list of reserved words and doing "fancy stuff" in the parser. Note that I very much don't know the details of the "fancy stuff". --david From neal at metaslash.com Wed Nov 5 22:58:37 2003 From: neal at metaslash.com (Neal Norwitz) Date: Wed Nov 5 22:58:47 2003 Subject: [Python-Dev] Deprecating obsolete builtins In-Reply-To: <200311052326.hA5NQCk11560@oma.cosc.canterbury.ac.nz> References: <002301c3a39d$36d00020$e841fea9@oemcomputer> <200311052326.hA5NQCk11560@oma.cosc.canterbury.ac.nz> Message-ID: <20031106035837.GB7212@epoch.metaslash.com> On Thu, Nov 06, 2003 at 12:26:12PM +1300, Greg Ewing wrote: > [Neal Norwitz] > > For 2.4 I'd suggest we officially deprecate: apply, coerce, intern. > > In the case of intern, do you mean to move it into > a module, or remove it altogether? > > If the latter, why? For the most part, I meant to remove them (including intern) altogether in the long run. In 2.4, I only meant to officially deprecate them with a warning. intern() doesn't seem particularly useful or commonly used. At least moving it to sys or some other module is an improvement IMO. My primary goal in pushing to deprecate these older features is to make the language smaller. A secondary goal is to reduce the code base, thus easing maintenance and testing. If a feature is not useful, in the long run, I think it should be removed. I agree there's pain involved. But there's also pain in keeping it. Part of that pain, is that its use get propagated. Perhaps people that teach Python and write books can speak to this better than I. This idea leads to Jeremy's statement: The solution is to get people to stop using 1.5.2. I don't entirely understand why so many people write new code that needs to work with it. If we never deprecate/threaten to remove a feature, people will continue to use it. But that becomes a circular argument for why we can't deprecate/remove it. How long should we wait from the time a feature is not needed until it is removed? Here's the documentation release dates from the doc web page (http://python.org/doc/versions.html): 2.3 29 Jul 2003 2.2 21 Dec 2001 2.1 15 Apr 2001 2.0 16 Oct 2000 1.5.2 30 Apr 1999 By the time 2.4 is released (likely mid-2004 at the earliest), apply() will have been made redundant for about 4 years (since 2.0 was released). All we are talking about is adding a warning for 2.4. I'm not sure whether it is appropriate to remove apply() in 2.5 (delivered in 2005-2006?). But if we don't work towards cleaning up, it will never get done. I also have no problem adding a module for backwards compatibility that adds apply(), etc to builtins. In fact, I think this is a better approach that if someone wants to "port" their code from 1.5.2 to 2.4, they can acheive much of it by adding: import python1_5_2_compatibility which does some magic. I also think the reverse is true. For new builtins, it would be nice to provide a compatibility module that can be downloaded for older versions. That way I can use sum(), enumerate(), etc in 2.2 and before. Neal From bac at OCF.Berkeley.EDU Wed Nov 5 23:22:45 2003 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Wed Nov 5 23:22:54 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: <200311052308.hA5N8gU30099@12-236-54-216.client.attbi.com> References: <004401c3a3e2$eecdb1a0$e841fea9@oemcomputer> <200311052221.hA5MLDX29943@12-236-54-216.client.attbi.com> <200311060002.29814.aleaxit@yahoo.com> <200311052308.hA5N8gU30099@12-236-54-216.client.attbi.com> Message-ID: <3FA9CC95.6010809@ocf.berkeley.edu> Guido van Rossum wrote: >>Unless some _opportune_ (i.e., truly good:-) use case of "naturally >>reversible nonsequence" (doubly linked list...?-) arises (and the >>__reversed__ idea can inserted then -- just as it could be removed >>if reversed started out with it -- as long as we do it before the beta) >>reversed with or without __reversed__ seem anyway fine to me -- >>arguments being so finely balanced on both sides. > > > It's more effort to add something later than to remove it (since > there's always *someone* who's already dependent on it), so I see the > argument about adding __reversed__ far from balanced. I see at most a > 5% chance that reversed() would be removed before 2.3b1. If we add > __reversed__ now I doubt that we'll remove it (assuming reversed() > stays), but I still am unconvinced of the need (and I *am* convinced > of the danger). > > So: > > - I am +1 on adding reversed() provisionally > - I am -1 on adding __reversed__ at the same time > Been following this from afar (crazy week with homework; fun). In case anyone cares about my opinion: +0 on reversed(): wouldn't hurt having it but I still don't see it as critical enough to be a built-in -1 on __reversed__: I like my iterator protocol **simple**. OK, back to studying for my midterm. -Brett From tdelaney at avaya.com Wed Nov 5 23:31:03 2003 From: tdelaney at avaya.com (Delaney, Timothy C (Timothy)) Date: Wed Nov 5 23:31:09 2003 Subject: [Python-Dev] Deprecating obsolete builtins Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DEDF617A@au3010avexu1.global.avaya.com> > From: Neal Norwitz [mailto:neal@metaslash.com] > > For the most part, I meant to remove them (including intern) > altogether in the long run. In 2.4, I only meant to officially > deprecate them with a warning. intern() doesn't seem particularly > useful or commonly used. At least moving it to sys or some other > module is an improvement IMO. One reason why intern() hasn't been commonly used is that it made things immortal. This is no longer the case - I'd like to see if the use of intern() changes. What I would prefer would be for intern() to be able to take any hashable object - in particular, tuples. It's not uncommon for me to create lots of small tuples which end up having the same data in them - interning could save quite a bit of memory. Yes, I can fake it with my own interning function, but that then means I have to deal with the immortality problems again. So I'd actually advocate enhancing intern(), rather than removing it, now that interned things are mortal. Tim Delaney From aahz at pythoncraft.com Wed Nov 5 23:55:45 2003 From: aahz at pythoncraft.com (Aahz) Date: Wed Nov 5 23:55:47 2003 Subject: [Python-Dev] Deprecating obsolete builtins In-Reply-To: <338366A6D2E2CA4C9DAEAE652E12A1DEDF617A@au3010avexu1.global.avaya.com> References: <338366A6D2E2CA4C9DAEAE652E12A1DEDF617A@au3010avexu1.global.avaya.com> Message-ID: <20031106045545.GA20099@panix.com> On Thu, Nov 06, 2003, Delaney, Timothy C (Timothy) wrote: > From: Neal Norwitz [mailto:neal@metaslash.com] >> >> For the most part, I meant to remove them (including intern) >> altogether in the long run. In 2.4, I only meant to officially >> deprecate them with a warning. intern() doesn't seem particularly >> useful or commonly used. At least moving it to sys or some other >> module is an improvement IMO. > > One reason why intern() hasn't been commonly used is that it made > things immortal. This is no longer the case - I'd like to see if the > use of intern() changes. > > What I would prefer would be for intern() to be able to take any > hashable object - in particular, tuples. It's not uncommon for me to > create lots of small tuples which end up having the same data in them > - interning could save quite a bit of memory. > > Yes, I can fake it with my own interning function, but that then means > I have to deal with the immortality problems again. > > So I'd actually advocate enhancing intern(), rather than removing it, > now that interned things are mortal. Agreed. But intern() should *not* be a builtin function. It belongs in sys. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan From tdelaney at avaya.com Thu Nov 6 00:13:39 2003 From: tdelaney at avaya.com (Delaney, Timothy C (Timothy)) Date: Thu Nov 6 00:13:45 2003 Subject: [Python-Dev] Deprecating obsolete builtins Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DEDF619C@au3010avexu1.global.avaya.com> > From: Aahz [mailto:aahz@pythoncraft.com] > > > > So I'd actually advocate enhancing intern(), rather than > > removing it, now that interned things are mortal. > > Agreed. But intern() should *not* be a builtin function. It > belongs in sys. Hmm - not so sure about sys, but I agree it could quite well be moved out of builtins. I don't feel it belongs in sys because it has nothing to do with the environment that python is running in. Instead it has to do with object management. Tim Delaney From guido at python.org Thu Nov 6 00:30:51 2003 From: guido at python.org (Guido van Rossum) Date: Thu Nov 6 00:31:01 2003 Subject: [Python-Dev] closure semantics In-Reply-To: Your message of "Wed, 05 Nov 2003 19:44:43 PST." <3FA9C3AB.808@ActiveState.com> References: <200310220121.52789.aleaxit@yahoo.com> <200310212340.h9LNeYq25691@12-236-54-216.client.attbi.com> <200310220158.21389.aleaxit@yahoo.com> <3FA976B8.9070806@ActiveState.com> <200311052229.hA5MTWT30008@12-236-54-216.client.attbi.com> <3FA9C3AB.808@ActiveState.com> Message-ID: <200311060530.hA65Ups30577@12-236-54-216.client.attbi.com> > [Alex] > >>>So it can't be global, as it must stay a keyword for backwards > >>>compatibility at least until 3.0. > > [David] > >>Why? Removing keywords should be much simpler than adding them. I > >>have no idea how hard it is to hack the parser to adjust, but I > >>can't imagine how having 'global' no longer be a keyword as far as > >>its concerned break b/w compatibility. > >> > >>What am I missing? > > [GvR] > > I don't recall the context, but I think the real issue with removing > > 'global' is that there's too much code out there that uses the global > > syntax to remove the global statement before 3.0. > [David] > I would never have suggested that. Just that we can evolve the parser > to retain the old usage > > global a,b,c > > while allowing a new usage > > global.a = value > > by removing 'global' from the list of reserved words and doing "fancy > stuff" in the parser. Note that I very much don't know the details > of the "fancy stuff". Ah. *If* we want to parse both it would be easier to keep global as a keyword and do fancy stuff to recognize the second form... But I think somewhere in the mega-thread about this topic is hidden the conclusion that there are better ways to do this. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Nov 6 00:33:48 2003 From: guido at python.org (Guido van Rossum) Date: Thu Nov 6 00:33:59 2003 Subject: [Python-Dev] Re: PEP 322: Reverse Iteration In-Reply-To: Your message of "Wed, 05 Nov 2003 22:25:19 EST." <002501c3a415$9b06f9e0$e841fea9@oemcomputer> References: <002501c3a415$9b06f9e0$e841fea9@oemcomputer> Message-ID: <200311060533.hA65XmF30630@12-236-54-216.client.attbi.com> > Agreed! I'll take __reversed__ out of the pep. > > May I mark this one as accepted and move on? Yes. Just mark it as "conditionally accepted" (meaning that if we find it useless after all we can remove it before 2.3b1 -- you can make that condition explicit). --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Thu Nov 6 00:39:58 2003 From: barry at python.org (Barry Warsaw) Date: Thu Nov 6 00:40:21 2003 Subject: [Python-Dev] Deprecating obsolete builtins In-Reply-To: <20031106035837.GB7212@epoch.metaslash.com> References: <002301c3a39d$36d00020$e841fea9@oemcomputer> <200311052326.hA5NQCk11560@oma.cosc.canterbury.ac.nz> <20031106035837.GB7212@epoch.metaslash.com> Message-ID: <1068097197.13655.0.camel@anthem> On Wed, 2003-11-05 at 22:58, Neal Norwitz wrote: > I also have no problem adding a module for backwards compatibility > that adds apply(), etc to builtins. In fact, I think this is > a better approach that if someone wants to "port" their code > from 1.5.2 to 2.4, they can acheive much of it by adding: > > import python1_5_2_compatibility from __past__ import cruft <1.6.1 wink> -Barry From guido at python.org Thu Nov 6 00:41:17 2003 From: guido at python.org (Guido van Rossum) Date: Thu Nov 6 00:41:34 2003 Subject: [Python-Dev] test warnings for "%x"%id(foo) on 2.3 branch In-Reply-To: Your message of "Thu, 06 Nov 2003 14:15:46 +1100." <200311060315.hA63Fndh000543@localhost.localdomain> References: <200311060315.hA63Fndh000543@localhost.localdomain> Message-ID: <200311060541.hA65fHQ30649@12-236-54-216.client.attbi.com> > If %x will return a negative hex number, then the internals of id() > must make sure that they return a positive number, or whatever does > the standard repr will need to change as well. I'll log a bug on SF > for it. The standard repr is written in C and uses %p, which does a platform specific thing, but typically produces an unsigned hex number of appropriate length; apparently we've not been ported to platforms where it does something else, otherwise the test would have failed there too. One can argue that the test is too constrained anyway -- why should we care about the specific hex number in the repr() of a class? I'm not for adding %p to Python's string formats; it's too implementation specific and I don't see a use for it other than matching the built-in repr(). id() has always returned negative numbers on all platforms where pointers happen to have the high bit set; apart from making this test pass in the future (which is a pretty weak argument) I don't see a problem with that, so I'm not in favor of changing it, even though it would be easy enough to change PyLong_FromVoidPtr() to call PyLong_FromLong[Long](). --Guido van Rossum (home page: http://www.python.org/~guido/) From theller at python.net Thu Nov 6 05:31:11 2003 From: theller at python.net (Thomas Heller) Date: Thu Nov 6 05:31:35 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Doc/lib libtraceback.tex, 1.17, 1.18 In-Reply-To: (nascheme@users.sourceforge.net's message of "Wed, 05 Nov 2003 15:03:31 -0800") References: Message-ID: nascheme@users.sourceforge.net writes: > Update of /cvsroot/python/python/dist/src/Doc/lib > In directory sc8-pr-cvs1:/tmp/cvs-serv27582/Doc/lib > > Modified Files: > libtraceback.tex > Log Message: > Add traceback.format_exc(). > > > Index: libtraceback.tex > =================================================================== > RCS file: /cvsroot/python/python/dist/src/Doc/lib/libtraceback.tex,v > retrieving revision 1.17 > retrieving revision 1.18 > diff -C2 -d -r1.17 -r1.18 > *** libtraceback.tex 30 Jan 2003 22:22:59 -0000 1.17 > --- libtraceback.tex 5 Nov 2003 23:02:58 -0000 1.18 > *************** > *** 49,52 **** > --- 49,57 ---- > \end{funcdesc} > > + \begin{funcdesc}{format_exc}{\optional{limit\optional{, file}}} > + This is like \code{print_exc(\var{limit})} but returns a string > + instead of printing to a file. > + \end{funcdesc} > + Shouldn't there be a 'new in Python 2.4' note here? I don't remember how this is spelled in LaTeX. Thomas From mwh at python.net Thu Nov 6 07:09:40 2003 From: mwh at python.net (Michael Hudson) Date: Thu Nov 6 07:09:47 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Doc/lib libtraceback.tex, 1.17, 1.18 In-Reply-To: (Thomas Heller's message of "Thu, 06 Nov 2003 11:31:11 +0100") References: Message-ID: <2moevpzojf.fsf@starship.python.net> Thomas Heller writes: > nascheme@users.sourceforge.net writes: >> + \begin{funcdesc}{format_exc}{\optional{limit\optional{, file}}} >> + This is like \code{print_exc(\var{limit})} but returns a string >> + instead of printing to a file. >> + \end{funcdesc} >> + > > Shouldn't there be a 'new in Python 2.4' note here? I don't remember how > this is spelled in LaTeX. \versionadded{2.4} -- I also fondly recall Paris because that's where I learned to debug Zetalisp while drunk. -- Olin Shivers From tdelaney at avaya.com Thu Nov 6 16:46:48 2003 From: tdelaney at avaya.com (Delaney, Timothy C (Timothy)) Date: Thu Nov 6 16:46:56 2003 Subject: [Python-Dev] Deprecating obsolete builtins Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DEDF6280@au3010avexu1.global.avaya.com> > From: Guido van Rossum [mailto:guido@python.org] > > > > > So I'd actually advocate enhancing intern(), rather > than removing > > > > it, now that interned things are mortal. > > > > > > Have you thought about how to implement that? And have > you calculated > > > how much memory you would save? > > > > Not yet - musings at the end of the day. As to how much memory - I > > really don't think it can be calculated - it's so > > application-dependent. > > Well obviously I meant for *your* app, because you're the one bringing > this up (I'm highly skeptical of the idea if you hadn't > guessed yet :-). Moved back to python-dev because I've got some actual pseudocode in here ... ;) I'll follow it up further when I've got a solid use case. I'm also skeptical of the idea, but think it's worth some additional thought. At the moment it's just gut feeling that if we're going to have it at all, it seems that it would be useful for things other than strings. As for implementation ... something like: _INTERN_DICT = WeakKeyValueDictionary() def unrestrained_intern (obj): # Singletons don't need to be interned if obj is None or obj is True or obj is False: return obj try: return intern(obj) except TypeError: return _INTERN_DICT.setdefault(obj, obj) a = (1, 2, 3) b = (1, 2) + (3,) assert unrestrained_intern(a) is unrestrained_intern(b) Of course, this would require that we could create a weak reference to hashable builtin types like tuple and int. The dictionary holding the objects would need to be weak on both key and value to ensure mortality. Anyway, there's a lot of flow-on effects there :( and its very much in a fledgling concept phase at the moment. Tim Delaney From raymond.hettinger at verizon.net Fri Nov 7 02:33:54 2003 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Fri Nov 7 02:34:47 2003 Subject: [Python-Dev] Optional arguments for str.encode /.decode Message-ID: <000901c3a501$8fb10800$1535c797@oemcomputer> Idea for the day: Let the str.encode/decode methods accept keyword arguments to be forwarded to the underlying codec. For example, zlib_codec.py can then express its encoding function as: def zlib_encode(input,errors='strict', **kwds): assert errors == 'strict' if 'level' in kwds: output = zlib.compress(input, kwds['level']) else: output = zlib.compress(input) return (output, len(input)) The user can then have access to zlib's optional compression level argument: >>> 'which witch has which witches wristwatch'.encode('zlib', level=9) 'x\x9c+\xcf\xc8L\xceP(\xcf,\x01\x92\x19\x89\xc5\n\xe5\x08~*\x90W\x94Y\\R \x9e\x08\xe4\x00\x005\xe5\x0fi' This small extension to the protocol makes it possible to use codecs for a wider variety of applications: >>> msg = 'beware the ides of march'.encode('des', key=0x10ab03b78495d2) >>> print msg.decode(('des', key=0x10ab03b78495d2) beware the ides of march' >>> template = '${name} was born in ${country}' >>> print template.encode('pep292_codec', name='Guido', country='Netherlands') 'Guido was born in the Netherlands' A key advantage of extending the codec protocol is that new or experimental services can easily be added or tried out without expanding the API elsewhere. For example, Barry's simpler string substitutions can be implemented without adding a new string method to cook the text. Already, the existing protocol has provided consistent, uniform access to a variety of services: text.encode('quotedprintable') text.encode('rot13') text.encode('palmos') The proposed extension allows this benefit to apply to an even broader range of services. Raymond Hettinger From Boris.Boutillier at arteris.net Fri Nov 7 07:24:35 2003 From: Boris.Boutillier at arteris.net (Boris Boutillier) Date: Fri Nov 7 07:24:45 2003 Subject: [Python-Dev] Code to prevent modification on builtins classes also abusively (IMHO) prevents modifications on extensions modules, some ideas on this. In-Reply-To: <338366A6D2E2CA4C9DAEAE652E12A1DEDF6280@au3010avexu1.global.avaya.com> References: <338366A6D2E2CA4C9DAEAE652E12A1DEDF6280@au3010avexu1.global.avaya.com> Message-ID: <3FAB8F03.50601@arteris.net> I look into the archives and didn't see any debate on the question, hope I didn't miss something. My point concerns limitations on extensions module due to checks aiming the builtins. The main point is settable extension classes. In Python code there is some checks against TPFLAGS_HEAPTYPE, extension modules should'nt have this flag, so the normal type->tp_setattro doesnt allow the user to set new attributes on your extension classes. There is a way around, write a special MetaClass which redefine setattr. In the extension module I'm writing (I'm porting some Python code to Python-C for speed issues) the user can set attributes and slots on my classes. What I need is the complete type->tp_setattro behaviour, without the check. I didn't see a way to have this behaviour using only Python API (is rereadying the type a work around ?), so I copy paste all the code to make update_slots work (ouch 2500 lines). This is now almost working, every kind of attribute can be set but the __setattr__ one, the hackcheck prevents the user from calling another __setattr__ from its new setattr: example of my extension class hierachy: Class A(object) Class B(A) In the extension, there is a tp->setattro on B, if the user want to redefine it, he can't call the A __setattr__: def myBSetattr(self,k,v): super(B,self).__setattr__(k,v) ## Do here my special stuff This won't work, the hachcheck will see some kind of hack here, 'you cant' call the A.__setattr__ function from a B object' :). First question, Is there a known way around ? Possible Improvments : In the python code there is in function function checks to see if you are not modying builtins classes, unfortunately this code is also concerning extension modules. I think the Heaptype flag is abusively used in differents cases mostly, in type_setattro, object_set_bases, object_set_classes, the checks have nothing to do with the HeapType true definition as stated in the comments in Include/Object.h , it is used, I think, only because this is the only one that makes a difference between builtins and user classes. Unfortunately with this flag extension classes fall into the 'builtin' part. A way to solve the problem without backward compatibility problems, would be to have a new TPFLAGS_SETABLE flag, defaulting to 0 for builtins/extension classes and 1 for User,Python classes. This flag would be check in place of the heaptype one when revelant. I'm ready to write the code for this if there is some positive votes, won't bother if everybody is against it. Boris From barry at python.org Fri Nov 7 09:22:32 2003 From: barry at python.org (Barry Warsaw) Date: Fri Nov 7 09:22:45 2003 Subject: [Python-Dev] Optional arguments for str.encode /.decode In-Reply-To: <000901c3a501$8fb10800$1535c797@oemcomputer> References: <000901c3a501$8fb10800$1535c797@oemcomputer> Message-ID: <1068214951.15995.100.camel@anthem> On Fri, 2003-11-07 at 02:33, Raymond Hettinger wrote: > Idea for the day: Let the str.encode/decode methods accept keyword > arguments to be forwarded to the underlying codec. Nice. > Already, the existing protocol has provided consistent, uniform access > to a variety of services: > > text.encode('quotedprintable') > text.encode('rot13') > text.encode('palmos') > > The proposed extension allows this benefit to apply to an even broader > range of services. Which is all really cool. The only thing that begins to bother me about this is the use of strings as name lookup keys for finding functions. This seems generally unpythonic and error prone -- aside from the documentation problem that the list of standard lookup keys is buried in a non-obvious place. -Barry From Jack.Jansen at cwi.nl Fri Nov 7 09:37:41 2003 From: Jack.Jansen at cwi.nl (Jack Jansen) Date: Fri Nov 7 09:37:36 2003 Subject: [Python-Dev] check-in policy, trunk vs maintenance branch In-Reply-To: <20031103140123.GA14146@panix.com> References: <200311031347.10995.aleaxit@yahoo.com> <20031103140123.GA14146@panix.com> Message-ID: On 3 Nov 2003, at 15:01, Aahz wrote: > On Mon, Nov 03, 2003, Alex Martelli wrote: >> >> I made a few bugfix check-ins to the 2.3 maintenance branch this >> weekend and Michael Hudson commented that he thinks that so doing is a >> bad idea, that bug fixes should filter from the 2.4 trunk to the 2.3 >> branch and not the other way around. Is this indeed the policy (have >> I missed some guidelines about it)? > > PEP 6: > > As individual patches get contributed to the feature release fork, > each patch contributor is requested to consider whether the patch > is > a bug fix suitable for inclusion in a patch release. If the patch > is > considered suitable, the patch contributor will mail the > SourceForge > patch (bug fix?) number to the maintainers' mailing list. Is it okay to apply fixes to the branch only when I know the relevant portions of the trunk will disappear before 2.4? I've done some fixes to the MacPython IDE that I did only on the release23-maint branch, because the plan is that the IDE will be replaced by something completely different soon... -- Jack Jansen http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman From aahz at pythoncraft.com Fri Nov 7 09:59:34 2003 From: aahz at pythoncraft.com (Aahz) Date: Fri Nov 7 09:59:37 2003 Subject: [Python-Dev] check-in policy, trunk vs maintenance branch In-Reply-To: References: <200311031347.10995.aleaxit@yahoo.com> <20031103140123.GA14146@panix.com> Message-ID: <20031107145934.GA10075@panix.com> On Fri, Nov 07, 2003, Jack Jansen wrote: > > Is it okay to apply fixes to the branch only when I know the relevant > portions of the trunk will disappear before 2.4? I'd say not. That's the same reasoning Alex used, and I think that any exceptions made will only lead to trouble later. What happens if you get hit by a beer truck and the 2.4 changes don't get made? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan From aahz at pythoncraft.com Fri Nov 7 10:06:19 2003 From: aahz at pythoncraft.com (Aahz) Date: Fri Nov 7 10:06:24 2003 Subject: [Python-Dev] Optional arguments for str.encode /.decode In-Reply-To: <000901c3a501$8fb10800$1535c797@oemcomputer> References: <000901c3a501$8fb10800$1535c797@oemcomputer> Message-ID: <20031107150619.GB10075@panix.com> On Fri, Nov 07, 2003, Raymond Hettinger wrote: > > For example, zlib_codec.py can then express its encoding function as: > > def zlib_encode(input,errors='strict', **kwds): > assert errors == 'strict' > if 'level' in kwds: > output = zlib.compress(input, kwds['level']) > else: > output = zlib.compress(input) > return (output, len(input)) > > The user can then have access to zlib's optional compression level > argument: > > >>> 'which witch has which witches wristwatch'.encode('zlib', level=9) Change this to def zlib_encode(input,errors='strict', opts=None): if opts: if 'level' in opts: ... >>> 'which witch has which witches wristwatch'.encode('zlib', {'level':9}) and I'm +1. Otherwise I'm somewhere around -0; I agree with Barry about possible pollution. This change is a small inconvenience for greater decoupling. opts could be an instance instead, but I think a straight dict probably makes the most sense. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan From barry at python.org Fri Nov 7 10:14:54 2003 From: barry at python.org (Barry Warsaw) Date: Fri Nov 7 10:15:06 2003 Subject: [Python-Dev] Optional arguments for str.encode /.decode In-Reply-To: <20031107150619.GB10075@panix.com> References: <000901c3a501$8fb10800$1535c797@oemcomputer> <20031107150619.GB10075@panix.com> Message-ID: <1068218094.15995.125.camel@anthem> On Fri, 2003-11-07 at 10:06, Aahz wrote: > Change this to > > def zlib_encode(input,errors='strict', opts=None): > if opts: > if 'level' in opts: > ... > > >>> 'which witch has which witches wristwatch'.encode('zlib', {'level':9}) Actually, I like that less. It looks gross to me. Keyword arguments are a bit nicer, but do open the possibility for interference with future arguments to .encode() and .decode(). I'm probably +0 with the original and -0 with this style. > and I'm +1. Otherwise I'm somewhere around -0; I agree with Barry about > possible pollution. This change is a small inconvenience for greater > decoupling. opts could be an instance instead, but I think a straight > dict probably makes the most sense. Actually what I was complaining about probably is too late to "fix". It was the use of a string for the first argument to .encode() and .decode(). I dislike that for the same reason we don't do obj.__dict__['attribute'] on a regular basis. ;) -Barry From aleaxit at yahoo.com Fri Nov 7 10:24:23 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Fri Nov 7 10:24:34 2003 Subject: [Python-Dev] Optional arguments for str.encode /.decode In-Reply-To: <1068218094.15995.125.camel@anthem> References: <000901c3a501$8fb10800$1535c797@oemcomputer> <20031107150619.GB10075@panix.com> <1068218094.15995.125.camel@anthem> Message-ID: <200311071624.23409.aleaxit@yahoo.com> On Friday 07 November 2003 04:14 pm, Barry Warsaw wrote: ... > Actually what I was complaining about probably is too late to "fix". It We must keep supporting that approach, yes (alas), but maybe it's not too late to encourage another alternative style instead? E.g., have some object exposing attributes corresponding to those strings that do name codecs, so that while e.g. s.encode('zlib', level=9) would have to keep working, the officially encouraged style would be: s.encode(codec.zlib, level=9) or something of that ilk...? > was the use of a string for the first argument to .encode() and > .decode(). I dislike that for the same reason we don't do > obj.__dict__['attribute'] on a regular basis. ;) So my suggestion would take us back to obj.attribute style (as a preferred alternative to using 'attribute' overtly as a dict key)... Alex From barry at python.org Fri Nov 7 10:31:29 2003 From: barry at python.org (Barry Warsaw) Date: Fri Nov 7 10:31:35 2003 Subject: [Python-Dev] Optional arguments for str.encode /.decode In-Reply-To: <200311071624.23409.aleaxit@yahoo.com> References: <000901c3a501$8fb10800$1535c797@oemcomputer> <20031107150619.GB10075@panix.com> <1068218094.15995.125.camel@anthem> <200311071624.23409.aleaxit@yahoo.com> Message-ID: <1068219089.15995.128.camel@anthem> On Fri, 2003-11-07 at 10:24, Alex Martelli wrote: > We must keep supporting that approach, yes (alas), but maybe it's > not too late to encourage another alternative style instead? E.g., have > some object exposing attributes corresponding to those strings that > do name codecs, so that while e.g. > > s.encode('zlib', level=9) > > would have to keep working, the officially encouraged style would be: > > s.encode(codec.zlib, level=9) > > or something of that ilk...? If s.encode(codec.notacodec, level=9) throws an AttributeError, then +1. Add that to the original idea and +1 all around. -Barry From python at rcn.com Fri Nov 7 10:36:19 2003 From: python at rcn.com (Raymond Hettinger) Date: Fri Nov 7 10:36:34 2003 Subject: [Python-Dev] Optional arguments for str.encode /.decode In-Reply-To: <200311071624.23409.aleaxit@yahoo.com> Message-ID: <000d01c3a544$e4081540$bfb42c81@oemcomputer> [Barry] > > Actually what I was complaining about probably is too late to "fix". It [Alex] > We must keep supporting that approach, yes (alas), but maybe it's > not too late to encourage another alternative style instead? E.g., have > some object exposing attributes corresponding to those strings that > do name codecs, so that while e.g. > > s.encode('zlib', level=9) > > would have to keep working, the officially encouraged style would be: > > s.encode(codec.zlib, level=9) > > or something of that ilk...? +1, that is a great idea. Raymond From aleaxit at yahoo.com Fri Nov 7 10:49:27 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Fri Nov 7 10:49:36 2003 Subject: [Python-Dev] Optional arguments for str.encode /.decode In-Reply-To: <1068219089.15995.128.camel@anthem> References: <000901c3a501$8fb10800$1535c797@oemcomputer> <200311071624.23409.aleaxit@yahoo.com> <1068219089.15995.128.camel@anthem> Message-ID: <200311071649.27884.aleaxit@yahoo.com> On Friday 07 November 2003 04:31 pm, Barry Warsaw wrote: > On Fri, 2003-11-07 at 10:24, Alex Martelli wrote: > > We must keep supporting that approach, yes (alas), but maybe it's > > not too late to encourage another alternative style instead? E.g., have > > some object exposing attributes corresponding to those strings that > > do name codecs, so that while e.g. > > > > s.encode('zlib', level=9) > > > > would have to keep working, the officially encouraged style would be: > > > > s.encode(codec.zlib, level=9) > > > > or something of that ilk...? > > If s.encode(codec.notacodec, level=9) throws an AttributeError, then > +1. Add that to the original idea and +1 all around. We should surely be able to arrange an object (codecs.codec ...? not sure where it should best live) that exposes as attributes those codecs that are registered, and raises AttributeError for attempts to access on it attributes with other names, it seems to me. Q&D worst case, class _Codec_Lookupper(object): def __getattr__(self, name): try: codecs.lookup(name) except LookupError: raise AttributeError else: return name codecs.codec = _Codec_Lookupper() [which is something we could try out right now...] (but I suspect that we can do better, performance-wise, by returning the lookup's result as a non-string in case of success, saving .encode and .decode some duplicated work). Alex From anthony at interlink.com.au Fri Nov 7 11:01:51 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Fri Nov 7 11:02:29 2003 Subject: [Python-Dev] check-in policy, trunk vs maintenance branch In-Reply-To: Message-ID: <200311071601.hA7G1qD2030938@localhost.localdomain> >>> Jack Jansen wrote > Is it okay to apply fixes to the branch only when I know the relevant > portions of the trunk will disappear before 2.4? > > I've done some fixes to the MacPython IDE that I did only on the > release23-maint branch, because the plan is that the IDE will be > replaced by something completely different soon... I'd prefer to see them applied to the trunk as well, unless it's a significant amount of work to do so. Plans (and workloads) change, and big replacement/rewrites sometimes don't happen. Going through changelogs (much) after the fact to try and find missed trunk->branch or branch->trunk patches is a nightmare. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From aleaxit at yahoo.com Fri Nov 7 11:08:02 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Fri Nov 7 11:08:08 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311071649.27884.aleaxit@yahoo.com> References: <000901c3a501$8fb10800$1535c797@oemcomputer> <1068219089.15995.128.camel@anthem> <200311071649.27884.aleaxit@yahoo.com> Message-ID: <200311071708.02744.aleaxit@yahoo.com> From Barry's discussion of the problem of "magic strings" as arguments to .encode / .decode , I was reminded of a blog entry, http://www.brunningonline.net/simon/blog/archives/000803.html which mentions another case of "magic strings" that might perhaps be (optionally but suggestedly) changed into more-readable attributes (in this case, clearly attributes of the 'file' type): mode arguments to 'file' calls. Simon Brunning, the author of that blog entry, argues that myFile = file(filename, 'rb') (while of course we're going to keep accepting it forever) is not quite as readable and maintainable as, e.g.: myFile = file(filename, file.READ + file.BINARY) Just curious -- what are everybody's feelings about that idea? I'm about +0 on it, myself -- I doubt I'd remember to use it (too much C in my past...:-) but I see why others would prefer it. Another separate "attributes of types" issue raised by that same blog entry -- and that one does find me +1 -- is: isn't it time to make available as attributes of the str type object those few things that we still need to 'import string' for? E.g., the maketrans function (and maybe we could even give it a better name as long as we're making it a str.something?)... Alex From tim.hochberg at ieee.org Fri Nov 7 11:11:13 2003 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Fri Nov 7 11:11:20 2003 Subject: [Python-Dev] Re: Optional arguments for str.encode /.decode In-Reply-To: <200311071624.23409.aleaxit@yahoo.com> References: <000901c3a501$8fb10800$1535c797@oemcomputer> <20031107150619.GB10075@panix.com> <1068218094.15995.125.camel@anthem> <200311071624.23409.aleaxit@yahoo.com> Message-ID: <3FABC421.7020505@ieee.org> Alex Martelli wrote: > On Friday 07 November 2003 04:14 pm, Barry Warsaw wrote: > ... > >>Actually what I was complaining about probably is too late to "fix". It > > > We must keep supporting that approach, yes (alas), but maybe it's > not too late to encourage another alternative style instead? E.g., have > some object exposing attributes corresponding to those strings that > do name codecs, so that while e.g. > > s.encode('zlib', level=9) > > would have to keep working, the officially encouraged style would be: > > s.encode(codec.zlib, level=9) > > or something of that ilk...? FWIW, If keyword arg collisions are still a concern, it seem it should be possible to make the following work without too much trouble:: s.encode(codec.zlib(level=9)) These codec objects could be simple classes that stash away their args and kwargs to pass on to the underlying encode:: class CodecObj: def __init__(self, *args, **kwargs): self.name = self.__class__.__name___ self.args = args self.kargs = kargs class zlib(CodecObj): pass # .... In the encode method, the codec name, args and kargs would be grabbed from the corresponding attributes of the CodecObj (Unless the object was a string, in which case the old behaviour would be used). This would have the added advantage of pushing people to the new syntax. The downside is that:: s.encode(codec.zlib) wouldn't work. One would probably have to use the more verbose syntax:: s.encode(codec.zlib()) -tim >>was the use of a string for the first argument to .encode() and >>.decode(). I dislike that for the same reason we don't do >>obj.__dict__['attribute'] on a regular basis. ;) > > > So my suggestion would take us back to obj.attribute style (as a > preferred alternative to using 'attribute' overtly as a dict key)... > > > Alex > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/python-python-dev%40m.gmane.org > From guido at python.org Fri Nov 7 12:05:12 2003 From: guido at python.org (Guido van Rossum) Date: Fri Nov 7 12:05:21 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: Your message of "Fri, 07 Nov 2003 17:08:02 +0100." <200311071708.02744.aleaxit@yahoo.com> References: <000901c3a501$8fb10800$1535c797@oemcomputer> <1068219089.15995.128.camel@anthem> <200311071649.27884.aleaxit@yahoo.com> <200311071708.02744.aleaxit@yahoo.com> Message-ID: <200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com> > http://www.brunningonline.net/simon/blog/archives/000803.html > > which mentions another case of "magic strings" that might perhaps be > (optionally but suggestedly) changed into more-readable attributes (in > this case, clearly attributes of the 'file' type): mode arguments to 'file' > calls. Simon Brunning, the author of that blog entry, argues that > > myFile = file(filename, 'rb') > > (while of course we're going to keep accepting it forever) is not quite as > readable and maintainable as, e.g.: > > myFile = file(filename, file.READ + file.BINARY) > > Just curious -- what are everybody's feelings about that idea? I'm > about +0 on it, myself -- I doubt I'd remember to use it (too much C > in my past...:-) but I see why others would prefer it. Doesn't seem the right solution to me. If I were to design an API for this without reference to the C convention, I'd probably use keyword arguments. I outright disagree with Brunning's idea for the struct module. More verbose isn't always more readable or easier to remember. > Another separate "attributes of types" issue raised by that same > blog entry -- and that one does find me +1 -- is: isn't it time to > make available as attributes of the str type object those few things > that we still need to 'import string' for? E.g., the maketrans > function (and maybe we could even give it a better name as long as > we're making it a str.something?)... Yes, that would be good. Is there anything besides maketrans() in the string module worth saving? (IMO letters and digits etc. are not -- you can use s.isletter() etc. for that.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Nov 7 12:16:57 2003 From: guido at python.org (Guido van Rossum) Date: Fri Nov 7 12:17:16 2003 Subject: [Python-Dev] Code to prevent modification on builtins classes also abusively (IMHO) prevents modifications on extensions modules, some ideas on this. In-Reply-To: Your message of "Fri, 07 Nov 2003 13:24:35 +0100." <3FAB8F03.50601@arteris.net> References: <338366A6D2E2CA4C9DAEAE652E12A1DEDF6280@au3010avexu1.global.avaya.com> <3FAB8F03.50601@arteris.net> Message-ID: <200311071716.hA7HGv502563@12-236-54-216.client.attbi.com> > I look into the archives and didn't see any debate on the question, hope > I didn't miss something. > > My point concerns limitations on extensions module due to checks aiming > the builtins. > The main point is settable extension classes. > In Python code there is some checks against TPFLAGS_HEAPTYPE, extension > modules should'nt have this flag, so the normal type->tp_setattro doesnt > allow the user to > set new attributes on your extension classes. There is a way around, > write a special MetaClass which redefine setattr. Or you can create a Python subclass that doesn't add any features but inherits from your extension class -- the user can set attributes on the Python class to their heart's content and everything will work as needed. > In the extension module I'm writing (I'm porting some Python code to > Python-C for speed issues) the user can set attributes and slots on my > classes. > What I need is the complete type->tp_setattro behaviour, without the > check. I didn't see a way to have this behaviour using only Python API > (is rereadying the type a work around ?), so I copy paste all the code > to make update_slots work (ouch 2500 lines). A much simpler approach would be to have a metaclass whose tp_setattro clears the HEAPTYPE flag, calls type->tp_setattro, and then restores the HEAPTYPE flag. Yes, that might be considered cheating, but so is copying 2500 lines of code. :-) > This is now almost working, every kind of attribute can be set but the > __setattr__ one, the hackcheck prevents the user from calling another > __setattr__ from its new setattr: > example of my extension class hierachy: > Class A(object) > Class B(A) > > In the extension, there is a tp->setattro on B, if the user want to > redefine it, he can't call the A __setattr__: > def myBSetattr(self,k,v): > super(B,self).__setattr__(k,v) > ## Do here my special stuff > This won't work, the hachcheck will see some kind of hack here, 'you > cant' call the A.__setattr__ function from a B object' :). I don't understand this -- does any of my suggestions above handle it? > First question, Is there a known way around ? > > Possible Improvments : > > In the python code there is in function function checks to see if you > are not modying builtins classes, unfortunately this code is also > concerning extension modules. > I think the Heaptype flag is abusively used in differents cases mostly, > in type_setattro, object_set_bases, object_set_classes, the checks have > nothing to do with the HeapType true definition as stated in the > comments in Include/Object.h , it is used, I think, only because this is > the only one that makes a difference between builtins and user classes. > Unfortunately with this flag extension classes fall into the > 'builtin' part. > > A way to solve the problem without backward compatibility problems, > would be to have a new TPFLAGS_SETABLE flag, defaulting to 0 for > builtins/extension classes and 1 for User,Python classes. This flag > would be check in place of the heaptype one when revelant. > > I'm ready to write the code for this if there is some positive votes, > won't bother if everybody is against it. This seems to be a reasonable suggestion, however I want you to consider what happens if you are using multiple interpreters. When you set a function attribute builtin or extension type, the function references to the environment of the interpreter where it was defined, but it is visible from all interpreters. This is likely not what you want, and that's why the HEAPTYPE flag exists. I would strongly advise using my first suggestion above (derive a class in Python) rather than mess with HEAPTYPE. --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Fri Nov 7 12:17:05 2003 From: barry at python.org (Barry Warsaw) Date: Fri Nov 7 12:17:20 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311071708.02744.aleaxit@yahoo.com> References: <000901c3a501$8fb10800$1535c797@oemcomputer> <1068219089.15995.128.camel@anthem> <200311071649.27884.aleaxit@yahoo.com> <200311071708.02744.aleaxit@yahoo.com> Message-ID: <1068225424.15995.146.camel@anthem> On Fri, 2003-11-07 at 11:08, Alex Martelli wrote: > From Barry's discussion of the problem of "magic strings" as arguments to > .encode / .decode , I was reminded of a blog entry, > > http://www.brunningonline.net/simon/blog/archives/000803.html > > which mentions another case of "magic strings" that might perhaps be > (optionally but suggestedly) changed into more-readable attributes (in > this case, clearly attributes of the 'file' type): mode arguments to 'file' > calls. Simon Brunning, the author of that blog entry, argues that > > myFile = file(filename, 'rb') > > (while of course we're going to keep accepting it forever) is not quite as > readable and maintainable as, e.g.: > > myFile = file(filename, file.READ + file.BINARY) > > Just curious -- what are everybody's feelings about that idea? I'm > about +0 on it, myself -- I doubt I'd remember to use it (too much C > in my past...:-) but I see why others would prefer it. I'm with you: too much muscle memory to probably use it. But I still think it's a good idea, with one caveat. A problem with constants like this, especially if they're mapped to integers, is that printing them is unhelpful: >>> from socket import * >>> print AF_UNIX 1 >>> from errno import * >>> print EEXIST 17 If your memory is as bad as mine, how many times have /you/ typed errno.errorcode[17]? :) I would love it if what happened really was something like: >>> from socket import * >>> print AF_UNIX socket.AF_UNIX >>> from errno import * >>> print EEXIST errno.EEXIST Now, I have an enum metaclass, originally ripped from Jeremy, but with a few nice additions and modifications of my own, which would get us closer to this. It allows you to define an enum like: >>> class Family(enum.Enum): ... AF_UNIX = 1 ... AF_INET = 2 ... # ... ... >>> Family.AF_UNIX EnumInstance(Family, AF_UNIX, 1) >>> Family.AF_UNIX == 1 True >>> Family.AF_UNIX == 3 False >>> [x for x in Family] [EnumInstance(Family, AF_UNIX, 1), EnumInstance(Family, AF_INET, 2)] >>> Family[1] EnumInstance(Family, AF_INET, 2) The last might be a tad surprising, but makes sense if you think about it. :) Class Enum has a metaclass of EnumMetaclass, where all the fun magic is . EnumInstances are subclasses of int and it would be easy to make their __str__() be the nicer output format. Anyway, if these type attribute constants like file.READ were something like EnumInstances, then I think it would make writing and debugging stuff like this much nicer. > Another separate "attributes of types" issue raised by that same blog > entry -- and that one does find me +1 -- is: isn't it time to make available > as attributes of the str type object those few things that we still need > to 'import string' for? E.g., the maketrans function (and maybe we could > even give it a better name as long as we're making it a str.something?)... +1-ly y'rs, -Barry From barry at python.org Fri Nov 7 12:19:12 2003 From: barry at python.org (Barry Warsaw) Date: Fri Nov 7 12:19:24 2003 Subject: [Python-Dev] Re: Optional arguments for str.encode /.decode In-Reply-To: <3FABC421.7020505@ieee.org> References: <000901c3a501$8fb10800$1535c797@oemcomputer> <20031107150619.GB10075@panix.com> <1068218094.15995.125.camel@anthem> <200311071624.23409.aleaxit@yahoo.com> <3FABC421.7020505@ieee.org> Message-ID: <1068225551.15995.149.camel@anthem> On Fri, 2003-11-07 at 11:11, Tim Hochberg wrote: > The downside is that:: > > s.encode(codec.zlib) > > wouldn't work. One would probably have to use the more verbose syntax:: > > s.encode(codec.zlib()) Maybe not. s.encode() can magically zero-arg instantiate the class. We're starting to put a lot of smarts into .encode() and .decode() but I think it's worth it. Nice idea. -Barry From barry at python.org Fri Nov 7 12:26:43 2003 From: barry at python.org (Barry Warsaw) Date: Fri Nov 7 12:26:53 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com> References: <000901c3a501$8fb10800$1535c797@oemcomputer> <1068219089.15995.128.camel@anthem> <200311071649.27884.aleaxit@yahoo.com> <200311071708.02744.aleaxit@yahoo.com> <200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com> Message-ID: <1068226002.15995.153.camel@anthem> On Fri, 2003-11-07 at 12:05, Guido van Rossum wrote: > Yes, that would be good. Is there anything besides maketrans() in the > string module worth saving? (IMO letters and digits etc. are not -- > you can use s.isletter() etc. for that.) I'm not following, are you saying we don't need string.ascii_letters and friends any more? -Barry From aleaxit at yahoo.com Fri Nov 7 12:30:55 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Fri Nov 7 12:31:03 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <1068226002.15995.153.camel@anthem> References: <000901c3a501$8fb10800$1535c797@oemcomputer> <200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com> <1068226002.15995.153.camel@anthem> Message-ID: <200311071830.55764.aleaxit@yahoo.com> On Friday 07 November 2003 06:26 pm, Barry Warsaw wrote: > On Fri, 2003-11-07 at 12:05, Guido van Rossum wrote: > > Yes, that would be good. Is there anything besides maketrans() in the > > string module worth saving? (IMO letters and digits etc. are not -- > > you can use s.isletter() etc. for that.) > > I'm not following, are you saying we don't need string.ascii_letters and > friends any more? I think we do, but I'd rather access them as str.ascii_letters myself. Or maybe we could use just letters, lowercase and uppercase as attribute names, implying the ascii_ -- people needing nonasciis might then still need to "import string", which in itself might be OK, but... that might be a bit too confusing overall. Anyway, I do have code that e.g. does "for c in string.ascii_lowercase: ...", and that is not as handily done with just the .islowercase method... Alex From guido at python.org Fri Nov 7 12:35:26 2003 From: guido at python.org (Guido van Rossum) Date: Fri Nov 7 12:35:38 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: Your message of "Fri, 07 Nov 2003 12:26:43 EST." <1068226002.15995.153.camel@anthem> References: <000901c3a501$8fb10800$1535c797@oemcomputer> <1068219089.15995.128.camel@anthem> <200311071649.27884.aleaxit@yahoo.com> <200311071708.02744.aleaxit@yahoo.com> <200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com> <1068226002.15995.153.camel@anthem> Message-ID: <200311071735.hA7HZQ902692@12-236-54-216.client.attbi.com> > > Yes, that would be good. Is there anything besides maketrans() in the > > string module worth saving? (IMO letters and digits etc. are not -- > > you can use s.isletter() etc. for that.) > > I'm not following, are you saying we don't need string.ascii_letters and > friends any more? Hm, I'd forgotten about ascii_letters. It would make a beautiful class attribute of str. I *do* think that we don't need string.letters -- the only use for it I've seen is checking if a character is in that string, and c.isletter() is faster. But if someone has a use case for it that isn't argued away, I'd be okay with seeing it reincarnated as a class attribute of str too. --Guido van Rossum (home page: http://www.python.org/~guido/) From aleaxit at yahoo.com Fri Nov 7 12:37:27 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Fri Nov 7 12:37:32 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com> References: <000901c3a501$8fb10800$1535c797@oemcomputer> <200311071708.02744.aleaxit@yahoo.com> <200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com> Message-ID: <200311071837.27292.aleaxit@yahoo.com> On Friday 07 November 2003 06:05 pm, Guido van Rossum wrote: ... > Doesn't seem the right solution to me. If I were to design an API > for this without reference to the C convention, I'd probably use > keyword arguments. Interesting! Something like f = file('foo', writable=True) ... ? > I outright disagree with Brunning's idea for the struct module. More > verbose isn't always more readable or easier to remember. Heh, yes, I didn't even quote that one, being -1 on it myself:-) > > Another separate "attributes of types" issue raised by that same > > blog entry -- and that one does find me +1 -- is: isn't it time to > > make available as attributes of the str type object those few things > > that we still need to 'import string' for? E.g., the maketrans > > function (and maybe we could even give it a better name as long as > > we're making it a str.something?)... > > Yes, that would be good. Is there anything besides maketrans() in the > string module worth saving? (IMO letters and digits etc. are not -- > you can use s.isletter() etc. for that.) Hmmm, I do have loops such as 'for c in string.ascii_lowercase: ..."; e.g in a letter-counting example: for c in string.ascii_lowercase: print '%s: %8d" % (c, counts.get(c,0)) using counts.keys(), sorted, wouldn't be the same, as the 0's would not stand out. Admittedly coding 'abc...xyz' explicitly ain't gonna kill me, but... Alex From fdrake at acm.org Fri Nov 7 12:40:31 2003 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri Nov 7 12:40:42 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <1068226002.15995.153.camel@anthem> References: <000901c3a501$8fb10800$1535c797@oemcomputer> <1068219089.15995.128.camel@anthem> <200311071649.27884.aleaxit@yahoo.com> <200311071708.02744.aleaxit@yahoo.com> <200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com> <1068226002.15995.153.camel@anthem> Message-ID: <16299.55567.884685.216681@grendel.zope.com> On Fri, 2003-11-07 at 12:05, Guido van Rossum wrote: > Yes, that would be good. Is there anything besides maketrans() in the > string module worth saving? (IMO letters and digits etc. are not -- > you can use s.isletter() etc. for that.) Yikes! Are you assuming those are only used for "in" tests??? Barry Warsaw writes: > I'm not following, are you saying we don't need string.ascii_letters and > friends any more? We definately need these still. I don't see any reason to remove them, and they're definately still used. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From skip at pobox.com Fri Nov 7 13:15:27 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Nov 7 13:15:41 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <1068225424.15995.146.camel@anthem> References: <000901c3a501$8fb10800$1535c797@oemcomputer> <1068219089.15995.128.camel@anthem> <200311071649.27884.aleaxit@yahoo.com> <200311071708.02744.aleaxit@yahoo.com> <1068225424.15995.146.camel@anthem> Message-ID: <16299.57663.781598.114168@montanaro.dyndns.org> Barry> I would love it if what happened really was something like: >>> from socket import * >>> print AF_UNIX socket.AF_UNIX >>> from errno import * >>> print EEXIST errno.EEXIST http://manatee.mojam.com/~skip/python/ConstantMap.py No metaclass wizardry needed. i-didn't-even-know-i-owned-a-time-machine-ly y'rs, Skip From python at rcn.com Fri Nov 7 13:25:58 2003 From: python at rcn.com (Raymond Hettinger) Date: Fri Nov 7 13:26:11 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311071735.hA7HZQ902692@12-236-54-216.client.attbi.com> Message-ID: <002701c3a55c$97088c80$bfb42c81@oemcomputer> > Hm, I'd forgotten about ascii_letters. It would make a beautiful > class attribute of str. The problem with ascii_letters is that it is not constant. Depending on the startup, it can optionally replace the usual definition with that provided by strop.lowercase. > I *do* think that we don't need string.letters -- the only use for it > I've seen is checking if a character is in that string, and > c.isletter() is faster. But if someone has a use case for it that > isn't argued away, I'd be okay with seeing it reincarnated as a class > attribute of str too. I had C coded a patch for a whole group of str.isSomething tests. The only thing that held it up was my not finding time to figure out how to exactly the same thing for Unicode objects. Maybe someone can pick-up the patch: www.python.org/sf/562501 Raymond From barry at python.org Fri Nov 7 13:44:23 2003 From: barry at python.org (Barry Warsaw) Date: Fri Nov 7 13:44:44 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311071735.hA7HZQ902692@12-236-54-216.client.attbi.com> References: <000901c3a501$8fb10800$1535c797@oemcomputer> <1068219089.15995.128.camel@anthem> <200311071649.27884.aleaxit@yahoo.com> <200311071708.02744.aleaxit@yahoo.com> <200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com> <1068226002.15995.153.camel@anthem> <200311071735.hA7HZQ902692@12-236-54-216.client.attbi.com> Message-ID: <1068230662.15995.159.camel@anthem> On Fri, 2003-11-07 at 12:35, Guido van Rossum wrote: > Hm, I'd forgotten about ascii_letters. It would make a beautiful > class attribute of str. > > I *do* think that we don't need string.letters -- the only use for it > I've seen is checking if a character is in that string, and > c.isletter() is faster. Ah gotcha. I'd definitely want to retain ascii_letters, probably ascii_lowercase and ascii_uppercase, digits, hexdigits, octdigits, punctuation, printable, and whitespace. I'm not sure about the locale specific constants, but maybe we do something like: str.ascii.letters str.ascii.lowercase str.locale.letters str.locale.lowercase I'd definitely want to make these all read-only, e.g. removing the undefined warnings for string.lowercase. -Barry From barry at python.org Fri Nov 7 13:50:34 2003 From: barry at python.org (Barry Warsaw) Date: Fri Nov 7 13:50:42 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <16299.57663.781598.114168@montanaro.dyndns.org> References: <000901c3a501$8fb10800$1535c797@oemcomputer> <1068219089.15995.128.camel@anthem> <200311071649.27884.aleaxit@yahoo.com> <200311071708.02744.aleaxit@yahoo.com> <1068225424.15995.146.camel@anthem> <16299.57663.781598.114168@montanaro.dyndns.org> Message-ID: <1068231034.15995.162.camel@anthem> On Fri, 2003-11-07 at 13:15, Skip Montanaro wrote: > Barry> I would love it if what happened really was something like: > > >>> from socket import * > >>> print AF_UNIX > socket.AF_UNIX > >>> from errno import * > >>> print EEXIST > errno.EEXIST > > http://manatee.mojam.com/~skip/python/ConstantMap.py > > No metaclass wizardry needed. > > i-didn't-even-know-i-owned-a-time-machine-ly y'rs, Oh boo. Metaclasses are so much fun though! :) But the enum stuff does have some other advantages. I'll try to clean the code up (read: document it :) and post it somewhere. -Barry From guido at python.org Fri Nov 7 13:59:31 2003 From: guido at python.org (Guido van Rossum) Date: Fri Nov 7 14:00:39 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: Your message of "Fri, 07 Nov 2003 13:25:58 EST." <002701c3a55c$97088c80$bfb42c81@oemcomputer> References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> Message-ID: <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> > > Hm, I'd forgotten about ascii_letters. It would make a beautiful > > class attribute of str. > > The problem with ascii_letters is that it is not constant. Depending on > the startup, it can optionally replace the usual definition with that > provided by strop.lowercase. Haven't you got that backwards? I thought ascii_letters was really a constant, but letters was modified by setlocale(). > > I *do* think that we don't need string.letters -- the only use for it > > I've seen is checking if a character is in that string, and > > c.isletter() is faster. But if someone has a use case for it that > > isn't argued away, I'd be okay with seeing it reincarnated as a class > > attribute of str too. > > I had C coded a patch for a whole group of str.isSomething tests. The > only thing that held it up was my not finding time to figure out how to > exactly the same thing for Unicode objects. Maybe someone can pick-up > the patch: > > www.python.org/sf/562501 I don't have time to investigate the patch; is the existing set of isXXX() methods not enough? This seems a separate issue though. Anyway, I've been nearly convinced that the various constants should be part of the str class. But should corresponding constants be added to the Unicode class??? Some would be very large. If not, I'm less convinced that they belong on the str class. Also, perhaps the locale-dependent variables should perhaps be moved into the locale module? That would avoid the Unicode question above, because the locale module doesn't apply to Unicode. --Guido van Rossum (home page: http://www.python.org/~guido/) From python at rcn.com Fri Nov 7 14:04:51 2003 From: python at rcn.com (Raymond Hettinger) Date: Fri Nov 7 14:05:04 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <1068230662.15995.159.camel@anthem> Message-ID: <002f01c3a562$06131dc0$bfb42c81@oemcomputer> > Ah gotcha. I'd definitely want to retain ascii_letters, probably > ascii_lowercase and ascii_uppercase, digits, hexdigits, octdigits, > punctuation, printable, and whitespace Other than possibly upper and lower, the rest should be skipped and left for tests like isdigit(). The tests are faster than the usual linear search style of: if char in str.letters. Raymond From fdrake at acm.org Fri Nov 7 14:05:14 2003 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri Nov 7 14:05:35 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> Message-ID: <16299.60650.800354.930018@grendel.zope.com> Guido van Rossum writes: > Anyway, I've been nearly convinced that the various constants should > be part of the str class. But should corresponding constants be added > to the Unicode class??? Some would be very large. If not, I'm less > convinced that they belong on the str class. I'm happy for them to stay where they are. > Also, perhaps the locale-dependent variables should perhaps be moved > into the locale module? That would avoid the Unicode question above, > because the locale module doesn't apply to Unicode. +1 -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From walter at livinglogic.de Fri Nov 7 14:10:18 2003 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Fri Nov 7 14:10:24 2003 Subject: [Python-Dev] Re: Optional arguments for str.encode /.decode In-Reply-To: <1068225551.15995.149.camel@anthem> References: <000901c3a501$8fb10800$1535c797@oemcomputer> <20031107150619.GB10075@panix.com> <1068218094.15995.125.camel@anthem> <200311071624.23409.aleaxit@yahoo.com> <3FABC421.7020505@ieee.org> <1068225551.15995.149.camel@anthem> Message-ID: <3FABEE1A.1050000@livinglogic.de> Barry Warsaw wrote: > On Fri, 2003-11-07 at 11:11, Tim Hochberg wrote: > > >>The downside is that:: >> >> s.encode(codec.zlib) >> >>wouldn't work. One would probably have to use the more verbose syntax:: >> >> s.encode(codec.zlib()) > > > Maybe not. s.encode() can magically zero-arg instantiate the class. > We're starting to put a lot of smarts into .encode() and .decode() but I > think it's worth it. Nice idea. Would this mean any changes to the C API? And if we're going to enhance the C API, so that PyObject *PyUnicode_Encode( const Py_UNICODE *s, int size, const char *encoding, const char *errors ); becomes PyObject *PyUnicode_Encode( const Py_UNICODE *s, int size, PyObject *encoding, const char *errors ); would it make sense to enhance the PEP 293 error callback machinery to allow PyObject *PyUnicode_Encode( const Py_UNICODE *s, int size, PyObject *encoding, PyObject *errors ); so that the callback function can be passed directly to the codec without any need for registering/lookup? Bye, Walter D?rwald From martin at v.loewis.de Fri Nov 7 14:12:35 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Fri Nov 7 14:13:52 2003 Subject: [Python-Dev] Optional arguments for str.encode /.decode In-Reply-To: <000901c3a501$8fb10800$1535c797@oemcomputer> References: <000901c3a501$8fb10800$1535c797@oemcomputer> Message-ID: "Raymond Hettinger" writes: > Idea for the day: Let the str.encode/decode methods accept keyword > arguments to be forwarded to the underlying codec. -1. The non-Unicode usage of .encode should not have been there in the first place, IMO, so I dislike any extensions to it. Regards, Martin From martin at v.loewis.de Fri Nov 7 14:15:30 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Fri Nov 7 14:16:34 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <002701c3a55c$97088c80$bfb42c81@oemcomputer> References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> Message-ID: "Raymond Hettinger" writes: > > Hm, I'd forgotten about ascii_letters. It would make a beautiful > > class attribute of str. > > The problem with ascii_letters is that it is not constant. Depending on > the startup, it can optionally replace the usual definition with that > provided by strop.lowercase. Can you give an example? Regards, Martin From skip at pobox.com Fri Nov 7 14:47:27 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Nov 7 14:47:42 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <002f01c3a562$06131dc0$bfb42c81@oemcomputer> References: <1068230662.15995.159.camel@anthem> <002f01c3a562$06131dc0$bfb42c81@oemcomputer> Message-ID: <16299.63183.923295.432422@montanaro.dyndns.org> Raymond> Other than possibly upper and lower, the rest should be skipped Raymond> and left for tests like isdigit(). The tests are faster than Raymond> the usual linear search style of: if char in str.letters. A couple people have claimed that the .is*() string methods are faster than testing a character against a string. I'm sure that's true in some cases, but it seems not to be true for string.ascii_letters. Here are several timeit.py runs, ordered from slowest to fastest. Both situations have a pair of runs, one with a positive test and one with a negative test. Using char in someset: % timeit.py -s 'import string, sets; pset = sets.Set(string.ascii_letters)' "'.' in pset" 100000 loops, best of 3: 4.68 usec per loop % timeit.py -s 'import string, sets; pset = sets.Set(string.ascii_letters)' "'z' in pset" 100000 loops, best of 3: 4.58 usec per loop Using char.isalpha() or char.islower(): % timeit.py -s 'import string' "'z'.islower()" 1000000 loops, best of 3: 0.93 usec per loop % timeit.py -s 'import string' "'.'.islower()" 1000000 loops, best of 3: 0.928 usec per loop % timeit.py -s 'import string' "'z'.isalpha()" 1000000 loops, best of 3: 0.893 usec per loop % timeit.py -s 'import string' "'.'.isalpha()" 1000000 loops, best of 3: 0.96 usec per loop Using char in somestring: % timeit.py -s 'import string; pset = string.ascii_letters' "'z' in pset" 1000000 loops, best of 3: 0.617 usec per loop % timeit.py -s 'import string; pset = string.ascii_letters' "'.' in pset" 1000000 loops, best of 3: 0.747 usec per loop Using char in somedict: % timeit.py -s 'import string; pset = dict(zip(string.ascii_letters,string.ascii_letters))' "'.' in pset" 1000000 loops, best of 3: 0.502 usec per loop % timeit.py -s 'import string; pset = dict(zip(string.ascii_letters,string.ascii_letters))' "'z' in pset" 1000000 loops, best of 3: 0.509 usec per loop The only clear loser is the 'char in set' case, no doubt due to its current Python implementation, however testing a character for membership in a short string seems to be faster than using the .is*() methods to me. Skip From theller at python.net Fri Nov 7 15:16:42 2003 From: theller at python.net (Thomas Heller) Date: Fri Nov 7 15:17:04 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311071735.hA7HZQ902692@12-236-54-216.client.attbi.com> (Guido van Rossum's message of "Fri, 07 Nov 2003 09:35:26 -0800") References: <000901c3a501$8fb10800$1535c797@oemcomputer> <1068219089.15995.128.camel@anthem> <200311071649.27884.aleaxit@yahoo.com> <200311071708.02744.aleaxit@yahoo.com> <200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com> <1068226002.15995.153.camel@anthem> <200311071735.hA7HZQ902692@12-236-54-216.client.attbi.com> Message-ID: Guido van Rossum writes: > I *do* think that we don't need string.letters -- the only use for it > I've seen is checking if a character is in that string, and > c.isletter() is faster. But if someone has a use case for it that > isn't argued away, I'd be okay with seeing it reincarnated as a class > attribute of str too. But there are probably more useful combinations like string.letters + string.digits + "_" than there should be isxxx() tests. Thomas From bac at OCF.Berkeley.EDU Fri Nov 7 15:49:06 2003 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Fri Nov 7 15:49:17 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> Message-ID: <3FAC0542.30803@ocf.berkeley.edu> Guido van Rossum wrote: > Anyway, I've been nearly convinced that the various constants should > be part of the str class. But should corresponding constants be added > to the Unicode class??? Some would be very large. If not, I'm less > convinced that they belong on the str class. > > Also, perhaps the locale-dependent variables should perhaps be moved > into the locale module? That would avoid the Unicode question above, > because the locale module doesn't apply to Unicode. > How about a strtools module? I was thinking that constants like ascii_letters could go there along with an implementation of join() that took arguments in an obvious way (or at least the way everyone seems to request it). Barry's string replacement function could also go there (the one using $; wasn't it agreed that interpolation was the wrong term to use or something?). This would prevent polluting the str type too much plus remove any hindrance that there necessarily be a mirror value for Unicode since the docs can explicitly state it only works for str in those cases. -Brett From aleaxit at yahoo.com Fri Nov 7 15:58:36 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Fri Nov 7 15:58:48 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <16299.63183.923295.432422@montanaro.dyndns.org> References: <1068230662.15995.159.camel@anthem> <002f01c3a562$06131dc0$bfb42c81@oemcomputer> <16299.63183.923295.432422@montanaro.dyndns.org> Message-ID: <200311072158.36054.aleaxit@yahoo.com> On Friday 07 November 2003 20:47, Skip Montanaro wrote: ... > The only clear loser is the 'char in set' case, no doubt due to its > current Python implementation, however testing a character for membership > in a short string seems to be faster than using the .is*() methods to me. Very interesting! To me, this suggests fixing this performance bug -- there is no reason that I can see why the .is* methiods should be _slower_. Would a performance bugfix (no implementation change, just a speedup) be OK for 2.3.3, I hope? That would motivate me to work on it soonest... Alex From fdrake at acm.org Fri Nov 7 16:03:20 2003 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri Nov 7 16:03:33 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311072158.36054.aleaxit@yahoo.com> References: <1068230662.15995.159.camel@anthem> <002f01c3a562$06131dc0$bfb42c81@oemcomputer> <16299.63183.923295.432422@montanaro.dyndns.org> <200311072158.36054.aleaxit@yahoo.com> Message-ID: <16300.2200.332569.861030@grendel.zope.com> Alex Martelli writes: > Very interesting! To me, this suggests fixing this performance bug -- there > is no reason that I can see why the .is* methiods should be _slower_. Would > a performance bugfix (no implementation change, just a speedup) be OK for > 2.3.3, I hope? That would motivate me to work on it soonest... People keep hinting that these methods should be faster, but I see no reason to think they would be. Think about it: using the method requires the creation of a bound method object. No matter how fast PyMalloc is, that's still a fair bit of work. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From aleaxit at yahoo.com Fri Nov 7 16:04:15 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Fri Nov 7 16:04:24 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <002f01c3a562$06131dc0$bfb42c81@oemcomputer> References: <002f01c3a562$06131dc0$bfb42c81@oemcomputer> Message-ID: <200311072204.15841.aleaxit@yahoo.com> On Friday 07 November 2003 20:04, Raymond Hettinger wrote: > > Ah gotcha. I'd definitely want to retain ascii_letters, probably > > ascii_lowercase and ascii_uppercase, digits, hexdigits, octdigits, > > punctuation, printable, and whitespace > > Other than possibly upper and lower, the rest should be skipped and left > for tests like isdigit(). The tests are faster than the usual linear > search style of: if char in str.letters. I guess the tests should be faster, yes, but I would still want _iterables_ for ascii_* and digits. One issue with allowing "if char in string.letters:" is that these days this will not raise if the alleged 'char' is more than one character -- it will give True for (e.g.) 'ab', False for (e.g.) 'foobar', since it tests _substrings_. So, maybe, str.letters and friends should be iterables which also implement a __contains__ method that raises some error with helpful information about using .iswhatever() instead -- that's assuming we want people NOT to test with "if char in str.letters:". If we DO want people to test that way, then I think str.letters should _still_ have __contains__, but specifically one to optimize speed in this case (if supported it should be just as fast as the .is... method -- which as Skip reminds us may in turn need optimization...). Alex From martin at v.loewis.de Fri Nov 7 16:07:30 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Fri Nov 7 16:07:35 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: References: <000901c3a501$8fb10800$1535c797@oemcomputer> <1068219089.15995.128.camel@anthem> <200311071649.27884.aleaxit@yahoo.com> <200311071708.02744.aleaxit@yahoo.com> <200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com> <1068226002.15995.153.camel@anthem> <200311071735.hA7HZQ902692@12-236-54-216.client.attbi.com> Message-ID: Thomas Heller writes: > But there are probably more useful combinations like > > string.letters + string.digits + "_" I think the typical application of this should use regular expressions instead. Regards, Martin From fdrake at acm.org Fri Nov 7 16:08:41 2003 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri Nov 7 16:09:05 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <3FAC0542.30803@ocf.berkeley.edu> References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> <3FAC0542.30803@ocf.berkeley.edu> Message-ID: <16300.2521.199201.364251@grendel.zope.com> Brett C. writes: > How about a strtools module? I was thinking that constants like > ascii_letters could go there along with an implementation of join() that > took arguments in an obvious way (or at least the way everyone seems to > request it). Not sure I like the increasing array of module name suffixes. There's the classic "foolib", then we added "footools" and "fooutils" (think "mimetools" and "distutils"). Not trying to create an issue here, just generally dismayed. > Barry's string replacement function could also go there > (the one using $; wasn't it agreed that interpolation was the wrong term > to use or something?). We're calling it substitution. People know what that means, and don't get it confused with interpolation. > This would prevent polluting the str type too much plus remove any > hindrance that there necessarily be a mirror value for Unicode since the > docs can explicitly state it only works for str in those cases. Or it could just work polymorphically. ;-) I don't see any need for everything to be defined by the classes. Types. Oh, whatever those things are! -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From aleaxit at yahoo.com Fri Nov 7 16:11:35 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Fri Nov 7 16:11:44 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> Message-ID: <200311072211.35895.aleaxit@yahoo.com> On Friday 07 November 2003 19:59, Guido van Rossum wrote: ... > Anyway, I've been nearly convinced that the various constants should > be part of the str class. But should corresponding constants be added > to the Unicode class??? Some would be very large. If not, I'm less > convinced that they belong on the str class. I think the str.XXX constants should be iterables with a __contain__ method (the latter either to forbid the 'if char in str.XXX:" test if we dislike it, or to optimize it if we like it). The corresponding unicode.XXX constants could also be iterables -- not necessarily large ones if we don't want them to be: each of them could just step a counter through all unicode characters and just return the ones that satisfy some appropriate .iswhatever test. > Also, perhaps the locale-dependent variables should perhaps be moved > into the locale module? That would avoid the Unicode question above, > because the locale module doesn't apply to Unicode. +1 -- I think the more "localized" the effects of module locale are, the happier we shall all be; the "global side effect" of locale.setlocale having effects on other modules (string, time, os, and gettext) has always left me a little bit doubtful (I've used it at times, but wished I could avoid using it...). Alex From martin at v.loewis.de Fri Nov 7 16:16:32 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Fri Nov 7 16:16:54 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311072158.36054.aleaxit@yahoo.com> References: <1068230662.15995.159.camel@anthem> <002f01c3a562$06131dc0$bfb42c81@oemcomputer> <16299.63183.923295.432422@montanaro.dyndns.org> <200311072158.36054.aleaxit@yahoo.com> Message-ID: Alex Martelli writes: > Very interesting! To me, this suggests fixing this performance bug > -- there is no reason that I can see why the .is* methiods should be > _slower_. Would a performance bugfix (no implementation change, > just a speedup) be OK for 2.3.3, I hope? Yes, but I doubt you do much about it. I also fail to see how it is relevant to ascii_letters. .islower is locale-aware, so it is your C library which does the bulk of the work. Regards, Martin From skip at pobox.com Fri Nov 7 16:41:15 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Nov 7 16:41:28 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <16300.2521.199201.364251@grendel.zope.com> References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> <3FAC0542.30803@ocf.berkeley.edu> <16300.2521.199201.364251@grendel.zope.com> Message-ID: <16300.4475.778642.314514@montanaro.dyndns.org> >> How about a strtools module? Fred> Not sure I like the increasing array of module name suffixes. Fred> There's the classic "foolib", then we added "footools" and Fred> "fooutils" (think "mimetools" and "distutils"). Not to mention which, we have a perfectly good module name already: string. Skip From fdrake at acm.org Fri Nov 7 16:44:21 2003 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri Nov 7 16:44:31 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <16300.4475.778642.314514@montanaro.dyndns.org> References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> <3FAC0542.30803@ocf.berkeley.edu> <16300.2521.199201.364251@grendel.zope.com> <16300.4475.778642.314514@montanaro.dyndns.org> Message-ID: <16300.4661.135812.868335@grendel.zope.com> Skip Montanaro writes: > Not to mention which, we have a perfectly good module name already: string. +1 for calling it "string"! It has the nice advantage of backward compatibility for those names as well. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From barry at python.org Fri Nov 7 16:57:31 2003 From: barry at python.org (Barry Warsaw) Date: Fri Nov 7 16:57:39 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <3FAC0542.30803@ocf.berkeley.edu> References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> <3FAC0542.30803@ocf.berkeley.edu> Message-ID: <1068242250.15995.186.camel@anthem> On Fri, 2003-11-07 at 15:49, Brett C. wrote: > How about a strtools module? I don't see much point. If we wanted to keep things in a module, the string module already exists and seems the most logical place for stringy things. > Barry's string replacement function could also go there > (the one using $; wasn't it agreed that interpolation was the wrong term > to use or something?). I've taken to calling it string substitutions. -Barry From fincher.8 at osu.edu Fri Nov 7 17:58:03 2003 From: fincher.8 at osu.edu (Jeremy Fincher) Date: Fri Nov 7 16:59:42 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: References: <000901c3a501$8fb10800$1535c797@oemcomputer> Message-ID: <200311071758.03374.fincher.8@osu.edu> On Friday 07 November 2003 04:07 pm, Martin v. L?wis wrote: > Thomas Heller writes: > > But there are probably more useful combinations like > > > > string.letters + string.digits + "_" > > I think the typical application of this should use regular expressions > instead. A typical application, sure. But not all applications -- what if the string is being built, for instance, to pass as the optional "delete" argument of str.translate? Jeremy From aahz at pythoncraft.com Fri Nov 7 17:09:01 2003 From: aahz at pythoncraft.com (Aahz) Date: Fri Nov 7 17:09:08 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: References: <000901c3a501$8fb10800$1535c797@oemcomputer> <1068219089.15995.128.camel@anthem> <200311071649.27884.aleaxit@yahoo.com> <200311071708.02744.aleaxit@yahoo.com> <200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com> <1068226002.15995.153.camel@anthem> <200311071735.hA7HZQ902692@12-236-54-216.client.attbi.com> Message-ID: <20031107220901.GA20961@panix.com> On Fri, Nov 07, 2003, Martin v. L?wis wrote: > Thomas Heller writes: >> >> But there are probably more useful combinations like >> >> string.letters + string.digits + "_" > > I think the typical application of this should use regular expressions > instead. Ick: 'Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.' --Jamie Zawinski, comp.emacs.xemacs, 8/1997 -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan From aleaxit at yahoo.com Fri Nov 7 17:25:29 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Fri Nov 7 17:25:44 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <16300.2200.332569.861030@grendel.zope.com> References: <1068230662.15995.159.camel@anthem> <200311072158.36054.aleaxit@yahoo.com> <16300.2200.332569.861030@grendel.zope.com> Message-ID: <200311072325.29330.aleaxit@yahoo.com> On Friday 07 November 2003 22:03, Fred L. Drake, Jr. wrote: > Alex Martelli writes: > > Very interesting! To me, this suggests fixing this performance bug -- > > there is no reason that I can see why the .is* methiods should be > > _slower_. Would a performance bugfix (no implementation change, just > > a speedup) be OK for 2.3.3, I hope? That would motivate me to work on > > it soonest... > > People keep hinting that these methods should be faster, but I see no > reason to think they would be. Think about it: using the method > requires the creation of a bound method object. No matter how fast > PyMalloc is, that's still a fair bit of work. Good point! So, a first little trick to accelerate this might be to use getsets (unfortunately this gives a marginally Python-level-observable alteration for e.g. "print 'x'.isdigit.__name__", so perhaps it's only suitable for 2.4, not 2.3.3, alas... I dunno...). I tried a little experiment adding a new test .isabit() that says if a string is entirely made up of '0' and '1': static PyGetSetDef string_getsets[] = { {"isabit", (getter)string_isabit, 0, 0}, {0} }; ... string_getsets, /* tp_getset */ where: static PyObject * _return_true = 0; static PyObject * _return_false = 0; static PyObject * _true_returner(PyObject* ignore_self) { Py_RETURN_TRUE; } static PyObject * _false_returner(PyObject* ignore_self) { Py_RETURN_FALSE; } static PyMethodDef _str_bool_returners[] = { {"_str_return_false", (PyCFunction)_false_returner, METH_NOARGS}, {"_str_return_true", (PyCFunction)_true_returner, METH_NOARGS}, {0} }; static PyObject * string_isabit(PyStringObject *s) { char* p = PyString_AS_STRING(s); int len = PyString_GET_SIZE(s); int i; for(i=0; i References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> <16299.60650.800354.930018@grendel.zope.com> Message-ID: <200311072241.hA7Mf7M03135@12-236-54-216.client.attbi.com> > Guido van Rossum writes: > > Anyway, I've been nearly convinced that the various constants should > > be part of the str class. But should corresponding constants be added > > to the Unicode class??? Some would be very large. If not, I'm less > > convinced that they belong on the str class. [Fred] > I'm happy for them to stay where they are. ??? They're in the strign module, which has got to go. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Nov 7 17:42:33 2003 From: guido at python.org (Guido van Rossum) Date: Fri Nov 7 17:42:42 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: Your message of "Fri, 07 Nov 2003 21:16:42 +0100." References: <000901c3a501$8fb10800$1535c797@oemcomputer> <1068219089.15995.128.camel@anthem> <200311071649.27884.aleaxit@yahoo.com> <200311071708.02744.aleaxit@yahoo.com> <200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com> <1068226002.15995.153.camel@anthem> <200311071735.hA7HZQ902692@12-236-54-216.client.attbi.com> Message-ID: <200311072242.hA7MgXS03159@12-236-54-216.client.attbi.com> > But there are probably more useful combinations like > > string.letters + string.digits + "_" > > than there should be isxxx() tests. We don't need to invent anything for that. You can use a regular expression with \w. --Guido van Rossum (home page: http://www.python.org/~guido/) From aleaxit at yahoo.com Fri Nov 7 17:44:46 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Fri Nov 7 17:44:56 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: References: <1068230662.15995.159.camel@anthem> <200311072158.36054.aleaxit@yahoo.com> Message-ID: <200311072344.46848.aleaxit@yahoo.com> On Friday 07 November 2003 22:16, Martin v. L?wis wrote: > Alex Martelli writes: > > Very interesting! To me, this suggests fixing this performance bug > > -- there is no reason that I can see why the .is* methiods should be > > _slower_. Would a performance bugfix (no implementation change, > > just a speedup) be OK for 2.3.3, I hope? > > Yes, but I doubt you do much about it. I also fail to see how it is I dunno -- it seems that (on a toy case where an 'in' test takes 0.25 usec and an .isdigit takes 0.52 to 0.55) we can shave the time to 0.39, about in-between, by avoiding the generation of a bound-method. Now of course saving 25% or so isn't huge, but maybe it's still worth it...? > relevant to ascii_letters. .islower is locale-aware, so it is your C > library which does the bulk of the work. Ah -- interesting point! So, for example: f = xx.islower print f() # insert locale change here print f() should be able to print two distinct values for appropriate values of xx and locale changes, right? Hmmm -- if supporting this usage is crucial then indeed we can't avoid generating a boundmethod (for .islower and other locale-aware .is* methods), because the "return a function" approach is basically evaluating the function at attribute-access time... if locale changes between the attribute-access time and the moment of the call, then the result may not be as desired. Funny, among the deleterious effects of locale-changing's "subterraneous global effects" I had not considered this one -- it breaks nice conditions we might otherwise have counted on thanks to strings' immutability and the parameterless nature of the .is...() methods. Oh well, I guess the trick is not worth pursuing just for the sake of .isdigit and .isspace, then, if "locale change between access and call" must be supported. Pity, because despite the C library's amount of work, the overhead of the bound-method generation is not trivial, as Fred mentioned. So, if the fast idiom is _inevitably_ "if xx in ...:" (thanks in part to the fact that we _don't_ have to support locale changes in the middle of things in this case), then perhaps we should stop touting xx.is...() as superior, and see about offering the best possible support for the 'in' case -- where my remarks about "accidental successes" of, e.g., "if xx in ...digits...:" when xx=="23" but not when xx=="34" stand. We can't break "if xx in string.digits:" (maybe somebody's relying on the test succeding when xx is a sequence of adjacent increasing digits?) but we can surely choose, if we wish, to define the semantics of "if xx in str.digits:" in a (IMHO) more helpful-against-errors way.... Alex From guido at python.org Fri Nov 7 17:47:37 2003 From: guido at python.org (Guido van Rossum) Date: Fri Nov 7 17:47:48 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: Your message of "Fri, 07 Nov 2003 12:49:06 PST." <3FAC0542.30803@ocf.berkeley.edu> References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> <3FAC0542.30803@ocf.berkeley.edu> Message-ID: <200311072247.hA7Mlb503198@12-236-54-216.client.attbi.com> > How about a strtools module? I was thinking that constants like > ascii_letters could go there along with an implementation of join() that > took arguments in an obvious way (or at least the way everyone seems to > request it). Barry's string replacement function could also go there > (the one using $; wasn't it agreed that interpolation was the wrong term > to use or something?). > > This would prevent polluting the str type too much plus remove any > hindrance that there necessarily be a mirror value for Unicode since the > docs can explicitly state it only works for str in those cases. Do we have an indication that the str type is getting polluted too much? Apart from the locale-specific things and maketrans, what else wouldn't work for Unicode that's currently under consideration? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Nov 7 17:49:31 2003 From: guido at python.org (Guido van Rossum) Date: Fri Nov 7 17:49:38 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: Your message of "Fri, 07 Nov 2003 22:04:15 +0100." <200311072204.15841.aleaxit@yahoo.com> References: <002f01c3a562$06131dc0$bfb42c81@oemcomputer> <200311072204.15841.aleaxit@yahoo.com> Message-ID: <200311072249.hA7MnVx03222@12-236-54-216.client.attbi.com> > I guess the tests should be faster, yes, but I would still want _iterables_ > for ascii_* and digits. Why? It's not like you're going to save much space by not creating a string of 52 bytes. > One issue with allowing "if char in string.letters:" is that these > days this will not raise if the alleged 'char' is more than one > character -- it will give True for (e.g.) 'ab', False for (e.g.) > 'foobar', since it tests _substrings_. Right. > So, maybe, str.letters and friends should be iterables which also > implement a __contains__ method that raises some error with helpful > information about using .iswhatever() instead -- that's assuming we > want people NOT to test with "if char in str.letters:". If we DO > want people to test that way, then I think str.letters should > _still_ have __contains__, but specifically one to optimize speed in > this case (if supported it should be just as fast as the > .is... method -- which as Skip reminds us may in turn need > optimization...). Hm. The iterable idea seems overblown for something as simple as this. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Fri Nov 7 17:58:33 2003 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri Nov 7 17:58:44 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311072241.hA7Mf7M03135@12-236-54-216.client.attbi.com> References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> <16299.60650.800354.930018@grendel.zope.com> <200311072241.hA7Mf7M03135@12-236-54-216.client.attbi.com> Message-ID: <16300.9113.720680.750981@grendel.zope.com> Guido van Rossum writes: > They're in the strign module, Right. > which has got to go. I don't think this has ever been justified. What's wrong with the string module for things like ascii_letters? What has to go is the collection of functions that were replaced by string methods. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From guido at python.org Fri Nov 7 18:02:57 2003 From: guido at python.org (Guido van Rossum) Date: Fri Nov 7 18:03:04 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: Your message of "Fri, 07 Nov 2003 17:58:33 EST." <16300.9113.720680.750981@grendel.zope.com> References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> <16299.60650.800354.930018@grendel.zope.com> <200311072241.hA7Mf7M03135@12-236-54-216.client.attbi.com> <16300.9113.720680.750981@grendel.zope.com> Message-ID: <200311072302.hA7N2ve03300@12-236-54-216.client.attbi.com> > I don't think this has ever been justified. What's wrong with the > string module for things like ascii_letters? What has to go is the > collection of functions that were replaced by string methods. In the end it would be a module containing 4 constants and one function. I'd rather consolidate all that elsewhere. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Fri Nov 7 18:10:51 2003 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri Nov 7 18:11:01 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311072302.hA7N2ve03300@12-236-54-216.client.attbi.com> References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> <16299.60650.800354.930018@grendel.zope.com> <200311072241.hA7Mf7M03135@12-236-54-216.client.attbi.com> <16300.9113.720680.750981@grendel.zope.com> <200311072302.hA7N2ve03300@12-236-54-216.client.attbi.com> Message-ID: <16300.9851.671401.447992@grendel.zope.com> Guido van Rossum writes: > In the end it would be a module containing 4 constants and one > function. I'd rather consolidate all that elsewhere. Frankly, that doesn't bother me, especially given that they've always been in the string module. But I count more than 4 constants that should be kept: ascii_letters ascii_lowercase ascii_uppercase digits hexdigits octdigits whitespace All of these could reasonably live on both str and unicode if that's not considered pollution. But if they live in a module, there's no reason not to keep string around for that purpose. (I don't object to making them class attributes; I object to creating a new module for them.) -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From guido at python.org Fri Nov 7 18:17:17 2003 From: guido at python.org (Guido van Rossum) Date: Fri Nov 7 18:17:24 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: Your message of "Fri, 07 Nov 2003 18:10:51 EST." <16300.9851.671401.447992@grendel.zope.com> References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> <16299.60650.800354.930018@grendel.zope.com> <200311072241.hA7Mf7M03135@12-236-54-216.client.attbi.com> <16300.9113.720680.750981@grendel.zope.com> <200311072302.hA7N2ve03300@12-236-54-216.client.attbi.com> <16300.9851.671401.447992@grendel.zope.com> Message-ID: <200311072317.hA7NHHF03334@12-236-54-216.client.attbi.com> > > In the end it would be a module containing 4 constants and one > > function. I'd rather consolidate all that elsewhere. > > Frankly, that doesn't bother me, especially given that they've always > been in the string module. But I count more than 4 constants that > should be kept: > > ascii_letters > ascii_lowercase > ascii_uppercase > digits > hexdigits > octdigits > whitespace > > All of these could reasonably live on both str and unicode if that's > not considered pollution. But if they live in a module, there's no > reason not to keep string around for that purpose. > > (I don't object to making them class attributes; I object to creating > a new module for them.) Ah, we agree about this then. I do think that keeping the string module around without all the functions it historically contained would be a mistake, confusing folks. This error is pretty clear: >>> import string Traceback (most recent call last): File "", line 1, in ? ImportError: No module named string >>> But this one is much more mystifying: >>> import string >>> print string.join(["a", "b"], ".") Traceback (most recent call last): File "", line 1, in ? AttributeError: 'module' object has no attribute 'join' >>> --Guido van Rossum (home page: http://www.python.org/~guido/) From janssen at parc.com Fri Nov 7 18:25:31 2003 From: janssen at parc.com (Bill Janssen) Date: Fri Nov 7 18:26:06 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: Your message of "Fri, 07 Nov 2003 08:08:02 PST." <200311071708.02744.aleaxit@yahoo.com> Message-ID: <03Nov7.152531pst."58611"@synergy1.parc.xerox.com> > myFile = file(filename, 'rb') > > (while of course we're going to keep accepting it forever) is not quite as= > > readable and maintainable as, e.g.: > > myFile = file(filename, file.READ + file.BINARY) Actually, the default should be BINARY, however it works. I think it's insane that 'r' works on Unix but breaks on Windows when reading a JPEG file. Bill From tim.one at comcast.net Fri Nov 7 18:26:44 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Nov 7 18:26:49 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311072302.hA7N2ve03300@12-236-54-216.client.attbi.com> Message-ID: [Fred] >> I don't think this has ever been justified. What's wrong with the >> string module for things like ascii_letters? What has to go is the >> collection of functions that were replaced by string methods. [Guido] > In the end it would be a module containing 4 constants and one > function. I'd rather consolidate all that elsewhere. Cool -- let's make a new stringhelpers module, then . From exarkun at intarweb.us Fri Nov 7 18:50:13 2003 From: exarkun at intarweb.us (Jp Calderone) Date: Fri Nov 7 18:51:05 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311071735.hA7HZQ902692@12-236-54-216.client.attbi.com> References: <000901c3a501$8fb10800$1535c797@oemcomputer> <1068219089.15995.128.camel@anthem> <200311071649.27884.aleaxit@yahoo.com> <200311071708.02744.aleaxit@yahoo.com> <200311071705.hA7H5Ck02534@12-236-54-216.client.attbi.com> <1068226002.15995.153.camel@anthem> <200311071735.hA7HZQ902692@12-236-54-216.client.attbi.com> Message-ID: <20031107235013.GA30537@intarweb.us> On Fri, Nov 07, 2003 at 09:35:26AM -0800, Guido van Rossum wrote: > > > Yes, that would be good. Is there anything besides maketrans() in the > > > string module worth saving? (IMO letters and digits etc. are not -- > > > you can use s.isletter() etc. for that.) > > > > I'm not following, are you saying we don't need string.ascii_letters and > > friends any more? > > Hm, I'd forgotten about ascii_letters. It would make a beautiful > class attribute of str. > > I *do* think that we don't need string.letters -- the only use for it > I've seen is checking if a character is in that string, and > c.isletter() is faster. But if someone has a use case for it that > isn't argued away, I'd be okay with seeing it reincarnated as a class > attribute of str too. > How about this use case? def genPassword(pickFrom=string.letters+string.digits, n=8): return ''.join([random.choice(pickFrom) for i in range(n)]) Jp -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : http://mail.python.org/pipermail/python-dev/attachments/20031107/19059b4b/attachment-0001.bin From janssen at parc.com Fri Nov 7 18:58:17 2003 From: janssen at parc.com (Bill Janssen) Date: Fri Nov 7 19:00:02 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: Your message of "Fri, 07 Nov 2003 14:42:33 PST." <200311072242.hA7MgXS03159@12-236-54-216.client.attbi.com> Message-ID: <03Nov7.155826pst."58611"@synergy1.parc.xerox.com> > > But there are probably more useful combinations like > > > > string.letters + string.digits + "_" > > > > than there should be isxxx() tests. > > We don't need to invent anything for that. You can use a regular > expression with \w. > > --Guido van Rossum (home page: http://www.python.org/~guido/) That's replacing the "clear" with the "arcane" (or perhaps the "fairly incomprehensible"). Is that really a good ultimate direction for Python? Bill From aleaxit at yahoo.com Fri Nov 7 19:02:04 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Fri Nov 7 19:02:11 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311072249.hA7MnVx03222@12-236-54-216.client.attbi.com> References: <002f01c3a562$06131dc0$bfb42c81@oemcomputer> <200311072204.15841.aleaxit@yahoo.com> <200311072249.hA7MnVx03222@12-236-54-216.client.attbi.com> Message-ID: <200311080102.04546.aleaxit@yahoo.com> On Friday 07 November 2003 23:49, Guido van Rossum wrote: > > I guess the tests should be faster, yes, but I would still want > > _iterables_ for ascii_* and digits. > > Why? It's not like you're going to save much space by not creating a > string of 52 bytes. Strings are iterables. What I'm saying is that I don't necessarily need them to be strings, if having iterables that aren't strings (perhaps a string subclass redefining just __contains__) would help with: > > One issue with allowing "if char in string.letters:" is that these > > days this will not raise if the alleged 'char' is more than one > > character -- it will give True for (e.g.) 'ab', False for (e.g.) > > 'foobar', since it tests _substrings_. > > Right. > > So, maybe, str.letters and friends should be iterables which also > > implement a __contains__ method that raises some error with helpful > > information about using .iswhatever() instead -- that's assuming we > > want people NOT to test with "if char in str.letters:". If we DO > > want people to test that way, then I think str.letters should > > _still_ have __contains__, but specifically one to optimize speed in > > this case (if supported it should be just as fast as the > > .is... method -- which as Skip reminds us may in turn need > > optimization...). > > Hm. The iterable idea seems overblown for something as simple as > this. Is presenting this as "a subtype of str that overrides __contains__ appropriately" more acceptable? Alex From guido at python.org Fri Nov 7 19:10:15 2003 From: guido at python.org (Guido van Rossum) Date: Fri Nov 7 19:10:28 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: Your message of "Sat, 08 Nov 2003 01:02:04 +0100." <200311080102.04546.aleaxit@yahoo.com> References: <002f01c3a562$06131dc0$bfb42c81@oemcomputer> <200311072204.15841.aleaxit@yahoo.com> <200311072249.hA7MnVx03222@12-236-54-216.client.attbi.com> <200311080102.04546.aleaxit@yahoo.com> Message-ID: <200311080010.hA80AFe03432@12-236-54-216.client.attbi.com> > > > I guess the tests should be faster, yes, but I would still want > > > _iterables_ for ascii_* and digits. > > > > Why? It's not like you're going to save much space by not creating a > > string of 52 bytes. > > Strings are iterables. What I'm saying is that I don't necessarily need > them to be strings, if having iterables that aren't strings (perhaps a > string subclass redefining just __contains__) would help with: An example given earlier: string.letters + string.digits + "_" indicates that we want them to be concrete strings. > > > One issue with allowing "if char in string.letters:" is that these > > > days this will not raise if the alleged 'char' is more than one > > > character -- it will give True for (e.g.) 'ab', False for (e.g.) > > > 'foobar', since it tests _substrings_. > > > > Right. > > > > > So, maybe, str.letters and friends should be iterables which also > > > implement a __contains__ method that raises some error with helpful > > > information about using .iswhatever() instead -- that's assuming we > > > want people NOT to test with "if char in str.letters:". If we DO > > > want people to test that way, then I think str.letters should > > > _still_ have __contains__, but specifically one to optimize speed in > > > this case (if supported it should be just as fast as the > > > .is... method -- which as Skip reminds us may in turn need > > > optimization...). > > > > Hm. The iterable idea seems overblown for something as simple as > > this. > > Is presenting this as "a subtype of str that overrides __contains__ > appropriately" more acceptable? No, I think it's being too clever. --Guido van Rossum (home page: http://www.python.org/~guido/) From pinard at iro.umontreal.ca Fri Nov 7 18:11:45 2003 From: pinard at iro.umontreal.ca (=?iso-8859-1?Q?Fran=E7ois?= Pinard) Date: Fri Nov 7 19:52:55 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <16300.4475.778642.314514@montanaro.dyndns.org> References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> <3FAC0542.30803@ocf.berkeley.edu> <16300.2521.199201.364251@grendel.zope.com> <16300.4475.778642.314514@montanaro.dyndns.org> Message-ID: <20031107231145.GA6625@titan.progiciels-bpi.ca> [Skip Montanaro] > >> How about a strtools module? > Fred> Not sure I like the increasing array of module name suffixes. > Fred> There's the classic "foolib", then we added "footools" and > Fred> "fooutils" (think "mimetools" and "distutils"). > Not to mention which, we have a perfectly good module name already: string. When the `string' module was more or less aimed at deprecation (at least in practice), a good while ago, this was good news to me, because this module was preventing me, as a programmer, to use `string' as a variable name. Currently in Python, `string' as a module is not ubiquitously needed as it once was in 1.5.2 times, and this is good news. Let it go and vanish if this is doable, but avoid making `string' any stronger. I would much prefer that library modules (past and future) should never be named after likely user variable names. -- Fran?ois Pinard http://www.iro.umontreal.ca/~pinard From bac at OCF.Berkeley.EDU Fri Nov 7 20:29:41 2003 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Fri Nov 7 20:29:48 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311072247.hA7Mlb503198@12-236-54-216.client.attbi.com> References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> <3FAC0542.30803@ocf.berkeley.edu> <200311072247.hA7Mlb503198@12-236-54-216.client.attbi.com> Message-ID: <3FAC4705.3080900@ocf.berkeley.edu> Guido van Rossum wrote: >>How about a strtools module? I was thinking that constants like >>ascii_letters could go there along with an implementation of join() that >>took arguments in an obvious way (or at least the way everyone seems to >>request it). Barry's string replacement function could also go there >>(the one using $; wasn't it agreed that interpolation was the wrong term >>to use or something?). >> >>This would prevent polluting the str type too much plus remove any >>hindrance that there necessarily be a mirror value for Unicode since the >>docs can explicitly state it only works for str in those cases. > > > Do we have an indication that the str type is getting polluted too > much? As of right now? Not really, but this might lead down that road (probably being overly cautious on this). I do agree with Fred in that I would be just as happy to have them in a module. Might be a bias I have developed about keeping *everything* in a class/type or instance (I blame Java =). I really don't mind if they get added to the type; moving them to another module just seemed like a cleaner solution to me. I am basically: +0 for making the constants a class variable (really more like +.5, but rounding screws that up) -1 for leaving the string module (I agree with Francois' argument about the name, plus we have said it is going to be deprecated for so long I would like to see it through) +1 for moving them to another module that can have generic string-helping functions -Brett From martin at v.loewis.de Sat Nov 8 04:53:19 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Sat Nov 8 04:53:36 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311072344.46848.aleaxit@yahoo.com> References: <1068230662.15995.159.camel@anthem> <200311072158.36054.aleaxit@yahoo.com> <200311072344.46848.aleaxit@yahoo.com> Message-ID: Alex Martelli writes: > I dunno -- it seems that (on a toy case where an 'in' test takes 0.25 usec > and an .isdigit takes 0.52 to 0.55) we can shave the time to 0.39, about > in-between, by avoiding the generation of a bound-method. Now of course > saving 25% or so isn't huge, but maybe it's still worth it...? If you can avoid creating bound methods in the general case, that would be a good thing. Even avoiding them for for strings only would be valuable, although I would then ask that you extend your strategy to lists. > should be able to print two distinct values for appropriate values of > xx and locale changes, right? Correct. > Hmmm -- if supporting this usage is crucial > then indeed we can't avoid generating a boundmethod (for .islower and > other locale-aware .is* methods), because the "return a function" approach > is basically evaluating the function at attribute-access time... if locale > changes between the attribute-access time and the moment of the call, > then the result may not be as desired. It's not crucial, but it would be an incompatible change to change it. However, this is irrelevant with respect to bound methods. The locale-awareness is in the code of the function, so if you manage to invoke that at the point of the call (instead of caching its result), then it would still be compatible. Regards, Martin From raymond.hettinger at verizon.net Sat Nov 8 07:22:58 2003 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Sat Nov 8 07:23:13 2003 Subject: [Python-Dev] operator.isMappingType Message-ID: <001101c3a5f3$0c3f0b00$66b52c81@oemcomputer> >>> import operator >>> map(operator.isMappingType, [(), [], '', u'', {}]) [True, True, True, True, True] We did not resolve this when it came up before. Would there be any objections to my removing operator.isMappingType()? Raymond Hettinger -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20031108/dfc567ff/attachment.html From martin at v.loewis.de Sat Nov 8 08:09:09 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Sat Nov 8 08:09:16 2003 Subject: [Python-Dev] SourceForge CVS services improved Message-ID: <200311081309.hA8D99ix005156@mira.informatik.hu-berlin.de> In case you haven't read this announcement: ( 2003-11-04 09:51:53 - Project CVS Service ) Cutover of pserver-based CVS service and ViewCVS access to repositories to the new CVS infrastructure has been completed. Synchronization of data from the primary CVS server to the new CVS infrastructure now occurs every 5 hours (formerly once per day). Performance of pserver-based CVS access and ViewCVS access has been significantly improved; connection shedding (formerly used to cap the total number of simultaneous CVS connections) has been disabled. So anonymous users of the Python CVS should not see rejected connections anymore, and should see files only "slightly" behind. SF has completed the installation of new CVS server hardware, so developers should see an improved performance, compared to several months ago. Regards, Martin From python at rcn.com Sat Nov 8 09:29:28 2003 From: python at rcn.com (Raymond Hettinger) Date: Sat Nov 8 09:29:44 2003 Subject: [Python-Dev] Optional arguments for str.encode /.decode In-Reply-To: Message-ID: <002601c3a604$b7b55460$66b52c81@oemcomputer> > "Raymond Hettinger" writes: > > > Idea for the day: Let the str.encode/decode methods accept keyword > > arguments to be forwarded to the underlying codec. > > -1. The non-Unicode usage of .encode should not have been there in the > first place, IMO, so I dislike any extensions to it. I understand a desire to keep it pure. Would it be useful to add a separate method to support non-Unicode access? This style of access has some wonderful properties in terms of decoupling, accessibility, learnability, and uniformity. I can image that many kinds of bulk string operations could benefit from this interface: t.transform('crc32') t.transform('md5') t.transform('des_encode', key=0x10ab03b78495d2) t.transform('substitution', name='guido', home='netherlands') t.transform('huffman') Raymond Hettinger From aleaxit at yahoo.com Sat Nov 8 11:09:34 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Sat Nov 8 11:09:44 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: References: <1068230662.15995.159.camel@anthem> <200311072344.46848.aleaxit@yahoo.com> Message-ID: <200311081709.35052.aleaxit@yahoo.com> On Saturday 08 November 2003 10:53, Martin v. L?wis wrote: > Alex Martelli writes: > > I dunno -- it seems that (on a toy case where an 'in' test takes 0.25 > > usec and an .isdigit takes 0.52 to 0.55) we can shave the time to 0.39, > > about in-between, by avoiding the generation of a bound-method. Now of > > course saving 25% or so isn't huge, but maybe it's still worth it...? > > If you can avoid creating bound methods in the general case, that > would be a good thing. Even avoiding them for for strings only would > be valuable, although I would then ask that you extend your strategy > to lists. Lists are mutable, which makes "creating bound methods" (or the equivalent thereof) absolutely unavoidable -- e.g.: xxx = somelist.somemethod " alter somelist at will " yyy = xxx( ) xxx needs to be able to refer back to somelist at call time, clearly. This problem doesn't necessarily apply to method calls _on immutable objects_ -- as long as their results are not affected by other mutable "global" aspects of "the environment" in ways which also depend on the object they were originally called on. The is... methods of strings would be just perfect -- were it not for the influence of locale. Consider isdigit, which isn't influenced by locale. When x.isdigit is ACCESSED, we can direct that access through a getter, which, upon examining x's value at that time, KNOWS what the call will have to return -- whenever the call happens. So, the getter can return a callable that always returns True when called, or one that always returns False when called -- no need to create *new* callable objects for either, we can just keep two callables around for the purpose and incref them as needed. Few situations are as favourable as this one -- immutable object, no arguments, just two possible constant-returning callables needed. I just think it might be worth taking advantage of these rare circumstances, where feasible, to avoid wasting a little bit of performance. I think that this can be done in 2.3.* without changing Python-observable behavior in any way whatsoever -- just that if, e.g., we do it for both isdigit and isspace (the two non-locale-dependent string is* methods, i believe), we'll need 4 callables rather than 2 so that their __name__ and _doc__ attributes can be indistinguishable from the current versions thereof. > It's not crucial, but it would be an incompatible change to change it. > > However, this is irrelevant with respect to bound methods. The > locale-awareness is in the code of the function, so if you manage to > invoke that at the point of the call (instead of caching its result), > then it would still be compatible. Nope., because the locale-dependent part needs to be applied to the actual string on which, e.g., isupper is being called. Therefore, since locale-dependency applies at call-time, we need a way _at call-time_ to get to the actual string... i.e., a bound-method or its equivalent, alas. Only when attribute-fetch-time behavior can be substituted for call-time behavior, is the above optimization feasible. Alex From aleaxit at yahoo.com Sat Nov 8 11:43:25 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Sat Nov 8 11:43:34 2003 Subject: [Python-Dev] operator.isMappingType In-Reply-To: <001101c3a5f3$0c3f0b00$66b52c81@oemcomputer> References: <001101c3a5f3$0c3f0b00$66b52c81@oemcomputer> Message-ID: <200311081743.25977.aleaxit@yahoo.com> On Saturday 08 November 2003 13:22, Raymond Hettinger wrote: > >>> import operator > >>> map(operator.isMappingType, [(), [], '', u'', {}]) > > [True, True, True, True, True] > > We did not resolve this when it came up before. Would there be any > objections to my removing operator.isMappingType()? No objections from me. Either it should be made to do something useful (and I don't know how unless the 'basemapping' abstract type I mentioned is introduced), or it should be removed -- having it in its current state seems worst. Alex From barry at python.org Sat Nov 8 12:22:04 2003 From: barry at python.org (Barry Warsaw) Date: Sat Nov 8 12:22:10 2003 Subject: [Python-Dev] Small change to python-bugs-list Message-ID: <1068312124.15995.204.camel@anthem> It seems pretty redundant for the subject header of messages to this list to have both the SF added [ python-Bugs-XXXXXX ] prefix and the [Python-bugs-list] prefix added by Mailman. I removed the latter. -Barry From python at rcn.com Sat Nov 8 12:34:05 2003 From: python at rcn.com (Raymond Hettinger) Date: Sat Nov 8 12:34:18 2003 Subject: [Python-Dev] FW: [Python-checkins] python/dist/src/Doc/whatsnew whatsnew24.tex, 1.5, 1.6 Message-ID: <000001c3a61e$82451120$49bc958d@oemcomputer> > ! A new built-in function, \function{reversed(seq)}, takes a sequence > ! and returns an iterator that returns the elements of the sequence > ! in reverse order. > ! > ! \begin{verbatim} > ! >>> for i in reversed([1,2,3]): > ! ... print i > ! ... > ! 3 > ! 2 > ! 1 > ! \end{verbatim} > ! > ! Note that \function{reversed()} only accepts sequences, not arbitrary > ! iterators. If you want to reverse an iterator, convert it to > ! a list or tuple with \function{list()} or \function{tuple()}. > ! > ! \begin{verbatim} > ! >>> input = open('/etc/passwd', 'r') > ! >>> for line in reversed(list(input)): > ! ... print line > ! ... > ! root:*:0:0:System Administrator:/var/root:/bin/tcsh > ! ... > ! \end{verbatim} It would be nice to present the new features in light of what makes them desirable. "for elem in reversed(mylist)" wins in readability, speed, and memory performance over "mylist.reverse(); for elem in mylist" or "for elem in mylist[::-1]". The readability win is predicated on the notion that half-open intervals are easier to understand in the forwards direction. 'xrange(n//2, 0, -1)' is not as instantly understandable as reversed(xrange(1, n//2)). Using the newer form, anyone can quickly identify the first element, last element, and number of steps. > + \item The list type gained a \method{sorted(iterable)} method that > + returns the elements of the iterable as a sorted list. It also accepts > + the \var{cmp}, \var{key}, and \var{reverse} keyword arguments, same as > + the \method{sort()} method. An example usage: > + > + \begin{verbatim} > + >>> L = [9,7,8,3,2,4,1,6,5] > + >>> list.sorted(L) > + [1, 2, 3, 4, 5, 6, 7, 8, 9] > + >>> L > + [9, 7, 8, 3, 2, 4, 1, 6, 5] > + >>> > + \end{verbatim} > + > + Note that the original list is unchanged; the list returned by > + \method{sorted()} is a newly-created one. The keys points here are that 1) any iterable may be used as an input and 2) list.sorted() is an in-line expression which allows it to be used in function arguments, lambda expressions, list comprehensions, and for-loop specifications: genTodoList(today, list.sorted(tasks, key=prioritize)) getlargest = lambda x: list.sorted(x)[-1] x = [myfunc(v) for v in list.sorted(mydict.itervalues())] for key in list.sorted(mydict): . . . > + \item The \module{heapq} module is no longer implemented in Python, > + having been converted into C. And it now runs about 10 times faster which makes it viable for industrial strength applications. > \item The \module{random} module has a new method called > \method{getrandbits(N)} Formerly, there was no O(n) method for generating large random numbers. The new method supports random.randrange() that arbitrarily large numbers can be generated (important for public key cryptography and prime number generation). From martin at v.loewis.de Sat Nov 8 15:39:06 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Sat Nov 8 15:39:16 2003 Subject: [Python-Dev] Optional arguments for str.encode /.decode In-Reply-To: <002601c3a604$b7b55460$66b52c81@oemcomputer> References: <002601c3a604$b7b55460$66b52c81@oemcomputer> Message-ID: "Raymond Hettinger" writes: > I understand a desire to keep it pure. Would it be useful to add a > separate method to support non-Unicode access? No. > This style of access has some wonderful properties in terms of > decoupling, accessibility, learnability, and uniformity. No. The .encode approach you are talking about requires users to put string literals into Python source. This is a) completely different from encoding, where you learn the encoding only at run-time, e.g. from a MIME header or a config file. b) creates a different way to do the same thing; There should be one-- and preferably only one --obvious way to do it. > t.transform('crc32') Better write this as crc32.transform(t) > t.transform('md5') Better md5.transform(t) > t.transform('des_encode', key=0x10ab03b78495d2) Better des.encrypt(t, key=0x10ab03b78495d2). For des, there are two operations for string conversion, encrypt and decrypt; putting the direction of the operation in the transform name sux. > t.transform('substitution', name='guido', home='netherlands') Better t.substitute(name='guido', home='netherlands') > t.transform('huffman') Better huffman.transform(t) They are *not* uniform, as you have to remember the various parameters. Regards, Martin From martin at v.loewis.de Sat Nov 8 15:51:35 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Sat Nov 8 15:51:42 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311081709.35052.aleaxit@yahoo.com> References: <1068230662.15995.159.camel@anthem> <200311072344.46848.aleaxit@yahoo.com> <200311081709.35052.aleaxit@yahoo.com> Message-ID: Alex Martelli writes: > Lists are mutable, which makes "creating bound methods" (or the equivalent > thereof) absolutely unavoidable -- e.g.: > xxx = somelist.somemethod > " alter somelist at will " > yyy = xxx( ) > > xxx needs to be able to refer back to somelist at call time, clearly. It depends on the source code. In your example, I agree it is unavoidable. In the much more common case of yyy = somelist.somemethod() one could call the code of somemethod without creating a bound method, and, in some cases, without creating the argument tuple. It would be good if, for >>> def x(a): ... a.append(1) ... the code could change from 2 0 LOAD_FAST 0 (a) 3 LOAD_ATTR 1 (append) 6 LOAD_CONST 1 (1) 9 CALL_FUNCTION 1 12 POP_TOP 13 LOAD_CONST 0 (None) 16 RETURN_VALUE to 2 0 LOAD_FAST 0 (a) 3 LOAD_CONST 2 (append) 6 LOAD_CONST 1 (1) 9 CALL_METHOD 1 12 POP_TOP 13 LOAD_CONST 0 (None) 16 RETURN_VALUE where CALL_METHOD would read the method name from stack. Unfortunately, that would be a semantical change, a __getattr__ would not be called anymore. Perhaps that can be changed to 2 0 LOAD_FAST 0 (a) 3 LOAD_METHOD 1 (append) 6 LOAD_CONST 1 (1) 9 CALL_METHOD 1 12 POP_TOP 13 LOAD_CONST 0 (None) 16 RETURN_VALUE where LOAD_METHOD has the option of returning an fast_method object (which exists only once per type type and method), and CALL_METHOD would check for whether there is a fast_method object on stack, and then explicitly pop "self" from the stack as well. > Few situations are as favourable as this one -- immutable object, no > arguments, just two possible constant-returning callables needed. Most cases are as favourable as this one. If you immediately call the bound method, and then discard the bound-method-object, there is no point in creating it first. The exception is the getattr-style computation of callables, where getattr cannot know that the result is going to be called right away. Regards, Martin From aleaxit at yahoo.com Sat Nov 8 19:00:58 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Sat Nov 8 19:01:07 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: References: <1068230662.15995.159.camel@anthem> <200311081709.35052.aleaxit@yahoo.com> Message-ID: <200311090100.58703.aleaxit@yahoo.com> On Saturday 08 November 2003 21:51, Martin v. L?wis wrote: ... > > Lists are mutable, which makes "creating bound methods" (or the > > equivalent thereof) absolutely unavoidable -- e.g.: [[ I meant -- but didn't say out loud!-) -- "without changing the current bytecode-level logic". The change I proposed and experimented with for strings' is... methods is localized to stringobject.c and requires changing nothing except the details of string objects' implementation ]] > > xxx = somelist.somemethod > > " alter somelist at will " > > yyy = xxx( ) > > > > xxx needs to be able to refer back to somelist at call time, clearly. > > It depends on the source code. In your example, I agree it is > unavoidable. In the much more common case of > > yyy = somelist.somemethod() > > one could call the code of somemethod without creating a bound method, > and, in some cases, without creating the argument tuple. It would be Yes, if different bytecode was generated, this would of course be possible. > would not be called anymore. Perhaps that can be changed to > > 2 0 LOAD_FAST 0 (a) > 3 LOAD_METHOD 1 (append) > 6 LOAD_CONST 1 (1) > 9 CALL_METHOD 1 > 12 POP_TOP > 13 LOAD_CONST 0 (None) > 16 RETURN_VALUE > > where LOAD_METHOD has the option of returning an fast_method object > (which exists only once per type type and method), and CALL_METHOD > would check for whether there is a fast_method object on stack, and > then explicitly pop "self" from the stack as well. Yes, if LOAD_METHOD was also able to return a perfectly generic object (just in case the attribute named 'append' was not in fact a method), and CALL_METHOD could fallback to today's CALL_FUNCTION's functionallty. I'm not sure what's supposed to happen to 'self' if LOAD_METHOD cannot push a fastmethod object but needs to push something else instead -- would the something else (anything but a fastmethod) also consume the 'self' then (whether to ignore it or merge it into a boundmethod)? It does look like this could work, and on a wide range of typical method-call uses. > > Few situations are as favourable as this one -- immutable object, no > > arguments, just two possible constant-returning callables needed. > > Most cases are as favourable as this one. If you immediately call the Yes, for the kind of bytecode-level change you're proposing, I do believe most method-calls do follow this pattern. > bound method, and then discard the bound-method-object, there is no > point in creating it first. The exception is the getattr-style > computation of callables, where getattr cannot know that the result is > going to be called right away. ...or, no doubt, other special descriptors with getters playing dirty tricks. But I do agree that these are still likely to be a tiny fraction of use cases. My proposal was very narrow and safe -- yours is very broad, but by that very characteristic I think it _may_ make a difference to a certain pie-throwing-related bet, which my no doubt wouldn't be the case for mine. So, Guido may well be more interested in your idea than in mine, given he's the one directly involved in the pie issues... Alex From bac at OCF.Berkeley.EDU Sat Nov 8 19:11:23 2003 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Sat Nov 8 19:11:29 2003 Subject: [Python-Dev] Time for py3k@python.org or a Py3K Wiki? In-Reply-To: <16278.33180.5190.95094@montanaro.dyndns.org> References: <16278.33180.5190.95094@montanaro.dyndns.org> Message-ID: <3FAD862B.9020302@ocf.berkeley.edu> Skip Montanaro wrote: > These various discussions are moving along a bit too rapidly for me to keep > up. We have been discussing language issues which are going to impact > Python 3.0, either by deprecating current language constructs which can't be > eliminated until then (e.g., the global statement) or by tossing around > language construct ideas which will have to wait until then for their > implementation (other mechanisms for variable access in outer scopes). > Unfortunately, I'm afraid these things are going to get lost in the sea of > other python-dev topics and be forgotten about then the time is ripe. > The Summaries can help with this (this is why whenever an idea comes up for Py3k I try to mention it), but read below for worries on this. > Maybe this would be a good time to create a py3k@python.org mailing list > with more restrictions than python-dev (posting by members only? membership > by invitation?) so we can more easily separate these ideas from shorter term > issues and keep track of them in a separate Mailman archive. I'd suggest > starting a Wiki, but that seems a bit too "global". You can restrict Wiki > mods in MoinMoin to users who are logged in, but I'm not sure you can > restrict signups very well. > I am working on the next Summary and I am drowning here. Thanks to PEP 289 and PEP 323 I was able to basically do a quick overview and just point to the PEPs for generator expressions and reiterability/copying iterators, respectively. But I might have to summarize the 'global' discussion which is just immense. The problem is that I am the one doing the summary. Not only might I misunderstand something, but it will most likely have a slightly skewed view toward my thinking. I think Skip is right in having a separate place for *very* long-term discussions separate from immediate concerns. Long-term stuff does not need to be followed by everyone nor does everyone care about immediate issues like whether something should be backported. A layer of separation might be nice. Or perhaps a list for maintenance and another for new ideas. I can see having that division work as well. Dividing into more than two lists, though, would quickly turn into a logistical nightmare when ideas need to shift to another list. -Brett From martin at v.loewis.de Sat Nov 8 19:28:53 2003 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat Nov 8 19:29:09 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311090100.58703.aleaxit@yahoo.com> References: <1068230662.15995.159.camel@anthem> <200311081709.35052.aleaxit@yahoo.com> <200311090100.58703.aleaxit@yahoo.com> Message-ID: <3FAD8A45.1020901@v.loewis.de> Alex Martelli wrote: > [[ I meant -- but didn't say out loud!-) -- "without changing the current > bytecode-level logic". The change I proposed and experimented with > for strings' is... methods is localized to stringobject.c and requires > changing nothing except the details of string objects' implementation ]] Then I probably don't understand what you are suggesting. What would LOAD_ATTR do if the object is a string, the attribute is "isdigit", and you were allowed to assume that the result won't depend on factors that may change over time? Regards, Martin From bac at OCF.Berkeley.EDU Sat Nov 8 19:33:30 2003 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Sat Nov 8 19:33:33 2003 Subject: [Python-Dev] string substitution fxn in a new module (was: Can we please have a better dict interpolation syntax?) In-Reply-To: <200310231538.h9NFcIW02840@12-236-54-216.client.attbi.com> References: <200310230136.h9N1afs19446@oma.cosc.canterbury.ac.nz> <16279.56778.309781.129469@montanaro.dyndns.org> <1066921335.11634.103.camel@anthem> <16279.62016.628120.971560@montanaro.dyndns.org> <200310231538.h9NFcIW02840@12-236-54-216.client.attbi.com> Message-ID: <3FAD8B5A.3000704@ocf.berkeley.edu> Guido van Rossum wrote: > I have too much on my plate (spent too much on generator expressions > lately :-). > > I am bowing out of the variable substitution discussion after noting > that putting it in a module would be a great start (like for sets). > This idea seemed to die for no apparent reason. Fred, Skip, and Barry all liked the idea of adding the string substitution code to a module (one idea for a name was textutils) and Guido obviously seems receptive to the idea. Do people feel like moving forward with a new module? -Brett From aleaxit at yahoo.com Sun Nov 9 05:43:32 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Sun Nov 9 05:43:41 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <3FAD8A45.1020901@v.loewis.de> References: <1068230662.15995.159.camel@anthem> <200311090100.58703.aleaxit@yahoo.com> <3FAD8A45.1020901@v.loewis.de> Message-ID: <200311091143.32121.aleaxit@yahoo.com> On Sunday 09 November 2003 01:28, Martin v. L?wis wrote: > Alex Martelli wrote: > > [[ I meant -- but didn't say out loud!-) -- "without changing the > > current bytecode-level logic". The change I proposed and experimented > > with for strings' is... methods is localized to stringobject.c and > > requires changing nothing except the details of string objects' > > implementation ]] > > Then I probably don't understand what you are suggesting. What would > LOAD_ATTR do if the object is a string, the attribute is "isdigit", and > you were allowed to assume that the result won't depend on factors that > may change over time? The LOAD_ATTR attribute, using exactly the machinery it uses today, gets to PyString_Type's tp_getattro slot, which is unchanged: PyObject_GenericGetAttr, /* tp_getattro */ Only one slot in PyString_Type is changed at all: string_getsets, /* tp_getset */ and it's changed from being 0 as it is now to pointing to: static PyGetSetDef string_getsets[] = { {"isdigit", (getter)string_isdigit_getter, 0, isdigit_getter__doc__}, /* other getsets snipped */ {0} }; string_isdigit_getter is quite similar to today's string_isdigit *EXCEPT* that instead of returning True or False (via PyBool_FromLong(1) etc) it returns one of two nullary callables which will always return True or respectively False when called: static PyObject * _isdigit_return_true = 0; static PyObject * _isdigit_return_false = 0; static PyObject * _isdigit_true_returner(PyObject* ignore_self) { Py_RETURN_TRUE; } static PyObject * _isdigit_false_returner(PyObject* ignore_self) { Py_RETURN_FALSE; } static PyMethodDef _str_bool_returners[] = { {"isdigit", (PyCFunction)_isdigit_false_returner, METH_NOARGS}, {"isdigit", (PyCFunction)_isdigit_true_returner, METH_NOARGS}, /* other "bool returners" snipped */ {0} }; static PyObject* _return_returner(PyObject** returner, PyMethodDef *returner_method_def) { if(!*returner) *returner = PyCFunction_New(returner_method_def, 0); Py_INCREF(*returner); return *returner; } so string_isdigit_getter uses return _return_returner(&_isdigit_return_true, _str_bool_returners+1); where string_isdigit would instead use return PyBool_FromLong(1); That's all there is to my proposal (we'd have another pair of 'bool returners' for isspace -- I think there are no other is... methods of strings suitable for this, given locale dependency of letter/upper/lower concepts) -- just a simple way to exploit descriptors to avoid creating bound-method objects -- with a speedup of 30% compared with the current implementations of isdigit and isspace (but the `in` operator is jet another 30% faster in both cases). Your proposal is vastly more ambitious and interesting, it seems to me. Alex From skip at manatee.mojam.com Sun Nov 9 08:00:47 2003 From: skip at manatee.mojam.com (Skip Montanaro) Date: Sun Nov 9 08:00:58 2003 Subject: [Python-Dev] Weekly Python Bug/Patch Summary Message-ID: <200311091300.hA9D0luu028669@manatee.mojam.com> Bug/Patch Summary ----------------- 562 open / 4322 total bugs (+62) 191 open / 2445 total patches (+15) New Bugs -------- Unhelpful error message from cgi module (2003-11-02) http://python.org/sf/834840 [2.3.2] zipfile test failure on AIX 5.1 (2003-11-03) http://python.org/sf/835145 [2.3.2] bz2 test failure on AIX 4.3.2, Tru64 UNIX (2003-11-03) http://python.org/sf/835176 socket object method "makefile" has wrong doc (2003-11-03) http://python.org/sf/835300 [2.3.2] test_socket failure on IRIX 6.5 (2003-11-03) http://python.org/sf/835338 logging.StreamHandler encodes log message in UTF-8 (2003-11-03) http://python.org/sf/835353 MacPython builds with DESTROOT need fixup (2003-11-04) http://python.org/sf/835790 strftime month name is encoded somehow (2003-11-04) http://python.org/sf/836035 socket.send() on behaves as nonblocking when timeout is set (2003-11-04) http://python.org/sf/836058 email generator can give bad output (2003-11-04) http://python.org/sf/836293 Windows installer 2.3.2 leaves old version in control panel (2003-11-05) http://python.org/sf/836515 pyport.h redeclares gethostname() if SOLARIS is defined (2003-11-06) http://python.org/sf/837046 Tk.quit and sys.exit cause Fatal Error (2003-11-06) http://python.org/sf/837234 id() for large ptr should return a long (2003-11-06) http://python.org/sf/837242 cryptic os.spawnvpe() return code (2003-11-06) http://python.org/sf/837577 socket.gethostbyname raises gaierror, not herror (2003-11-07) http://python.org/sf/837929 Unloading extension modules not always safe (2003-11-07) http://python.org/sf/838140 PackageManager does not clean up after itself (2003-11-07) http://python.org/sf/838144 MacPython for Panther additions includes IDLE (2003-11-08) http://python.org/sf/838616 New Patches ----------- Update htmllib to HTML 4.01 (2003-11-04) http://python.org/sf/836088 Build changes for AIX (2003-11-05) http://python.org/sf/836434 assert should not generate code if optimized (2003-11-05) http://python.org/sf/836879 Avoid "apply" warnings in "logging", still works in 1.52 (2003-11-05) http://python.org/sf/836942 make pty.fork() allocate a controlling tty (2003-11-08) http://python.org/sf/838546 Closed Bugs ----------- Named groups limitation in sre (2003-08-25) http://python.org/sf/794819 pyclbr.readmodule_ex() (2003-10-11) http://python.org/sf/821818 _set_cloexec of tempfile.py uses incorrect error handling (2003-10-11) http://python.org/sf/821896 object.h misdocuments PyDict_SetItemString (2003-10-21) http://python.org/sf/827856 Docstring for pyclbr.readmodule() is incorrect (2003-10-28) http://python.org/sf/831969 Bad Security Advice in CGI Documentation (2003-10-29) http://python.org/sf/832515 Incorrect priority 'in' and '==' (2003-10-31) http://python.org/sf/833905 Closed Patches -------------- Added HTTP{,S}ProxyConnection (2002-02-08) http://python.org/sf/515003 Add traceback.format_exc (2003-01-30) http://python.org/sf/677887 Fix for former/latter confusion in Extending documentation (2003-10-06) http://python.org/sf/819012 Implementation PEP 322: Reverse Iteration (2003-11-01) http://python.org/sf/834422 From doko at cs.tu-berlin.de Sun Nov 9 15:01:57 2003 From: doko at cs.tu-berlin.de (Matthias Klose) Date: Sun Nov 9 15:04:12 2003 Subject: [Python-Dev] python icons? Message-ID: <16302.40245.488709.729747@gargle.gargle.HOWL> Wanting to add an icon for gnome/KDE menus for a binary python package. There are no images in the distribution itself, and not many on the website. Looking for something like http://www.python.org/cgi-bin/moinmoin/ in standard resolutions like 64x64, 48x48, 32x32 and 16x16. Maybe something like this could be added to the Misc directory in the tarball. Matthias From iusty at k1024.org Sun Nov 9 17:44:45 2003 From: iusty at k1024.org (Iustin Pop) Date: Sun Nov 9 17:42:46 2003 Subject: [Python-Dev] tempfile.mktemp and os.path.exists Message-ID: <20031109224445.GA26291@saytrin.hq.k1024.org> Hello, The tempfile.mktemp function uses os.path.exists to test whether a file already exists. Since this returns false for broken symbolic links, wouldn't it be better if the function would actually do an os.lstat on the filename? I know the function is not safe by definition, but this issue could (with a low probability) cause the file to actually be created in another directory, as the non-existent target of the symlink, instead of in the given directory (the one in which the symlink resides). Regards, Iustin Pop From tdelaney at avaya.com Sun Nov 9 17:54:41 2003 From: tdelaney at avaya.com (Delaney, Timothy C (Timothy)) Date: Sun Nov 9 17:54:48 2003 Subject: [Python-Dev] other "magic strings" issues Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DEDF64B6@au3010avexu1.global.avaya.com> > From: python-dev-bounces+tdelaney=avaya.com@python.org > > I guess the tests should be faster, yes, but I would still > want _iterables_ for ascii_* and digits. > > One issue with allowing "if char in string.letters:" is that > these days this will not raise if the alleged 'char' is more > than one character -- it will give True for (e.g.) 'ab', False > for (e.g.) 'foobar', since it tests _substrings_. # inside string.py or equivalent ... import sets ascii_letters = sets.Set(ascii_letters) Hmm - we'd have the iterability, individual characters and speed, but lose iterating in order. I'm sure there's things out there that rely on iterating over ascii_letters in order ... ;) Tim Delaney From guido at python.org Sun Nov 9 21:11:57 2003 From: guido at python.org (Guido van Rossum) Date: Sun Nov 9 21:12:15 2003 Subject: [Python-Dev] tempfile.mktemp and os.path.exists In-Reply-To: Your message of "Mon, 10 Nov 2003 00:44:45 +0200." <20031109224445.GA26291@saytrin.hq.k1024.org> References: <20031109224445.GA26291@saytrin.hq.k1024.org> Message-ID: <200311100211.hAA2BvK14648@12-236-54-216.client.attbi.com> > Hello, > > The tempfile.mktemp function uses os.path.exists to test whether a file > already exists. Since this returns false for broken symbolic links, > wouldn't it be better if the function would actually do an os.lstat on > the filename? > > I know the function is not safe by definition, but this issue could > (with a low probability) cause the file to actually be created in > another directory, as the non-existent target of the symlink, instead of > in the given directory (the one in which the symlink resides). > > Regards, > Iustin Pop Sounds like a good suggestion; I'll see if I can check something in. (However, given that there already exists an attack on this function, does fixing this actually make any difference?) --Guido van Rossum (home page: http://www.python.org/~guido/) From aleaxit at yahoo.com Mon Nov 10 03:18:15 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Mon Nov 10 03:18:22 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <338366A6D2E2CA4C9DAEAE652E12A1DEDF64B6@au3010avexu1.global.avaya.com> References: <338366A6D2E2CA4C9DAEAE652E12A1DEDF64B6@au3010avexu1.global.avaya.com> Message-ID: <200311100918.15810.aleaxit@yahoo.com> On Sunday 09 November 2003 11:54 pm, Delaney, Timothy C (Timothy) wrote: ... > ascii_letters = sets.Set(ascii_letters) > > Hmm - we'd have the iterability, individual characters and speed, but lose > iterating in order. I'm sure there's things out there that rely on > iterating over ascii_letters in order ... ;) Yes, that's my main use case -- presenting results to the user, so they need to be in alphabetic order (ascii_lowercase actually, but it's much the same). Anyway, Guido has already pronounced on such enhancements as "Too Clever", so we have to keep ascii_lowercase &c as plain strings without any enhancements and keep the "false positives" &c on 'in' checks. Alex From mwh at python.net Mon Nov 10 05:34:40 2003 From: mwh at python.net (Michael Hudson) Date: Mon Nov 10 05:34:45 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <1068225424.15995.146.camel@anthem> (Barry Warsaw's message of "Fri, 07 Nov 2003 12:17:05 -0500") References: <000901c3a501$8fb10800$1535c797@oemcomputer> <1068219089.15995.128.camel@anthem> <200311071649.27884.aleaxit@yahoo.com> <200311071708.02744.aleaxit@yahoo.com> <1068225424.15995.146.camel@anthem> Message-ID: <2msmkwy0jj.fsf@starship.python.net> Barry Warsaw writes: > I would love it if what happened really was something like: > >>>> from socket import * >>>> print AF_UNIX > socket.AF_UNIX >>>> from errno import * >>>> print EEXIST > errno.EEXIST I've had this idea too. I like it, I think. The signal module could use it too... Cheers, mwh -- I have a feeling that any simple problem can be made arbitrarily difficult by imposing a suitably heavy administrative process around the development. -- Joe Armstrong, comp.lang.functional From mwh at python.net Mon Nov 10 05:38:05 2003 From: mwh at python.net (Michael Hudson) Date: Mon Nov 10 05:38:08 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311071708.02744.aleaxit@yahoo.com> (Alex Martelli's message of "Fri, 7 Nov 2003 17:08:02 +0100") References: <000901c3a501$8fb10800$1535c797@oemcomputer> <1068219089.15995.128.camel@anthem> <200311071649.27884.aleaxit@yahoo.com> <200311071708.02744.aleaxit@yahoo.com> Message-ID: <2moevky0du.fsf@starship.python.net> Alex Martelli writes: > From Barry's discussion of the problem of "magic strings" as arguments to > .encode / .decode , I was reminded of a blog entry, > > http://www.brunningonline.net/simon/blog/archives/000803.html > > which mentions another case of "magic strings" that might perhaps be > (optionally but suggestedly) changed into more-readable attributes (in > this case, clearly attributes of the 'file' type): mode arguments to 'file' > calls. Simon Brunning, the author of that blog entry, argues that > > myFile = file(filename, 'rb') > > (while of course we're going to keep accepting it forever) is not quite as > readable and maintainable as, e.g.: > > myFile = file(filename, file.READ + file.BINARY) > > Just curious -- what are everybody's feelings about that idea? I'm > about +0 on it, myself -- I doubt I'd remember to use it (too much C > in my past...:-) but I see why others would prefer it. I think I prefer Guido's idea that when a function argument is almost always constant you should really have two functions and /F's (?) idea that there should be a 'textfile' function: textfile(path[, mode='r'[, encoding='ascii']]) -> file object or similar. Cheers, mwh -- Need to Know is usually an interesting UK digest of things that happened last week or might happen next week. [...] This week, nothing happened, and we don't care. -- NTK Now, 2000-12-29, http://www.ntk.net/ From skip at pobox.com Sat Nov 8 07:34:07 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Nov 10 08:42:09 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <16300.9851.671401.447992@grendel.zope.com> References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> <16299.60650.800354.930018@grendel.zope.com> <200311072241.hA7Mf7M03135@12-236-54-216.client.attbi.com> <16300.9113.720680.750981@grendel.zope.com> <200311072302.hA7N2ve03300@12-236-54-216.client.attbi.com> <16300.9851.671401.447992@grendel.zope.com> Message-ID: <16300.58047.526545.28711@montanaro.dyndns.org> Fred> Frankly, that doesn't bother me, especially given that they've Fred> always been in the string module. But I count more than 4 Fred> constants that should be kept: Fred> ascii_letters Fred> ascii_lowercase Fred> ascii_uppercase Fred> digits Fred> hexdigits Fred> octdigits Fred> whitespace Don't forget 'punctuation'. Maybe it should be 'ascii_punctuation', since I'm sure there are other punctuation characters which would turn up in unicode. Fred> All of these could reasonably live on both str and unicode if Fred> that's not considered pollution. But if they live in a module, Fred> there's no reason not to keep string around for that purpose. If they are going to be attached to a class, why not to basestring? Fred> (I don't object to making them class attributes; I object to creating Fred> a new module for them.) Agreed. If they stay in a module, I'd prefer they just stay in string. That creates the minimum amount of churn in people's code. Anyone who's been converting to string methods has had to leave all the above constants alone anyway. Skip From fdrake at acm.org Mon Nov 10 09:25:06 2003 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon Nov 10 09:25:31 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <16300.58047.526545.28711@montanaro.dyndns.org> References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> <16299.60650.800354.930018@grendel.zope.com> <200311072241.hA7Mf7M03135@12-236-54-216.client.attbi.com> <16300.9113.720680.750981@grendel.zope.com> <200311072302.hA7N2ve03300@12-236-54-216.client.attbi.com> <16300.9851.671401.447992@grendel.zope.com> <16300.58047.526545.28711@montanaro.dyndns.org> Message-ID: <16303.40898.410595.383833@grendel.zope.com> Skip Montanaro writes: > Don't forget 'punctuation'. Maybe it should be 'ascii_punctuation', since > I'm sure there are other punctuation characters which would turn up in > unicode. Ah, yes. > If they are going to be attached to a class, why not to basestring? That makes sense for ascii_* and *digits, perhaps. whitespace and punctuation definately change for Unicode, so it's less clear that the values belong in a base class. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From guido at python.org Mon Nov 10 10:34:53 2003 From: guido at python.org (Guido van Rossum) Date: Mon Nov 10 10:35:03 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: Your message of "Mon, 10 Nov 2003 10:34:40 GMT." <2msmkwy0jj.fsf@starship.python.net> References: <000901c3a501$8fb10800$1535c797@oemcomputer> <1068219089.15995.128.camel@anthem> <200311071649.27884.aleaxit@yahoo.com> <200311071708.02744.aleaxit@yahoo.com> <1068225424.15995.146.camel@anthem> <2msmkwy0jj.fsf@starship.python.net> Message-ID: <200311101534.hAAFYrB15503@12-236-54-216.client.attbi.com> > > I would love it if what happened really was something like: > > > >>>> from socket import * > >>>> print AF_UNIX > > socket.AF_UNIX > >>>> from errno import * > >>>> print EEXIST > > errno.EEXIST > > I've had this idea too. I like it, I think. The signal module could > use it too... Yes, that would be cool for many enums. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Nov 10 10:39:07 2003 From: guido at python.org (Guido van Rossum) Date: Mon Nov 10 10:39:13 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: Your message of "Mon, 10 Nov 2003 10:38:05 GMT." <2moevky0du.fsf@starship.python.net> References: <000901c3a501$8fb10800$1535c797@oemcomputer> <1068219089.15995.128.camel@anthem> <200311071649.27884.aleaxit@yahoo.com> <200311071708.02744.aleaxit@yahoo.com> <2moevky0du.fsf@starship.python.net> Message-ID: <200311101539.hAAFd8H15525@12-236-54-216.client.attbi.com> > I think I prefer Guido's idea that when a function argument is almost > always constant you should really have two functions and /F's (?) > idea that there should be a 'textfile' function: > > textfile(path[, mode='r'[, encoding='ascii']]) -> file object > > or similar. I'm not so sure about that in this case. There are quite a few places where one writes a wrapper for open() that takes a mode and passes it on to the real open(). Having to distinguish between multiple open() functions would complexify this. OTOH my experimental standard I/O replacement (nondist/sandbox/sio) does a similar thing, by providing different constructors for different functionality (buffering, text translation, low-level I/O basis). --Guido van Rossum (home page: http://www.python.org/~guido/) From dan at sidhe.org Mon Nov 10 10:44:56 2003 From: dan at sidhe.org (Dan Sugalski) Date: Mon Nov 10 10:40:27 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <16303.40898.410595.383833@grendel.zope.com> References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> <16299.60650.800354.930018@grendel.zope.com> <200311072241.hA7Mf7M03135@12-236-54-216.client.attbi.com> <16300.9113.720680.750981@grendel.zope.com> <200311072302.hA7N2ve03300@12-236-54-216.client.attbi.com> <16300.9851.671401.447992@grendel.zope.com> <16300.58047.526545.28711@montanaro.dyndns.org> <16303.40898.410595.383833@grendel.zope.com> Message-ID: On Mon, 10 Nov 2003, Fred L. Drake, Jr. wrote: > > Skip Montanaro writes: > > Don't forget 'punctuation'. Maybe it should be 'ascii_punctuation', since > > I'm sure there are other punctuation characters which would turn up in > > unicode. > > Ah, yes. > > > If they are going to be attached to a class, why not to basestring? > > That makes sense for ascii_* and *digits, perhaps. Digits change for Unicode as well. Plus they get potentially... interesting in some cases, where the digit-ness of a character is arguably contextually driven, but I think that can be ignored. Most of the time, at least. Dan --------------------------------------"it's like this"------------------- Dan Sugalski even samurai dan@sidhe.org have teddy bears and even teddy bears get drunk From mwh at python.net Mon Nov 10 10:56:01 2003 From: mwh at python.net (Michael Hudson) Date: Mon Nov 10 10:56:08 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <200311101539.hAAFd8H15525@12-236-54-216.client.attbi.com> (Guido van Rossum's message of "Mon, 10 Nov 2003 07:39:07 -0800") References: <000901c3a501$8fb10800$1535c797@oemcomputer> <1068219089.15995.128.camel@anthem> <200311071649.27884.aleaxit@yahoo.com> <200311071708.02744.aleaxit@yahoo.com> <2moevky0du.fsf@starship.python.net> <200311101539.hAAFd8H15525@12-236-54-216.client.attbi.com> Message-ID: <2md6c0xlny.fsf@starship.python.net> Guido van Rossum writes: >> I think I prefer Guido's idea that when a function argument is almost >> always constant you should really have two functions and /F's (?) >> idea that there should be a 'textfile' function: >> >> textfile(path[, mode='r'[, encoding='ascii']]) -> file object >> >> or similar. > > I'm not so sure about that in this case. There are quite a few places > where one writes a wrapper for open() that takes a mode and passes it > on to the real open(). I may just be being thick today but I can't think of many. Most of the time passing in an already on file object would be better interface, surely? Well, there's things like the codec writers, but textfile would hopefully subsume them. > Having to distinguish between multiple open() functions would > complexify this. > > OTOH my experimental standard I/O replacement (nondist/sandbox/sio) > does a similar thing, by providing different constructors for > different functionality (buffering, text translation, low-level I/O > basis). Does text translation cover unicode issues here? Cheers, mwh -- Never meddle in the affairs of NT. It is slow to boot and quick to crash. -- Stephen Harris -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html From fdrake at acm.org Mon Nov 10 11:01:48 2003 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon Nov 10 11:02:05 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> <16299.60650.800354.930018@grendel.zope.com> <200311072241.hA7Mf7M03135@12-236-54-216.client.attbi.com> <16300.9113.720680.750981@grendel.zope.com> <200311072302.hA7N2ve03300@12-236-54-216.client.attbi.com> <16300.9851.671401.447992@grendel.zope.com> <16300.58047.526545.28711@montanaro.dyndns.org> <16303.40898.410595.383833@grendel.zope.com> Message-ID: <16303.46700.213857.424250@grendel.zope.com> Dan Sugalski writes: > Digits change for Unicode as well. Plus they get potentially... > interesting in some cases, where the digit-ness of a character is arguably > contextually driven, but I think that can be ignored. Most of the time, at > least. That depends on how we define "digits" for this purpose. I've always thought of the *digits strings as true constants; other may disagree. I understand that the digit-ness of a Unicode character is defined in more interesting ways than simply the ASCII characters 0-9. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From dan at sidhe.org Mon Nov 10 11:18:10 2003 From: dan at sidhe.org (Dan Sugalski) Date: Mon Nov 10 11:13:42 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <16303.46700.213857.424250@grendel.zope.com> References: <002701c3a55c$97088c80$bfb42c81@oemcomputer> <200311071859.hA7IxVY02856@12-236-54-216.client.attbi.com> <16299.60650.800354.930018@grendel.zope.com> <200311072241.hA7Mf7M03135@12-236-54-216.client.attbi.com> <16300.9113.720680.750981@grendel.zope.com> <200311072302.hA7N2ve03300@12-236-54-216.client.attbi.com> <16300.9851.671401.447992@grendel.zope.com> <16300.58047.526545.28711@montanaro.dyndns.org> <16303.40898.410595.383833@grendel.zope.com> <16303.46700.213857.424250@grendel.zope.com> Message-ID: On Mon, 10 Nov 2003, Fred L. Drake, Jr. wrote: > > Dan Sugalski writes: > > Digits change for Unicode as well. Plus they get potentially... > > interesting in some cases, where the digit-ness of a character is arguably > > contextually driven, but I think that can be ignored. Most of the time, at > > least. > > That depends on how we define "digits" for this purpose. I've always > thought of the *digits strings as true constants; other may disagree. Fair enough. The languages that use non-latin alphabets all have characters for digits, though many allow the use latin digits as well. I suppose it's a matter of taste as to whether the non-latin digit characters are treated as true digits or not. There's also the issue of interpreting numeric constants in general if you open up the set of digits with Unicode--it could be considered odd to allow kanji characters that are tagged as digits to not be considered digits for numeric constants or string->number conversions. Dan --------------------------------------"it's like this"------------------- Dan Sugalski even samurai dan@sidhe.org have teddy bears and even teddy bears get drunk From guido at python.org Mon Nov 10 11:34:28 2003 From: guido at python.org (Guido van Rossum) Date: Mon Nov 10 11:34:40 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: Your message of "Mon, 10 Nov 2003 15:56:01 GMT." <2md6c0xlny.fsf@starship.python.net> References: <000901c3a501$8fb10800$1535c797@oemcomputer> <1068219089.15995.128.camel@anthem> <200311071649.27884.aleaxit@yahoo.com> <200311071708.02744.aleaxit@yahoo.com> <2moevky0du.fsf@starship.python.net> <200311101539.hAAFd8H15525@12-236-54-216.client.attbi.com> <2md6c0xlny.fsf@starship.python.net> Message-ID: <200311101634.hAAGYSW15612@12-236-54-216.client.attbi.com> > >> textfile(path[, mode='r'[, encoding='ascii']]) -> file object > >> > >> or similar. > > > > I'm not so sure about that in this case. There are quite a few places > > where one writes a wrapper for open() that takes a mode and passes it > > on to the real open(). > > I may just be being thick today but I can't think of many. Most of > the time passing in an already on file object would be better > interface, surely? Well, there's things like the codec writers, but > textfile would hopefully subsume them. Here's a pattern that I use frequently in unit tests: def makefile(self, data, mode="wb"): fn = tempfile.mktemp() self.tempfilenames.append(fn) f = open(fn, mode) f.write(data) f.close() return fn > > Having to distinguish between multiple open() functions would > > complexify this. > > > > OTOH my experimental standard I/O replacement (nondist/sandbox/sio) > > does a similar thing, by providing different constructors for > > different functionality (buffering, text translation, low-level I/O > > basis). > > Does text translation cover unicode issues here? Yes, the framework should support Unicode encoding/decoding too (but the implementation doesn't do much of this -- have a look). --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Mon Nov 10 12:16:34 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Nov 10 12:16:47 2003 Subject: [Python-Dev] other "magic strings" issues In-Reply-To: <338366A6D2E2CA4C9DAEAE652E12A1DEDF64B6@au3010avexu1.global.avaya.com> References: <338366A6D2E2CA4C9DAEAE652E12A1DEDF64B6@au3010avexu1.global.avaya.com> Message-ID: <16303.51186.409765.238472@montanaro.dyndns.org> Tim> # inside string.py or equivalent ... Tim> import sets Tim> ascii_letters = sets.Set(ascii_letters) Tim> Hmm - we'd have the iterability, individual characters and speed, Tim> but lose iterating in order. I'm sure there's things out there that Tim> rely on iterating over ascii_letters in order ... ;) Actually, I suspect that in most cases you wouldn't have speed unless sets.Set() is rewritten in C. See my previous post with the timeit.py results. Skip From michael at petroni.cc Mon Nov 10 14:38:56 2003 From: michael at petroni.cc (Michael Petroni) Date: Mon Nov 10 14:39:00 2003 Subject: [Python-Dev] socket listen problem under aix Message-ID: <3FAFE950.5020705@petroni.cc> hi! sorry for posting here as a non-member and non-developer, but i've a problem that is (maybe) a bug: i'm running python 2.2.3 under aix 4.3.3 compiled with gcc version 2.9-aix51-020209. subsequent accept calls in the socket library block after a defined number of calls depending on the accept queue size. the call then never returns, a connection to the server port gets a timeout and netstat -a still shows the port as listening. see the following example code: ---- import socket queue_size = 6 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.bind(("", 7111)) s.listen(queue_size) while 1: (c, addr) = s.accept() c.close() ---- depending on "queue_size" the loop blocks after n calls: size calls 1 1 2 3 3 4 4 6 5 7 6 9 i've tried the same code on various other systems with different python versions -> no problem at all. looks like that some ressources for the tcp connection queue are not freed any more. have i found a bug or did i miss something? sorry for the inconvenience once more and thx. mike From guido at python.org Mon Nov 10 14:49:36 2003 From: guido at python.org (Guido van Rossum) Date: Mon Nov 10 14:50:25 2003 Subject: [Python-Dev] socket listen problem under aix In-Reply-To: Your message of "Mon, 10 Nov 2003 20:38:56 +0100." <3FAFE950.5020705@petroni.cc> References: <3FAFE950.5020705@petroni.cc> Message-ID: <200311101949.hAAJnaO15861@12-236-54-216.client.attbi.com> > i'm running python 2.2.3 under aix 4.3.3 compiled with gcc version > 2.9-aix51-020209. > > subsequent accept calls in the socket library block after a defined > number of calls depending on the accept queue size. the call then never > returns, a connection to the server port gets a timeout and netstat -a > still shows the port as listening. > > see the following example code: > > ---- > import socket > queue_size = 6 > s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) > s.bind(("", 7111)) > s.listen(queue_size) > while 1: > (c, addr) = s.accept() > c.close() > ---- > > depending on "queue_size" the loop blocks after n calls: > > size calls > 1 1 > 2 3 > 3 4 > 4 6 > 5 7 > 6 9 > > i've tried the same code on various other systems with different python > versions -> no problem at all. looks like that some ressources for the > tcp connection queue are not freed any more. Almost certainly the problem is either in AIX or in your understanding of how sockets work, and not in Python's socket module. The socket module just calls the underlying system calls; it doesn't introduce this kind of problems by itself (but it doesn't prevent you from making a bogus sequence of calls either). If you want help debugging this issue, comp.lang.python would be a more appropriate place to ask. (My immdiate question after seeing your code above is, what is the client doing?) --Guido van Rossum (home page: http://www.python.org/~guido/) From vladimir.marangozov at optimay.com Mon Nov 10 16:12:35 2003 From: vladimir.marangozov at optimay.com (Marangozov, Vladimir (Vladimir)) Date: Mon Nov 10 16:12:41 2003 Subject: [Python-Dev] Re: other "magic strings" issues Message-ID: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com> Hi, [Guido] > I do think that keeping the string module around without all the > functions it historically contained would be a mistake, confusing > folks. This error is pretty clear: > > >>> import string > Traceback (most recent call last): > File "", line 1, in ? > ImportError: No module named string > >>> > > But this one is much more mystifying: > > >>> import string > >>> print string.join(["a", "b"], ".") > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: 'module' object has no attribute 'join' > >>> I am trying to understand what's the bottom line of this thread. It looks like people are suggesting that the venerable string module should vanish + provide its functions as object attributes. Well, I have to say that I actually like the fact that I can be procedural with strings and not object-oriented. Having all str functions as object attributes is too much OO for my mind with regard to this basic type. And too much OOrientation isn't always simple to grasp (despite that we can have anything as an object attribute now and regardless some nice pipe-like serialized string constructs achieved with attributes). Put it another way, it's good to have all string functions being attributes to a single well-known object, that object being the 'string' module, instead of spreading it all over... So add the attributes if you wish so (I respect OO minds), but don't zap the module (i.e. please respect mine ;-). Cheers, Vladimir From iusty at k1024.org Mon Nov 10 16:25:05 2003 From: iusty at k1024.org (Iustin Pop) Date: Mon Nov 10 16:24:05 2003 Subject: [Python-Dev] tempfile.mktemp and os.path.exists In-Reply-To: <200311100211.hAA2BvK14648@12-236-54-216.client.attbi.com> References: <20031109224445.GA26291@saytrin.hq.k1024.org> <200311100211.hAA2BvK14648@12-236-54-216.client.attbi.com> Message-ID: <20031110212505.GB5361@saytrin.hq.k1024.org> On Sun, Nov 09, 2003 at 06:11:57PM -0800, Guido van Rossum wrote: > > The tempfile.mktemp function uses os.path.exists to test whether a file > > already exists. Since this returns false for broken symbolic links, > > wouldn't it be better if the function would actually do an os.lstat on > > the filename? > > > > I know the function is not safe by definition, but this issue could > > (with a low probability) cause the file to actually be created in > > another directory, as the non-existent target of the symlink, instead of > > in the given directory (the one in which the symlink resides). > Sounds like a good suggestion; I'll see if I can check something in. The fix is trivial (IMHO). A patch is attached. > > (However, given that there already exists an attack on this function, > does fixing this actually make any difference?) Not really, but it is defensive programming (since the module is security-oriented). Maybe you want a non-existent name for a block device or a pipe (which mkstemp doesn't provide). I happened to look into the module to see if I can replace some hand-written functions with the ones in the module and I saw that mktemp() could be improved maybe. Regards, Iustin Pop -------------- next part -------------- diff -urN old/tempfile.py new/tempfile.py --- old/tempfile.py 2003-11-10 23:07:46.000000000 +0200 +++ new/tempfile.py 2003-11-10 23:22:57.000000000 +0200 @@ -338,7 +338,9 @@ for seq in xrange(TMP_MAX): name = names.next() file = _os.path.join(dir, prefix + name + suffix) - if not _os.path.exists(file): + try: + _os.lstat(file) + except _os.error: return file raise IOError, (_errno.EEXIST, "No usable temporary filename found") From guido at python.org Mon Nov 10 16:30:12 2003 From: guido at python.org (Guido van Rossum) Date: Mon Nov 10 16:30:19 2003 Subject: [Python-Dev] tempfile.mktemp and os.path.exists In-Reply-To: Your message of "Mon, 10 Nov 2003 23:25:05 +0200." <20031110212505.GB5361@saytrin.hq.k1024.org> References: <20031109224445.GA26291@saytrin.hq.k1024.org> <200311100211.hAA2BvK14648@12-236-54-216.client.attbi.com> <20031110212505.GB5361@saytrin.hq.k1024.org> Message-ID: <200311102130.hAALUCT16049@12-236-54-216.client.attbi.com> > > Sounds like a good suggestion; I'll see if I can check something in. > The fix is trivial (IMHO). A patch is attached. Now there you are wrong, my friend. :-) > > (However, given that there already exists an attack on this function, > > does fixing this actually make any difference?) > Not really, but it is defensive programming (since the module is > security-oriented). Maybe you want a non-existent name for a block > device or a pipe (which mkstemp doesn't provide). I use it all the time for situations where I have to name a file that an external program is going to create for me. > I happened to look into the module to see if I can replace some > hand-written functions with the ones in the module and I saw that > mktemp() could be improved maybe. > > Regards, > Iustin Pop > > --zhXaljGHf11kAtnf > Content-Type: text/plain; charset=us-ascii > Content-Disposition: attachment; filename="tempfile.patch" > > diff -urN old/tempfile.py new/tempfile.py > --- old/tempfile.py 2003-11-10 23:07:46.000000000 +0200 > +++ new/tempfile.py 2003-11-10 23:22:57.000000000 +0200 > @@ -338,7 +338,9 @@ > for seq in xrange(TMP_MAX): > name = names.next() > file = _os.path.join(dir, prefix + name + suffix) > - if not _os.path.exists(file): > + try: > + _os.lstat(file) > + except _os.error: > return file > > raise IOError, (_errno.EEXIST, "No usable temporary filename found") This fix would break on non-Unix platforms (the module should work everywhere). Fortunately I already checked something in that *does* work across platforms. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Mon Nov 10 16:31:47 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Nov 10 16:30:46 2003 Subject: [Python-Dev] Re: other "magic strings" issues In-Reply-To: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com > Message-ID: <5.1.0.14.0.20031110161755.030b1540@mail.telecommunity.com> At 10:12 PM 11/10/03 +0100, Marangozov, Vladimir (Vladimir) wrote: >Put it another way, it's good to have all string functions being >attributes to a single well-known object, that object being the >'string' module, instead of spreading it all over... So add the >attributes if you wish so (I respect OO minds), but don't zap >the module (i.e. please respect mine ;-). Actually, even in Python 2.2, you can access the same functions as 'str.whatever', e.g.: Python 2.2.2 (#37, Oct 14 2002, 17:02:34) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. IDLE 0.8 -- press F1 for help >>> str.upper("foo") 'FOO' >>> str.join(" ",["1","2","3"]) '1 2 3' >>> str.split("x y z") ['x', 'y', 'z'] >>> str.count("a+b+c","+") 2 In fact, the only items missing from 'str' as opposed to 'string' in 2.2 are: Constants --------- ascii_letters ascii_lowercase ascii_uppercase digits hexdigits letters lowercase octdigits printable punctuation uppercase whitespace Functions and Exceptions ------------------------ capwords (actually, the same as str.title) joinfields (alias for join, so str.join really suffices) index_error maketrans atof, atof_error atoi, atoi_error atol, atol_error So, the actual discussion is mostly about what to do with the constants, as the functions are already pretty much available in 'str'. Note that since 'str' is a built-in, it doesn't have to be imported, and it's three less characters to type. So, if you prefer a non-object style for strings, you could still do it if string went away. For legacy code support, you could probably even do: sys.modules['string'] = str in some cases. :) From aleaxit at yahoo.com Mon Nov 10 16:51:10 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Mon Nov 10 16:52:49 2003 Subject: [Python-Dev] Re: other "magic strings" issues In-Reply-To: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com> References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com> Message-ID: <200311102251.10904.aleaxit@yahoo.com> On Monday 10 November 2003 10:12 pm, Marangozov, Vladimir (Vladimir) wrote: ... > Put it another way, it's good to have all string functions being > attributes to a single well-known object, that object being the > 'string' module, instead of spreading it all over... So add the Not sure anybody wants to "spread it all over", for whatever "it". str.whatever should be usable where string.whatever is usable now, so, what would the problem be...? As for being able to call, when appropriate: something.amethod(somestring, whatever) rather than _having_ to call somestring.amethod(whatever) I _do_ sympathize with this. str.methodname, being an unbound method, may NOT be usable quite as freely ("quite as polymorphically", in OO-speak:-) as string.method was recently. E.g. : >>> import string >>> string.upper(u'ciao') u'CIAO' >>> string.upper('ciao') 'CIAO' >>> str.upper('ciao') 'CIAO' >>> str.upper(u'ciao') Traceback (most recent call last): File "", line 1, in ? TypeError: descriptor 'upper' requires a 'str' object but received a 'unicode' in other words, string.upper is currently callable on ANY object which internally defines an .upper() method, whether that object is a string or not; str.upper instead does typechecking on its first argument -- you can only call it on a bona fide instance of str or a subclass, not polymorphically in the usual Python sense of signature-based polymorphism. So, if I have a sequence with some strings and some unicode objects I cannot easily get a correspondent sequence with each item uppercased _except_ with string.upper...: >>> map(string.upper, ('ciao', u'ciao')) ['CIAO', u'CIAO'] >>> map(str.upper, ('ciao', u'ciao')) Traceback (most recent call last): File "", line 1, in ? TypeError: descriptor 'upper' requires a 'str' object but received a 'unicode' >>> map(unicode.upper, ('ciao', u'ciao')) Traceback (most recent call last): File "", line 1, in ? TypeError: descriptor 'upper' requires a 'unicode' object but received a 'str' To be honest I don't currently have any real use case that's quite like this (i.e., based on a mix of string and unicode objects), but I DO have cases in completely different domains where I end up coding the like of (sigh...): def fooper(obj): return obj.foop() foopresults = map(fooper, lotsofobjects) or equivalently: foopresults = map(lambda obj: obj.foop(), lotsofobjects) or also (probably best for this specific use case): foopresults = [ obj.foop() for obj in lotsofobjects ] map may not be the best example, because it's old-ish and most replaceable with list comprehensions (optionally with zip), itertools, etc. But I _do_ need an "easily expressed callable" for _many_ perfectly current and indeed future (2.4) idioms. E.g., "order the items of lotsobjs in increasing order of their .foop() results" in 2.4 would be lotsobjs.sort(key=lambda obj: obj.foop()) ...and we're back to wishing for a way to pass a nonlambda-callable. E.g. a string-related example would be "order the strings in list lotsastrings (which may be all plain strings, or all unicode strings, on different calls of this overall function) in case-insensitive-alphabetical order". In 2.4 _with_ the string module that's a snap: lotsastrings.sort(key=string.upper) _without_ string.upper's quiet and strong polymorphism, we'd be back to lambda, or a tiny def for the equivalent of string.upper, or nailing down the exact type involved, leading perhaps to nasty code such as lotsastrings.sort(key=type(lotsastrings[0]).upper) (not ADVOCATING this by any means -- on the contrary, pointing it out as a danger of having such callables ONLY available as unbound methods and thus requiring the exact type...). But it does not seem to me that keeping module string as it is now is necessarily the ideal solution to this small quandary. It works for those methods which strings _used_ to have in 1.5.2 -- try, e.g., string.title -- and you're hosed again. _Extending_ module string doesn't seem like a pleasant option either -- and if we did we'd _still_ leave exactly the same problem open for non-string objects on which we'd like to get a polymorphic callable that's normally a method (key= parameter in sort, all the 'func' and 'pred' parameters to itertools functions, ...). Rather, why not think of a slightly more general solution...? We could have an object -- say "callmethod", although I'm sure better names can easily be found by this creative crowd;-) -- with functionality roughly equivalent to the following Python code...: class MethodCaller(object): def __getattr__(self, name): def callmethod(otherself, *args, **kwds): return getattr(otherself, name)(*args, **kwds) return callmethod callmethod = MethodCaller() Now, the ability to obtain callables for each of the above examples becomes available -- with parametric polymorphism just like Python normally offers. Performance with this implementation would surely be bad (but then, string.upper(s) is over twice as slow as s.upper() and I don't hear complaints on that...:-) but maybe a more clever implementation might partly compensate... _if_, that is, there IS any interest at all in the idea, of course! Alex From djc at object-craft.com.au Mon Nov 10 17:45:54 2003 From: djc at object-craft.com.au (Dave Cole) Date: Mon Nov 10 17:45:59 2003 Subject: [Python-Dev] socket listen problem under aix In-Reply-To: <3FAFE950.5020705@petroni.cc> References: <3FAFE950.5020705@petroni.cc> Message-ID: <1068504354.10481.11.camel@echidna.object-craft.com.au> On Tue, 2003-11-11 at 06:38, Michael Petroni wrote: > hi! > > sorry for posting here as a non-member and non-developer, but i've a > problem that is (maybe) a bug: > > i'm running python 2.2.3 under aix 4.3.3 compiled with gcc version > 2.9-aix51-020209. > > subsequent accept calls in the socket library block after a defined > number of calls depending on the accept queue size. the call then never > returns, a connection to the server port gets a timeout and netstat -a > still shows the port as listening. > > see the following example code: > > ---- > import socket > queue_size = 6 > s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) > s.bind(("", 7111)) > s.listen(queue_size) > while 1: > (c, addr) = s.accept() > c.close() > ---- It may not have any effect but try changing the socket call to this: socket.socket(socket.AF_INET, socket.SOCK_STREAM, socket.IPPROTO_TCP) I recently wrote a non-blocking select loop server (in C) on AIX 4.3.3 and the program would run for hours then fail in strange ways. When I changed the socket() protocol argument from zero to IPROTO_TCP the problems went away. It is a long shot, but it is worth a try. - Dave -- http://www.object-craft.com.au From iusty at k1024.org Mon Nov 10 17:59:40 2003 From: iusty at k1024.org (Iustin Pop) Date: Mon Nov 10 17:57:45 2003 Subject: [Python-Dev] tempfile.mktemp and os.path.exists In-Reply-To: <200311102130.hAALUCT16049@12-236-54-216.client.attbi.com> References: <20031109224445.GA26291@saytrin.hq.k1024.org> <200311100211.hAA2BvK14648@12-236-54-216.client.attbi.com> <20031110212505.GB5361@saytrin.hq.k1024.org> <200311102130.hAALUCT16049@12-236-54-216.client.attbi.com> Message-ID: <20031110225940.GC5361@saytrin.hq.k1024.org> > Now there you are wrong, my friend. :-) > > This fix would break on non-Unix platforms (the module should work > everywhere). Fortunately I already checked something in that *does* > work across platforms. :-) Thanks for reminding me - sometimes I forget that, even if I cherish the portability of python! Iustin Pop From dingy at shvns.com Mon Nov 10 18:43:36 2003 From: dingy at shvns.com (Ding Yong) Date: Mon Nov 10 19:42:24 2003 Subject: [Python-Dev] Re: Python-Dev Digest, Vol 4, Issue 33 References: Message-ID: <004001c3a7e4$75d7b9c0$f065a8c0@dingyong> ----- Original Message ----- From: To: Sent: Tuesday, November 11, 2003 12:57 AM Subject: Python-Dev Digest, Vol 4, Issue 33 > Send Python-Dev mailing list submissions to > python-dev@python.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.python.org/mailman/listinfo/python-dev > or, via email, send a message with subject or body 'help' to > python-dev-request@python.org > > You can reach the person managing the list at > python-dev-owner@python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Python-Dev digest..." > > > Today's Topics: > > 1. python icons? (Matthias Klose) > 2. tempfile.mktemp and os.path.exists (Iustin Pop) > 3. RE: other "magic strings" issues (Delaney, Timothy C (Timothy)) > 4. Re: tempfile.mktemp and os.path.exists (Guido van Rossum) > 5. Re: other "magic strings" issues (Alex Martelli) > 6. Re: other "magic strings" issues (Michael Hudson) > 7. Re: other "magic strings" issues (Michael Hudson) > 8. Re: other "magic strings" issues (Skip Montanaro) > 9. Re: other "magic strings" issues (Fred L. Drake, Jr.) > 10. Re: other "magic strings" issues (Guido van Rossum) > 11. Re: other "magic strings" issues (Guido van Rossum) > 12. Re: other "magic strings" issues (Dan Sugalski) > 13. Re: other "magic strings" issues (Michael Hudson) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 9 Nov 2003 21:01:57 +0100 > From: Matthias Klose > Subject: [Python-Dev] python icons? > To: python-dev@python.org > Message-ID: <16302.40245.488709.729747@gargle.gargle.HOWL> > Content-Type: text/plain; charset=us-ascii > > Wanting to add an icon for gnome/KDE menus for a binary python > package. There are no images in the distribution itself, and not many > on the website. Looking for something like > http://www.python.org/cgi-bin/moinmoin/ in standard resolutions like > 64x64, 48x48, 32x32 and 16x16. Maybe something like this could be > added to the Misc directory in the tarball. > > Matthias > > > > > ------------------------------ > > Message: 2 > Date: Mon, 10 Nov 2003 00:44:45 +0200 > From: Iustin Pop > Subject: [Python-Dev] tempfile.mktemp and os.path.exists > To: python-dev@python.org > Message-ID: <20031109224445.GA26291@saytrin.hq.k1024.org> > Content-Type: text/plain; charset=us-ascii > > Hello, > > The tempfile.mktemp function uses os.path.exists to test whether a file > already exists. Since this returns false for broken symbolic links, > wouldn't it be better if the function would actually do an os.lstat on > the filename? > > I know the function is not safe by definition, but this issue could > (with a low probability) cause the file to actually be created in > another directory, as the non-existent target of the symlink, instead of > in the given directory (the one in which the symlink resides). > > Regards, > Iustin Pop > > > > ------------------------------ > > Message: 3 > Date: Mon, 10 Nov 2003 09:54:41 +1100 > From: "Delaney, Timothy C (Timothy)" > Subject: RE: [Python-Dev] other "magic strings" issues > To: > Message-ID: > <338366A6D2E2CA4C9DAEAE652E12A1DEDF64B6@au3010avexu1.global.avaya.com> > Content-Type: text/plain; charset="iso-8859-1" > > > From: python-dev-bounces+tdelaney=avaya.com@python.org > > > > I guess the tests should be faster, yes, but I would still > > want _iterables_ for ascii_* and digits. > > > > One issue with allowing "if char in string.letters:" is that > > these days this will not raise if the alleged 'char' is more > > than one character -- it will give True for (e.g.) 'ab', False > > for (e.g.) 'foobar', since it tests _substrings_. > > # inside string.py or equivalent ... > > import sets > > ascii_letters = sets.Set(ascii_letters) > > Hmm - we'd have the iterability, individual characters and speed, but lose iterating in order. I'm sure there's things out there that rely on iterating over ascii_letters in order ... ;) > > Tim Delaney > > > > ------------------------------ > > Message: 4 > Date: Sun, 09 Nov 2003 18:11:57 -0800 > From: Guido van Rossum > Subject: Re: [Python-Dev] tempfile.mktemp and os.path.exists > To: Iustin Pop > Cc: python-dev@python.org > Message-ID: <200311100211.hAA2BvK14648@12-236-54-216.client.attbi.com> > > > Hello, > > > > The tempfile.mktemp function uses os.path.exists to test whether a file > > already exists. Since this returns false for broken symbolic links, > > wouldn't it be better if the function would actually do an os.lstat on > > the filename? > > > > I know the function is not safe by definition, but this issue could > > (with a low probability) cause the file to actually be created in > > another directory, as the non-existent target of the symlink, instead of > > in the given directory (the one in which the symlink resides). > > > > Regards, > > Iustin Pop > > Sounds like a good suggestion; I'll see if I can check something in. > > (However, given that there already exists an attack on this function, > does fixing this actually make any difference?) > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > ------------------------------ > > Message: 5 > Date: Mon, 10 Nov 2003 09:18:15 +0100 > From: Alex Martelli > Subject: Re: [Python-Dev] other "magic strings" issues > To: "Delaney, Timothy C (Timothy)" , > > Message-ID: <200311100918.15810.aleaxit@yahoo.com> > Content-Type: text/plain; charset="iso-8859-1" > > On Sunday 09 November 2003 11:54 pm, Delaney, Timothy C (Timothy) wrote: > ... > > ascii_letters = sets.Set(ascii_letters) > > > > Hmm - we'd have the iterability, individual characters and speed, but lose > > iterating in order. I'm sure there's things out there that rely on > > iterating over ascii_letters in order ... ;) > > Yes, that's my main use case -- presenting results to the user, so they need > to be in alphabetic order (ascii_lowercase actually, but it's much the same). > > Anyway, Guido has already pronounced on such enhancements as "Too > Clever", so we have to keep ascii_lowercase &c as plain strings without any > enhancements and keep the "false positives" &c on 'in' checks. > > > Alex > > > > > ------------------------------ > > Message: 6 > Date: Mon, 10 Nov 2003 10:34:40 +0000 > From: Michael Hudson > Subject: Re: [Python-Dev] other "magic strings" issues > To: python-dev@python.org > Message-ID: <2msmkwy0jj.fsf@starship.python.net> > Content-Type: text/plain; charset=us-ascii > > Barry Warsaw writes: > > > I would love it if what happened really was something like: > > > >>>> from socket import * > >>>> print AF_UNIX > > socket.AF_UNIX > >>>> from errno import * > >>>> print EEXIST > > errno.EEXIST > > I've had this idea too. I like it, I think. The signal module could > use it too... > > Cheers, > mwh > > -- > I have a feeling that any simple problem can be made arbitrarily > difficult by imposing a suitably heavy administrative process > around the development. -- Joe Armstrong, comp.lang.functional > > > > ------------------------------ > > Message: 7 > Date: Mon, 10 Nov 2003 10:38:05 +0000 > From: Michael Hudson > Subject: Re: [Python-Dev] other "magic strings" issues > To: python-dev@python.org > Message-ID: <2moevky0du.fsf@starship.python.net> > Content-Type: text/plain; charset=us-ascii > > Alex Martelli writes: > > > From Barry's discussion of the problem of "magic strings" as arguments to > > .encode / .decode , I was reminded of a blog entry, > > > > http://www.brunningonline.net/simon/blog/archives/000803.html > > > > which mentions another case of "magic strings" that might perhaps be > > (optionally but suggestedly) changed into more-readable attributes (in > > this case, clearly attributes of the 'file' type): mode arguments to 'file' > > calls. Simon Brunning, the author of that blog entry, argues that > > > > myFile = file(filename, 'rb') > > > > (while of course we're going to keep accepting it forever) is not quite as > > readable and maintainable as, e.g.: > > > > myFile = file(filename, file.READ + file.BINARY) > > > > Just curious -- what are everybody's feelings about that idea? I'm > > about +0 on it, myself -- I doubt I'd remember to use it (too much C > > in my past...:-) but I see why others would prefer it. > > I think I prefer Guido's idea that when a function argument is almost > always constant you should really have two functions and /F's (?) > idea that there should be a 'textfile' function: > > textfile(path[, mode='r'[, encoding='ascii']]) -> file object > > or similar. > > Cheers, > mwh > > -- > Need to Know is usually an interesting UK digest of things that > happened last week or might happen next week. [...] This week, > nothing happened, and we don't care. > -- NTK Now, 2000-12-29, http://www.ntk.net/ > > > > ------------------------------ > > Message: 8 > Date: Sat, 8 Nov 2003 06:34:07 -0600 > From: Skip Montanaro > Subject: Re: [Python-Dev] other "magic strings" issues > To: "Fred L. Drake, Jr." > Cc: Guido van Rossum , python-dev@python.org > Message-ID: <16300.58047.526545.28711@montanaro.dyndns.org> > Content-Type: text/plain; charset=us-ascii > > > Fred> Frankly, that doesn't bother me, especially given that they've > Fred> always been in the string module. But I count more than 4 > Fred> constants that should be kept: > > Fred> ascii_letters > Fred> ascii_lowercase > Fred> ascii_uppercase > Fred> digits > Fred> hexdigits > Fred> octdigits > Fred> whitespace > > Don't forget 'punctuation'. Maybe it should be 'ascii_punctuation', since > I'm sure there are other punctuation characters which would turn up in > unicode. > > Fred> All of these could reasonably live on both str and unicode if > Fred> that's not considered pollution. But if they live in a module, > Fred> there's no reason not to keep string around for that purpose. > > If they are going to be attached to a class, why not to basestring? > > Fred> (I don't object to making them class attributes; I object to creating > Fred> a new module for them.) > > Agreed. If they stay in a module, I'd prefer they just stay in string. > That creates the minimum amount of churn in people's code. Anyone who's > been converting to string methods has had to leave all the above constants > alone anyway. > > Skip > > > > ------------------------------ > > Message: 9 > Date: Mon, 10 Nov 2003 09:25:06 -0500 > From: "Fred L. Drake, Jr." > Subject: Re: [Python-Dev] other "magic strings" issues > To: skip@pobox.com > Cc: python-dev@python.org > Message-ID: <16303.40898.410595.383833@grendel.zope.com> > Content-Type: text/plain; charset=us-ascii > > > Skip Montanaro writes: > > Don't forget 'punctuation'. Maybe it should be 'ascii_punctuation', since > > I'm sure there are other punctuation characters which would turn up in > > unicode. > > Ah, yes. > > > If they are going to be attached to a class, why not to basestring? > > That makes sense for ascii_* and *digits, perhaps. whitespace and > punctuation definately change for Unicode, so it's less clear that the > values belong in a base class. > > > -Fred > > -- > Fred L. Drake, Jr. > PythonLabs at Zope Corporation > > > > ------------------------------ > > Message: 10 > Date: Mon, 10 Nov 2003 07:34:53 -0800 > From: Guido van Rossum > Subject: Re: [Python-Dev] other "magic strings" issues > To: Michael Hudson > Cc: python-dev@python.org > Message-ID: <200311101534.hAAFYrB15503@12-236-54-216.client.attbi.com> > > > > I would love it if what happened really was something like: > > > > > >>>> from socket import * > > >>>> print AF_UNIX > > > socket.AF_UNIX > > >>>> from errno import * > > >>>> print EEXIST > > > errno.EEXIST > > > > I've had this idea too. I like it, I think. The signal module could > > use it too... > > Yes, that would be cool for many enums. > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > ------------------------------ > > Message: 11 > Date: Mon, 10 Nov 2003 07:39:07 -0800 > From: Guido van Rossum > Subject: Re: [Python-Dev] other "magic strings" issues > To: Michael Hudson > Cc: python-dev@python.org > Message-ID: <200311101539.hAAFd8H15525@12-236-54-216.client.attbi.com> > > > I think I prefer Guido's idea that when a function argument is almost > > always constant you should really have two functions and /F's (?) > > idea that there should be a 'textfile' function: > > > > textfile(path[, mode='r'[, encoding='ascii']]) -> file object > > > > or similar. > > I'm not so sure about that in this case. There are quite a few places > where one writes a wrapper for open() that takes a mode and passes it > on to the real open(). Having to distinguish between multiple open() > functions would complexify this. > > OTOH my experimental standard I/O replacement (nondist/sandbox/sio) > does a similar thing, by providing different constructors for > different functionality (buffering, text translation, low-level I/O > basis). > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > ------------------------------ > > Message: 12 > Date: Mon, 10 Nov 2003 10:44:56 -0500 (EST) > From: Dan Sugalski > Subject: Re: [Python-Dev] other "magic strings" issues > To: "Fred L. Drake, Jr." > Cc: skip@pobox.com, python-dev@python.org > Message-ID: > Content-Type: TEXT/PLAIN; charset=US-ASCII > > On Mon, 10 Nov 2003, Fred L. Drake, Jr. wrote: > > > > > Skip Montanaro writes: > > > Don't forget 'punctuation'. Maybe it should be 'ascii_punctuation', since > > > I'm sure there are other punctuation characters which would turn up in > > > unicode. > > > > Ah, yes. > > > > > If they are going to be attached to a class, why not to basestring? > > > > That makes sense for ascii_* and *digits, perhaps. > > Digits change for Unicode as well. Plus they get potentially... > interesting in some cases, where the digit-ness of a character is arguably > contextually driven, but I think that can be ignored. Most of the time, at > least. > > Dan > > --------------------------------------"it's like this"------------------- > Dan Sugalski even samurai > dan@sidhe.org have teddy bears and even > teddy bears get drunk > > > > > ------------------------------ > > Message: 13 > Date: Mon, 10 Nov 2003 15:56:01 +0000 > From: Michael Hudson > Subject: Re: [Python-Dev] other "magic strings" issues > To: python-dev@python.org > Message-ID: <2md6c0xlny.fsf@starship.python.net> > Content-Type: text/plain; charset=us-ascii > > Guido van Rossum writes: > > >> I think I prefer Guido's idea that when a function argument is almost > >> always constant you should really have two functions and /F's (?) > >> idea that there should be a 'textfile' function: > >> > >> textfile(path[, mode='r'[, encoding='ascii']]) -> file object > >> > >> or similar. > > > > I'm not so sure about that in this case. There are quite a few places > > where one writes a wrapper for open() that takes a mode and passes it > > on to the real open(). > > I may just be being thick today but I can't think of many. Most of > the time passing in an already on file object would be better > interface, surely? Well, there's things like the codec writers, but > textfile would hopefully subsume them. > > > Having to distinguish between multiple open() functions would > > complexify this. > > > > OTOH my experimental standard I/O replacement (nondist/sandbox/sio) > > does a similar thing, by providing different constructors for > > different functionality (buffering, text translation, low-level I/O > > basis). > > Does text translation cover unicode issues here? > > Cheers, > mwh > > -- > Never meddle in the affairs of NT. It is slow to boot and quick to > crash. -- Stephen Harris > -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html > > > > ------------------------------ > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > > > End of Python-Dev Digest, Vol 4, Issue 33 > ***************************************** From eppstein at ics.uci.edu Mon Nov 10 20:48:06 2003 From: eppstein at ics.uci.edu (David Eppstein) Date: Mon Nov 10 20:48:10 2003 Subject: [Python-Dev] Re: other "magic strings" issues References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com> <200311102251.10904.aleaxit@yahoo.com> Message-ID: In article <200311102251.10904.aleaxit@yahoo.com>, Alex Martelli wrote: > >>> map(string.upper, ('ciao', u'ciao')) > ['CIAO', u'CIAO'] > > >>> map(str.upper, ('ciao', u'ciao')) > Traceback (most recent call last): > File "", line 1, in ? > TypeError: descriptor 'upper' requires a 'str' object but received a 'unicode' > > >>> map(unicode.upper, ('ciao', u'ciao')) > Traceback (most recent call last): > File "", line 1, in ? > TypeError: descriptor 'upper' requires a 'unicode' object but received a 'str' > > > To be honest I don't currently have any real use case that's quite like this > (i.e., based on a mix of string and unicode objects), but I DO have cases > in completely different domains where I end up coding the like of (sigh...): Actually I had exactly this case recently: I had an object that needed to store a pointer to a function for normalizing item names prior to looking them up in a dictionary, and most of the time (but not always) that function was lower(). But I wanted to handle both str and unicode, so I wrote a one-line function: def lower(x): return x.lower() > ...and we're back to wishing for a way to pass a nonlambda-callable. E.g. > a string-related example would be "order the strings in list lotsastrings > (which may be all plain strings, or all unicode strings, on different calls > of this overall function) in case-insensitive-alphabetical order". In 2.4 > _with_ the string module that's a snap: > > lotsastrings.sort(key=string.upper) Is that really alphabetical? It seems like it orders them based on the ordinal value of the characters, which doesn't work so well for unicodes. The last time I needed this I couldn't figure out how to get a reasonable case-insensitive-alphabetical order in pure python, so I used PyObjC's NSString.localizedCaseInsensitiveCompare_ instead; a pure Python solution that works as well as that one would be welcome. -- David Eppstein http://www.ics.uci.edu/~eppstein/ Univ. of California, Irvine, School of Information & Computer Science From tim.one at comcast.net Mon Nov 10 21:48:50 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Nov 10 21:48:55 2003 Subject: [Python-Dev] More fun with Python shutdown Message-ID: Jim (Fulton) refactored oodles of Zope3 to make heavier use of weak references. Now Zope3 dies with a segfault when it's shut down, which makes its adoption of Python 2.3.2 a bit less attractive . The problem isn't really understood. I hope that once it is, there will be a simple way to avoid it under 2.3.2. Jim filed a bug report with a fix to the symptom here: http://www.python.org/sf/839548 It's another case where things go crazy during the second call of PyGC_Collect in Py_Finalize. Alas, we haven't found a simpler failing test case than "Zope3" yet. For bafflement value, I'll give a cmdline-parameterized snippet here that displays at least 4 distinct behaviors at shutdown, although a segfault isn't one of them: """ import weakref import os class C(object): def hi(self, w=os.write): w(1, 'hi 1\n') print 'hi 2' def pp(c=C()): c.hi() import sys exec "import %s as somemodule" % sys.argv[1] in globals() del sys somemodule.c1 = C() somemodule.awr = weakref.ref(somemodule.c1, lambda ignore, pp=pp: pp()) del C, pp """ Here are the ways it behaves (on Windows, anyway): C:\Code\python\PCbuild>python temp4.py tempfile hi 1 hi 2 C:\Code\python\PCbuild>python temp4.py math # curiously, __main__ the same C:\Code\python\PCbuild>python temp4.py __builtin__ hi 1 C:\Code\python\PCbuild>python temp4.py sys hi 1 Exception exceptions.AttributeError: "'NoneType' object has no attribute 'write'" in at 0x006B6C70> ignored C:\Code\python\PCbuild> The only one I can't make any sense of is __builtin__: the weakref callback is certainly invoked then, but its print statement neither produces output nor raises an exception. Note that the exception in the "sys" example has nothing to do with the "os.write" default-arg value. That's really the print statement, complaining because sys.stdout is None by the time shutdown gets there. From tismer at tismer.com Mon Nov 10 21:52:16 2003 From: tismer at tismer.com (Christian Tismer) Date: Mon Nov 10 21:52:18 2003 Subject: [Python-Dev] Making python C-API thread safe (try 2) In-Reply-To: References: <5.1.1.6.0.20030911142317.02b88640@telecommunity.com> <5.1.1.6.0.20030911130607.02426ec0@telecommunity.com> <5.1.1.6.0.20030911130607.02426ec0@telecommunity.com> <5.1.1.6.0.20030911142317.02b88640@telecommunity.com> <5.1.1.6.0.20030911162016.02027750@telecommunity.com> Message-ID: <3FB04EE0.50901@tismer.com> Not having read c.l.py for too long, some comments, anyway... A.M. Kuchling wrote: > On Fri, 12 Sep 2003 07:56:55 +0300, > Harri Pesonen wrote: > >>I don't know, I got mail about writing a PEP. It is clear that it would >>not be accepted, because it would break the existing API. The change is >>so big that I think that it has to be called a different language. > > > It would just be a different implementation of the same language. Jython > has different garbage collection characteristics from CPython, but they > still implement the same language; Stackless Python is still Python. This is only half the truth. Of course, you can run all of your code in Stackless without change. But as soon as you have become familiar with it, your programming sty changes so drastically, that you never will want to go back. I realized this late, after my first "eat your own dogfood" project. Stackless dramatically simplifies your coding style. This seems to be an irreversible process. I will provide examples. >>because this is too important to be ignored. Python *needs* to be >>free-threading... While written so heartily, and I can understand this very much, it appears to be very, very wrong, since it does not address general needs. I admit: TZhere are situations where you need this, and you would easily pay the extra of an at least 20-30 % overhead for being free-threaded. But this isn't common-case. Python's model of object sharing enforces such a costly scheme at the moment. In most cases, I strongly believe that this is not necessary, basically. The fact that access to almost any Python object is possible at almost any time is not a feature, but an artifact. Having to protect any mutable object at any time is a consequence of this. This protection currently either has to be the GIL, or builtin protection for the objects. I guess, that in most cases, you would want to have almost completely disjoint object spaces without any implicit sharing of mutables. You would provide extra communication primitives in order to share certain objects, instead. This way, most of the free threading issues would vanish, in favor of a limited set of controlled, shared objects, while most of the rest would just run unrestricted. Playing with such derivatives will be one of the strengths of the PyPy project, which has ability to try alternatives as one of its major goals. In CPython, you currently don't have much more alternatives than to run disjoint processes, which are communicating by exchanging pickled objects. (Which is, IMHO, not the worst solution at all!) > On the other hand, considering that the last free threading packages were > for 1.4, and no one has bothered to update them, the community doesn't seem > to find the subject as important as you do. :) I think it is worthwile to be considered as a special alternative. Making it *the* requirement is surely not the right goal for a tool that has to fulfil everybody's needs. Mysuggestion is to add this as a feature request to PyPy, together with some effort supporting it. If PyPy is going to be as flexible as we claimed many times, then it should be possible to derive a version with the desired properties. But this is meant to be a challenge for Harri, for instance. all the best -- chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From greg at cosc.canterbury.ac.nz Mon Nov 10 23:01:51 2003 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon Nov 10 23:02:33 2003 Subject: [Python-Dev] Deprecating obsolete builtins In-Reply-To: <20031106035837.GB7212@epoch.metaslash.com> Message-ID: <200311110401.hAB41pd17180@oma.cosc.canterbury.ac.nz> Neal Norwitz : > For the most part, I meant to remove them (including intern) > altogether in the long run. In 2.4, I only meant to officially > deprecate them with a warning. intern() doesn't seem particularly > useful or commonly used. If the implementation of string comparison is somehow changed so that explicit interning is no longer necessary for efficient lookup of dynamically-constructed names, then intern() can go. But until then, the functionality needs to be available somehow -- you might not need it often, but when you do, there's no substitute for it. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From vladimir.marangozov at optimay.com Tue Nov 11 05:13:50 2003 From: vladimir.marangozov at optimay.com (Marangozov, Vladimir (Vladimir)) Date: Tue Nov 11 05:14:15 2003 Subject: [Python-Dev] Re: other "magic strings" issues Message-ID: <6CC39F01DF9C56438FC6B7473A989B63055C18@geex2ku01.agere.com> Hi, > [me] > > Put it another way, it's good to have all string functions being > > attributes to a single well-known object, that object being the > > 'string' module, instead of spreading it all over... So add the [Alex] > Not sure anybody wants to "spread it all over", for whatever "it". > str.whatever should be usable where string.whatever is usable > now, so, what would the problem be...? "Should" is a bit of an overstatement, provided that Python lived happily without all string functions as attributes for 10+ years. Now you've grown OO and appreciate having all functions as attributes and that's fine. Noone has objected to enlarge the set of attributes. The objection is towards deprecating the 'string' module thus closing the door for a procedural approach to strings. And if I say that 5% of all programmers don't care about string polymorphism nor Unicode, that is probably true as well, so no point in arguing that o.upper() is better than string.upper(o). o.upper() is really StringType.upper(o) under the hood, which is the same as import string / string.upper(o). Both StringType and 'string' act as function packages (containers). Yes, I see you coming with arguments that they aren't really the same because of subtleties like Unicode, etc. but that's irrelevant for those 95% of the people who aren't heavily invested in strings and simply don't care. The catch is that if we favor the OO approach and deprecate 'string', we deprecate one explicit way of spelling things, which is import string / string.upper(o). This has been adopted and is widely used. Python has always tried to balance purity with practicality and OO in Python is still perceived as optional, especially for the newcomer who needs to write a couple of quick scripts to get the job done. I am not sure we have to favor the OO reasoning for everything. There are also backward compatibility issues arising from deprecating 'string' but I belive this is manageable. 'string' can be aliased to StringType so that it is backwards compatible. Removing the 'string' module as a name completely would be a bit of a challenge though... Cheers, Vladimir From jim at zope.com Tue Nov 11 05:32:01 2003 From: jim at zope.com (Jim Fulton) Date: Tue Nov 11 05:33:29 2003 Subject: [Python-Dev] Re: More fun with Python shutdown In-Reply-To: References: Message-ID: <3FB0BAA1.5040607@zope.com> Tim Peters wrote: > Jim (Fulton) refactored oodles of Zope3 to make heavier use of weak > references. Now Zope3 dies with a segfault when it's shut down, which makes > its adoption of Python 2.3.2 a bit less attractive . My main concern at this point is getting a 2.3.3 that doesn't have this behavior. In the worst case, I think I could create a version of weak dicts that avoided the symptom, by avoiding attribute accesses in weakref callbacks. > The problem isn't really understood. I hope that once it is, there will be > a simple way to avoid it under 2.3.2. Jim filed a bug report with a fix to > the symptom here: > > http://www.python.org/sf/839548 The theory is that it occurs when a cycle involving a class is broken by calling the tp_clear slot on a heap type. I verified this by setting a gdb break point in Zope 3 and verifying that type_clear was called while a type still had a ref count much higher than 1. From a purely theoretical point of view, the current behavior is wrong. There is clearly an invariant that tp_mro is not None and type_clear violates this. The fix (setting the mro to () in type_clear, is pretty straightforward. My assumption is that it's possible for this to occur at times other than shutdown, although, perhaps, wildly unlikely. What's especially poorly understood is how to make it happen in a smallter test program. > It's another case where things go crazy during the second call of > PyGC_Collect in Py_Finalize. Alas, we haven't found a simpler failing test > case than "Zope3" yet. > > For bafflement value, I'll give a cmdline-parameterized snippet here that > displays at least 4 distinct behaviors at shutdown, although a segfault > isn't one of them: BTW, with a debug build, I get an assertion error rather than a segfault. > """ > import weakref > import os > > class C(object): > def hi(self, w=os.write): > w(1, 'hi 1\n') > print 'hi 2' > > def pp(c=C()): > c.hi() > > import sys > exec "import %s as somemodule" % sys.argv[1] in globals() > del sys > > somemodule.c1 = C() > somemodule.awr = weakref.ref(somemodule.c1, lambda ignore, pp=pp: pp()) > > del C, pp > """ > > Here are the ways it behaves (on Windows, anyway): > > C:\Code\python\PCbuild>python temp4.py tempfile > hi 1 > hi 2 > > C:\Code\python\PCbuild>python temp4.py math # curiously, __main__ the same > > C:\Code\python\PCbuild>python temp4.py __builtin__ > hi 1 > > C:\Code\python\PCbuild>python temp4.py sys > hi 1 > Exception exceptions.AttributeError: "'NoneType' object has no attribute > 'write'" in at 0x006B6C70> ignored > > C:\Code\python\PCbuild> > > The only one I can't make any sense of is __builtin__: the weakref callback > is certainly invoked then, but its print statement neither produces output > nor raises an exception. When trying to debug this in Zope 3, I similarly noticed that prints in the weakref callback produced no output. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From mwh at python.net Tue Nov 11 06:40:02 2003 From: mwh at python.net (Michael Hudson) Date: Tue Nov 11 06:40:06 2003 Subject: [Python-Dev] Re: other "magic strings" issues In-Reply-To: (David Eppstein's message of "Mon, 10 Nov 2003 17:48:06 -0800") References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com> <200311102251.10904.aleaxit@yahoo.com> Message-ID: <2mhe1buoa5.fsf@starship.python.net> David Eppstein writes: >> ...and we're back to wishing for a way to pass a nonlambda-callable. E.g. >> a string-related example would be "order the strings in list lotsastrings >> (which may be all plain strings, or all unicode strings, on different calls >> of this overall function) in case-insensitive-alphabetical order". In 2.4 >> _with_ the string module that's a snap: >> >> lotsastrings.sort(key=string.upper) > > Is that really alphabetical? It seems like it orders them based on the > ordinal value of the characters, which doesn't work so well for unicodes. > The last time I needed this I couldn't figure out how to get a > reasonable case-insensitive-alphabetical order in pure python, so I used > PyObjC's NSString.localizedCaseInsensitiveCompare_ instead; a pure > Python solution that works as well as that one would be welcome. The locale module has some things in this direction -- strxfrm and strcoll, maybe? -- but I don't know what they do with unicode & doubt they even exist on OS X. Cheers, mwh -- Do I do everything in C++ and teach a course in advanced swearing? -- David Beazley at IPC8, on choosing a language for teaching From ark at acm.org Tue Nov 11 10:48:15 2003 From: ark at acm.org (Andrew Koenig) Date: Tue Nov 11 10:48:22 2003 Subject: [Python-Dev] question about PEP 323 (copyable iterators) Message-ID: <004601c3a86b$38b8fb80$6402a8c0@arkdesktop> Early in PEP 323, there is a claim that an iterator is considered copyable if it has a __copy__ method. The following example in the PEP illustrates that claim: def tee(it): it = iter(it) try: copier = it.__copy__ except AttributeError: # non-copyable iterator, do all the needed hard work # [snipped!] else: return it, copier() Later in the PEP, there is an example that suggests that an iterator should be considered copyable only if its __copy__ method can be called: class enumerate(object): def __init__(self, it): self.it = iter(it) self.i = -1 # next and __iter__ methods snipped from the original def __copy__(self): result = self.__class__.new() result.it = self.it.__copy__() result.i = self.i return result Here, class enumerate always has a __copy__ method, even if the iterator that is being enumerated doesn't. In other words, if you use class enumerate on an iterator that isn't copyable, you get an iterator with a __copy__ method that isn't copyable. Is that behavior really right? I would think that you would have to do something like this: class enumerate(object): def __init__(self, it): self.it = iter(it) self.i = -1 try it.__copy__ except AttributeError: pass else: self.__copy__ = self.conditional_copy def conditional_copy(self): result = self.__class__.new() result.it = self.it.__copy__() result.i = self.i return result Am I missing something? From tim at zope.com Tue Nov 11 12:07:20 2003 From: tim at zope.com (Tim Peters) Date: Tue Nov 11 12:08:26 2003 Subject: [Python-Dev] RE: More fun with Python shutdown In-Reply-To: <3FB0BAA1.5040607@zope.com> Message-ID: [Jim Fulton, on ] > ... > The theory is that it occurs when a cycle involving a class is broken > by calling the tp_clear slot on a heap type. I verified this by > setting a gdb break point in Zope 3 and verifying that type_clear was > called while a type still had a ref count much higher than 1. > > From a purely theoretical point of view, the current behavior is > wrong. It is, but a segfault is more than just pure theory . > There is clearly an invariant that tp_mro is not None and > type_clear violates this. The fix (setting the mro to () in > type_clear, is pretty straightforward. The invariant is that tp_mro is not NULL so long as anyone may reference it. tp_clear believes that tp_mro will never be referenced again, but it's demonstrably wrong in that belief. The real bug lies there: why is its belief wrong? You patched it so that tp_mro doesn't become NULL, thus avoiding the immediate segfault, but until we understand *why* the invariant got violated, it's unclear that the patch is "a fix". Code is still accessing the MRO after tp_clear is called, but now instead of a segfault it's going to see an empty MRO. That's also (and clearly so, at least to me) incorrect: code that tries to access a class's MRO should see the MRO the programmer intended, and no sane class has an empty tuple for its MRO. So I think the "tp_mro <- ()" patch exchanges gross breakage for subtler breakage. > My assumption is that it's possible for this to occur at times other > than shutdown, although, perhaps, wildly unlikely. In the absence of real understanding, who knows. If it is possible before shutdown, then the importance of not exposing user code to a made-up MRO skyrockets, IMO. > What's especially poorly understood is how to make it happen in a > smallter test program. > ... > BTW, with a debug build, I get an assertion error rather than a > segfault. Which assertion fails then? That may be a good clue toward truly understanding what's causing this. >> """ >> import weakref >> import os >> >> class C(object): >> def hi(self, w=os.write): >> w(1, 'hi 1\n') >> print 'hi 2' >> >> def pp(c=C()): >> c.hi() >> >> import sys >> exec "import %s as somemodule" % sys.argv[1] in globals() del sys >> >> somemodule.c1 = C() >> somemodule.awr = weakref.ref(somemodule.c1, lambda ignore, pp=pp: >> pp()) >> >> del C, pp >> """ ... >> C:\Code\python\PCbuild>python temp4.py __builtin__ >> hi 1 ... >> The only one I can't make any sense of is __builtin__: the weakref >> callback is certainly invoked then, but its print statement neither >> produces output nor raises an exception. > When trying to debug this in Zope 3, I similarly noticed that prints > in the weakref callback produced no output. I'm not sure this one's worth pursuing. Your problem occurred during the second call to gc in finalization, and the sys module has been gutted by that point. In particular, sys.stdout has been cleared, so a print statement can't work then. The only mystery to me wrt this is why it didn't raise an exception, like the >> Exception exceptions.AttributeError: "'NoneType' object has no attribute >> 'write'" in at 0x006B6C70> ignored raised when calling that little program with "sys" instead of "__builtin__". From guido at python.org Tue Nov 11 12:13:19 2003 From: guido at python.org (Guido van Rossum) Date: Tue Nov 11 12:13:41 2003 Subject: [Python-Dev] Re: other "magic strings" issues In-Reply-To: Your message of "Tue, 11 Nov 2003 11:40:02 GMT." <2mhe1buoa5.fsf@starship.python.net> References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com> <200311102251.10904.aleaxit@yahoo.com> <2mhe1buoa5.fsf@starship.python.net> Message-ID: <200311111713.hABHDJt17461@12-236-54-216.client.attbi.com> > The locale module has some things in this direction -- strxfrm and > strcoll, maybe? -- but I don't know what they do with unicode & doubt > they even exist on OS X. IMO, locale and Unicode shouldn't be mentioned in the same sentence. At least the part of the locale that defines properties of characters is subsumed in Unicode in a way that doesn't require you to specify the locale. (Of course the locale is still important in defining things like conventions for formatting numbers and dates.) --Guido van Rossum (home page: http://www.python.org/~guido/) From aleaxit at yahoo.com Tue Nov 11 12:21:53 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Tue Nov 11 12:22:00 2003 Subject: [Python-Dev] question about PEP 323 (copyable iterators) In-Reply-To: <004601c3a86b$38b8fb80$6402a8c0@arkdesktop> References: <004601c3a86b$38b8fb80$6402a8c0@arkdesktop> Message-ID: <200311111821.53479.aleaxit@yahoo.com> On Tuesday 11 November 2003 04:48 pm, Andrew Koenig wrote: > Early in PEP 323, there is a claim that an iterator is considered copyable > if it has a __copy__ method. The following example in the PEP illustrates > that claim: > > def tee(it): > it = iter(it) > try: copier = it.__copy__ > except AttributeError: > # non-copyable iterator, do all the needed hard work > # [snipped!] > else: > return it, copier() > > Later in the PEP, there is an example that suggests that an iterator should > be considered copyable only if its __copy__ method can be called: Very good point -- thanks! > Here, class enumerate always has a __copy__ method, even if the iterator > that is being enumerated doesn't. In other words, if you use class > enumerate on an iterator that isn't copyable, you get an iterator with a > __copy__ method that isn't copyable. Right. > Is that behavior really right? I would think that you would have to do > something like this: Special methods are normally defined on the type, not on the instance. So, a per-instance conditional definition of __copy__ does not appear to be right. Rather, I think I should rework the above example as: def tee(it): it = iter(it) try: return it, it.__copy__() except (AttributeError, TypeError): # non-copyable iterator, do all the needed hard work # [snipped!] i.e., an iterator is copyable if it has a __copy__ method that can be called without arguments and won't raise AttributeError or TypeError (other exceptions are not expected and would therefore propagate). This will allow "wrappers" such as enumerate to do their job most simply. (We could allow only TypeError and not AttributeError, but that would complicate both suppliers of __copy__ such as enumerate and consumers of it such as tee). Alex From bh at intevation.de Tue Nov 11 12:22:00 2003 From: bh at intevation.de (Bernhard Herzog) Date: Tue Nov 11 12:22:12 2003 Subject: [Python-Dev] RE: More fun with Python shutdown In-Reply-To: (Tim Peters's message of "Tue, 11 Nov 2003 12:07:20 -0500") References: Message-ID: <6qad72ddmv.fsf@salmakis.intevation.de> "Tim Peters" writes: >> When trying to debug this in Zope 3, I similarly noticed that prints >> in the weakref callback produced no output. > > I'm not sure this one's worth pursuing. Your problem occurred during the > second call to gc in finalization, and the sys module has been gutted by > that point. In particular, sys.stdout has been cleared, so a print > statement can't work then. The only mystery to me wrt this is why it didn't > raise an exception, like the > >>> Exception exceptions.AttributeError: "'NoneType' object has no attribute >>> 'write'" in at 0x006B6C70> ignored > > raised when calling that little program with "sys" instead of "__builtin__". Perhaps because sys.stderr has also been cleared? Python 2.3.2 (#2, Oct 6 2003, 19:39:48) [GCC 3.3.2 20030908 (Debian prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> class C(object): ... def __del__(self): ... print "__del__" ... >>> import sys >>> sys.stdout = None >>> c = C() >>> del c Exception exceptions.AttributeError: "'NoneType' object has no attribute 'write'" in > ignored >>> sys.stderr = None >>> c = C() >>> del c >>> Bernhard -- Intevation GmbH http://intevation.de/ Sketch http://sketch.sourceforge.net/ Thuban http://thuban.intevation.org/ From jim at zope.com Tue Nov 11 12:25:19 2003 From: jim at zope.com (Jim Fulton) Date: Tue Nov 11 12:26:20 2003 Subject: [Python-Dev] Re: More fun with Python shutdown In-Reply-To: References: Message-ID: <3FB11B7F.4040407@zope.com> Tim Peters wrote: > [Jim Fulton, on ] > >>... >>The theory is that it occurs when a cycle involving a class is broken >>by calling the tp_clear slot on a heap type. I verified this by >>setting a gdb break point in Zope 3 and verifying that type_clear was >>called while a type still had a ref count much higher than 1. >> >>From a purely theoretical point of view, the current behavior is >>wrong. > > > It is, but a segfault is more than just pure theory . I don't know what your point is here. > >>There is clearly an invariant that tp_mro is not None and >>type_clear violates this. The fix (setting the mro to () in >>type_clear, is pretty straightforward. > > > The invariant is that tp_mro is not NULL so long as anyone may reference it. > tp_clear believes that tp_mro will never be referenced again, but it's > demonstrably wrong in that belief. The real bug lies there: why is its > belief wrong? I thought that tp_clear was called to break cycles. Surely, if a class is in a cycle, there are references to it. Why would one assume that none of these references are instances? > You patched it so that tp_mro doesn't become NULL, thus avoiding the > immediate segfault, but until we understand *why* the invariant got > violated, it's unclear that the patch is "a fix". Code is still accessing > the MRO after tp_clear is called, but now instead of a segfault it's going > to see an empty MRO. That's also (and clearly so, at least to me) > incorrect: code that tries to access a class's MRO should see the MRO the > programmer intended, and no sane class has an empty tuple for its MRO. So I > think the "tp_mro <- ()" patch exchanges gross breakage for subtler > breakage. Surely, the original intent is top break something. ;) I'd much rather get an attribute error than a segfault or an equally fatal C assertion error. >>BTW, with a debug build, I get an assertion error rather than a >>segfault. > > > Which assertion fails then? That may be a good clue toward truly > understanding what's causing this. The assertion that mro is not NULL. :) See PyObject_GenericGetAttr. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From tim.one at comcast.net Tue Nov 11 12:32:23 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Nov 11 12:32:28 2003 Subject: [Python-Dev] RE: More fun with Python shutdown In-Reply-To: <6qad72ddmv.fsf@salmakis.intevation.de> Message-ID: [Bernhard Herzog] > Perhaps because sys.stderr has also been cleared? Sounds good to me. Now go back and figure out the real problem . From pje at telecommunity.com Tue Nov 11 12:47:42 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Nov 11 12:50:03 2003 Subject: [Python-Dev] Re: More fun with Python shutdown In-Reply-To: <3FB11B7F.4040407@zope.com> References: Message-ID: <5.1.1.6.0.20031111124123.02f48b90@telecommunity.com> At 12:25 PM 11/11/03 -0500, Jim Fulton wrote: >Tim Peters wrote: >>[Jim Fulton, on ] >> >>>... >>>The theory is that it occurs when a cycle involving a class is broken >>>by calling the tp_clear slot on a heap type. I verified this by >>>setting a gdb break point in Zope 3 and verifying that type_clear was >>>called while a type still had a ref count much higher than 1. >> From a purely theoretical point of view, the current behavior is >>>wrong. >> >>It is, but a segfault is more than just pure theory . > >I don't know what your point is here. It's a joke, laugh. :) >>>There is clearly an invariant that tp_mro is not None and >>>type_clear violates this. The fix (setting the mro to () in >>>type_clear, is pretty straightforward. >> >>The invariant is that tp_mro is not NULL so long as anyone may reference it. >>tp_clear believes that tp_mro will never be referenced again, but it's >>demonstrably wrong in that belief. The real bug lies there: why is its >>belief wrong? > >I thought that tp_clear was called to break cycles. Surely, if a class is >in a cycle, there are references to it. Why would one assume that none >of these references are instances? Actually, the funny thing here is that it's unlikely that the cycle a type is in involves its base classes. The only way I know of in pure Python to have such a cycle is to set an attribute of the base class to refer to the subclass, which means that clearing each type's dictionary (and other metaclass-defined slots, if any) should be sufficient to break the cycle, without touching tp_mro. >>You patched it so that tp_mro doesn't become NULL, thus avoiding the >>immediate segfault, but until we understand *why* the invariant got >>violated, it's unclear that the patch is "a fix". Code is still accessing >>the MRO after tp_clear is called, but now instead of a segfault it's going >>to see an empty MRO. That's also (and clearly so, at least to me) >>incorrect: code that tries to access a class's MRO should see the MRO the >>programmer intended, and no sane class has an empty tuple for its MRO. So I >>think the "tp_mro <- ()" patch exchanges gross breakage for subtler >>breakage. > >Surely, the original intent is top break something. ;) >I'd much rather get an attribute error than a segfault or an >equally fatal C assertion error. What's baffling me is what code is accessing the class after tp_clear is called. It can't be a __del__ method, or the cycle collector wouldn't be calling tp_clear, right? Or does it run __del__ methods during shutdown? From tim at zope.com Tue Nov 11 13:01:52 2003 From: tim at zope.com (Tim Peters) Date: Tue Nov 11 13:03:46 2003 Subject: [Python-Dev] RE: More fun with Python shutdown In-Reply-To: <3FB11B7F.4040407@zope.com> Message-ID: [Jim] >>> From a purely theoretical point of view, the current behavior is >>> wrong. [Tim] >> It is, but a segfault is more than just pure theory . [Jim] > I don't know what your point is here. I didn't know what you were trying to communicate by "From a purely theoretical point of view". That's all. A segault isn't a theoretical nit, it's a serious bug. Your phrasing appeared to imply that it wasn't a serious bug ("wrong" is synonymous with "bug" to me here). > ... > I thought that tp_clear was called to break cycles. Yes. > Surely, if a class is in a cycle, there are references to it. Yes. > Why would one assume that none of these references are instances? I don't think anyone is assuming that. The assumption is that nobody will *access* the class's MRO slot again. That's not the same as assuming there are no instances. It may be in part be a bad assumption that dead instances can't execute any methods ever again, fed by that gc refuses to break cycles if an object in the cycle contains a __del__ method. If weakrefs supply another path for executing from the grave, then the problem is deeper than the patch addresses. > ... > Surely, the original intent is top break something. ;) > I'd much rather get an attribute error than a segfault or an > equally fatal C assertion error. My goal on Python-Dev isn't just to stop Zope3 from segfaulting, feeding it mysterious AttributeErrors instead. That may be good enough for your current purposes, but it leaves the language in a still-sickly state. For example, I've suggested here before that the second call of gc from finalization may be a bad idea in general, because the interpreter is in a damaged (largely torn-down) state at that time. That would address a larger class of shutdown problems, and Zope isn't unique in seeing new shutdown problems under 2.3.2 (there have been other reports on c.l.py, but so far only of the "weird information-free msgs from threads at shutdown" flavor that we first saw in the Zope3 test suite, before cleaning up the stale threads). But we don't understand *this* problem well enough yet, and you raised the real possibility that this one can bite before shutdown. In that case a robust fix necessarily costs more than just commenting out the second gc call (which, all by itself, would have been enough to stop your segfaults so far too). >> Which assertion fails then? That may be a good clue toward truly >> understanding what's causing this. > The assertion that mro is not NULL. :) LOL -- that shed a lot of light . From eppstein at ics.uci.edu Tue Nov 11 13:09:05 2003 From: eppstein at ics.uci.edu (David Eppstein) Date: Tue Nov 11 13:09:08 2003 Subject: [Python-Dev] Re: other "magic strings" issues References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com> <200311102251.10904.aleaxit@yahoo.com> <2mhe1buoa5.fsf@starship.python.net> <200311111713.hABHDJt17461@12-236-54-216.client.attbi.com> Message-ID: In article <200311111713.hABHDJt17461@12-236-54-216.client.attbi.com>, Guido van Rossum wrote: > > The locale module has some things in this direction -- strxfrm and > > strcoll, maybe? -- but I don't know what they do with unicode & doubt > > they even exist on OS X. > > IMO, locale and Unicode shouldn't be mentioned in the same sentence. > At least the part of the locale that defines properties of characters > is subsumed in Unicode in a way that doesn't require you to specify > the locale. (Of course the locale is still important in defining > things like conventions for formatting numbers and dates.) The locale (as a concept) is also important in determining a unicode collation ordering, but it sounds like locale (as a Python module) doesn't do that. Ok, it sounds like I am stuck with PyObjC's NSString.localizedCaseInsensitiveCompare_, since Python's built-in cmp(unicode,unicode) sucks and locale doesn't provide an alternative. Are there any plans to add better collation ordering for unicode in future Python versions? Googling finds statements like http://mail.python.org/pipermail/i18n-sig/2001-May/000929.html (over two years ago, saying this has been on the plate for some time already then) but not much recent. -- David Eppstein http://www.ics.uci.edu/~eppstein/ Univ. of California, Irvine, School of Information & Computer Science From jim at zope.com Tue Nov 11 13:42:13 2003 From: jim at zope.com (Jim Fulton) Date: Tue Nov 11 13:43:12 2003 Subject: [Python-Dev] Re: More fun with Python shutdown In-Reply-To: <5.1.1.6.0.20031111124123.02f48b90@telecommunity.com> References: <5.1.1.6.0.20031111124123.02f48b90@telecommunity.com> Message-ID: <3FB12D85.8040005@zope.com> Phillip J. Eby wrote: > At 12:25 PM 11/11/03 -0500, Jim Fulton wrote: > >> Tim Peters wrote: >> ... >> Surely, the original intent is top break something. ;) >> I'd much rather get an attribute error than a segfault or an >> equally fatal C assertion error. > > > What's baffling me is what code is accessing the class after tp_clear is > called. It can't be a __del__ method, or the cycle collector wouldn't > be calling tp_clear, right? Or does it run __del__ methods during > shutdown? No, it's not a del. An object is being accessed in a weakref callback. The object being accessed is *not* the obect being accessed by the weakref. It's an object that had a dictionary that contained the wekref: class SurrogateRegistry(object): """Surrogate registry """ def __init__(self): self._surrogates = {} def _remove(k, selfref=weakref.ref(self)): self = selfref() if self is not None: try: del self._surrogates[k] except KeyError: pass self._remove = _remove This thing is similar to a WeakKeyDictionary. The _remove function is used as a callback when creating weakrefs of things stored as keys in the _surrogates dictionary. Now, it turns out that this function is called at a point where tp_clear has been called on the class. The problem occurs when the callback tries to do self._surrogates. (BTW, my workaround is: class SurrogateRegistry(object): """Surrogate registry """ def __init__(self): self._surrogates = surrogates = {} def _remove(k): try: del surrogates[k] except KeyError: pass self._remove = _remove which avoids accessing "self", but creates a strong reference, and this a cycle, from the weakref objects to the _surrogates dict, which is acceptable for my needs.) Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From martin at v.loewis.de Tue Nov 11 14:25:53 2003 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue Nov 11 14:26:08 2003 Subject: [Python-Dev] Re: other "magic strings" issues In-Reply-To: <200311111713.hABHDJt17461@12-236-54-216.client.attbi.com> References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com> <200311102251.10904.aleaxit@yahoo.com> <2mhe1buoa5.fsf@starship.python.net> <200311111713.hABHDJt17461@12-236-54-216.client.attbi.com> Message-ID: <3FB137C1.9000903@v.loewis.de> Guido van Rossum wrote: >>The locale module has some things in this direction -- strxfrm and >>strcoll, maybe? -- but I don't know what they do with unicode & doubt >>they even exist on OS X. > > > IMO, locale and Unicode shouldn't be mentioned in the same sentence. > At least the part of the locale that defines properties of characters > is subsumed in Unicode in a way that doesn't require you to specify > the locale. (Of course the locale is still important in defining > things like conventions for formatting numbers and dates.) In particular, locale also matters for collation. So the desire to collate Unicode strings properly is reasonable, but you need to know what locale to use for collation. With Python's current locale model, one would convert the Unicode string to the locale's encoding, and then perform collation. Of course, with an ICU wrapper, you could have multiple simultaneous locales, and collate Unicode strings without converting them into byte strings first. http://cvs.sourceforge.net/viewcvs.py/python-codecs/picu/ Regards, Martin From guido at python.org Tue Nov 11 14:56:35 2003 From: guido at python.org (Guido van Rossum) Date: Tue Nov 11 14:56:50 2003 Subject: [Python-Dev] Re: other "magic strings" issues In-Reply-To: Your message of "Tue, 11 Nov 2003 20:25:53 +0100." <3FB137C1.9000903@v.loewis.de> References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com> <200311102251.10904.aleaxit@yahoo.com> <2mhe1buoa5.fsf@starship.python.net> <200311111713.hABHDJt17461@12-236-54-216.client.attbi.com> <3FB137C1.9000903@v.loewis.de> Message-ID: <200311111956.hABJuZh18034@12-236-54-216.client.attbi.com> > >>The locale module has some things in this direction -- strxfrm and > >>strcoll, maybe? -- but I don't know what they do with unicode & doubt > >>they even exist on OS X. > > > Guido van Rossum wrote: > > IMO, locale and Unicode shouldn't be mentioned in the same sentence. > > At least the part of the locale that defines properties of characters > > is subsumed in Unicode in a way that doesn't require you to specify > > the locale. (Of course the locale is still important in defining > > things like conventions for formatting numbers and dates.) [MvL] > In particular, locale also matters for collation. So the desire to > collate Unicode strings properly is reasonable, but you need to know > what locale to use for collation. With Python's current locale model, > one would convert the Unicode string to the locale's encoding, and > then perform collation. Ouch. Seems you're right. > Of course, with an ICU wrapper, you could have multiple simultaneous > locales, and collate Unicode strings without converting them into byte > strings first. > > http://cvs.sourceforge.net/viewcvs.py/python-codecs/picu/ Is that something we could move into the std lib? --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Tue Nov 11 15:05:54 2003 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue Nov 11 15:06:21 2003 Subject: [Python-Dev] Re: other "magic strings" issues In-Reply-To: <200311111956.hABJuZh18034@12-236-54-216.client.attbi.com> References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com> <200311102251.10904.aleaxit@yahoo.com> <2mhe1buoa5.fsf@starship.python.net> <200311111713.hABHDJt17461@12-236-54-216.client.attbi.com> <3FB137C1.9000903@v.loewis.de> <200311111956.hABJuZh18034@12-236-54-216.client.attbi.com> Message-ID: <3FB14122.708@v.loewis.de> Guido van Rossum wrote: >>http://cvs.sourceforge.net/viewcvs.py/python-codecs/picu/ > > > Is that something we could move into the std lib? It's incomplete. When it is completed, yes, perhaps. However, ICU itself is *really* large (including the Unicode character database, encoding tables for all encodings of the world, and locale data for all languages), so we would need to ship that as well, or require that it is pre-existing on a system (possible for Linux, unrealistic for Windows). More realistically, we could expose wcscoll(3) where available, which would extend the Python locale model to Unicode (assuming the C library uses Unicode in wchar_t). Regards, Martin From guido at python.org Tue Nov 11 15:09:02 2003 From: guido at python.org (Guido van Rossum) Date: Tue Nov 11 15:09:11 2003 Subject: [Python-Dev] Re: other "magic strings" issues In-Reply-To: Your message of "Tue, 11 Nov 2003 21:05:54 +0100." <3FB14122.708@v.loewis.de> References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com> <200311102251.10904.aleaxit@yahoo.com> <2mhe1buoa5.fsf@starship.python.net> <200311111713.hABHDJt17461@12-236-54-216.client.attbi.com> <3FB137C1.9000903@v.loewis.de> <200311111956.hABJuZh18034@12-236-54-216.client.attbi.com> <3FB14122.708@v.loewis.de> Message-ID: <200311112009.hABK92M18120@12-236-54-216.client.attbi.com> > >>http://cvs.sourceforge.net/viewcvs.py/python-codecs/picu/ > > > > Is that something we could move into the std lib? > > It's incomplete. When it is completed, yes, perhaps. However, > ICU itself is *really* large (including the Unicode character > database, encoding tables for all encodings of the world, and > locale data for all languages), so we would need to ship that > as well, or require that it is pre-existing on a system (possible > for Linux, unrealistic for Windows). How big would ICU binaries for Windows be? I don't mind bloating the Windows installer by a few MB. As long as it doesn't have to land in CVS... > More realistically, we could expose wcscoll(3) where available, > which would extend the Python locale model to Unicode (assuming > the C library uses Unicode in wchar_t). I don't know what that is, but if you recommend it, I support it. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Tue Nov 11 15:21:40 2003 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue Nov 11 15:21:54 2003 Subject: [Python-Dev] Re: other "magic strings" issues In-Reply-To: <200311112009.hABK92M18120@12-236-54-216.client.attbi.com> References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com> <200311102251.10904.aleaxit@yahoo.com> <2mhe1buoa5.fsf@starship.python.net> <200311111713.hABHDJt17461@12-236-54-216.client.attbi.com> <3FB137C1.9000903@v.loewis.de> <200311111956.hABJuZh18034@12-236-54-216.client.attbi.com> <3FB14122.708@v.loewis.de> <200311112009.hABK92M18120@12-236-54-216.client.attbi.com> Message-ID: <3FB144D4.8060307@v.loewis.de> Guido van Rossum wrote: > How big would ICU binaries for Windows be? I don't mind bloating the > Windows installer by a few MB. As long as it doesn't have to land in > CVS... See ftp://www-126.ibm.com/pub/icu/2.6.1/icu-2.6.1.zip I haven't actually downloaded it because of size (9MB); the zip file may contain header files and the like which we shouldn't ship. >>More realistically, we could expose wcscoll(3) where available, [...] > I don't know what that is, but if you recommend it, I support it. See http://www.opengroup.org/onlinepubs/007908799/xsh/wcscoll.html It goes along with wcsxfrm and wcscmp for efficient collation, and parallels strcoll, strxfrm, and strcmp for wchar_t. Regards, Martin From theller at python.net Tue Nov 11 15:22:43 2003 From: theller at python.net (Thomas Heller) Date: Tue Nov 11 15:22:54 2003 Subject: [Python-Dev] More fun with Python shutdown In-Reply-To: (Tim Peters's message of "Mon, 10 Nov 2003 21:48:50 -0500") References: Message-ID: "Tim Peters" writes: > Jim (Fulton) refactored oodles of Zope3 to make heavier use of weak > references. Now Zope3 dies with a segfault when it's shut down, which makes > its adoption of Python 2.3.2 a bit less attractive . > > The problem isn't really understood. I hope that once it is, there will be > a simple way to avoid it under 2.3.2. Jim filed a bug report with a fix to > the symptom here: > > http://www.python.org/sf/839548 Is the problem I currently have the same, I also use weakrefs (although Jim's patch doesn't seem to help)? It is triggered when I have set the gc threshold to small values in a 2.3.2 debug build under Windows. When some containers in my program are destroyed Python crashes with an access violation in _Py_ForgetReference() because op->_ob_next and _op->_ob_prev are both NULL: void _Py_ForgetReference(register PyObject *op) { #ifdef SLOW_UNREF_CHECK register PyObject *p; #endif if (op->ob_refcnt < 0) Py_FatalError("UNREF negative refcnt"); if (op == &refchain || op->_ob_prev->_ob_next != op || op->_ob_next->_ob_prev != op) Py_FatalError("UNREF invalid object"); First I suspected buggy gc support in an extension module I have but the crash also occurrs when I remove it. Thomas PS: Here is the stack trace as displayed an MSVC6: _Py_ForgetReference(_object * 0x01101bd0) line 2001 + 15 bytes _Py_Dealloc(_object * 0x01101bd0) line 2021 + 9 bytes delete_garbage(_gc_head * 0x0012d640, _gc_head * 0x1e1783e0) line 516 + 81 bytes collect(int 0) line 625 + 13 bytes collect_generations() line 673 + 9 bytes _PyObject_GC_Malloc(unsigned int 24) line 1061 _PyObject_GC_New(_typeobject * 0x1e186c00 _PyListIter_Type) line 1070 + 12 bytes list_iter(_object * 0x01101c08) line 2414 + 10 bytes PyObject_GetIter(_object * 0x01101c08) line 2161 + 7 bytes eval_frame(_frame * 0x008aa278) line 2077 + 9 bytes PyEval_EvalCodeEx(PyCodeObject * 0x00b8be40, _object * 0x00b7af50, _object * 0x00000000, _object * * 0x008e3808, int 0, _object * * 0x008e3808, int 1, _object * * 0x00000000, int 0, _object * 0x00000000) line 2663 + 9 bytes fast_function(_object * 0x00b966c0, _object * * * 0x0012da24, int 2, int 0, int 1) line 3532 + 68 bytes call_function(_object * * * 0x0012da24, int 256) line 3458 + 25 bytes eval_frame(_frame * 0x008e36a8) line 2116 + 13 bytes PyEval_EvalCodeEx(PyCodeObject * 0x00b8bb28, _object * 0x00b7af50, _object * 0x00000000, _object * * 0x01117d6c, int 1, _object * * 0x00000000, int 0, _object * * 0x0111712c, int 1, _object * 0x00000000) line 2663 + 9 bytes function_call(_object * 0x011028d0, _object * 0x01117d58, _object * 0x00000000) line 509 + 64 bytes PyObject_Call(_object * 0x011028d0, _object * 0x01117d58, _object * 0x00000000) line 1755 + 15 bytes PyObject_CallFunction(_object * 0x011028d0, char * 0x1e1c63b8) line 1797 + 15 bytes handle_callback(_PyWeakReference * 0x01114f68, _object * 0x011028d0) line 684 + 18 bytes PyObject_ClearWeakRefs(_object * 0x01101bd0) line 750 + 13 bytes subtype_dealloc(_object * 0x01101bd0) line 656 + 9 bytes _Py_Dealloc(_object * 0x01101bd0) line 2022 + 7 bytes list_dealloc(PyListObject * 0x00a0f930) line 214 + 153 bytes _Py_Dealloc(_object * 0x00a0f930) line 2022 + 7 bytes dict_dealloc(_dictobject * 0x01100380) line 708 + 108 bytes _Py_Dealloc(_object * 0x01100380) line 2022 + 7 bytes subtype_dealloc(_object * 0x010f4f18) line 680 + 81 bytes _Py_Dealloc(_object * 0x010f4f18) line 2022 + 7 bytes PyDict_DelItem(_object * 0x01100428, _object * 0x00a73368) line 583 + 81 bytes PyObject_GenericSetAttr(_object * 0x010f4ee0, _object * 0x00a73368, _object * 0x00000000) line 1529 + 13 bytes PyObject_SetAttr(_object * 0x010f4ee0, _object * 0x00a73368, _object * 0x00000000) line 1289 + 18 bytes eval_frame(_frame * 0x008f8b38) line 1760 + 15 bytes PyEval_EvalCodeEx(PyCodeObject * 0x00b48d90, _object * 0x00b42188, _object * 0x00000000, _object * * 0x00893c70, int 5, _object * * 0x00893c84, int 0, _object * * 0x00000000, int 0, _object * 0x00000000) line 2663 + 9 bytes fast_function(_object * 0x00b57980, _object * * * 0x0012e070, int 5, int 5, int 0) line 3532 + 68 bytes call_function(_object * * * 0x0012e070, int 4) line 3458 + 25 bytes eval_frame(_frame * 0x00893b08) line 2116 + 13 bytes fast_function(_object * 0x00b79fb0, _object * * * 0x0012e218, int 5, int 5, int 0) line 3518 + 9 bytes call_function(_object * * * 0x0012e218, int 4) line 3458 + 25 bytes eval_frame(_frame * 0x008958e8) line 2116 + 13 bytes PyEval_EvalCodeEx(PyCodeObject * 0x00b48188, _object * 0x00b42188, _object * 0x00000000, _object * * 0x011194cc, int 4, _object * * 0x00000000, int 0, _object * * 0x00000000, int 0, _object * 0x00000000) line 2663 + 9 bytes function_call(_object * 0x00b4cea8, _object * 0x011194b8, _object * 0x00000000) line 509 + 64 bytes PyObject_Call(_object * 0x00b4cea8, _object * 0x011194b8, _object * 0x00000000) line 1755 + 15 bytes PyEval_CallObjectWithKeywords(_object * 0x00b4cea8, _object * 0x011194b8, _object * 0x00000000) line 3346 + 17 bytes PyObject_CallObject(_object * 0x00b4cea8, _object * 0x011194b8) line 1746 + 15 bytes _CallPythonObject(void * 0x0012e3e4, char * 0x10010e00, _object * 0x00b4cea8, _object * 0x00a9fbc0, void * * 0x0012e41c) line 178 + 14 bytes i_CallPythonObject(_object * 0x00b4cea8, _object * 0x00a9fbc0, void * * 0x0012e40c) line 213 + 26 bytes From tim at zope.com Tue Nov 11 15:41:11 2003 From: tim at zope.com (Tim Peters) Date: Tue Nov 11 15:42:16 2003 Subject: [Python-Dev] Re: More fun with Python shutdown In-Reply-To: <5.1.1.6.0.20031111124123.02f48b90@telecommunity.com> Message-ID: [Phillip J. Eby] > ... > Actually, the funny thing here is that it's unlikely that the cycle a > type is in involves its base classes. Well, all new-style classes are in cycles with bases: >>> class C(object): pass ... >>> object.__subclasses__()[-1] # so C is reachable from object >>> C.__mro__ # and object is reachable from C (, ) >>> For that matter, since the first element of the MRO is the class itself, a new-style class is in a self-cycle. That also requires clearing the MRO to break. IIRC, one of the reasons Guido wanted to call gc during finalization was to give these new-style class cycles a chance to destroy themselves cleanly. > ... > What's baffling me is what code is accessing the class after tp_clear > is called. It can't be a __del__ method, or the cycle collector > wouldn't be calling tp_clear, right? Or does it run __del__ methods > during shutdown? Jim explained -- as best we can without a finite test case to nail it. There does seem to be an assumption that a class object won't get collected if any instance of the class is still around. "Because" the class object would have a reference to it from the class instance, so that a live class instance keeps the class alive. But, if the class object and all remaining instances are all in one cycle, and that cycle is unreachable from outside, and the class doesn't define a __del__ method, then I *expect* gc would try to clean up the dead cycle. In that case, gc starts calling tp_clear slots in a seemingly arbitrary order. If the destruction of a class instance then happened to trigger a weakref callback which in turn tried to access an attribute of the class, and the class had already been through its tp_clear, then a NULL-pointer dereference (due to the cleared tp_mro slot) would be unavoidable. But if that's what's happening, then tricks like the one on the table may not be enough to stop segfaults: replacing tp_mro with an empty tuple only "works" so long as the class object hasn't also been thru its tp_dealloc routine. Once it goes thru tp_dealloc, the memory is recyclable heap trash, and tp_mro may or may not retain the bits that "look like" a pointer to an empty tuple by the time some weakref callback triggers an access to them. In a release build it's likely that the "pointer to an empty tuple" will survive across deallocation for at least a little while, because tp_mro isn't near an end of the object (so is unlikely to get overridden by malloc's or pymalloc's internal bookkeeping pointers). It's a crapshoot, though. A complication in all this is that Python's cyclic gc never calls tp_dealloc or tp_free directly! The only cleanup slot it calls directly is tp_clear. Deallocations still occur only as side effects of refcounts falling to 0, as tp_clear actions break cycles (and execute Py_DECREFs along the way). This protects against a class's tp_dealloc (but not tp_clear) getting called while instances still exist, even if they're all in one cycle. But "still exist" gets fuzzy then. Here's a cute one: """ class C(object): pass def pp(): import winsound winsound.Beep(2000, 500) import weakref wr = weakref.ref(C, lambda ignore, pp=pp: pp()) del C # this isn't enough to free C: C is still in at least two cycles """ C:\Python23>python temp5.py Fatal Python error: Interpreter not initialized (version mismatch?) abnormal program termination C:\Python23> That one is due to the weakref callback getting called after Py_Finalize does initialized = 0; so that the "import winsound" fails (I gave up trying to print things in callbacks ). From theller at python.net Tue Nov 11 16:05:31 2003 From: theller at python.net (Thomas Heller) Date: Tue Nov 11 16:05:43 2003 Subject: [Python-Dev] More fun with Python shutdown In-Reply-To: (Thomas Heller's message of "Tue, 11 Nov 2003 21:22:43 +0100") References: Message-ID: Thomas Heller writes: > "Tim Peters" writes: > >> Jim (Fulton) refactored oodles of Zope3 to make heavier use of weak >> references. Now Zope3 dies with a segfault when it's shut down, which makes >> its adoption of Python 2.3.2 a bit less attractive . >> >> The problem isn't really understood. I hope that once it is, there will be >> a simple way to avoid it under 2.3.2. Jim filed a bug report with a fix to >> the symptom here: >> >> http://www.python.org/sf/839548 > > Is the problem I currently have the same, I also use weakrefs (although > Jim's patch doesn't seem to help)? > > It is triggered when I have set the gc threshold to small values in a > 2.3.2 debug build under Windows. When some containers in my program are > destroyed Python crashes with an access violation in > _Py_ForgetReference() because op->_ob_next and > _op->_ob_prev are both NULL: > > void > _Py_ForgetReference(register PyObject *op) > { > #ifdef SLOW_UNREF_CHECK > register PyObject *p; > #endif > if (op->ob_refcnt < 0) > Py_FatalError("UNREF negative refcnt"); > if (op == &refchain || > op->_ob_prev->_ob_next != op || op->_ob_next->_ob_prev != op) > Py_FatalError("UNREF invalid object"); Here is the smallest program I can currently come up with that triggers this bug. Most of the code is extracted from Patrick O'Brian's dispatcher module on activestate's cookbook site, it creates weak references to bound methods by dissecting them into im_self and im_func. This program only prints "A" before crashing, so it does occur *before* interpreter shutdown. Thomas ----- import weakref import gc gc.set_threshold(1) connections = {} _boundMethods = weakref.WeakKeyDictionary() def safeRef(object): selfkey = object.im_self funckey = object.im_func if not _boundMethods.has_key(selfkey): _boundMethods[selfkey] = weakref.WeakKeyDictionary() if not _boundMethods[selfkey].has_key(funckey): _boundMethods[selfkey][funckey] = \ BoundMethodWeakref(boundMethod=object) return _boundMethods[selfkey][funckey] class BoundMethodWeakref: def __init__(self, boundMethod): def remove(object, self=self): _removeReceiver(receiver=self) self.weakSelf = weakref.ref(boundMethod.im_self, remove) self.weakFunc = weakref.ref(boundMethod.im_func, remove) def _removeReceiver(receiver): for senderkey in connections.keys(): for signal in connections[senderkey].keys(): receivers = connections[senderkey][signal] try: receivers.remove(receiver) except: pass _cleanupConnections(senderkey, signal) ################ class X(object): def test(self): pass def test(): print "A" safeRef(X().test) print "B" if __name__ == "__main__": test() ----- From pje at telecommunity.com Tue Nov 11 18:33:34 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Nov 11 18:36:08 2003 Subject: [Python-Dev] Re: More fun with Python shutdown In-Reply-To: References: <5.1.1.6.0.20031111124123.02f48b90@telecommunity.com> Message-ID: <5.1.1.6.0.20031111182245.028bed40@telecommunity.com> At 03:41 PM 11/11/03 -0500, Tim Peters wrote: >[Phillip J. Eby] > > ... > > Actually, the funny thing here is that it's unlikely that the cycle a > > type is in involves its base classes. > >Well, all new-style classes are in cycles with bases: > > >>> class C(object): pass >.. > >>> object.__subclasses__()[-1] # so C is reachable from object > I thought this was done with weak references. > >>> C.__mro__ # and object is reachable from C >(, ) > >>> > >For that matter, since the first element of the MRO is the class itself, a Oops. I forgot about that. >A complication in all this is that Python's cyclic gc never calls tp_dealloc >or tp_free directly! The only cleanup slot it calls directly is tp_clear. >Deallocations still occur only as side effects of refcounts falling to 0, as >tp_clear actions break cycles (and execute Py_DECREFs along the way). > >This protects against a class's tp_dealloc (but not tp_clear) getting called >while instances still exist, even if they're all in one cycle. But "still >exist" gets fuzzy then. Hm. So what if tp_clear didn't mess with the MRO, except to decref its self-reference in the MRO? tp_dealloc would have to decref the MRO tuple then, and deal with the off-by-one refcount for the type that would result from the tuple's deallocation. Could that work? From tim.one at comcast.net Tue Nov 11 19:01:08 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Nov 11 19:01:16 2003 Subject: [Python-Dev] More fun with Python shutdown In-Reply-To: Message-ID: > http://www.python.org/sf/839548 [Thomas Heller] > Is the problem I currently have the same, Probably not. > I also use weakrefs (although Jim's patch doesn't seem to help)? I guess your problem and Jim's both have in common that you and Zope3 use assignment statements too . > It is triggered when I have set the gc threshold to small values in a > 2.3.2 debug build under Windows. When some containers in my program > are destroyed Python crashes with an access violation in > _Py_ForgetReference() because op->_ob_next and > _op->_ob_prev are both NULL: That's a list of "all objects". Deallocating an object removes it from that list. Trying to deallocate it a second time tries to remove it from the list a second time, which barfs in just this way. > PS: Here is the stack trace as displayed an MSVC6: > > _Py_ForgetReference(_object * 0x01101bd0) line 2001 + 15 bytes > _Py_Dealloc(_object * 0x01101bd0) line 2021 + 9 bytes ... > _Py_Dealloc(_object * 0x01101bd0) line 2022 + 7 bytes Bingo: _Py_Dealloc with the same object pointer appears twice in the stack. That's almost certainly a bug in Python, but is almost certainly unrelated to the problem Jim is having. I was able to make your test case substantially smaller. The key is that the "remove" callback trigger gc. Apart from that, it doesn't matter at all what "remove" does. I don't know what the bug is, though, and since the last of these consumed more than a day to track down and fix, I don't anticipate having time to do that again: """ import weakref import gc _boundMethods = weakref.WeakKeyDictionary() def safeRef(object): selfkey = object.im_self funckey = object.im_func _boundMethods[selfkey] = weakref.WeakKeyDictionary() _boundMethods[selfkey][funckey] = BoundMethodWeakref(object) class BoundMethodWeakref: def __init__(self, boundMethod): def remove(object): gc.collect() self.weakSelf = weakref.ref(boundMethod.im_self, remove) class X(object): def test(self): pass def test(): print "A" safeRef(X().test) print "B" if __name__ == "__main__": test() """ As far as I can get without stopping: It's dying when the anonymous bound method (X().test) is getting cleaned up. That decrefs the anonymous X(), marking the end of its life too, which triggers a weakref callback, which calls gc.collect() (in your original program, a .keys() method created a list, which was enough to trigger gc because you set the gc threshold to 1). The anonymous X() then shows up in gc's list of garbage, and the Py_DECREF in this part of gc: if ((clear = op->ob_type->tp_clear) != NULL) { Py_INCREF(op); clear(op); Py_DECREF(op); } then knocks the refcount on the anonymous X() back to 0 a second time, triggering the fatal attempt to deallocate an object that's already in the process of being deallocated. This *may* be a deep problem. gc doesn't expect that the refcount on anything it knows about is already 0 at the time gc gets started. The way Python works , anything whose refcount falls to 0 is recycled without cyclic gc's help. Nevertheless, the anonymous X() container *is* in gc's lists when gc starts here, with a refcount of 0, and gc correctly concludes that X() isn't reachable from "outside". That's why it tries to delete X() itself. Anyway, the only thing weakrefs have to do with this is that they managed to trigger gc between the time a gc-tracked container became dead and the time the container untracked itself from gc. I'll note that the anonymous bound method object *did* untrack itself from gc before the fatal part began. Hmm. subtype_dealloc() *also* untracked the anonymous X() before the fatal part began, but then it *re*tracked it: /* UnTrack and re-Track around the trashcan macro, alas */ /* See explanation at end of function for full disclosure */ PyObject_GC_UnTrack(self); ++_PyTrash_delete_nesting; Py_TRASHCAN_SAFE_BEGIN(self); --_PyTrash_delete_nesting; _PyObject_GC_TRACK(self); /* We'll untrack for real later */ It's just a few lines later that the suicidal weakref callback gets triggered. The good news is that Guido must have spent days in all trying to bulletproof subtype_dealloc(), so it's not like a bug in this part of the code is a big surprise . It's possible that temporarily incref'ing self before the PyObject_ClearWeakRefs() call would be a correct fix (that would prevent gc from believing the object is collectible, and offhand I don't see anything other than PyObject_ClearWeakRefs here that could trigger a round of gc). If that's a correct analysis, this is a very serious bug: double-deallocation will normally go undetected in a release build, and will lead to memory corruption. It will happen only when a weakref callback happens to trigger gc, *and* the object being torn down at the time happens to be in a generation gc collects at the time gc is triggered. So the conditions that trigger it are rare and unpredictable, and the effects of the memory corruption it leads to are equally bad (anything can happen, at any time later). From greg at cosc.canterbury.ac.nz Tue Nov 11 19:13:53 2003 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue Nov 11 19:14:08 2003 Subject: [Python-Dev] Re: More fun with Python shutdown In-Reply-To: Message-ID: <200311120013.hAC0Drd25804@oma.cosc.canterbury.ac.nz> Tim Peters : > If the destruction of a class instance then happened to trigger a > weakref callback which in turn tried to access an attribute of the > class, and the class had already been through its tp_clear, then a > NULL-pointer dereference (due to the cleared tp_mro slot) would be > unavoidable. The crux of this seems to be that, now that we have weak references, __del__ methods are not the only thing that can trigger execution of arbitrary Python code when an object becomes unreferenced. Maybe the GC should also refuse to collect cycles in which any member is referenced by a weak reference with an associated callback? The alternative is to accept that arbitrary Python code can be called while the GC is in the midst of breaking a cycle. In that case, it's unacceptable for any object's tp_clear to set a Python pointer to NULL, or do anything else that would render the object no longer a valid Python object. That would be enough to stop segfaults, but it still wouldn't entirely solve the problem at hand, because the fact is there's no way to break the self-cycle in a class's MRO without rendering it unusable as a class object for at least some purposes. Which makes me think that the only safe thing to do is treat a weak-ref-with-callback as tantamount to a __del__ method for GC purposes. > But if that's what's happening, then tricks like the one on the table > may not be enough to stop segfaults: replacing tp_mro with an empty > tuple only "works" so long as the class object hasn't also been thru > its tp_dealloc routine. But that can't happen until the object's refcount has dropped to zero, in which case it can't be touched any longer by Python code. I don't think there's any worry with this. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim at zope.com Tue Nov 11 23:17:59 2003 From: tim at zope.com (Tim Peters) Date: Tue Nov 11 23:18:51 2003 Subject: [Python-Dev] Re: More fun with Python shutdown In-Reply-To: <5.1.1.6.0.20031111182245.028bed40@telecommunity.com> Message-ID: [Tim] >> >>> class C(object): pass >> .. >> >>> object.__subclasses__()[-1] # so C is reachable from object >> >>> [Phillip J. Eby] > I thought this was done with weak references. Ouch, yes. My apologies -- I keep forgetting that one. >> For that matter, since the first element of the MRO is the class >> itself [self-cycle] > Oops. I forgot about that. OK, I'll settle for a tie in the forgetfulness contest . > Hm. So what if tp_clear didn't mess with the MRO, except to decref > its self-reference in the MRO? tp_dealloc would have to decref the > MRO tuple then, and deal with the off-by-one refcount for the type > that would result from the tuple's deallocation. Could that work? Until we have a finite test case that reproduces Jim's problem, I don't know. It's possible. My intuition remains that hacking the tp_mro slot is patching a symptom of a deeper problem that's going to keep coming back in other guises. BTW, lying about true refcounts is fraught with subtle dangers. If you were the one who had to fiddle the ZODB3 cache to work with Python's cyclic gc, you'd have a better gut appreciation for that . From tim.one at comcast.net Wed Nov 12 00:22:26 2003 From: tim.one at comcast.net (Tim Peters) Date: Wed Nov 12 00:22:37 2003 Subject: [Python-Dev] Re: More fun with Python shutdown In-Reply-To: <200311120013.hAC0Drd25804@oma.cosc.canterbury.ac.nz> Message-ID: [Greg Ewing] > The crux of this seems to be that, now that we have weak references, > __del__ methods are not the only thing that can trigger execution of > arbitrary Python code when an object becomes unreferenced. - "this" needs clarification. Thomas Heller's bug didn't involve cycles, but I think that bug has no real intersection with Jim's woes. Some of the shutdown glitches I've displayed here, as well as the ones people have griped about on c.l.py, also weren't related to weakref callbacks. There's more than one (and more than two ...) distinct glitches here. - It is indeed the callbacks-- not weakrefs per se --that are the cause of *most* of these things. - weakref callbacks are easier to live with than __del__ methods in one (and maybe only one) respect: when the death of X triggers a weakref callback C, C isn't passed X, but X.__del__ is. So a weakref callback can't resurrect X, but X.__del__ can. I'm not sure how much comfort to take from that, since a weakref callback could presumably resurrect other trash in its dead object's cycle. > Maybe the GC should also refuse to collect cycles in which any member > is referenced by a weak reference with an associated callback? I've been meaning to think about that, but haven't been able to make more time for it. It should be possible to construct motivating examples. > The alternative is to accept that arbitrary Python code can be called > while the GC is in the midst of breaking a cycle. Bingo -- that's my fear. It's hard to say why in advance, but every time we've found a spot where arbitrary Python code *can* run during gc, we've eventually been screwed royally on that spot. Hell, last time we pissed away most of a week because PyObject_HasAttr (then used to ask whether __del__ exists; no longer used) ended up making massive changes to a Zope database as a side effect of indirectly calling the object's class's __getattr__ hook, mutating the Python object graph massively in the process as a side effect in turn of all the crap ZODB was doing to materialize ghosts. gc has to have a patch of unshifting ground to stand on. > In that case, it's unacceptable for any object's tp_clear to set > a Python pointer to NULL, or do anything else that would render the > object no longer a valid Python object. I expect it's worse than just that (since it always has been worse than just that in the past, although nobody has been able to predict exactly how for every case in advance). > That would be enough to stop segfaults, but it still wouldn't entirely > solve the problem at hand, because the fact is there's no way to break > the self-cycle in a class's MRO without rendering it unusable as a > class object for at least some purposes. Phil Eby suggested a hack for that specific one (decrement the refcount, and that's all -- the MRO holds an "illegitimate" self-reference then; wave hands, pray, and maybe it doesn't break something else). > Which makes me think that the only safe thing to do is treat a > weak-ref-with-callback as tantamount to a __del__ method for GC > purposes. Quite possibly so. >> But if that's what's happening, then tricks like the one on the table >> may not be enough to stop segfaults: replacing tp_mro with an empty >> tuple only "works" so long as the class object hasn't also been thru >> its tp_dealloc routine. > But that can't happen until the object's refcount has dropped to zero, > in which case it can't be touched any longer by Python code. Probably so. It depends not so much on principle as on the parts of the code where we cheat (e.g., if it were always true that refcount-dropped-to-0 implies can't-be-touched-again-by-Python-code, then what is it that gets passed to x.__del__()? x does -- but we cheat). From greg at cosc.canterbury.ac.nz Wed Nov 12 00:54:35 2003 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed Nov 12 00:54:51 2003 Subject: [Python-Dev] Re: More fun with Python shutdown In-Reply-To: Message-ID: <200311120554.hAC5sZn26866@oma.cosc.canterbury.ac.nz> Tim: > - weakref callbacks are easier to live with than __del__ methods in > one (and maybe only one) respect: when the death of X triggers > a weakref callback C, C isn't passed X, but X.__del__ is. So a > weakref callback can't resurrect X, but X.__del__ can. The object causing trouble doesn't need to be the one that died, e.g. doing a tp_clear on X causes Y to die which triggers a weakref callback which references X by some route. Resurrection of X isn't an issue, because it's not dead yet -- it is, however, in the process of being indiscriminately torn apart by the GC, messing up who-knows-what invariant that the callback might be relying on. So I can't see that the lack of possibility of resurrection helps much at all. > e.g., if it were always true that refcount-dropped-to-0 implies > can't-be-touched-again-by-Python-code, then what is it that gets > passed to x.__del__()? x does -- but we cheat But (I hope, at least!) it's guaranteed that the x.__del__() call is completed before any of the C-level deallocation code for x is begun... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From aleaxit at yahoo.com Wed Nov 12 03:14:08 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Wed Nov 12 03:14:14 2003 Subject: [Python-Dev] which sleepycat versions do we support in 2.3.* ? Message-ID: <200311120914.08946.aleaxit@yahoo.com> Somebody just wrote to help@python.org asking for guidance in resolving some conflicts in comments in 2.3.2 files regarding sleepycat versions we support. In Modules/Setup we say: """ The earliest supported version of that library is 3.0, the latest supported version is 4.0 (4.1 is specifically not supported, """ In README we say: """ Only versions 3.1 through 4.1 of Sleepycat's libraries provide the necessary API """ In setup.py we say: """ The earliest supported version of that library is 3.1, the latest supported version is 4.2 ... 3.1 is only partially supported """ I believe that setup.py is accurate, README slightly out of date, Modules/Setup way out of date -- but I thought that double checking couldn't possibly hurt. So, can I confirm this to the help@python.org querant, and fix the comments in README (should it say 3.1 through 4.2, or 3.2 through 4.2, given the "only partial support" for 3.1?) and Modules/Setup (presumably with a pointer to setup.py)? Alex From aleaxit at yahoo.com Wed Nov 12 03:55:49 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Wed Nov 12 03:55:56 2003 Subject: [Python-Dev] Re: other "magic strings" issues In-Reply-To: <3FB144D4.8060307@v.loewis.de> References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com> <200311112009.hABK92M18120@12-236-54-216.client.attbi.com> <3FB144D4.8060307@v.loewis.de> Message-ID: <200311120955.49289.aleaxit@yahoo.com> On Tuesday 11 November 2003 09:21 pm, Martin v. L?wis wrote: > Guido van Rossum wrote: > > How big would ICU binaries for Windows be? I don't mind bloating the > > Windows installer by a few MB. As long as it doesn't have to land in > > CVS... > > See > > ftp://www-126.ibm.com/pub/icu/2.6.1/icu-2.6.1.zip > > I haven't actually downloaded it because of size (9MB); the zip file > may contain header files and the like which we shouldn't ship. I have downloaded it, and it's a sources zipfile (needs to be unpacked with the -a to unzip, on Linux). I'm not quite sure of how to estimate the size of the Windows binaries since I don't have a decent Windows system to build it on at the moment. For a Linux-on-386 build, I see: [alex@lancelot source]$ size /usr/local/lib/libicu*.so.26.1 text data bss dec hex filename 8449053 3948 4 8453005 80fb8d /usr/local/lib/libicudata.so.26.1 875940 14528 908 891376 d99f0 /usr/local/lib/libicui18n.so.26.1 51426 4296 8 55730 d9b2 /usr/local/lib/libicuio.so.26.1 145377 4160 4 149541 24825 /usr/local/lib/libicule.so.26.1 29860 1244 4 31108 7984 /usr/local/lib/libiculx.so.26.1 26339 1004 4 27347 6ad3 /usr/local/lib/libicutoolutil.so.26.1 664190 21100 356 685646 a764e /usr/local/lib/libicuuc.so.26.1 and zipping just these .so.26.1 files to gain an idea of their overall compressibility gives me: alex@lancelot source]$ zip fup /usr/local/lib/libicu*.so.26.1 adding: usr/local/lib/libicudata.so.26.1 (deflated 54%) adding: usr/local/lib/libicui18n.so.26.1 (deflated 65%) adding: usr/local/lib/libicuio.so.26.1 (deflated 64%) adding: usr/local/lib/libicule.so.26.1 (deflated 70%) adding: usr/local/lib/libiculx.so.26.1 (deflated 65%) adding: usr/local/lib/libicutoolutil.so.26.1 (deflated 55%) adding: usr/local/lib/libicuuc.so.26.1 (deflated 60%) [alex@lancelot source]$ ll fup.zip -rw-rw-r-- 1 alex alex 4790641 Nov 12 09:53 fup.zip I'm sure I've forgotten something, but I hope the sizes are roughly indicative and about 5MB compressed, 10MB on disk, are more or less what we could be adding to the Python windows installer if it came with ICU. Perhaps somebody with a decent Windows platform can measure this more accurately!-) Alex From Boris.Boutillier at arteris.net Wed Nov 12 04:38:35 2003 From: Boris.Boutillier at arteris.net (Boris Boutillier) Date: Wed Nov 12 04:38:44 2003 Subject: [Python-Dev] New flag to differentiate Builtins and extensions classes ? Message-ID: <3FB1FF9B.7040508@arteris.net> I look into the archives and didn't see any debate on the question, hope I didn't miss something. My point concerns limitations on extensions module due to checks aiming the builtins. The main point is settable extension classes. In Python code there is some checks against TPFLAGS_HEAPTYPE, extension modules should'nt have this flag, so the normal type->tp_setattro doesnt allow the user to set new attributes on your extension classes. There is a way around, write a special MetaClass which redefine setattr. In the extension module I'm writing (I'm porting some Python code to Python-C for speed issues) the user can set attributes and slots on my classes. What I need is the complete type->tp_setattro behaviour, without the check. I didn't see a way to have this behaviour using only Python API (is rereadying the type a work around ?), so I copy paste all the code to make update_slots work (ouch 2500 lines). This is now almost working, every kind of attribute can be set but the __setattr__ one, the hackcheck prevents the user from calling another __setattr__ from its new setattr: example of my extension class hierachy: Class A(object) Class B(A) In the extension, there is a tp->setattro on B, if the user want to redefine it, he can't call the A __setattr__: def myBSetattr(self,k,v): super(B,self).__setattr__(k,v) ## Do here my special stuff This won't work, the hachcheck will see some kind of hack here, 'you cant' call the A.__setattr__ function from a B object' . First question, Is there a known way around ? Possible Improvments : In the python code there is in function function checks to see if you are not modying builtins classes, unfortunately this code is also concerning extension modules. I think the Heaptype flag is abusively used in differents cases mostly, in type_setattro, object_set_bases, object_set_classes, the checks have nothing to do with the HeapType true definition as stated in the comments in Include/Object.h , it is used, I think, only because this is the only one that makes a difference between builtins and user classes. Unfortunately with this flag extension classes fall into the 'builtin' part. A way to solve the problem without backward compatibility problems, would be to have a new TPFLAGS_SETABLE flag, defaulting to 0 for builtins/extension classes and 1 for User,Python classes. This flag would be check in place of the heaptype one when revelant. I'm ready to write the code for this if there is some positive votes, won't bother if everybody is against it. Boris From bh at intevation.de Wed Nov 12 06:51:08 2003 From: bh at intevation.de (Bernhard Herzog) Date: Wed Nov 12 06:51:16 2003 Subject: [Python-Dev] Re: More fun with Python shutdown In-Reply-To: <200311120013.hAC0Drd25804@oma.cosc.canterbury.ac.nz> (Greg Ewing's message of "Wed, 12 Nov 2003 13:13:53 +1300 (NZDT)") References: <200311120013.hAC0Drd25804@oma.cosc.canterbury.ac.nz> Message-ID: <6qn0b1yfdf.fsf@salmakis.intevation.de> Greg Ewing writes: > Maybe the GC should also refuse to collect cycles in which any member > is referenced by a weak reference with an associated callback? Wouldn't it be possible to call the callbacks of all weakrefs that point to a cycle about to be destroyed before that destruction begins? Bernhard -- Intevation GmbH http://intevation.de/ Sketch http://sketch.sourceforge.net/ Thuban http://thuban.intevation.org/ From mwh at python.net Wed Nov 12 07:41:44 2003 From: mwh at python.net (Michael Hudson) Date: Wed Nov 12 07:41:49 2003 Subject: [Python-Dev] Re: other "magic strings" issues In-Reply-To: (David Eppstein's message of "Tue, 11 Nov 2003 10:09:05 -0800") References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com> <200311102251.10904.aleaxit@yahoo.com> <2mhe1buoa5.fsf@starship.python.net> <200311111713.hABHDJt17461@12-236-54-216.client.attbi.com> Message-ID: <2mr80du5br.fsf@starship.python.net> David Eppstein writes: > Ok, it sounds like I am stuck with PyObjC's > NSString.localizedCaseInsensitiveCompare_, since Python's built-in > cmp(unicode,unicode) sucks and locale doesn't provide an alternative. "sucks" is too strong. Maybe there should be better collation support but I don't think we should change the default comparison to do it. Cheers, mwh -- The ability to quote is a serviceable substitute for wit. -- W. Somerset Maugham From mwh at python.net Wed Nov 12 07:43:40 2003 From: mwh at python.net (Michael Hudson) Date: Wed Nov 12 07:43:43 2003 Subject: [Python-Dev] New flag to differentiate Builtins and extensions classes ? In-Reply-To: <3FB1FF9B.7040508@arteris.net> (Boris Boutillier's message of "Wed, 12 Nov 2003 10:38:35 +0100") References: <3FB1FF9B.7040508@arteris.net> Message-ID: <2mn0b1u58j.fsf@starship.python.net> Boris Boutillier writes: > I look into the archives and didn't see any debate on the question, > hope I didn't miss something. Apart from your four(?) posts on the subject and various replies from me and Guido? Cheers, mwh -- This proposal, if accepted, will probably mean a heck of a lot of work for somebody. But since I don't want it accepted, I don't care. -- Laura Creighton, PEP 666 From barry at python.org Wed Nov 12 07:45:00 2003 From: barry at python.org (Barry Warsaw) Date: Wed Nov 12 07:45:11 2003 Subject: [Python-Dev] which sleepycat versions do we support in 2.3.* ? In-Reply-To: <200311120914.08946.aleaxit@yahoo.com> References: <200311120914.08946.aleaxit@yahoo.com> Message-ID: <1068641100.31989.85.camel@anthem> On Wed, 2003-11-12 at 03:14, Alex Martelli wrote: > I believe that setup.py is accurate, README slightly out of date, > Modules/Setup way out of date -- but I thought that double > checking couldn't possibly hurt. So, can I confirm this to the > help@python.org querant, and fix the comments in README (should > it say 3.1 through 4.2, or 3.2 through 4.2, given the "only partial support" > for 3.1?) and Modules/Setup (presumably with a pointer to setup.py)? Greg can give the definitive answer here, but my understanding is that the bsddb wrapper in Python 2.3 probably requires at least BerkeleyDB 3.3.11, supports up to 4.1.25, with the latter recommended (if it were up to me, at least :). The wrapper in Python 2.3.x probably does not support BerkeleyDB 4.2.x. -Barry From aleaxit at yahoo.com Wed Nov 12 08:07:48 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Wed Nov 12 08:08:02 2003 Subject: [Python-Dev] which sleepycat versions do we support in 2.3.* ? In-Reply-To: <1068641100.31989.85.camel@anthem> References: <200311120914.08946.aleaxit@yahoo.com> <1068641100.31989.85.camel@anthem> Message-ID: <200311121407.48286.aleaxit@yahoo.com> On Wednesday 12 November 2003 01:45 pm, Barry Warsaw wrote: > On Wed, 2003-11-12 at 03:14, Alex Martelli wrote: > > I believe that setup.py is accurate, README slightly out of date, > > Modules/Setup way out of date -- but I thought that double > > checking couldn't possibly hurt. So, can I confirm this to the > > help@python.org querant, and fix the comments in README (should > > it say 3.1 through 4.2, or 3.2 through 4.2, given the "only partial > > support" for 3.1?) and Modules/Setup (presumably with a pointer to > > setup.py)? > > Greg can give the definitive answer here, but my understanding is that > the bsddb wrapper in Python 2.3 probably requires at least BerkeleyDB > 3.3.11, supports up to 4.1.25, with the latter recommended (if it were > up to me, at least :). The wrapper in Python 2.3.x probably does not > support BerkeleyDB 4.2.x. Hmmm -- that's bad, because 2.3's setup.py does appear to be looking for 4.2 with priority, so, if that's installed on the user's machine, we might be looking for trouble... Alex From aleaxit at yahoo.com Wed Nov 12 08:12:35 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Wed Nov 12 08:12:43 2003 Subject: [Python-Dev] Re: other "magic strings" issues In-Reply-To: <2mr80du5br.fsf@starship.python.net> References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com> <2mr80du5br.fsf@starship.python.net> Message-ID: <200311121412.35765.aleaxit@yahoo.com> On Wednesday 12 November 2003 01:41 pm, Michael Hudson wrote: > David Eppstein writes: > > Ok, it sounds like I am stuck with PyObjC's > > NSString.localizedCaseInsensitiveCompare_, since Python's built-in > > cmp(unicode,unicode) sucks and locale doesn't provide an alternative. > > "sucks" is too strong. Maybe there should be better collation support > but I don't think we should change the default comparison to do it. That seems sensible to me. However, if we do get stuck with a "comparison function", then sorting may not be quite as smooth (the cf would need to be called for each comparison); it might be better to be able to get something suitable for passing to key= -- i.e., the equivalent of C's strxfrm(), rather than of strcoll(), if one had to choose. Alex From guido at python.org Wed Nov 12 11:08:12 2003 From: guido at python.org (Guido van Rossum) Date: Wed Nov 12 11:08:24 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib sets.py, 1.47, 1.48 In-Reply-To: Your message of "Wed, 12 Nov 2003 07:21:22 PST." References: Message-ID: <200311121608.hACG8Cd20609@12-236-54-216.client.attbi.com> > Modified Files: > sets.py > Log Message: > Improve backwards compatibility code to handle True/False. > > Index: sets.py > =================================================================== > RCS file: /cvsroot/python/python/dist/src/Lib/sets.py,v > retrieving revision 1.47 > retrieving revision 1.48 > diff -C2 -d -r1.47 -r1.48 > *** sets.py 8 Sep 2003 19:16:36 -0000 1.47 > --- sets.py 12 Nov 2003 15:21:20 -0000 1.48 > *************** > *** 74,77 **** > --- 74,81 ---- > if not predicate(x): > yield x > + try: > + True, False > + except NameError: > + True, False = (0==0, 0!=0) > > __all__ = ['BaseSet', 'Set', 'ImmutableSet'] What's this doing in the 2.4 CVS? --Guido van Rossum (home page: http://www.python.org/~guido/) From eppstein at ics.uci.edu Wed Nov 12 12:02:14 2003 From: eppstein at ics.uci.edu (David Eppstein) Date: Wed Nov 12 12:02:17 2003 Subject: [Python-Dev] Re: other "magic strings" issues References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com> <200311102251.10904.aleaxit@yahoo.com> <2mhe1buoa5.fsf@starship.python.net> <200311111713.hABHDJt17461@12-236-54-216.client.attbi.com> <2mr80du5br.fsf@starship.python.net> Message-ID: In article <2mr80du5br.fsf@starship.python.net>, Michael Hudson wrote: > David Eppstein writes: > > > Ok, it sounds like I am stuck with PyObjC's > > NSString.localizedCaseInsensitiveCompare_, since Python's built-in > > cmp(unicode,unicode) sucks and locale doesn't provide an alternative. > > "sucks" is too strong. Maybe there should be better collation support > but I don't think we should change the default comparison to do it. Let me be more specific. Since we have such useful hashing-based dictionary data structures in Python, we don't often need cmp for binary search trees, so the main reason for comparing unicodes (as far as I can tell) is to put them in a logical order for displaying to humans. cmp(unicode,unicode) does a very bad job of this, whenever there are non-ascii characters involved. Its existence tricks you into thinking Python has a useful unicode comparison function when it doesn't. -- David Eppstein http://www.ics.uci.edu/~eppstein/ Univ. of California, Irvine, School of Information & Computer Science From theller at python.net Wed Nov 12 12:13:44 2003 From: theller at python.net (Thomas Heller) Date: Wed Nov 12 12:14:04 2003 Subject: [Python-Dev] More fun with Python shutdown In-Reply-To: (Tim Peters's message of "Tue, 11 Nov 2003 19:01:08 -0500") References: Message-ID: "Tim Peters" writes: > That's almost certainly a bug in Python, but is almost certainly unrelated > to the problem Jim is having. > > I was able to make your test case substantially smaller. The key is that > the "remove" callback trigger gc. Apart from that, it doesn't matter at all > what "remove" does. I don't know what the bug is, though, and since the > last of these consumed more than a day to track down and fix, I don't > anticipate having time to do that again: Thanks. I've submitted a bug http://www.python.org/sf/840829 for it. I have the impression that I'm not able to fix the bug myself, although I consider it a critical bug since it basically makes weakref callbacks unusable because gc can occur at any time. My workaround for now is to disable gc as the fist action in the callback and enable it again as the last action, but I'm unconvinced that this does really help in all cases. Thomas From trentm at ActiveState.com Wed Nov 12 14:27:26 2003 From: trentm at ActiveState.com (Trent Mick) Date: Wed Nov 12 14:31:42 2003 Subject: [Python-Dev] Re: other "magic strings" issues In-Reply-To: <200311120955.49289.aleaxit@yahoo.com>; from aleaxit@yahoo.com on Wed, Nov 12, 2003 at 09:55:49AM +0100 References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com> <200311112009.hABK92M18120@12-236-54-216.client.attbi.com> <3FB144D4.8060307@v.loewis.de> <200311120955.49289.aleaxit@yahoo.com> Message-ID: <20031112112725.A6879@ActiveState.com> [Martin] > > ftp://www-126.ibm.com/pub/icu/2.6.1/icu-2.6.1.zip [Alex Martelli wrote] > For a Linux-on-386 build, I see: > > [alex@lancelot source]$ size /usr/local/lib/libicu*.so.26.1 > text data bss dec hex filename > 8449053 3948 4 8453005 80fb8d /usr/local/lib/libicudata.so.26.1 > 875940 14528 908 891376 d99f0 /usr/local/lib/libicui18n.so.26.1 > 51426 4296 8 55730 d9b2 /usr/local/lib/libicuio.so.26.1 > 145377 4160 4 149541 24825 /usr/local/lib/libicule.so.26.1 > 29860 1244 4 31108 7984 /usr/local/lib/libiculx.so.26.1 > 26339 1004 4 27347 6ad3 /usr/local/lib/libicutoolutil.so.26.1 > 664190 21100 356 685646 a764e /usr/local/lib/libicuuc.so.26.1 > > ... > > Perhaps somebody with a decent Windows platform can measure > this more accurately!-) For a Windows build (on Win2K compiled with VC++ 6): Directory of D:\trentm\tmp\icu\bin 12/11/2003 11:21a . 12/11/2003 11:21a .. 12/11/2003 11:18a 20,480 ctestfw.dll 12/11/2003 11:19a 16,384 decmn.exe 12/11/2003 11:19a 20,480 derb.exe 12/11/2003 11:20a 16,384 genbrk.exe 12/11/2003 11:19a 16,384 genccode.exe 12/11/2003 11:19a 16,384 gencmn.exe 12/11/2003 11:19a 20,480 gencnval.exe 12/11/2003 11:20a 49,152 genidna.exe 12/11/2003 11:19a 20,480 gennames.exe 12/11/2003 11:19a 32,768 gennorm.exe 12/11/2003 11:19a 49,152 genpname.exe 12/11/2003 11:19a 32,768 genprops.exe 12/11/2003 11:19a 69,632 genrb.exe 12/11/2003 11:19a 16,384 gentest.exe 12/11/2003 11:19a 20,480 gentz.exe 12/11/2003 11:19a 24,576 genuca.exe 12/11/2003 11:20a 8,495,104 icudt26l.dll 12/11/2003 11:19a 692,224 icuin26.dll 12/11/2003 11:21a 57,344 icuio26.dll 12/11/2003 11:20a 90,112 icule26.dll 12/11/2003 11:21a 40,960 iculx26.dll 12/11/2003 11:19a 32,768 icutu26.dll 12/11/2003 11:18a 585,728 icuuc26.dll 12/11/2003 11:19a 40,960 makeconv.exe 12/11/2003 11:20a 32,768 pkgdata.exe 12/11/2003 11:21a 45,056 uconv.exe 26 File(s) 10,555,392 bytes Note that I am just stoopidly compiling and reporting here. :) Trent -- Trent Mick TrentM@ActiveState.com From martin at v.loewis.de Wed Nov 12 15:31:42 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Wed Nov 12 15:32:03 2003 Subject: [Python-Dev] Re: other "magic strings" issues In-Reply-To: References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com> <200311102251.10904.aleaxit@yahoo.com> <2mhe1buoa5.fsf@starship.python.net> <200311111713.hABHDJt17461@12-236-54-216.client.attbi.com> <2mr80du5br.fsf@starship.python.net> Message-ID: David Eppstein writes: > Let me be more specific. Since we have such useful hashing-based > dictionary data structures in Python, we don't often need cmp for > binary search trees, so the main reason for comparing unicodes (as far > as I can tell) is to put them in a logical order for displaying to > humans. cmp(unicode,unicode) does a very bad job of this, whenever > there are non-ascii characters involved. Its existence tricks you into > thinking Python has a useful unicode comparison function when it > doesn't. It's useful for sorting, but not for collation. Comparing!=Collating. That said, locale.strcoll does what you want. Regards, Martin From martin at v.loewis.de Wed Nov 12 15:34:59 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Wed Nov 12 15:35:28 2003 Subject: [Python-Dev] Re: other "magic strings" issues In-Reply-To: <200311112009.hABK92M18120@12-236-54-216.client.attbi.com> References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com> <200311102251.10904.aleaxit@yahoo.com> <2mhe1buoa5.fsf@starship.python.net> <200311111713.hABHDJt17461@12-236-54-216.client.attbi.com> <3FB137C1.9000903@v.loewis.de> <200311111956.hABJuZh18034@12-236-54-216.client.attbi.com> <3FB14122.708@v.loewis.de> <200311112009.hABK92M18120@12-236-54-216.client.attbi.com> Message-ID: Guido van Rossum writes: > > More realistically, we could expose wcscoll(3) where available, > > which would extend the Python locale model to Unicode (assuming > > the C library uses Unicode in wchar_t). > > I don't know what that is, but if you recommend it, I support it. I should have remembered this time machine. locale.strcoll already uses wcscoll if the platform supports it, so locale.strcoll should be used for locale-aware collation. locale.strxfrm does not (yet) support Unicode; I'm uncertain whether it should (as you typically use this when presenting sorted lists to the user; displaying them will certainly take much longer than sorting them). Regards, Martin From eppstein at ics.uci.edu Wed Nov 12 17:13:29 2003 From: eppstein at ics.uci.edu (David Eppstein) Date: Wed Nov 12 17:13:32 2003 Subject: [Python-Dev] Re: other "magic strings" issues In-Reply-To: References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com> <200311102251.10904.aleaxit@yahoo.com> <2mhe1buoa5.fsf@starship.python.net> <200311111713.hABHDJt17461@12-236-54-216.client.attbi.com> <2mr80du5br.fsf@starship.python.net> Message-ID: <87474468.1068646409@dhcp31-56.ics.uci.edu> On 11/12/03 9:31 PM +0100 "Martin v. L?wis" wrote: > That said, locale.strcoll does what you want. It does? >>> locale.strcoll(unicode('Universit?t','utf8'),u'University') Traceback (most recent call last): File "", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128) >>> locale.setlocale(locale.LC_COLLATE,'en_US') Traceback (most recent call last): File "", line 1, in ? File "/usr/lib/python2.2/locale.py", line 372, in setlocale return _setlocale(category, locale) locale.Error: locale setting not supported Even if locale would allow me to set a locale, which locale should I set, in order to allow all unicodes (not just e.g. iso-8859-1, but all of them) to be collated in some reasonable order? -- David Eppstein http://www.ics.uci.edu/~eppstein/ Univ. of California, Irvine, School of Information & Computer Science From martin at v.loewis.de Wed Nov 12 18:09:51 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Wed Nov 12 18:10:14 2003 Subject: [Python-Dev] Re: other "magic strings" issues In-Reply-To: <87474468.1068646409@dhcp31-56.ics.uci.edu> References: <6CC39F01DF9C56438FC6B7473A989B63055C13@geex2ku01.agere.com> <200311102251.10904.aleaxit@yahoo.com> <2mhe1buoa5.fsf@starship.python.net> <200311111713.hABHDJt17461@12-236-54-216.client.attbi.com> <2mr80du5br.fsf@starship.python.net> <87474468.1068646409@dhcp31-56.ics.uci.edu> Message-ID: David Eppstein writes: > It does? Sure: Python 2.3 (#26, Aug 1 2003, 09:50:29) [GCC 3.3 20030226 (prerelease) (SuSE Linux)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import locale >>> locale.setlocale(locale.LC_ALL,"") 'LC_CTYPE=de_DE@euro;LC_NUMERIC=de_DE@euro;LC_TIME=de_DE@euro;LC_COLLATE=C;LC_MONETARY=de_DE@euro;LC_MESSAGES=de_DE@euro;LC_PAPER=de_DE@euro;LC_NAME=de_DE@euro;LC_ADDRESS=de_DE@euro;LC_TELEPHONE=de_DE@euro;LC_MEASUREMENT=de_DE@euro;LC_IDENTIFICATION=de_DE@euro' >>> locale.strcoll(u"universit\xe4t",u"University") 32 >>> locale.setlocale(locale.LC_ALL,"en_US") 'en_US' >>> locale.strcoll(u"universit\xe4t",u"University") -24 > Even if locale would allow me to set a locale, which locale should I > set, in order to allow all unicodes (not just e.g. iso-8859-1, but all > of them) to be collated in some reasonable order? Define "reasonable order". There is no "reasonable order" independent of the language. In German, it is just not reasonable to have Japanese characters. Most Germans cannot tell Katakana from Hiragana, so it just does not matter to them how those collate. Likewise, I guess most Japanese won't see a difference between an umlaut and a circumflex. Regards, Martin From greg at cosc.canterbury.ac.nz Wed Nov 12 18:48:20 2003 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed Nov 12 18:49:14 2003 Subject: [Python-Dev] Re: More fun with Python shutdown In-Reply-To: <6qn0b1yfdf.fsf@salmakis.intevation.de> Message-ID: <200311122348.hACNmKV03619@oma.cosc.canterbury.ac.nz> Bernhard Herzog : > Wouldn't it be possible to call the callbacks of all weakrefs that point > to a cycle about to be destroyed before that destruction begins? I'm not sure that would be a good idea, for the same reasons that it wouldn't be a good idea to do the same for __del__ methods. Something might depend on them being called in the right order, or in not being called too soon. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim at zope.com Wed Nov 12 23:47:37 2003 From: tim at zope.com (Tim Peters) Date: Wed Nov 12 23:48:40 2003 Subject: [Python-Dev] Re: More fun with Python shutdown In-Reply-To: <6qn0b1yfdf.fsf@salmakis.intevation.de> Message-ID: [Bernhard Herzog] > Wouldn't it be possible to call the callbacks of all weakrefs that > point to a cycle about to be destroyed before that destruction begins? Yes, but GC couldn't also go on to call tp_clear then -- without deeper changes, the objects would have to leak. Suppose objects I and J have (strong) references to each other -- they form a two-object cycle. Suppose I also holds a weakref to J, with a callback to a method of I. Suppose the cycle becomes unreachable. GC detects that. It can also (with small changes to current code) detect that J has a weakref-associated callback, and invoke it. But when the callback returns, GC must stop trying to make progress: at that point it knows absolutely nothing anymore about the object graph, because there's absolutely nothing a callback can't do. In particular, because the callback in the example is a method of I, it has full access to I (via the callback's "self" argument), and because I has a strong reference to J, it also has full access to J. The callback can resurrect either or both the objects, and/or install new weakref callbacks on either or both, or even break the strong-reference cycle manually so that normal refcounting completely destroys I before the callback returns (although there's an obscure technical reason for why the callback can't completely destroy J before it returns -- I ahd J are different in this one respect). If GC went on to, for example, execute tp_clear on I or J, tp_clear can leave behind an accessible (if the callback resurrected it) insane object, where "insane" means one that a user-- whether in innocence or by hostile design doesn't matter --can exploit to crash the interpreter. For example, Jim has proven that a new-style class object is insane in this way after its tp_clear is invoked, and it's extremely easy to provoke one into segfaulting. Of course that's right out -- we're trying to repair a current segfault, not supply subtler ways to create segfaults. We also have to do this within the boundaries of what can be sold for a bugfix release, so gross changes in semantics are also right out. In particular, we've never said that tp_clear has to leave an object in a usable state, so it would be a hard sell to start to demand that in a bugfix release. Still, I want this to work. There's a saving grace here that __del__ methods don't have: if a __del__ method resurrects an object, there's nothing to stop the __del__ method from getting called again (when the refcount falls to 0 again). But weakref callbacks are *already* one-shot things: a given weakref callback destroys itself as part of the process of getting invoked. So once we've invoked a weakref callback for J, that callback is history. Sick code *in* the callback could install *another* weakref callback on J, so we have to be careful, but J's original callbacks are gone forever, and in almost all code will leave J callback-free. As above, GC cannot go on to call tp_clear after invoking a callback. However, after invoking all the callbacks, it *could* start another "mini" gc cycle, taking the list of cyclic trash as its starting point (as "the generation" to be collected). This is the only way it can know what the post-callback state of the object graph is. In all sane code, this mini-gc run will discover that (a) all this stuff is still cyclic trash, and (b) none of it has weakref-callbacks anymore. *Then* it's safe to run through the list calling tp_clear methods. In sick code (code that resurrects objects via a weakref callback, or registers new weakref callbacks to dead objects via a weakref callback), the mini gc run will automatically remove the resurrected objects from current consideration (they'll move to an older generation as a matter of course). It may even discover that nothing is trash anymore. If so, no harm done: because we haven't called tp_clear on anything, nothing has been damaged. If there's some trash left with (necessarily) new weakref callbacks, we're back to where we started. We *could* proceed the same way then, but I'm afraid that would give actively hostile code a way to put gc into a never-ending loop. Instead I'd simply move those objects into the next generation, and let gc end then. Again, because we haven't called tp_clear on anything, nothing has been damaged in this case either. A subtlety: instead of doing the "mini gc pass", why not just move the leftover objects into an older generation and let gc return right away then? The problem: any weakref callback in any cyclic trash would stop a complete invocation of gc from removing any trash then. A perfectly ordinary, non-hostile program, that happened to create lots of weakref callbacks in cyclic trash could then get into a state where every time gc runs, it finds one of these things, and despite that the app never does anything sick (like resurrecting in a callback), gc would never make any progress. The true purpose of the "mini gc pass" is to ensure that gc does make progress in sane code, and no matter how quickly and sustainedly it creates dead cycles containing weakref callbacks. Terminology subtlety: the "mini" in "mini gc pass" refers to that the generation it starts with is presumably small, not to that this pass has an especially easy time of it. It still has to do all the work of deducing liveness and deadness from scratch. There are no shortcuts it can take here, simply because there's nothing a callback can't do. However, this pass should go quickly: it starts with what *was* entirely trash in cycles, and it's probably still entirely trash in cycles. This is maximally easy for Python's kind of cyclic gc (it chases all and only the objects in the dead cycles then -- it doesn't have to visit any objects outside the dead cycles, *unless* the cycles aren't truly dead anymore). So for sane programs, it adds gc time proportional to the number of pointers in the dead cycles, independent of the total number of objects. All cyclic trash found by all gc invocations consumes a little more time too, because we have to ask each trash object whether it has an associated weakref callback. In most programs, most of the time, the answer will be "no". From tim at zope.com Thu Nov 13 01:44:34 2003 From: tim at zope.com (Tim Peters) Date: Thu Nov 13 01:45:37 2003 Subject: [Python-Dev] Provoking Jim's MRO segfault before shutdown Message-ID: The following program provokes a segfault before shutdown in a release build, or, in a debug build, triggers Assertion failed: mro != NULL, file C:\Code\python\Objects\object.c, line 1225 This is on current 2.4 trunk, so includes the fix checked in on Wednesday for "Thomas Heller's bug". In the "it figures" department: I was never able to provoke Jim's problem on purpose. I was trying to provoke a different failure here, and never got to the point of finishing the code for that purpose. Heh. """ import gc import weakref alist = [] class J(object): pass class II(object): __slots__ = 'J', 'wr' def resurrect(self, ignore): alist.append(self.J) I = II() J.I = I I.J = J I.wr = weakref.ref(J, I.resurrect) del I, J, II gc.collect() print alist """ It's trying to resolve self.J in the callback at the time it dies. Unlike Jim's scenario, the failure here is due to that II is an insane state (the class containing the callback code, not some other class) -- but close enough for me. I doubt the __slots__ declaration is necessary, but it *is* necessary for II to be a new-style class. If you make II an old-style class instead, you get a different surprise in the callback: because tp_clear has already been called on I too the way things work today, and old-style classes look in the instance dict first, the attempt to reference self.J raises AttributeError. There's no way to guess that might happen from staring at the Python code, though (and remember that this is before shutdown! we're all too eager to overlook shutdown failures, but even if we weren't this one is just a result of regular garbage collection while the interpreter and all modules are in perfect shape). The suggested approach in the long earlier email should repair both the segfault and the AttributeError-out-of-thin-air surprises. It would instead result in J's resurrection (with J wholly intact; and I and II would also resurrect, since J has a strong reference to I, and I to II). The specific invocation of gc in which this occurred wouldn't be able to collect anything (at all, even if there were a million other objects in vanilla trash cycles at the time -- they wouldn't get collected until a later run of gc, one that didn't resurrect dead cycles). From tim at zope.com Thu Nov 13 02:17:34 2003 From: tim at zope.com (Tim Peters) Date: Thu Nov 13 02:18:35 2003 Subject: [Python-Dev] Provoking Jim's MRO segfault before shutdown In-Reply-To: Message-ID: [Tim] > ... > The suggested approach in the long earlier email should repair both > the segfault and the AttributeError-out-of-thin-air surprises. ... > The specific invocation of gc in which this occurred wouldn't be able > to collect anything (at all, even if there were a million other objects > in vanilla trash cycles at the time -- they wouldn't get collected > until a later run of gc, one that didn't resurrect dead cycles). Sorry, not so -- the "mini gc pass" of the same gc invocation would collect all million of the other objects in vanilla trash cycles. It's only weakref callbacks sick enough to install brand new weakref callbacks on dead objects that would prevent the other trash from getting collected in the same gc invocation. There wasn't anything like that in the segfaulting program. It's also possible that we could change the weakref implementation to refuse to allow creating new weakrefs while a weakref callback was in progress. But that would be a new restriction; it wouldn't save gc much work (the mini gc pass would still have to do full live-dead analysis on the leftover trash; it would only save that pass from asking the "survivors" whether they grew any new weakref callbacks); and reporting an exception that occurs during gc happens by calling Py_FatalError (it's extreme). From edloper at gradient.cis.upenn.edu Thu Nov 13 04:01:07 2003 From: edloper at gradient.cis.upenn.edu (Edward Loper) Date: Thu Nov 13 03:00:03 2003 Subject: [Python-Dev] add a list.stablesort() method? Message-ID: <3FB34853.4070500@gradient.cis.upenn.edu> Python's list.sort() method has gone through many different incarnations, some of which have been stable, and some of which have not. As of Python 2.3, list.sort() *is* stable, but we're told not to rely on that behavior. [1] In particular, it might change for future versions/alternate implementations of Python. Given that, would it make sense to add a list.stablesort() method? For the current implementation of cPython, it would just be another name for list.sort(). But adding a new name for it has two advantages: 1. If we discover a faster sorting algorithm that's not stable, then future versions of Python can switch list.sort() to use that, but list.stablesort() will still be available for anyone who needs a stable sort. 2. It explicitly marks (to the reader) which sort operations are relying on stability. The main disadvantages that I can think of are: 1. It adds a new method to the list object, which probably won't get used all that often (most tasks don't call for stable sorts). 2. You can already implement a stablesort procedure in Python (albeit less efficiently than the c implementation). [2] 3. If we do add a non-stable sort in the future, we'll need to maintain 2 separate sorting algorithms in listobj.c. Does this seem like a reasonable addition? -Edward [1] [2] From greg at electricrain.com Thu Nov 13 03:30:48 2003 From: greg at electricrain.com (Gregory P. Smith) Date: Thu Nov 13 03:30:56 2003 Subject: [Python-Dev] which sleepycat versions do we support in 2.3.* ? In-Reply-To: <1068641100.31989.85.camel@anthem> References: <200311120914.08946.aleaxit@yahoo.com> <1068641100.31989.85.camel@anthem> Message-ID: <20031113083048.GH26081@zot.electricrain.com> On Wed, Nov 12, 2003 at 07:45:00AM -0500, Barry Warsaw wrote: > On Wed, 2003-11-12 at 03:14, Alex Martelli wrote: > > > I believe that setup.py is accurate, README slightly out of date, > > Modules/Setup way out of date -- but I thought that double > > checking couldn't possibly hurt. So, can I confirm this to the > > help@python.org querant, and fix the comments in README (should > > it say 3.1 through 4.2, or 3.2 through 4.2, given the "only partial support" > > for 3.1?) and Modules/Setup (presumably with a pointer to setup.py)? > > Greg can give the definitive answer here, but my understanding is that > the bsddb wrapper in Python 2.3 probably requires at least BerkeleyDB > 3.3.11, supports up to 4.1.25, with the latter recommended (if it were > up to me, at least :). The wrapper in Python 2.3.x probably does not > support BerkeleyDB 4.2.x. > > -Barry 3.2 - 4.2 should work. 3.1 is too old and not worth the effort to get to work properly again if its even possible. I just removed checks and mention of support for it in 2.4cvs. I added the support for compiling with 4.2.x before 2.3.2 was released. sleepycat gave me a beta 4.2; with luck they'll actually release it for real soon. The python 2.3.3 windows binary distribution should be compiled using 4.1.25 to maintain perfect compatibility with python 2.3-2.3.2. -greg From brian at sweetapp.com Thu Nov 13 04:00:16 2003 From: brian at sweetapp.com (Brian Quinlan) Date: Thu Nov 13 03:57:29 2003 Subject: [Python-Dev] add a list.stablesort() method? In-Reply-To: <3FB34853.4070500@gradient.cis.upenn.edu> Message-ID: <002e01c3a9c4$8f469e80$21795418@dell8200> > As of Python 2.3, list.sort() *is* stable, but we're told not to > rely on that behavior. [1] In particular, it might change for > future versions/alternate implementations of Python. You missed Guido's pronouncement on this issue: http://mail.python.org/pipermail/python-dev/2003-October/038773.html The bottom line is: "OK, I pronounce on this: Python's list.sort() shall be stable." Cheers, Brian From jim at zope.com Thu Nov 13 06:08:11 2003 From: jim at zope.com (Jim Fulton) Date: Thu Nov 13 06:13:02 2003 Subject: [Python-Dev] Re: Provoking Jim's MRO segfault before shutdown In-Reply-To: References: Message-ID: <3FB3661B.7050909@zope.com> Tim Peters wrote: ... > It's trying to resolve self.J in the callback at the time it dies. Unlike > Jim's scenario, the failure here is due to that II is an insane state (the > class containing the callback code, not some other class) -- but close > enough for me. This is exactly like my scenario. The class containing the callback is hosed. In my scenario, I wasn't resuurecting anything though. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From tim at zope.com Thu Nov 13 08:04:17 2003 From: tim at zope.com (Tim Peters) Date: Thu Nov 13 08:04:33 2003 Subject: [Python-Dev] RE: Provoking Jim's MRO segfault before shutdown In-Reply-To: <3FB3661B.7050909@zope.com> Message-ID: [Tim] > ... >> It's trying to resolve self.J in the callback at the time it dies. >> Unlike Jim's scenario, the failure here is due to that II is an >> insane state (the class containing the callback code, not some other >> class) -- but close enough for me. [Jim Fulton] > This is exactly like my scenario. The class containing the callback > is hosed. Ah! I misunderstood. Great , then. > In my scenario, I wasn't resuurecting anything though. Right, I was trying to provoke a different (but related) problem. But it does't matter what the method is named, or what it's trying to do -- it's dying before it gets to the part that would have resurrected something ... for example, this segfaults too: """ import gc import weakref class J(object): pass class II(object): def happy_happy_joy_joy(self, ignore): print self.bunny_rabbit I = II() I.unused = J I.wr = weakref.ref(J, I.happy_happy_joy_joy) del I, J, II print "the sun shines" gc.collect() print "on all the little children" """ Comment out the "del" instead, and then all the little children get to enjoy Mr. Sunshine for the few microseconds it takes to see Mr. Segfault during shutdown instead. Random curiousity: note that this version doesn't set up a cycle *between* J and I (the "J.I = J" line from the original was cut here). It's unclear what "purpose" J serves in this version. Nevertheless, if "I.unsued = J" is also removed, the segfault goes away, and it just delivers a bunny_rabbit AttributeError instead. As is, the strong reference from I to J nudges gc into calling tp_clear on II before breaking cycles causes the refcount on J to fall to 0. From barry at python.org Thu Nov 13 09:13:19 2003 From: barry at python.org (Barry Warsaw) Date: Thu Nov 13 09:13:27 2003 Subject: [Python-Dev] which sleepycat versions do we support in 2.3.* ? In-Reply-To: <20031113083048.GH26081@zot.electricrain.com> References: <200311120914.08946.aleaxit@yahoo.com> <1068641100.31989.85.camel@anthem> <20031113083048.GH26081@zot.electricrain.com> Message-ID: <1068732799.3723.17.camel@anthem> On Thu, 2003-11-13 at 03:30, Gregory P. Smith wrote: > I added the support for compiling with 4.2.x before 2.3.2 was released. > sleepycat gave me a beta 4.2; with luck they'll actually release it for > real soon. Cool! I didn't realize that. -Barry From barry at python.org Thu Nov 13 09:22:40 2003 From: barry at python.org (Barry Warsaw) Date: Thu Nov 13 09:22:53 2003 Subject: [Python-Dev] Provoking Jim's MRO segfault before shutdown In-Reply-To: References: Message-ID: <1068733359.3723.19.camel@anthem> On Thu, 2003-11-13 at 02:17, Tim Peters wrote: > il a later run of gc, one that didn't resurrect dead cycles). > > Sorry, not so -- the "mini gc pass" of the same gc invocation would collect > all million of the other objects in vanilla trash cycles. It's only weakref > callbacks sick enough to install brand new weakref callbacks on dead objects > that would prevent the other trash from getting collected in the same gc > invocation. There wasn't anything like that in the segfaulting program. When Python's shutting down, will there /be/ another GC invocation? -Barry From tim at zope.com Thu Nov 13 10:12:14 2003 From: tim at zope.com (Tim Peters) Date: Thu Nov 13 10:12:35 2003 Subject: [Python-Dev] Provoking Jim's MRO segfault before shutdown In-Reply-To: <1068733359.3723.19.camel@anthem> Message-ID: [Tim] >> Sorry, not so -- the "mini gc pass" of the same gc invocation would >> collect all million of the other objects in vanilla trash cycles. >> It's only weakref callbacks sick enough to install brand new weakref >> callbacks on dead objects that would prevent the other trash from >> getting collected in the same gc invocation. There wasn't anything >> like that in the segfaulting program. [Barry Warsaw] > When Python's shutting down, will there /be/ another GC invocation? New in 2.3, gc is forced twice by Py_Finalize. But it's quite possible for a weakref callback that itself installs new weakref callbacks to objects in unreachable (dead) cycles, and then resurrects those dead objects, to create a situation where no number of gc collections can suffice, not under the proposed scheme, nor under the current scheme, nor under any scheme -- the programmer has then set things up so that, no matter how often we try to clean up the trash, their code keeps resurrecting part of it, then pretends to kill it off again, etc etc. So it's always (under any scheme) possible to write code that will leave a weakref callback uncalled at the time Python does its C-level exit(). But at best, I think that's pathological code. It's not a plausible use case, except to ensure that it's not a way to crash the interpreter. Under the proposed scheme, there's no issue here *except* for code that (ab)uses weakref callbacks to install new weakref callbacks in their bodies, and attaches the callbacks objects that are unreachable from outside a dead clump of cyclic trash containing both the object running the original weakref callback and the object that triggered the weakref callback. BTW, I think Python should drop its second call of garbage collection in Py_Finalize, and *possibly* its first call too. The second call happens after modules have been torn down, so callbacks or __del__ methods run then are quite likely to suffer unexpected exceptions (module globals are None, sys.stdout no longer exists, etc). That second call is what triggered Jim's original segfault; was the cause of the mysterious chain of information-free messages when the Zope3 test suite finished (before we cleaned up forgotten daemon threads); and is the cause of similar new shutdown irritations reported on c.l.py. The first call in Py_Finalize suffers a different problem: because the global C-level "initialized" flag has been set false by the time it's called, any Python-level code run as a result of garbage collection that tries to load a module gets a baffling (to the user) Py_FatalError complaining that Python isn't initialized. I stumbled into that one by accident while trying to reproduce Jim's problem, and that's the only report of it I know of So I'm not excited about that one, but a Py_FatalError at shutdown is sure going to attract attention when somebody else stumbles into it. From tim at zope.com Thu Nov 13 13:03:19 2003 From: tim at zope.com (Tim Peters) Date: Thu Nov 13 13:03:46 2003 Subject: [Python-Dev] subtype_dealloc needs rethinking In-Reply-To: <3FB3661B.7050909@zope.com> Message-ID: We've got multiple segfault problems associated with weakref callbacks, and multiple problems of that kind coming from subtype_dealloc alone. Here's a piece of test_weakref.py in my 2.4 checkout; the first part of the test got fixed yesterday; the second part has not been checked in yet, because it still fails (in a release build it corrupts memory and that may not be visible; in a debug build it reliably segfaults, due to double deallocation): """ def test_sf_bug_840829(self): # "weakref callbacks and gc corrupt memory" # subtype_dealloc erroneously exposed a new-style instance # already in the process of getting deallocated to gc, # causing double-deallocation if the instance had a weakref # callback that triggered gc. # If the bug exists, there probably won't be an obvious symptom # in a release build. In a debug build, a segfault will occur # when the second attempt to remove the instance from the "list # of all objects" occurs. import gc class C(object): pass c = C() wr = weakref.ref(c, lambda ignore: gc.collect()) del c # There endeth the first part. It gets worse. del wr c1 = C() c1.i = C() wr = weakref.ref(c1.i, lambda ignore: gc.collect()) c2 = C() c2.c1 = c1 del c1 # still alive because c2 points to it # Now when subtype_dealloc gets called on c2, it's not enough just # that c2 is immune from gc while the weakref callbacks associated # with c2 execute (there are none in this 2nd half of the test, btw). # subtype_dealloc goes on to call the base classes' deallocs too, # so any gc triggered by weakref callbacks associated with anything # torn down by a base class dealloc can also trigger double # deallocation of c2. del c2 """ There are two identifiable (so far) problems in subtype_dealloc (note that these have nothing to do with Jim's current woes -- those are a different problem with weakref callbacks, and he hasn't yet hit the problems I'm talking about here -- but he will, eventually). 1. A weakref callback can resurrect self, but the code isn't aware of that now. It's not *easy* to resurrect self, and we probably thought it wasn't possible, but it is: if self is in a dead cycle, and the weakref callback invokes a method of an object in that cycle, self is visible to the callback (following the cycle links), and so self can get resurrected by the callback. The callback doesn't have to specifically try to resurrect self, it can happen as a side effect of resurrecting anything in the cycle from which self is reachable. 2. Unlike other dealloc routines, subtype_delloc leaves the object, with refcnt 0, tracked by gc. That's the cause of the now seemingly endless sequence of ways to provoke double deallocation: when a weakref callback is invoked at any time while subtype_dealloc is executing (whether the callback is associated with self, or with anything that dies as a result of any base class cleanup calls), and if gc happens to trigger while the callback is executing, and self happens to be in a generation gc is collecting, then the tracked refcount=0 self looks like garbage to gc, so gc does incref call tp_clear decref on it, and the decref knocks the refcount back down to 0 again thus triggering another deallocation (while the original deallocation is still in progress). To avoid #2, one of these two must be true: A. self is untracked at the time gc happens. B. self has a refcount > 0 at the time gc happens (e.g., the usual "temporarily resurrect" trick). I checked in a 0-byte change yesterday that repaired the first half of the test case, using #A (I simply moved the line that retracks self below the *immediate* weakref callback). But that same approach can't work for the rest of subtype_dealloc, for reasons you explained in a comment at the end of the function. Doing something of the #B flavor appears so far to work (meaning it fixes the rest of the test case, and hasn't triggered a new problem yet): """ Index: typeobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/typeobject.c,v retrieving revision 2.251 diff -u -r2.251 typeobject.c --- typeobject.c 12 Nov 2003 20:43:28 -0000 2.251 +++ typeobject.c 13 Nov 2003 17:57:07 -0000 @@ -667,6 +667,17 @@ goto endlabel; } + /* We're still not out of the woods: anything torn down by slots + * or a base class dealloc may also trigger gc in a weakref callback. + * For reasons explained at the end of the function, we have to + * keep self tracked now. The only other way to make gc harmless + * is to temporarily resurrect self. We couldn't do that before + * calling PyObject_ClearWeakRefs because that function raises + * an exception if its argument doesn't have a refcount of 0. + */ + assert(self->ob_refcnt == 0); + self->ob_refcnt = 1; + /* Clear slots up to the nearest base with a different tp_dealloc */ base = type; while ((basedealloc = base->tp_dealloc) == subtype_dealloc) { @@ -693,6 +704,8 @@ _PyObject_GC_UNTRACK(self); /* Call the base tp_dealloc() */ + assert(self->ob_refcnt == 1); + self->ob_refcnt = 0; assert(basedealloc); basedealloc(self); """ I'm not sure those asserts *can't* trigger, though (well, actually, I'm sure they can, if a weakref callback resurrects self -- but that's a different problem), and the code is getting obscure. Maybe that comes with the territory. So fresh eyeballs would help. The problems with resurrection are related to Jim's problem, in that tp_clear can leave behind insane objects, and those can kill us whether a callback provokes the insanity directly (as in Jim's case), or a resurrected insane object gets provoked sometime later. I sketched a different scheme for solving those in a long msg yesterday (it doesn't involve subtype_dealloc; it involves changing gc to be much more aware of the problems weakref callbacks can create). From tim.one at comcast.net Thu Nov 13 15:17:53 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Nov 13 15:17:58 2003 Subject: [Python-Dev] subtype_dealloc needs rethinking In-Reply-To: Message-ID: [Tim] > ... > There are two identifiable (so far) problems in subtype_dealloc Make that one; I'm convinced the first is bogus. > ... > 1. A weakref callback can resurrect self, but the code isn't aware of > that now. I no longer believe that's possible. While a weakref callback can resurrect objects in dead cycles, a weakref callback called *as a result* of anything subtype_dealloc does cannot resurrect the object subtype_dealloc is tearing down (because self's refcount is legitimately 0 then -- Python code can't get to self, even if it was in a dead cycle, and the weakref callback doesn't pass the dying object to the callback either). That just leaves subtype_dealloc with its problem of allowing gc to believe that self is collectible. I'm feeling more confident about that too after staring at the code more, but a complete fix remains strained. From oussoren at cistron.nl Thu Nov 13 04:59:48 2003 From: oussoren at cistron.nl (Ronald Oussoren) Date: Thu Nov 13 15:23:10 2003 Subject: [Python-Dev] Re: More fun with Python shutdown In-Reply-To: <200311122348.hACNmKV03619@oma.cosc.canterbury.ac.nz> References: <200311122348.hACNmKV03619@oma.cosc.canterbury.ac.nz> Message-ID: <1DB15604-15C0-11D8-A0F2-0003931CFE24@cistron.nl> On 13 nov 2003, at 0:48, Greg Ewing wrote: > Bernhard Herzog : > >> Wouldn't it be possible to call the callbacks of all weakrefs that >> point >> to a cycle about to be destroyed before that destruction begins? > > I'm not sure that would be a good idea, for the same reasons that it > wouldn't be a good idea to do the same for __del__ methods. Something > might depend on them being called in the right order, or in not being > called too soon. But isn't the order in which they are called undefined (for cycles)? Another option would be to record what callbacks you will do and call them after completing the destruction of the cycle. Ronald From tim at zope.com Thu Nov 13 17:18:19 2003 From: tim at zope.com (Tim Peters) Date: Thu Nov 13 17:19:29 2003 Subject: [Python-Dev] subtype_dealloc needs rethinking In-Reply-To: <20031113213711.GA12902@vicky.ecs.soton.ac.uk> Message-ID: [Armin Rigo] > If all these ways involve the GC, Jim's problem does not, but all the "subtype_dealloc vs weakref callback vs cyclic gc" segfaults did. > a solution that would avoid similar problems in potentially other > deallocators might be to fix the GC instead: > >> incref >> call tp_clear >> decref > > This is the only place where the GC explicitely changes reference > counters. > It could just be skipped for objects with null refcount. I really don't like that, because gc isn't broken -- an object with refcount 0 is trash by any reasonable meaning of the word. What I intend to do in 2.4 instead is include a new assert near the start of gc, to verify that none of the refcounts it sees are 0 coming in. That should never happen, the way Python's gc works. > As the GC is the only piece of code that should be able to handle > objects with refcounts of zero (apart from deallocators, but we assume > these ones know what they are doing) this would fix the double- > deallocation issue I agree that it would. > without making subtype_dealloc even more hairy. But subtype_dealloc will never be simple or clear, so if obscure cruft has to be added, I'd rather add it there. Adding a strange special case to gc would spread the obscurity, but it's not a goal to make everything at least a little obscure . Thanks to Neil Schemenauer, the gc code today is remarkably clean and clear. Thanks to Guido, subtype_dealloc is about as clear as it can be . I just checked in another patch for the sequence of problems Thomas Heller is seeing, and I think the final result leaves subtype_dealloc exactly as obscure as it was in 2.3: all this "real fix" amounts to is moving down a line of code to near the end of the function (that's the line retracking self with GC -- it used to do this long before it was necessary to do it, and now it delays doing it until it's actually needed, which is beyond all the code where it's dangerous to do it). From guido at python.org Fri Nov 14 11:03:39 2003 From: guido at python.org (Guido van Rossum) Date: Fri Nov 14 11:13:22 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib modulefinder.py, 1.7, 1.8 In-Reply-To: Your message of "Fri, 14 Nov 2003 02:28:44 PST." References: Message-ID: <200311141603.hAEG3do04761@12-236-54-216.client.attbi.com> > Index: modulefinder.py > =================================================================== > RCS file: /cvsroot/python/python/dist/src/Lib/modulefinder.py,v > retrieving revision 1.7 > retrieving revision 1.8 > diff -C2 -d -r1.7 -r1.8 > *** modulefinder.py 18 Jul 2003 15:31:40 -0000 1.7 > --- modulefinder.py 14 Nov 2003 10:28:42 -0000 1.8 > *************** > *** 211,215 **** > return > modules = {} > ! suffixes = [".py", ".pyc", ".pyo"] > for dir in m.__path__: > try: > --- 211,220 ---- > return > modules = {} > ! # 'suffixes' used to be a list hardcoded to [".py", ".pyc", ".pyo"]. > ! # But we must also collect Python extension modules - although > ! # we cannot separate normal dlls from Python extensions. > ! suffixes = [] > ! for triple in imp.get_suffixes(): > ! suffixes.append(triple[0]) > for dir in m.__path__: > try: Have you tested freeze after this? I'm not sure that receiving extension module files won't confuse it. --Guido van Rossum (home page: http://www.python.org/~guido/) From theller at python.net Fri Nov 14 12:09:23 2003 From: theller at python.net (Thomas Heller) Date: Fri Nov 14 12:09:35 2003 Subject: [Python-Dev] Version number in the release-maint23 branch Message-ID: I'd like to change the version number in the CVS release-maint23 branch to be able to do correct version checks. Currently it is this: /* Version parsed out into numeric values */ #define PY_MAJOR_VERSION 2 #define PY_MINOR_VERSION 3 #define PY_MICRO_VERSION 2 #define PY_RELEASE_LEVEL PY_RELEASE_LEVEL_FINAL #define PY_RELEASE_SERIAL 0 /* Version as a string */ #define PY_VERSION "2.3.2+" Is it ok to change it to the following: /* Version parsed out into numeric values */ #define PY_MAJOR_VERSION 2 #define PY_MINOR_VERSION 3 #define PY_MICRO_VERSION 3 #define PY_RELEASE_LEVEL PY_RELEASE_LEVEL_ALPHA #define PY_RELEASE_SERIAL 0 /* Version as a string */ #define PY_VERSION "2.3.3a0" Thomas From mwh at python.net Fri Nov 14 12:18:06 2003 From: mwh at python.net (Michael Hudson) Date: Fri Nov 14 12:18:13 2003 Subject: [Python-Dev] Version number in the release-maint23 branch In-Reply-To: (Thomas Heller's message of "Fri, 14 Nov 2003 18:09:23 +0100") References: Message-ID: <2mislmrhrl.fsf@starship.python.net> Thomas Heller writes: > Is it ok to change it to the following: Yes. Cheers, mwh :-) -- MARVIN: Do you want me to sit in a corner and rust, or just fall apart where I'm standing? -- The Hitch-Hikers Guide to the Galaxy, Episode 2 From fdrake at acm.org Fri Nov 14 12:18:43 2003 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri Nov 14 12:19:22 2003 Subject: [Python-Dev] Version number in the release-maint23 branch In-Reply-To: References: Message-ID: <16309.3699.852855.738740@grendel.zope.com> Thomas Heller writes: > I'd like to change the version number in the CVS release-maint23 branch > to be able to do correct version checks. ... > Is it ok to change it to the following: Yes. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From theller at python.net Fri Nov 14 11:59:36 2003 From: theller at python.net (Thomas Heller) Date: Fri Nov 14 12:27:14 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib modulefinder.py, 1.7, 1.8 In-Reply-To: <200311141603.hAEG3do04761@12-236-54-216.client.attbi.com> (Guido van Rossum's message of "Fri, 14 Nov 2003 08:03:39 -0800") References: <200311141603.hAEG3do04761@12-236-54-216.client.attbi.com> Message-ID: Guido van Rossum writes: > Have you tested freeze after this? I'm not sure that receiving > extension module files won't confuse it. >From what I remember, freeze has never 'worked' for me on windows - maybe I didn't try hard enough. Apart from that, modulefinder also finds extension modules in other ways, so I would guess freeze must be able to handle them. So, I would like to leave testing freeze to people and on platforms were it actually is used. If this means that this change must be backed out again in the 2.3 branch, so be it. Thomas From pp64 at cornell.edu Fri Nov 14 12:29:02 2003 From: pp64 at cornell.edu (Pavel Pergamenshchik) Date: Fri Nov 14 12:29:10 2003 Subject: [Python-Dev] Getting socket information from socket objects Message-ID: <20031114122902.2a08b3e3.pp64@cornell.edu> Hi. It appears that the easiest way to retrieve family/type/protocol fields from socket objects is this: def getsockinfo(sock): s = `sock._sock` sp = s[1:-1].split(",")[1:] g = {} d = {} for i in sp: exec i.strip() in g, d return (d["family"], d["type"], d["protocol"]) Wouldn't it be nice to have accessors for these fields? My particular use-case is Windows-specific (IO completion port proactor), so winsock API provides this, but I'd rather avoid that crud. Also, exporting getsockaddrarg in socketmodule.c CAPI would be useful, although the only use I can think of is implementing Windows' ConnectEx (which I am doing) From skip at pobox.com Fri Nov 14 12:49:46 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Nov 14 12:50:06 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib modulefinder.py, 1.7, 1.8 In-Reply-To: References: <200311141603.hAEG3do04761@12-236-54-216.client.attbi.com> Message-ID: <16309.5562.644105.6880@montanaro.dyndns.org> >> Have you tested freeze after this? I'm not sure that receiving >> extension module files won't confuse it. Thomas> From what I remember, freeze has never 'worked' for me on Thomas> windows - maybe I didn't try hard enough. Maybe freeze should be deprecated in 2.4. There are other third-party packages (Gordon McMillan's installer and Thomas's py2exe) which do a better job anyway. Does either one use freeze under the covers? Skip From theller at python.net Fri Nov 14 13:13:17 2003 From: theller at python.net (Thomas Heller) Date: Fri Nov 14 13:13:32 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib modulefinder.py, 1.7, 1.8 In-Reply-To: <16309.5562.644105.6880@montanaro.dyndns.org> (Skip Montanaro's message of "Fri, 14 Nov 2003 11:49:46 -0600") References: <200311141603.hAEG3do04761@12-236-54-216.client.attbi.com> <16309.5562.644105.6880@montanaro.dyndns.org> Message-ID: Skip Montanaro writes: > >> Have you tested freeze after this? I'm not sure that receiving > >> extension module files won't confuse it. > > Thomas> From what I remember, freeze has never 'worked' for me on > Thomas> windows - maybe I didn't try hard enough. > > Maybe freeze should be deprecated in 2.4. There are other third-party > packages (Gordon McMillan's installer and Thomas's py2exe) which do a better > job anyway. Does either one use freeze under the covers? Not that I know of (although I'm not sure how installer does it under *nix). But freeze has two advantages (from reading the sources): - it should be able to work everwhere were a C compiler is available - it is able to create true, single file executables. Thomas From guido at python.org Fri Nov 14 13:14:01 2003 From: guido at python.org (Guido van Rossum) Date: Fri Nov 14 13:14:15 2003 Subject: [Python-Dev] Getting socket information from socket objects In-Reply-To: Your message of "Fri, 14 Nov 2003 12:29:02 EST." <20031114122902.2a08b3e3.pp64@cornell.edu> References: <20031114122902.2a08b3e3.pp64@cornell.edu> Message-ID: <200311141814.hAEIE1d05005@12-236-54-216.client.attbi.com> > It appears that the easiest way to retrieve family/type/protocol > fields from socket objects is this: > def getsockinfo(sock): > s = `sock._sock` > sp = s[1:-1].split(",")[1:] > g = {} > d = {} > for i in sp: > exec i.strip() in g, d > return (d["family"], d["type"], d["protocol"]) > Wouldn't it be nice to have accessors for these fields? My > particular use-case is Windows-specific (IO completion port > proactor), so winsock API provides this, but I'd rather avoid that > crud. Sounds like a good idea. Upload your patches to SF! > Also, exporting getsockaddrarg in socketmodule.c CAPI would be > useful, although the only use I can think of is implementing > Windows' ConnectEx (which I am doing) I'm unclear on what you propose here; again, a working patch on SF showing what you propose would help. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Nov 14 13:15:02 2003 From: guido at python.org (Guido van Rossum) Date: Fri Nov 14 13:15:09 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib modulefinder.py, 1.7, 1.8 In-Reply-To: Your message of "Fri, 14 Nov 2003 11:49:46 CST." <16309.5562.644105.6880@montanaro.dyndns.org> References: <200311141603.hAEG3do04761@12-236-54-216.client.attbi.com> <16309.5562.644105.6880@montanaro.dyndns.org> Message-ID: <200311141815.hAEIF2j05017@12-236-54-216.client.attbi.com> > Maybe freeze should be deprecated in 2.4. That might be a good idea. > There are other third-party packages (Gordon McMillan's installer > and Thomas's py2exe) which do a better job anyway. Does either one > use freeze under the covers? No. (Though py2exe uses modulefinder, which is why that's in Lib rather than in Tools/freeze. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Nov 14 13:18:30 2003 From: guido at python.org (Guido van Rossum) Date: Fri Nov 14 13:18:38 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib modulefinder.py, 1.7, 1.8 In-Reply-To: Your message of "Fri, 14 Nov 2003 19:13:17 +0100." References: <200311141603.hAEG3do04761@12-236-54-216.client.attbi.com> <16309.5562.644105.6880@montanaro.dyndns.org> Message-ID: <200311141818.hAEIIU305052@12-236-54-216.client.attbi.com> > But freeze has two advantages (from reading the sources): > - it should be able to work everwhere were a C compiler is available Well, it also uses Make, although I suppose you could easily change it to create a script for some other build tool, as long as it's scriptable. > - it is able to create true, single file executables. Not on Windows unless you have a static build of Python. And not on Unix either unless you have static builds of all extension modules. --Guido van Rossum (home page: http://www.python.org/~guido/) From python at rcn.com Fri Nov 14 19:46:59 2003 From: python at rcn.com (Raymond Hettinger) Date: Fri Nov 14 19:48:04 2003 Subject: list.sort, was Re: [Python-Dev] decorate-sort-undecorate In-Reply-To: <20031113225056.GA11305@vicky.ecs.soton.ac.uk> Message-ID: <004601c3ab11$fa73e980$5204a044@oemcomputer> [Armin Rigo] > from heapq import * > def isorted(iterable): > heap = list(iterable) > heapify(heap) > while heap: > yield heappop(heap) > > This generator is similar to the new list.sorted() but starts yielding > elements after only O(n) operations (in heapify). Certainly not a > candidate > for itertools, but it could be added to heapqmodule.c. There are numerous > cases where this kind of lazy-sorting is interesting, if done reasonably > efficiently (unsurprizingly, this is known as Heap Sort). How much of the iterator can be consumed before it becomes preferable (in terms of speed and memory) to have used iter(list.sort())? My guess is that the break-even point for speed is around 10% depending on how much order already exists in the underlying list. In terms of memory, I think list.sort() always beats the above implementation. Raymond Hettinger From tim at zope.com Fri Nov 14 23:17:45 2003 From: tim at zope.com (Tim Peters) Date: Fri Nov 14 23:18:02 2003 Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc In-Reply-To: Message-ID: I think I have a reasonably elegant scheme to nail this. It's best described as modifications to what cyclic gc already does. So here's a summary, with the new steps identified by [NEW]: Cyclic gc first finds the maximal subset S of the objects in the current generation such that no object in S is reachable from outside of S. S is the (possibly empty) set of cyclic trash in the current generation. Next partition S into 5 ([NEW] -- is 3 today) disjoint sets: 1. Objects with __del__ methods. 2. Objects not in #1 reachable from an object in #1. 3. [NEW] Objects not in #1 or #2 with an associated weakref callback. 4. [NEW] Objects not in #1, #2 or #3 reachable from an object in #3. 5. Objects not in one of the other sets. Then: A. Call tp_clear on each object in set 5 (set 5 may mutate while this is going on, so that needs some care). If an object's refcount isn't 0 after calling its tp_clear, move it to the next older generation (that doesn't preclude that a later tp_clear in this step may reclaim it). B. [NEW] Invoke the callbacks associated with the objects still in set 3. This also needs some care, as the deallocations occurring in step #A may remove objects from set 3, or even just remove the weak references to them so that the objects in set 3 are still there, but no longer have an associated callback. I expect we'd have to contrive code to make that happen, but we have to be safe against every possibility. The callbacks invoked during this step may also remove callbacks from objects in set 3 we haven't yet gotten to, or even add new callbacks to objects in sets 1 through 4. C. [NEW] Move the objects still remaining in sets 3 and 4 to the youngest generation. D. Move the objects still remaining in set 1 to gc.garbage. E. Move the objects still remaining in set 2 to the next (older) generation. That's telegraphic, and is bursting with subtleties. Here are notes on the new subtleties: + A key observation is that running weakref callbacks on the objects in set 3 can't have any effect on the objects in set 5, nor can the states of the objects in set 5 affect what a callback may want to do. This is so because no object in set 5 is reachable from an object in set 3: a callback can neither consult nor alter a set 5 object. So clearing set 5 first (in step A) is harmless, and should allow most cyclic trash in most programs to get collected ASAP. + Clearing the objects in set 5 first is desirable also because doing so may break enough links that objects in sets 1 thru 4 get deallocated naturally (meaning via the usual refcount-falls-to-0 route). Note that it's quite possible that objects in sets 1 thru 4 are reachable from objects in set 5 -- it's the other direction where reachability can't hold (by construction of the partition, not by luck). + By the start of B, tp_clear hasn't been called on anything reachable from sets 3 or 4, so the callbacks "see" wholly intact objects. Nothing visible to the callbacks has been torn down: __dicts__ are still fully populated, __mro__ slots are still as they were, etc. Step B doesn't do any tp_clear itself either, so the only mutations that occur are those performed by the callbacks. If a callback destroys a piece of state some other callback wanted, that's entirely on the user's head. + Because a weakref callback destroys itself after it's called, in non-pathological programs no object in set 3 or 4 will have a weakref callback associated with it at the end of step B. We cannot go on to call tp_clear on these objects, because the instant the first callback returns, we have no idea anymore which of these objects are still part of cyclic trash (the callbacks can resurrect any or all of them, ditto add new callbacks to any/all). Determining whether they are still trash requires doing live/dead analysis over from scratch. Simply moving them into *some* generation ensures that they'll get analyzed again on a future run of cyclic gc. Moving them into the youngest generation is done because they almost certainly are (in almost all programs, almost all of the time) still cyclic trash, and without new weakref callbacks. Putting them in the youngest generation allows them to get reclaimed on the next gc invocation. In steady state for a sane program creating a sustained stream of cyclic trash with associated weakref callbacks, this delays their collection by one gc invocation: the reclamation throughput should equal the rate of trash creation, but there's a one-invocation reclamation latency introduced at the start. There's no new latency in invoking the callbacks. + Because we still won't collect cyclic trash with __del__ methods, or cyclic trash reachable from such trash, we do the partitioning in such a way that weakref callbacks on such trash don't get called at all -- we're not even going to try to reclaim them, so it may be surprising if their callbacks get invoked. OTOH, it may be desired that their callbacks get invoked despite that gc will never try to reclaim them on its own. Tough luck. The callbacks will get invoked if and when the user breaks enough cycles in gc.garbage to avoid running afoul of the __del__ restriction. Objections? Great objections are of two kinds: (1) it won't work; and (2) it can't be sold for a bugfix release. Note that 2.3.2 is segfaulting today, so *something* has to be done for a bugfix release. I don't believe this scheme alters any defined semantics, and to the contrary makes it possible to say for the first time that objects visible to callbacks are never in mysteriously (and undefinedly so) partly-destroyed states. Objecting that the order of callback invocation isn't defined doesn't hold, because the order isn't defined in 2.3.2 either. Tempting as it may be, a scheme that refused to collect cyclic trash with associated weakref callbacks would be an incompatible change; Jim also has a use case for that (a billion lines of Zope3 ). From tim.one at comcast.net Sat Nov 15 03:26:52 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Nov 15 03:27:04 2003 Subject: list.sort, was Re: [Python-Dev] decorate-sort-undecorate In-Reply-To: <004601c3ab11$fa73e980$5204a044@oemcomputer> Message-ID: [Armin Rigo] >> from heapq import * >> def isorted(iterable): >> heap = list(iterable) >> heapify(heap) >> while heap: >> yield heappop(heap) >> >> This generator is similar to the new list.sorted() but starts >> yielding elements after only O(n) operations (in heapify). >> ... [Raymond Hettinger] > How much of the iterator can be consumed before it becomes preferable > (in terms of speed and memory) to have used iter(list.sort())? > > My guess is that the break-even point for speed is around 10% > depending on how much order already exists in the underlying list. This depends so much on the speed of the heap implementation. When it gets into the log-time part, a high multiplicative constant due to fixed overheads makes a slow heap run like a fast heap would if the latter were working on an *exponentially* larger list. I just tried on my laptop, under 2.3.2, with lists of a million random floats. That's a bad case for list.sort() (there's no order to exploit, and it wastes some compares trying to find order to exploit), and is an average case for a heapsort. Even if I only asked for *just* the first element of the sorted result, using sort() and peeling off the first element was about 25% faster than using heapify followed by one heappop. That says something about how dramatic the overheads are in calling Python-coded heap functions (well, it also says something about the amount of effort I put into optimizing list.sort() ). There are deeper problems the heap approach has to fight: 1. A heapsort does substantially more element compares than a mergesort, and element compares are expensive in Python, so that's hard to overcome. 2. Heapsort has terrible spatial locality, and blowing the cache becomes even more important than comparison speed as the number of elements grows large. One of the experiments I did when writing the 2.3 sort was to compare a straight mergesort to an enhanced version of "weak- heap sort". Both of those do close to the theoretical minimum number of compares on random data. Despite that the mergesort moved more memory around, the always-sequential data access in the mergesort left it much faster than the cache-hostile weak- heap sort. A regular heapsort isn't as cache-hostile as a weak-heap sort, but it's solidly on the cache-hostile side of sorting algorithms, and does more compares too. There's another way to get an iterative sort: do an ordinary recursive top-down mergesort, but instead of shuffling sublists in place, *generate* the merge of the subsequences (which are themselves generators, etc). That's a very elegant sort, with the remarkable property that the first element of the final result is generated after doing exactly N-1 compares, which achieves the theoretical minimum for finding the smallest element. Getting result elements after that takes O(log(N)) additional compares each. No array storage is needed beyond the original input list (which isn't changed), but there are O(N) generators hiding in the runtime stack. Alas, for that reason it's impractical for large lists, and the overheads are deadly for short lists. It does enjoy the advantage of beauty . > In terms of memory, I think list.sort() always beats the above > implementation. That can't be -- the heap method only requires a fixed (independent of N) and small amount of working storage. list.sort() may need to allocate O(N) additional temp bytes under the covers (to create a working area for doing merges; it can be expected to allocate 2*N temp bytes for a random array of len N, which is its worst case; if there's a lot of pre-existing order in the input array, it can sometimes get away without allocating any temp space). From arigo at tunes.org Sat Nov 15 06:38:17 2003 From: arigo at tunes.org (Armin Rigo) Date: Sat Nov 15 06:42:08 2003 Subject: [Python-Dev] Small bug -- direct check-in allowed? Message-ID: <20031115113817.GA16190@vicky.ecs.soton.ac.uk> Hello, Just asking because I'm not sure about this rule: is it ok if I just make a check-in without first posting a SF bug or patch report for small bugs with an obvious solution ? In this case: >>> import heapq >>> heapq.heappop(5) Segmentation fault Armin From python at rcn.com Sat Nov 15 07:26:47 2003 From: python at rcn.com (Raymond Hettinger) Date: Sat Nov 15 07:27:10 2003 Subject: [Python-Dev] Small bug -- direct check-in allowed? In-Reply-To: <20031115113817.GA16190@vicky.ecs.soton.ac.uk> Message-ID: <002001c3ab73$bd3a2900$183ac797@oemcomputer> > Just asking because I'm not sure about this rule: is it ok if I just make > a > check-in without first posting a SF bug or patch report for small bugs > with an > obvious solution ? Just fix it. Raymond From tim.one at comcast.net Sat Nov 15 07:32:29 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Nov 15 07:32:35 2003 Subject: [Python-Dev] Small bug -- direct check-in allowed? In-Reply-To: <20031115113817.GA16190@vicky.ecs.soton.ac.uk> Message-ID: [Armin Rigo] > Just asking because I'm not sure about this rule: is it ok if I just > make a check-in without first posting a SF bug or patch report for > small bugs with an obvious solution ? Even large bugs. The question is much more whether it's likely that the change will be controversial. If you're an expert in an area, and want to fix what's obviously a bug, without introducing another bug in the process, and in a way that's obviously an improvement, it's not going to be controversial, and everyone saves time and effort if you just do it. Some of those "obviously"s may be obvious only *to* an expert in the area, but that's OK too -- the non-experts in the area wouldn't follow a report or discussion anyway. > In this case: > > >>> import heapq > >>> heapq.heappop(5) > Segmentation fault It depends on what you do. If, for example, you created a new standard SegfaultError exception, and used a platform-specific memory protection gimmick to raise that instead on your box but not others, you could reasonably expect that to be a controversial change on at least two counts. Then you should bring it up for discussion before doing it. If instead you want to say that, in this context, an integer N should act the same way as range(N) would have acted, and have heappop return 0, then you'd be judged insane if you checked that in, and I'd probably revoke your checkin privileges for your own good . If you want to raise TypeError in this case, great, just do it. From tim.one at comcast.net Sat Nov 15 07:41:44 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Nov 15 07:41:49 2003 Subject: [Python-Dev] RE: [Python-checkins] python/dist/src/Modules heapqmodule.c, 1.1, 1.2 In-Reply-To: Message-ID: > Modified Files: > heapqmodule.c > Log Message: > Verify heappop argument is a list. > > Index: heapqmodule.c > =================================================================== > RCS file: /cvsroot/python/python/dist/src/Modules/heapqmodule.c,v > retrieving revision 1.1 > retrieving revision 1.2 > diff -C2 -d -r1.1 -r1.2 > *** heapqmodule.c 8 Nov 2003 10:24:38 -0000 1.1 > --- heapqmodule.c 15 Nov 2003 12:33:01 -0000 1.2 > *************** > *** 120,123 **** > --- 120,128 ---- > int n; > > + if (!PyList_Check(heap)) { > + PyErr_SetString(PyExc_ValueError, "heap argument must be a list"); > + return NULL; > + } Now *that's* controversial: the complaint is about the type of the argument so should raise TypeError instead. Curiously, the Python version of this module raised a pretty mysterious AttributeError. From python at rcn.com Sat Nov 15 08:00:48 2003 From: python at rcn.com (Raymond Hettinger) Date: Sat Nov 15 08:01:12 2003 Subject: list.sort, was Re: [Python-Dev] decorate-sort-undecorate In-Reply-To: Message-ID: <002301c3ab78$7da69440$183ac797@oemcomputer> > [Armin Rigo] > >> from heapq import * > >> def isorted(iterable): > >> heap = list(iterable) > >> heapify(heap) > >> while heap: > >> yield heappop(heap) > >> > > In terms of memory, I think list.sort() always beats the above > > implementation. > > That can't be -- the heap method only requires a fixed (independent of N) > and small amount of working storage. list.sort() may need to allocate > O(N) > additional temp bytes under the covers (to create a working area for doing > merges; it can be expected to allocate 2*N temp bytes for a random array > of > len N, which is its worst case; if there's a lot of pre-existing order in > the input array, it can sometimes get away without allocating any temp > space). The isorted() generator shown above operates on a copy of the data while list.sort() works in-place. So, my take on it was the isorted() always used 2*N while list.sort() used 2*N only in the worst case. Raymond ################################################################# ################################################################# ################################################################# ##### ##### ##### ################################################################# ################################################################# ################################################################# From python at rcn.com Sat Nov 15 08:32:38 2003 From: python at rcn.com (Raymond Hettinger) Date: Sat Nov 15 08:33:01 2003 Subject: list.sort, was Re: [Python-Dev] decorate-sort-undecorate In-Reply-To: <20031115123758.GB26321@vicky.ecs.soton.ac.uk> Message-ID: <002701c3ab7c$f068cbc0$183ac797@oemcomputer> > Getting the 25 smallest elements: > > min_and_remove_repeatedly(lst, 25) 7.4 > list(itertools.islice(heapsort(lst), 25)) 1.05 > list(itertools.islice(isorted(lst), 25)) 1.03 > list.sorted(lst)[:25] 6.65 > > Getting all elements: > > list(heapsort(lst)) 22.49 > list(isorted(lst)) 26.06 > list.sorted(lst) 6.65 Can you find out at what value of N does the time for the heap approach match the time for the list.sorted() approach. I'm interested to see how close it comes to my original 10% estimate. > While heapsort is not much faster than the Python-coded isorted using the > C heappop, if there is interest I can submit it to SF. Without a much larger speed-up I would recommend against it. This is doubly true for the cases where N==1 or N > len(lst)//10 which are dominated by min() or list.sorted(). Why add a function that is usually the wrong way to do it. The situation is further unbalanced against the heap approach when the problem becomes "get the 25 largest" or for cases where the record comparison costs are more expensive. Raymond ################################################################# ################################################################# ################################################################# ##### ##### ##### ################################################################# ################################################################# ################################################################# ################################################################# ################################################################# ################################################################# ##### ##### ##### ################################################################# ################################################################# ################################################################# From guido at python.org Sat Nov 15 10:43:55 2003 From: guido at python.org (Guido van Rossum) Date: Sat Nov 15 10:44:03 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules heapqmodule.c, 1.1, 1.2 In-Reply-To: Your message of "Sat, 15 Nov 2003 04:33:04 PST." References: Message-ID: <200311151543.hAFFhtv13945@12-236-54-216.client.attbi.com> > + if (!PyList_Check(heap)) { > + PyErr_SetString(PyExc_ValueError, "heap argument must be a list"); > + return NULL; > + } > + As Tim suggested, this should be a TypeError. --Guido van Rossum (home page: http://www.python.org/~guido/) From fincher.8 at osu.edu Sat Nov 15 12:23:02 2003 From: fincher.8 at osu.edu (Jeremy Fincher) Date: Sat Nov 15 11:25:10 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules heapqmodule.c, 1.1, 1.2 In-Reply-To: <200311151543.hAFFhtv13945@12-236-54-216.client.attbi.com> References: <200311151543.hAFFhtv13945@12-236-54-216.client.attbi.com> Message-ID: <200311151223.02746.fincher.8@osu.edu> On Saturday 15 November 2003 10:43 am, Guido van Rossum wrote: > > + if (!PyList_Check(heap)) { > > + PyErr_SetString(PyExc_ValueError, "heap argument must be a list"); > > + return NULL; > > + } > > + > > As Tim suggested, this should be a TypeError. If only lists are allowed, wouldn't we be better off with a better interface than the current one? I thought the point of the current interface was that we could use containers other than lists as long as they defined pop and append methods. Jeremy From anthony at interlink.com.au Sat Nov 15 11:46:06 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Sat Nov 15 11:47:06 2003 Subject: [Python-Dev] Version number in the release-maint23 branch In-Reply-To: Message-ID: <200311151646.hAFGk6V2012644@localhost.localdomain> >>> Thomas Heller wrote > I'd like to change the version number in the CVS release-maint23 branch > to be able to do correct version checks. > [ switch from e.g. 2.3.2+ to 2.3.3a0 straight after release of 2.3.2 ] Should we make this official? In that case, after a major release, should the version go from, e.g. 2.4b1 -> 2.4b2 -> 2.4c1 -> 2.4 -> 2.4.1a0 ? Or should that only happen on the maint branch, and the trunk would go to 2.5a0? Consistently-hobgoblinish, Anthony -- Anthony Baxter It's never too late to have a happy childhood. From anthony at interlink.com.au Sat Nov 15 11:51:28 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Sat Nov 15 11:52:10 2003 Subject: [Python-Dev] Small bug -- direct check-in allowed? In-Reply-To: <20031115113817.GA16190@vicky.ecs.soton.ac.uk> Message-ID: <200311151651.hAFGpSFH012700@localhost.localdomain> >>> Armin Rigo wrote > Hello, > > Just asking because I'm not sure about this rule: is it ok if I just make a > check-in without first posting a SF bug or patch report for small bugs with an > obvious solution ? One twist to this - as someone who does release management, I'd prefer that if the bug has been in a released version of Python, it has a bug # that can be referenced in the NEWS file for a release. If, as in this case, it's in stuff that's never been released (I assume the bug is in Raymond's new C-code heapq module), I don't particularly care. If others agree with this, perhaps it should go in the developer docs on the website... Anthony -- Anthony Baxter It's never too late to have a happy childhood. From raymond.hettinger at verizon.net Sat Nov 15 13:19:39 2003 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Sat Nov 15 13:20:56 2003 Subject: [Python-Dev] set() and frozenset() Message-ID: <001401c3aba5$28a8a3c0$183ac797@oemcomputer> The C implementation of sets is now ready for your experimentation and comment. If fulfills most of Greg's proposal at: http://www.python.org/peps/pep-0218.html The files are at: nondist\sandbox\setobj Build it with: python setup.py build -g install Test it with: python_d test_set.py The differences from sets.py are: . The containers are now named set() and frozenset(). The lowercase convention is used to make the names consistent with other builtins like list() and dict(). User feedback said that the name ImmutableSet was unwieldy (?sp), so frozenset() was chosen as a more succinct alternative. . There is no set.update() method because that duplicated set.union_update(). . There is no automatic conversion from the mutable type to the non-mutable type. User feedback revealed that this was never needed. Also, removing it simplified the code considerably. The result is more straight-forward and a lot less magical. David Eppstein provided code examples demonstrating that set() and frozenset() are just right for implementing common graph algorithms and NFA/DFAs. . The __deepcopy__() method will be implemented in copy.py instead of setmodule.c. This is consistent with other builtin containers and keeps all the deepcopying knowledge in one place. Also, the code is much simpler in pure python and I wanted avoid importing the copy module inside setobject.c. . The __getstate__() and __setstate__() methods were replaced by __reduce__(). Pickle sizes were made much smaller by saving just the keys instead of key:True pairs. . There is no equivalent of BaseSet. This saves adding another builtin and it is not a burden to write isinstance(s, (set, frozenset)). The difference from PEP 218 is: . There is not a special syntax for constructing sets. Once generator expressions are implemented, special notations become superfluous. It is simple enough to write: s = set(iterable). Though the implementation is basically done and ready for you guys to experiment with, I still have a few open items: . Expand the unittests to include all of the applicable tests from the existing test_sets.py . Refactor the error exits to use goto and XDECREF. . Do one more detailed (line-by-line review). . Write the docs. . Recast the extension module to be a builtin object. . Note, the original sets.py will be left unchanged so that code written for it will continue to run without modification. For those interested in speed and pickle size, it is simple enough to search-and-replace "sets.Set" with "set". Raymond Hettinger From python at rcn.com Sat Nov 15 14:02:01 2003 From: python at rcn.com (Raymond Hettinger) Date: Sat Nov 15 14:02:29 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modulesheapqmodule.c, 1.1, 1.2 In-Reply-To: <200311151223.02746.fincher.8@osu.edu> Message-ID: <001901c3abaa$f4058ba0$183ac797@oemcomputer> > I thought the point of the current interface was > that > we could use containers other than lists as long as they defined pop and > append methods. It would need __len__(), __getitem__(), __setitem__(), append(), and pop(). Right now, any list of subclass of list will do. That helps the current implementation run faster. I think polymorphism is more important for the contents of the container rather than the container itself. The objects inside the container need only define __le__(). Raymond From python at rcn.com Sat Nov 15 14:48:45 2003 From: python at rcn.com (Raymond Hettinger) Date: Sat Nov 15 14:49:09 2003 Subject: [Python-Dev] set() and frozenset() addenda In-Reply-To: <001401c3aba5$28a8a3c0$183ac797@oemcomputer> Message-ID: <001e01c3abb1$7b4534c0$183ac797@oemcomputer> [My previous note] > The differences from sets.py are: Also, there is no _repr(sorted=True) method. That need is already met by list.sorted(s). Raymond From barry at python.org Sat Nov 15 15:09:08 2003 From: barry at python.org (Barry Warsaw) Date: Sat Nov 15 15:09:14 2003 Subject: [Python-Dev] Version number in the release-maint23 branch In-Reply-To: <200311151646.hAFGk6V2012644@localhost.localdomain> References: <200311151646.hAFGk6V2012644@localhost.localdomain> Message-ID: <1068926947.990.99.camel@anthem> On Sat, 2003-11-15 at 11:46, Anthony Baxter wrote: > >>> Thomas Heller wrote > > I'd like to change the version number in the CVS release-maint23 branch > > to be able to do correct version checks. > > [ switch from e.g. 2.3.2+ to 2.3.3a0 straight after release of 2.3.2 ] > > Should we make this official? In that case, after a major release, should > the version go from, e.g. 2.4b1 -> 2.4b2 -> 2.4c1 -> 2.4 -> 2.4.1a0 ? Or > should that only happen on the maint branch, and the trunk would go to > 2.5a0? Yes, I think so. After a release, branch to 2.x.1a0 and trunk to 2.x+1a0 -Barry From guido at python.org Sat Nov 15 16:56:39 2003 From: guido at python.org (Guido van Rossum) Date: Sat Nov 15 16:56:52 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modulesheapqmodule.c, 1.1, 1.2 In-Reply-To: Your message of "Sat, 15 Nov 2003 14:02:01 EST." <001901c3abaa$f4058ba0$183ac797@oemcomputer> References: <001901c3abaa$f4058ba0$183ac797@oemcomputer> Message-ID: <200311152156.hAFLud014391@12-236-54-216.client.attbi.com> > > I thought the point of the current interface was that we could use > > containers other than lists as long as they defined pop and append > > methods. > > It would need __len__(), __getitem__(), __setitem__(), append(), and > pop(). Right now, any list of subclass of list will do. That helps the > current implementation run faster. > > I think polymorphism is more important for the contents of the container > rather than the container itself. The objects inside the container need > only define __le__(). Well, of course. There *is* the theoretical objection that the old heapq.py would work with any mutable sequence supporting append() and pop() -- but I expect that is indeed purely a theoretical objection. When I first introduced heapq.py, I briefly considered making it a list subclass, but it didn't seem worth it (especially since the class version would likely be slower). But maybe for the C implementation this makes more sense, especially since it only allows lists or list subclasses anyway...? --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Sat Nov 15 17:28:46 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Sat Nov 15 17:29:11 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib modulefinder.py, 1.7, 1.8 In-Reply-To: <16309.5562.644105.6880@montanaro.dyndns.org> References: <200311141603.hAEG3do04761@12-236-54-216.client.attbi.com> <16309.5562.644105.6880@montanaro.dyndns.org> Message-ID: Skip Montanaro writes: > Maybe freeze should be deprecated in 2.4. There are other third-party > packages (Gordon McMillan's installer and Thomas's py2exe) which do a better > job anyway. I very much question that these other packages are "better", in all possible respects. In terms of usability for the developer, perhaps, but not in terms of quality of the resulting binary. So please keep freeze. Regards, Martin From martin at v.loewis.de Sat Nov 15 17:30:13 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Sat Nov 15 17:31:04 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib modulefinder.py, 1.7, 1.8 In-Reply-To: <200311141818.hAEIIU305052@12-236-54-216.client.attbi.com> References: <200311141603.hAEG3do04761@12-236-54-216.client.attbi.com> <16309.5562.644105.6880@montanaro.dyndns.org> <200311141818.hAEIIU305052@12-236-54-216.client.attbi.com> Message-ID: Guido van Rossum writes: > > - it is able to create true, single file executables. > > Not on Windows unless you have a static build of Python. And not on > Unix either unless you have static builds of all extension modules. Anybody using freeze should be able to arrange that these conditions are met. It is even possible to freeze Tcl into the resulting binary. Regards, Martin From magnus at hetland.org Sat Nov 15 18:18:03 2003 From: magnus at hetland.org (Magnus Lie Hetland) Date: Sat Nov 15 18:18:22 2003 Subject: [Python-Dev] Re: set() and frozenset() Message-ID: <20031115231803.GA21142@idi.ntnu.no> Great to see that these two will be in place soon! -- Magnus Lie Hetland "In this house we obey the laws of http://hetland.org thermodynamics!" Homer Simpson From tim.one at comcast.net Sat Nov 15 18:32:30 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Nov 15 18:32:27 2003 Subject: list.sort, was Re: [Python-Dev] decorate-sort-undecorate In-Reply-To: <002301c3ab78$7da69440$183ac797@oemcomputer> Message-ID: [Armin Rigo] >>>> from heapq import * >>>> def isorted(iterable): >>>> heap = list(iterable) >>>> heapify(heap) >>>> while heap: >>>> yield heappop(heap) [Raymond Hettinger] >>> In terms of memory, I think list.sort() always beats the above >>> implementation. [Tim] >> That can't be -- the heap method only requires a fixed (independent >> of N) and small amount of working storage. list.sort() may need to >> allocate O(N) additional temp bytes under the covers (to create a >> working area for doing merges; it can be expected to allocate 2*N >> temp bytes for a random array of len N, which is its worst case; if >> there's a lot of pre-existing order in the input array, it can >> sometimes get away without allocating any temp space). [Raymond] > The isorted() generator shown above operates on a copy of the data > while list.sort() works in-place. So, my take on it was the > isorted() always used 2*N while list.sort() used 2*N only in the > worst case. Ah. But that's comparing apples and donkeys: Armin's example works on any iterable, while list.sort() only works on lists. I assumed that by "list.sort()" you meant "the obvious method *based* on list.sort() also accepting any iterable", i.e., def isorted(iterable): copy = list(iterable) copy.sort() for x in copy: yield x Then it's got all the space overhead of the list copy in Armin's version, plus the additional hidden temp memory allocated by sort. Something to note: most applications that only want the "first N" or "last N" values in sorted order know N in advance, and that's highly exploitable. David Eppstein and I had a long thread about that here a while back. The example of implementing an "N-best queue" in the heapq test suite is a much better use of heaps when N is known, accepting an iterable directly (without turning it into a list first), and using storage for only N items. When N is (as is typical) much smaller than the total number of elements, that method can beat the pants off list.sort() even with the Python implementation of heaps. Indeed, Guido and I used that method for production code in Zope's full-text search subsystem (find the N best matches to a search query over some 10-200K documents). David presented a method that ran even faster, provided it was coded just right, based on doing quicksort-like partitioning steps on a buffer of about 3*N values. That also uses total space proportional to N (independent of the total number of incoming elements). A heap-based N-best queue would probably beat that again now that heaps are implemented in C. OTOH, if we implemented a quicksort-like partitioning routine in C too ... (it also suffers from gobs of fiddly little integer arithmetic and simple array indexing, which screams in C). From anthony at interlink.com.au Sun Nov 16 03:07:44 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Sun Nov 16 03:08:34 2003 Subject: [Python-Dev] sqlite into std library for 2.4? Message-ID: <200311160807.hAG87ion025129@localhost.localdomain> I'd like to suggest we include sqlite in the standard library for 2.4. It's maintained, is a full-featured SQL database with a very small footprint and very little needed in the way of dead chickens to get it up and running. Anyone else? Anthony From skip at manatee.mojam.com Sun Nov 16 08:01:10 2003 From: skip at manatee.mojam.com (Skip Montanaro) Date: Sun Nov 16 08:01:18 2003 Subject: [Python-Dev] Weekly Python Bug/Patch Summary Message-ID: <200311161301.hAGD1AJi023783@manatee.mojam.com> Bug/Patch Summary ----------------- 580 open / 4343 total bugs (+68) 196 open / 2455 total patches (+19) New Bugs -------- Typos in the docs (Extending/Embedding + Python/C API) (2003-11-09) http://python.org/sf/838938 Document that highly recursive data cannot be pickled (2003-11-09) http://python.org/sf/839075 attempt to access sys.argv when it doesn't exist (2003-11-10) http://python.org/sf/839151 interators broken for weak dicts (2003-11-10) http://python.org/sf/839159 SimpleHTTPServer reports wrong content-length for text files (2003-11-10) http://python.org/sf/839496 Bug in type's GC handling causes segfaults (2003-11-10) http://python.org/sf/839548 String formatting operator % badly documented (2003-11-10) http://python.org/sf/839585 Windows non-MS compiler doc updates (2003-11-10) http://python.org/sf/839709 MacPython installer: disk image does not mount from NFS (2003-11-11) http://python.org/sf/839865 Incorrect shared library build (2003-11-11) http://python.org/sf/840065 weakref callbacks and gc corrupt memory (2003-11-12) http://python.org/sf/840829 xmlrpclib chokes on Unicode keys (2003-11-13) http://python.org/sf/841757 -O breaks bundlebuilder --standalone (2003-11-13) http://python.org/sf/841800 PackMan database for panther misses devtools dep (2003-11-14) http://python.org/sf/842116 logging.shutdown() exception (2003-11-14) http://python.org/sf/842170 Digital Unix build fails to create ccpython.o (2003-11-14) http://python.org/sf/842171 optparser help formatting nit (2003-11-14) http://python.org/sf/842213 xmlrpclib and backward compatibility (2003-11-14) http://python.org/sf/842600 Windows mis-installs to network drive (2003-11-14) http://python.org/sf/842629 New Patches ----------- Footnote on bug in Mailbox with Windows text-mode files (2003-11-09) http://python.org/sf/838910 Cross building python for mingw32 (2003-11-13) http://python.org/sf/841454 Differentiation between Builtins and extension classes (2003-11-13) http://python.org/sf/841461 One more patch for --enable-shared (2003-11-13) http://python.org/sf/841807 reflect the removal of mpz (2003-11-14) http://python.org/sf/842567 NameError in the example of sets module (2003-11-15) http://python.org/sf/842994 doc fixes builtin super and string.replace (2003-11-16) http://python.org/sf/843088 Closed Bugs ----------- Closed Patches -------------- imaplib : Add support for the THREAD command (2003-08-31) http://python.org/sf/798297 invalid use of setlocale (2003-09-11) http://python.org/sf/804543 From barry at python.org Sun Nov 16 12:27:13 2003 From: barry at python.org (Barry Warsaw) Date: Sun Nov 16 12:27:25 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Doc/lib libfuncs.tex,1.151,1.152 In-Reply-To: References: Message-ID: <1069003633.990.106.camel@anthem> On Sun, 2003-11-16 at 11:17, rhettinger@users.sourceforge.net wrote: > Update of /cvsroot/python/python/dist/src/Doc/lib > In directory sc8-pr-cvs1:/tmp/cvs-serv13946/Doc/lib > > Modified Files: > libfuncs.tex > Log Message: > * Migrate set() and frozenset() from the sandbox. > * Install the unittests, docs, newsitem, include file, and makefile update. > * Exercise the new functions whereever sets.py was being used. > > Includes the docs for libfuncs.tex. Separate docs for the types are > forthcoming. Okay, I must have missed the discussion on these, but why are these so important that they should be in builtins? -Barry From DavidA at ActiveState.com Sun Nov 16 14:01:11 2003 From: DavidA at ActiveState.com (David Ascher) Date: Sun Nov 16 13:41:28 2003 Subject: [Python-Dev] sqlite into std library for 2.4? In-Reply-To: <200311160807.hAG87ion025129@localhost.localdomain> References: <200311160807.hAG87ion025129@localhost.localdomain> Message-ID: <3FB7C977.7090004@ActiveState.com> Anthony Baxter wrote: > I'd like to suggest we include sqlite in the standard library for 2.4. > > It's maintained, is a full-featured SQL database with a very small footprint > and very little needed in the way of dead chickens to get it up and running. FYI, it will be part of PHP 5, IIRC. --da From eppstein at ics.uci.edu Sun Nov 16 13:42:09 2003 From: eppstein at ics.uci.edu (David Eppstein) Date: Sun Nov 16 13:42:12 2003 Subject: [Python-Dev] Re: set() and frozenset() References: <001401c3aba5$28a8a3c0$183ac797@oemcomputer> Message-ID: In article <001401c3aba5$28a8a3c0$183ac797@oemcomputer>, "Raymond Hettinger" wrote: > The C implementation of sets is now ready for your experimentation and > comment. If fulfills most of Greg's proposal at: > http://www.python.org/peps/pep-0218.html ... > The differences from sets.py are: > > . The containers are now named set() and frozenset(). The lowercase > convention is used to make the names consistent with other builtins like > list() and dict(). User feedback said that the name ImmutableSet was > unwieldy (?sp), so frozenset() was chosen as a more succinct > alternative. I for one found it difficult to remember whether it was Immutable or Immutible. > . There is no automatic conversion from the mutable type to the > non-mutable type. User feedback revealed that this was never needed. More than never needed, I would find it confusing to put a set into a dictionary or whatever and then find that some other object has been put there in its place. > Also, removing it simplified the code considerably. The result is more > straight-forward and a lot less magical. David Eppstein provided code > examples demonstrating that set() and frozenset() are just right for > implementing common graph algorithms and NFA/DFAs. Well, I used Set and ImmutableSet, but yes, they're very useful. Thanks to Raymond for adding the backward compatibility to Python 2.2 needed for me to try this out. FWIW, I wrote another one yesterday, using a set partition refinement technique for recognizing chordal graphs; the code is at http://www.ics.uci.edu/~eppstein/PADS/Chordal.py, with subroutines in LexBFS.py, PartitionRefinement.py, and Sequence.py. The same partition refinement technique shows up in other algorithms including DFA minimization and would be quite painful without sets. Sets seems to me to be as fundamental a data structure as lists and dictionaries, and I'm enthusiastic about this becoming built in and faster. I would have liked to see {1,2,3} type syntax for sets, but the set/frozenset issue makes that a little problematic and perhaps the new iterator expressions make it unnecessary. -- David Eppstein http://www.ics.uci.edu/~eppstein/ Univ. of California, Irvine, School of Information & Computer Science From allison at sumeru.stanford.EDU Sun Nov 16 13:59:25 2003 From: allison at sumeru.stanford.EDU (Dennis Allison) Date: Sun Nov 16 13:59:37 2003 Subject: [Python-Dev] sqlite into std library for 2.4? In-Reply-To: <200311160807.hAG87ion025129@localhost.localdomain> Message-ID: I've no used it -- I just downloaded it to test -- but it looks like a very good candidate for inclusion. -d On Sun, 16 Nov 2003, Anthony Baxter wrote: > > I'd like to suggest we include sqlite in the standard library for 2.4. > > It's maintained, is a full-featured SQL database with a very small footprint > and very little needed in the way of dead chickens to get it up and running. > > Anyone else? > Anthony > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/allison%40sumeru.stanford.edu > From gh at ghaering.de Sun Nov 16 14:17:38 2003 From: gh at ghaering.de (=?ISO-8859-1?Q?Gerhard_H=E4ring?=) Date: Sun Nov 16 14:17:44 2003 Subject: [Python-Dev] sqlite into std library for 2.4? In-Reply-To: <200311160807.hAG87ion025129@localhost.localdomain> References: <200311160807.hAG87ion025129@localhost.localdomain> Message-ID: <3FB7CD52.1080200@ghaering.de> Anthony Baxter wrote: > I'd like to suggest we include sqlite in the standard library for 2.4. > > It's maintained, is a full-featured SQL database with a very small footprint > and very little needed in the way of dead chickens to get it up and running. I'm the (currently only active) PySQLite maintainer, so I think I'm qualified to comment on this ;) Before we can think about including this into the Python distribution there are two things I'd need to do: - code cleanup and documentatino (inline documentation is quite sparse) - writing documentation (the PySQLite documentation is quite outdated, and doesn't cover the advanced nonstandard features, like writing aggregates/functions in Python, etc.) Inclusion in the Python standard library means an API freeze. I'm not sure all of PySQLite has the best interfaces, yet. One solution could be to only document the parts where we consider the API *stable*. Last, but not least, I don't see the tremendous benefit of a simple embedded SQL database in the Python standard distribution. Sure, Windows users would have to download one thing less, but for Unix users nothing much will change, because we'd most probably still require an existing SQLite installation. And SQLite is nothing that you can expect being installed, anyway, like BSDdb is. So, more or less, Unix users will only save downloading PySQLite separately. -- Gerhard From tismer at tismer.com Sun Nov 16 20:02:48 2003 From: tismer at tismer.com (Christian Tismer) Date: Sun Nov 16 20:02:56 2003 Subject: [Python-Dev] _Py_RefTotal wrong in Py 2.2.3 Message-ID: <3FB81E38.9000505@tismer.com> Hi friends, over the weekend, I hacked quite a lot on Stackless with Python 2.2.3, in order to get rid of refcounting problems with thread pickling. It tuned out that code objects created wrong refcounts when unpickling them. I debuged this down to the very end, until I was sure my stuff is doing it right. Then I added a small function that recomputes the actual total refcounts from the chained list of all objects, and it turned out to be correct (and also my pickling), but _Py_RefTotal is different. Before I invest more time into this, please let me know: Is this a known problem which is solved by moving to Python 2.3.2, or should I try to find the bug? I know this is hard to debug for anybody but me, since pickling of code objects is a Stackless only feature. The key might be here: void _Py_NewReference(PyObject *op) { _Py_RefTotal++; op->ob_refcnt = 1; op->_ob_next = refchain._ob_next; op->_ob_prev = &refchain; refchain._ob_next->_ob_prev = op; refchain._ob_next = op; #ifdef COUNT_ALLOCS inc_count(op->ob_type); #endif } It might be that at some place, this function is used when the refcount is not zero, but I don't know. This would get _Py_RefTotal and the real refcounts out of sync. Many thanks -- chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From greg at electricrain.com Sun Nov 16 21:05:21 2003 From: greg at electricrain.com (Gregory P. Smith) Date: Sun Nov 16 21:05:27 2003 Subject: [Python-Dev] sqlite into std library for 2.4? In-Reply-To: <3FB7CD52.1080200@ghaering.de> References: <200311160807.hAG87ion025129@localhost.localdomain> <3FB7CD52.1080200@ghaering.de> Message-ID: <20031117020521.GB3366@zot.electricrain.com> On Sun, Nov 16, 2003 at 08:17:38PM +0100, Gerhard H?ring wrote: > > Inclusion in the Python standard library means an API freeze. I'm not > sure all of PySQLite has the best interfaces, yet. One solution could be > to only document the parts where we consider the API *stable*. > > Last, but not least, I don't see the tremendous benefit of a simple > embedded SQL database in the Python standard distribution. Sure, Windows > users would have to download one thing less, but for Unix users nothing > much will change, because we'd most probably still require an existing > SQLite installation. And SQLite is nothing that you can expect being > installed, anyway, like BSDdb is. So, more or less, Unix users will only > save downloading PySQLite separately. Agreed. I love SQLite (though i've not yet used it with python) but I don't think it needs to be bundled as part of the standard dist. Its an easy add-on. Perhaps it could just get a mention and a hyperlink in the python documentation (where?) as a suggested embedded SQL database. One thing that would change my mind about inclusion is if a python library similar to 'SQLObject' or 'orm' were of in good enough shape to be included at the same time. Both provide an object oriented abstraction to a database preventing you from needing to write any SQL in most cases; similar to perl's Class::DBI package. -g From jeremy at zope.com Sun Nov 16 21:47:12 2003 From: jeremy at zope.com (Jeremy Hylton) Date: Sun Nov 16 21:50:31 2003 Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc In-Reply-To: References: Message-ID: <1069037232.6983.1.camel@localhost.localdomain> On Fri, 2003-11-14 at 23:17, Tim Peters wrote: > Objections? None here, but you knew that. Everyone seems to be interested in this topic, though. How hard is the implementation going to be? Jeremy From tim at zope.com Sun Nov 16 22:24:33 2003 From: tim at zope.com (Tim Peters) Date: Sun Nov 16 22:24:33 2003 Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc In-Reply-To: <1069037232.6983.1.camel@localhost.localdomain> Message-ID: [Tim, sketches a Grand Scheme for making weakref callbacks from cyclic garbage wholly sane, then asks ..] >> Objections? [Jeremy Hylton] > None here, but you knew that. Great! > Everyone seems to be interested in this topic, though. Then I expect everyone to volunteer to test the patch . > How hard is the implementation going to be? I just made a patch while running my final tests, so I have a pretty solid proof that what I sketched was implementable . It's exactly the scheme I described, and the coding went smoothly because it was something that could be (and was) fully thought-out in advance. That doesn't rule out conceptual or coding errors, though. Now I'll stop typing until I know whether all the tests pass ... OK, here's the patch: http://www.python.org/sf/843455 I asked especially for Neal's (Mr. GC) and Fred's (Mr. WeakRef) reviews, but all reviews are welcome. From rohit.nadhani at tallysolutions.com Mon Nov 17 04:22:05 2003 From: rohit.nadhani at tallysolutions.com (RN) Date: Mon Nov 17 04:30:39 2003 Subject: [Python-Dev] Variable Scope Message-ID: I have a 2 Python scripts that contain the following lines: test.py ------- from testmod import * a1 = 10 modfunc() testmod.py ----------- def modfunc(): print a1 When I run test.py, it returns the following error: File "testmod.py", line 2, in modfunc print a1 NameError: global name 'a1' is not defined My intent is to make a1 a global variable - so that I can access its value in all functions of imported modules. What should I do? Thanks in advance, Rohit From martin at v.loewis.de Mon Nov 17 05:15:29 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Mon Nov 17 05:17:15 2003 Subject: [Python-Dev] Variable Scope In-Reply-To: References: Message-ID: "RN" writes: > My intent is to make a1 a global variable - so that I can access its value > in all functions of imported modules. What should I do? Please post the question to python-list@python.org. python-dev is for the development *of* Python, not for the development *with* Python. Regards, Martin From arigo at tunes.org Mon Nov 17 06:03:08 2003 From: arigo at tunes.org (Armin Rigo) Date: Mon Nov 17 06:07:28 2003 Subject: [Python-Dev] _Py_RefTotal wrong in Py 2.2.3 In-Reply-To: <3FB81E38.9000505@tismer.com> References: <3FB81E38.9000505@tismer.com> Message-ID: <20031117110308.GA31680@vicky.ecs.soton.ac.uk> Hello Christian, On Mon, Nov 17, 2003 at 02:02:48AM +0100, Christian Tismer wrote: > I debuged this down to the very end, until I was sure > my stuff is doing it right. Then I added a small function > that recomputes the actual total refcounts from the > chained list of all objects, and it turned out to be > correct (and also my pickling), but _Py_RefTotal is different. I found a few places that manipulate ob_refcnt directly without worrying about keeping _Py_RefTotal or other debugging information in sync: * classobject.c:instance_dealloc(), for __del__ * stringobject.c, for interned strings * typeobject.c:slot_tp_del(), for __del__ too I bet you could also find these easily, but maybe it should be regarded as a bug list. At any rate, the __del__ tricks will indeed make some counters invalid. A bientot, Armin. From mwh at python.net Mon Nov 17 07:24:25 2003 From: mwh at python.net (Michael Hudson) Date: Mon Nov 17 07:24:35 2003 Subject: [Stackless] Re: [Python-Dev] _Py_RefTotal wrong in Py 2.2.3 In-Reply-To: <20031117110308.GA31680@vicky.ecs.soton.ac.uk> (Armin Rigo's message of "Mon, 17 Nov 2003 11:03:08 +0000") References: <3FB81E38.9000505@tismer.com> <20031117110308.GA31680@vicky.ecs.soton.ac.uk> Message-ID: <2my8ufp4hy.fsf@starship.python.net> Armin Rigo writes: > Hello Christian, > > On Mon, Nov 17, 2003 at 02:02:48AM +0100, Christian Tismer wrote: >> I debuged this down to the very end, until I was sure >> my stuff is doing it right. Then I added a small function >> that recomputes the actual total refcounts from the >> chained list of all objects, and it turned out to be >> correct (and also my pickling), but _Py_RefTotal is different. > > I found a few places that manipulate ob_refcnt directly without > worrying about keeping _Py_RefTotal or other debugging information > in sync: Um, don't most of these places at least *try* to keep _Py_RefTotal in sync? I am aware of a few places that get this wrong, but the fixes weren't obvious to me. > * classobject.c:instance_dealloc(), for __del__ One way of getting _Py_RefTotal out of sync is resurrecting objects in __del__ methods. Another is some bizarre interaction with the trashcan machinery (don't recall what, sorry, may also be different with 2.2 vs 2.3). > * stringobject.c, for interned strings > > * typeobject.c:slot_tp_del(), for __del__ too > > I bet you could also find these easily, but maybe it should be regarded as a > bug list. I think these are bugs. > At any rate, the __del__ tricks will indeed make some counters > invalid. Which __del__ tricks specifically? Cheers, mwh -- Strangely enough I saw just such a beast at the grocery store last night. Starbucks sells Javachip. (It's ice cream, but that shouldn't be an obstacle for the Java marketing people.) -- Jeremy Hylton, 29 Apr 1997 From Jack.Jansen at cwi.nl Mon Nov 17 10:18:59 2003 From: Jack.Jansen at cwi.nl (Jack Jansen) Date: Mon Nov 17 10:18:57 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib modulefinder.py, 1.7, 1.8 In-Reply-To: <200311141815.hAEIF2j05017@12-236-54-216.client.attbi.com> References: <200311141603.hAEG3do04761@12-236-54-216.client.attbi.com> <16309.5562.644105.6880@montanaro.dyndns.org> <200311141815.hAEIF2j05017@12-236-54-216.client.attbi.com> Message-ID: <5E99A2D2-1911-11D8-80BE-0030655234CE@cwi.nl> On 14 Nov 2003, at 19:15, Guido van Rossum wrote: >> Maybe freeze should be deprecated in 2.4. > > That might be a good idea. > >> There are other third-party packages (Gordon McMillan's installer >> and Thomas's py2exe) which do a better job anyway. Does either one >> use freeze under the covers? > > No. (Though py2exe uses modulefinder, which is why that's in Lib > rather than in Tools/freeze. :-) And so do the freeze tools on the Mac (BuildApplication and I think also bundlebuilder). -- Jack Jansen http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman From python at rcn.com Mon Nov 17 10:31:15 2003 From: python at rcn.com (Raymond Hettinger) Date: Mon Nov 17 10:31:43 2003 Subject: [Python-Dev] Small bug -- direct check-in allowed? In-Reply-To: <200311162352.hAGNqTYA002118@localhost.localdomain> Message-ID: <003201c3ad1f$d750e420$e841fea9@oemcomputer> > >>> "Raymond Hettinger" wrote > > I think that adds an unnecessary level of indirection. SF helps when it > > comes to tracking, public discussion, patch evolution, the approval > > process, etc. However, for direct fixes, I think the check-in message > > is sufficient. > > I disagree - if I hit a bug and want to see if it's fixed, often the > entry in Misc/NEWS is far too brief to be useful. Not everyone has a > CVS checkout of Python that they can check against. For big bugs, having a SF entry or detailed news entry is reasonable. But for buglets, there is a PITA factor that goes with opening an SF report, fixing the bug, referencing it the SF in checking, referencing the checking in SF, and immediately closing the report. That PITA factor is cost that will be paid by every active developer and, IMO, give very little gain. Beyond cluttering the bugs list, it can become an obstacle to getting the bugs fixed at all. I am certain that adding more administrative overhead will make it less likely that someone will bother with an otherwise quick fix. That isn't just laziness, the volunteers often only have a minutes to deal will something they happen to see. Also, volunteers don't want to feel like their time is being wasted. I, for one, would loath having to go back through all of my checkins and create/edit/reference/close a related SF report. It would be a boring day long project that would suck and yet add almost nothing. I'm sure there are a few who value like having all those references but I'm unwilling to transfer that burden onto the tiny group of people who volunteer their time fixing little buglets everywhere. If there is someone who places value on the references and also has checkin priviledges, then there is nothing stopping them from reading each checkin and establishing a new SF entry for it. I think they would be wasting their time, but if that is *their* itch, then they are welcome to scratch it. Making me scratch their itch is another matter entirely. Raymond From tim at zope.com Mon Nov 17 11:06:57 2003 From: tim at zope.com (Tim Peters) Date: Mon Nov 17 11:07:26 2003 Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc In-Reply-To: Message-ID: [Tim, on ] > ... > It's exactly the scheme I described, and the coding went smoothly > because it was something that could be (and was) fully thought-out in > advance. That doesn't rule out conceptual or coding errors, though. As I noted on the patch in the wee hours, "conceptual errors" wins. I out-thought a wrong thing, but one that happened to be good enough to fix all the new test cases: it doesn't really matter which objects are reachable from the objects whose deaths trigger callbacks, what really matters is which objects are reachable from the callbacks themselves. The test cases were so incestuous (objects all pointing to each other) that those turned out to be the same sets, but that's not a necessary outcome -- although it appears to be a likely outcome. Here's one that's surprising after the patch: """ import weakref, gc class C: def cb(self, ignore): print self.__dict__ c1, c2 = C(), C() c2.me = c2 c2.c1 = c1 c2.wr = weakref.ref(c1, c2.cb) del c1, c2 print 'about to collect' gc.collect() print 'collected' """ The callback triggers on the death of c1 then, but c1 isn't in a cycle at all (it's hanging *off* a cycle), and c2 isn't reachable from c1. But c2 is reachable from the callback. c2 is in a self-cycle via c2.me, and in another via c2.wr (which indirectly points back to c2 via the weakref's bound method object c2.cb). After the patch, c1 ends up in the set of objects with an associated weakref callback, but c2 isn't reachable from that set so tp_clear is called on c2. That destroys c2's __dict__ before the callback can get invoked, so when c1 dies the callback sees a tp_clear'ed c2: about to collect {} collected I know it's hard for people to get excited about an empty dict . But that's not the point: the point is that if it's possible to expose an object that's been tp_clear'ed to Python code, then *anything* can happen. For example, this minor variation segfaults after the patch, right after printing "about to collect": """ import weakref, gc class C(object): def cb(self, ignore): print self.__dict__ class D: pass c1, c2 = D(), C() c2.me = c2 c2.c1 = c1 c2.wr = weakref.ref(c1, c2.cb) del c1, c2, C, D print 'about to collect' gc.collect() print 'collected' """ That class C was reachable from c1 in the first example protected C from getting tp_clear'ed at all, which was something the patch was trying to accomplish. But by giving c1 a different class, C's tp_clear immunity went away, but C is still reachable from the callback. Boom. So what's reachable from a callback? If the callback is not *itself* part of the garbage getting collected, then it acts like an external root, and so nothing reachable from the callback is part of the garbage getting collected either. gc has no worries then. If the callback itself is part of the garbage getting collected, then the weakref holding the callback must also be part of the garbage getting collected (else the weakref holding the callback would act as an external root, preventing the callback from being part of the garbage being collected too). My thought then was that a simpler scheme could simply call tp_clear on the trash weakrefs first. Calling tp_clear on a weakref just throws away the associated callbacks (if any) unexecuted, and if they don't get run then we have no reason to care what's reachable from them anymore. The fly in that ointment appears to be that a callback can itself be the target of a weakref, so that when the callback is thrown away, it can trigger calling another callback. At that point I feel asleep muttering unspeakable oaths. From tismer at tismer.com Mon Nov 17 11:18:26 2003 From: tismer at tismer.com (Christian Tismer) Date: Mon Nov 17 11:17:48 2003 Subject: [Python-Dev] _Py_RefTotal wrong in Py 2.2.3 In-Reply-To: <20031117110308.GA31680@vicky.ecs.soton.ac.uk> References: <3FB81E38.9000505@tismer.com> <20031117110308.GA31680@vicky.ecs.soton.ac.uk> Message-ID: <3FB8F4D2.8030301@tismer.com> Armin Rigo wrote: ... > I found a few places that manipulate ob_refcnt directly without worrying about > keeping _Py_RefTotal or other debugging information in sync: > > * classobject.c:instance_dealloc(), for __del__ > > * stringobject.c, for interned strings > > * typeobject.c:slot_tp_del(), for __del__ too > > I bet you could also find these easily, but maybe it should be regarded as a > bug list. At any rate, the __del__ tricks will indeed make some counters > invalid. Many thanks, this was very helpful. I will probably fix those cases which affect my code and submit patches. I was just not sure whether this is a known problem, maybe already solved. ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From tim.one at comcast.net Mon Nov 17 11:36:44 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Nov 17 11:36:40 2003 Subject: [Python-Dev] _Py_RefTotal wrong in Py 2.2.3 In-Reply-To: <20031117110308.GA31680@vicky.ecs.soton.ac.uk> Message-ID: [Armin] > I found a few places that manipulate ob_refcnt directly without > worrying about keeping _Py_RefTotal or other debugging information > in sync: In which codebase? (2.3.2, 2.3 maint, 2.4, ...?) > * classobject.c:instance_dealloc(), for __del__ instance_dealloc endures outrageous convolution trying to keep _Py_RefTotal (and friends) correct, although the code is very different between 2.3.2 and 2.2.3. In 2.3.2: - If the instance isn't resurrected, then the instance goes away without fiddling _Py_RefTotal at all. That's correct. - If it is resurrected, then /* If Py_REF_DEBUG, the original decref dropped _Py_RefTotal, * but _Py_NewReference bumped it again, so that's a wash. * If Py_TRACE_REFS, _Py_NewReference re-added self to the * object chain, so no more to do there either. * If COUNT_ALLOCS, the original decref bumped tp_frees, and * _Py_NewReference bumped tp_allocs: both of those need to * be undone. */ By "the original decref" it means the Py_DECREF that caused the instance's refcount to fall to 0 in the first place (thus getting us into instance_dealloc). > * stringobject.c, for interned strings Easy to believe that one's screwed up . > * typeobject.c:slot_tp_del(), for __del__ too At least in 2.3.2, that's enduring the same convolutions as instance_dealloc trying to keep this stuff right. From tim at zope.com Mon Nov 17 12:12:16 2003 From: tim at zope.com (Tim Peters) Date: Mon Nov 17 12:12:58 2003 Subject: [Stackless] Re: [Python-Dev] _Py_RefTotal wrong in Py 2.2.3 In-Reply-To: <2my8ufp4hy.fsf@starship.python.net> Message-ID: [Michael Hudson] > ... > One way of getting _Py_RefTotal out of sync is resurrecting objects in > __del__ methods. Oops! That's right: """ from sys import gettotalrefcount as g class C: def __del__(self): alist.append(self) alist = [] c1, c2, c3 = C(), C(), C() del c1, c2, c3 while 1: print g(), len(alist), del alist[:] """ g() goes up by 3 each time around the loop. /* If Py_REF_DEBUG, the original decref dropped _Py_RefTotal, * but _Py_NewReference bumped it again, so that's a wash. Heh. If you ignore the new reference(s) that resurrected the thing, I suppose that would be true. It should (2.3.2) do _Py_DEC_REFTOTAL; to make up for the extra increment done by _Py_NewReference; likewise in slot_tp_del (BTW, the macro expands to nothing if Py_REF_DEBUG isn't defined). From mwh at python.net Mon Nov 17 12:40:27 2003 From: mwh at python.net (Michael Hudson) Date: Mon Nov 17 12:40:35 2003 Subject: [Stackless] Re: [Python-Dev] _Py_RefTotal wrong in Py 2.2.3 In-Reply-To: (Tim Peters's message of "Mon, 17 Nov 2003 12:12:16 -0500") References: Message-ID: <2my8uenbas.fsf@starship.python.net> "Tim Peters" writes: > [Michael Hudson] >> ... >> One way of getting _Py_RefTotal out of sync is resurrecting objects in >> __del__ methods. > > Oops! That's right: [snip evidence] This is also why running test_descr in a loop still bumps sys.gettotalrefcount() by 3 or so each time. > /* If Py_REF_DEBUG, the original decref dropped _Py_RefTotal, > * but _Py_NewReference bumped it again, so that's a wash. > > Heh. If you ignore the new reference(s) that resurrected the thing, I > suppose that would be true. It should (2.3.2) do > > _Py_DEC_REFTOTAL; > > to make up for the extra increment done by _Py_NewReference; likewise in > slot_tp_del (BTW, the macro expands to nothing if Py_REF_DEBUG isn't > defined). Is it that easy? I remember fooling a little with this, but not successfully. It's just possible that I got confused, though. (Confused by finalizer issues? How could that be?) FWIW, my foolings were with new-style objects -- but from what you say in another post, it's unsurprising to find isomorphic problems with old-style classes (as in your example). Cheers, mwh -- Java is a WORA language! (Write Once, Run Away) -- James Vandenberg (on progstone@egroups.com) & quoted by David Rush on comp.lang.scheme From nas-python at python.ca Mon Nov 17 12:54:57 2003 From: nas-python at python.ca (Neil Schemenauer) Date: Mon Nov 17 12:52:45 2003 Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc In-Reply-To: References: Message-ID: <20031117175456.GA22498@mems-exchange.org> On Mon, Nov 17, 2003 at 11:06:57AM -0500, Tim Peters wrote: > it doesn't really matter which objects are reachable from the > objects whose deaths trigger callbacks, what really matters is > which objects are reachable from the callbacks themselves. Right, it's all about what that nasty user code can do. :-) > So what's reachable from a callback? If the callback is not *itself* part > of the garbage getting collected, then it acts like an external root, and so > nothing reachable from the callback is part of the garbage getting collected > either. gc has no worries then. Okay. > If the callback itself is part of the garbage getting collected, then the > weakref holding the callback must also be part of the garbage getting > collected (else the weakref holding the callback would act as an external > root, preventing the callback from being part of the garbage being collected > too). > > My thought then was that a simpler scheme could simply call tp_clear on the > trash weakrefs first. Calling tp_clear on a weakref just throws away the > associated callbacks (if any) unexecuted, and if they don't get run then we > have no reason to care what's reachable from them anymore. This I don't get. Don't people want the callbacks to be called? I don't see how a weakref callback is different than a __del__ method. While the object is not always reachable from the callback it could be (e.g. the callback could be a method). The fact that callbacks are one shot doesn't seem to help either since the callback can create a new callback. Neil From tim.one at comcast.net Mon Nov 17 13:02:13 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Nov 17 13:02:07 2003 Subject: [Stackless] Re: [Python-Dev] _Py_RefTotal wrong in Py 2.2.3 In-Reply-To: <2my8uenbas.fsf@starship.python.net> Message-ID: [Michael Hudson] > This is also why running test_descr in a loop still bumps > sys.gettotalrefcount() by 3 or so each time. Ah, so it's critical then . [Tim] >> /* If Py_REF_DEBUG, the original decref dropped _Py_RefTotal, >> * but _Py_NewReference bumped it again, so that's a wash. >> >> Heh. If you ignore the new reference(s) that resurrected the thing, >> I suppose that would be true. It should (2.3.2) do >> >> _Py_DEC_REFTOTAL; >> >> to make up for the extra increment done by _Py_NewReference; >> likewise in slot_tp_del (BTW, the macro expands to nothing if >> Py_REF_DEBUG isn't defined). > Is it that easy? In 2.3.2, it should be. The code is more convoluted in 2.2.3. I don't care about 2.2.n anymore, though, so I'm not going to spend any time looking at that. > I remember fooling a little with this, but not successfully. It's just > possible that I got confused, though. (Confused by finalizer > issues? How could that be?) I hate finalizers. I'm learning to hate weakref callbacks too. > FWIW, my foolings were with new-style objects -- but from what you say > in another post, it's unsurprising to find isomorphic problems with > old-style classes (as in your example). Right, Guido did copy+paste of masses of old-style object code into the new-style object code. One or two new bugs were introduced that way that I know of, long since fixed. This one is a case of duplicating a bug, and it looks to be as shallow as they get. Whoever did the last rework of string interning clearly wasn't thinking about all these "special builds" at all, so that may be trickier. From tim at zope.com Mon Nov 17 14:20:30 2003 From: tim at zope.com (Tim Peters) Date: Mon Nov 17 14:21:44 2003 Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc In-Reply-To: <20031117175456.GA22498@mems-exchange.org> Message-ID: [Tim] >> If the callback itself is part of the garbage getting collected, >> then the weakref holding the callback must also be part of the >> garbage getting collected (else the weakref holding the callback >> would act as an external root, preventing the callback from being >> part of the garbage being collected too). >> >> My thought then was that a simpler scheme could simply call tp_clear >> on the trash weakrefs first. Calling tp_clear on a weakref just >> throws away the associated callbacks (if any) unexecuted, and if >> they don't get run then we have no reason to care what's reachable >> from them anymore. [Neil Schemenauer] > This I don't get. Don't people want the callbacks to be called? The one person I know who cares about this a lot is Jim, and he was happy to have his callbacks raise mystery exceptions, just not segfaults . But if he doesn't care whether his callbacks "do something" in this context, he can't care whether they don't get run at all in this context either. When a weakref goes away, its callback (if any) goes away too, unexecuted, cyclic gc or not. If the weakref is part of cyclic trash, then clearing it up first is defensible -- that may have happened in 2.3.2 already, as the order in which gc invokes tp_clear is mostly accidental. If I can force the order in such a way as to reliably prevent disasters, that's a good tradeoff. If the user doesn't want the possibility for weakref callbacks not to get invoked, then they have to ensure that the weakref itself outlives the object whose death triggers that weakref's callback. They have to do that today too, with or without cyclic gc: >>> def cb(ignore): return 1/0 ... >>> import weakref >>> class C: pass ... >>> c = C() >>> wr = weakref.ref(c, cb) >>> del wr >>> del c >>> Once the weakref is cleared, the callback is history. When a weakref is part of a trash cycle, may as well clear it first. > I don't see how a weakref callback is different than a __del__ > method. While the object is not always reachable from the callback > it could be (e.g. the callback could be a method). The fact that > callbacks are one shot doesn't seem to help either since the > callback can create a new callback. It's the one-shot business that (I think) makes them easier to live with, in conjunction with that a callback vanishes if the weakref holding it goes away. A __del__ method never goes away. While a callback *can* install new callbacks, all over the place, I don't expect that real code does that. For code that doesn't, gc can make good progress. Java's flavor of __del__ method executes at most once: if an object is resurrected by its finalizer, that object's finalizer will never be run again (unless invoked explicitly by the user). That allows Java's gc to make good progress in the presence of resurrecting finalizers too: finalizers (if any) in cycles are run in an arbitrary order, and if any were run gc has to give up on finishing tearing down the objects (it can't know whether finalizers have resurrected objects until gc runs again). In the absence of resurrection, though, the next time gc runs, all the objects it ran finalizers on before are almost certainly still trash, and it can reclaim the memory without running dangerous finalizers again first. The patch I posted for weakrefs took a similar approach. Java doesn't allow adding callbacks to its elaborate weakrefs, though. It's more like the way we treat gc.garbage: you can optionally specify a ReferenceQueue object with a Java weakref, and when the referenced object is dead the weakref is added to the queue, for user inspection (well, I guess it's a little different for Java's "phantom references", but who cares ...). So I've been moving to a scheme where we treat finalizers like Java treats weakrefs, and we treat weakref callbacks like Java treats finalizers . The Java weakref facilities would be a lot easier for gc to live with, but too late for that. Jim empathically doesn't want to poll gc.garbage looking for weakrefs that appear in cycles. Maybe "tough luck" is the best response we can come up with to that, but cycles are getting very easy to create in Python by accident, so I don't really want to settle for that. OTOH, people can write __del__ methods that don't provoke leaks, and I suspect they could learn how to write weakrefs that don't provoke leaks too (assuming we changed Python to treat "has a weakref callback" the same as "has a __del__ method"). One way to do that was mentioned above, ensuring that a weakref outlives the object whose death triggers the weakref's callback. Or ensuring the reverse. It's only letting them die "at the same time" in a trash cycle that creates trouble. If the weakref and that object are both in the same clump of cyclic trash, it's unpredictable what happens in 2.3.2. If the weakref suffers tp_clear() first, the callback won't get invoked; if the object suffers tp_clear() first, the callback will get invoked -- but may lead to segfaults or lesser surprises. We can certainly repair that by treating objects with callbacks the same as objects with __del__ methods when they're in cyclic trash, and that's an easy change to the implementation. Then the objects with callbacks, and everything reachable from them, leak unless/until the user snaps enough cycles in gc.garbage. I don't have a feel for how much trouble it would be to avoid running afoul of that. Jim has so far presented it as an unacceptable burden. Another scheme is to just run all the weakref callbacks associated with trash cycles, without tp_clear'ing anything first. Then run gc again to figure out what's still trash, and repeat until no more weakref callbacks in trash cycles exist. If the weakref implementation is changed to forbid creating a new weakref callback while a weakref callback is executing, that gc-loop must eventually terminate (after the first try even in most code that does manage to put weakref callbacks in trash cycles). Beats me ... From mwh at python.net Mon Nov 17 15:11:24 2003 From: mwh at python.net (Michael Hudson) Date: Mon Nov 17 15:12:13 2003 Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc In-Reply-To: (Tim Peters's message of "Mon, 17 Nov 2003 14:20:30 -0500") References: Message-ID: <2mu152n4b7.fsf@starship.python.net> "Tim Peters" writes: > Another scheme is to just run all the weakref callbacks associated with > trash cycles, without tp_clear'ing anything first. Then run gc again to > figure out what's still trash, and repeat until no more weakref callbacks in > trash cycles exist. If the weakref implementation is changed to forbid > creating a new weakref callback while a weakref callback is executing, that > gc-loop must eventually terminate (after the first try even in most code > that does manage to put weakref callbacks in trash cycles). Maybe I'm misunderstanding, but in the presence of threads might that not create much confusion? I'm envisaging 1) object reaches refcount 0 2) weakred callback gets called 3) thread switch happens 4) new thread attempts to create a weakref callback, which fails 5) programmer goes insane Or am I missing something? Cheers, mwh -- There's an aura of unholy black magic about CLISP. It works, but I have no idea how it does it. I suspect there's a goat involved somewhere. -- Johann Hibschman, comp.lang.scheme From tim.one at comcast.net Mon Nov 17 15:29:18 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Nov 17 15:29:12 2003 Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc In-Reply-To: <2mu152n4b7.fsf@starship.python.net> Message-ID: [Tim] >> Another scheme is to just run all the weakref callbacks associated >> with trash cycles, without tp_clear'ing anything first. Then run gc >> again to figure out what's still trash, and repeat until no more >> weakref callbacks in trash cycles exist. If the weakref >> implementation is changed to forbid creating a new weakref callback >> while a weakref callback is executing, that gc-loop must eventually >> terminate (after the first try even in most code that does manage to >> put weakref callbacks in trash cycles). [Michael Hudson] > Maybe I'm misunderstanding, but in the presence of threads might that > not create much confusion? I'm envisaging > > 1) object reaches refcount 0 > 2) weakred callback gets called > 3) thread switch happens > 4) new thread attempts to create a weakref callback, which fails > 5) programmer goes insane > > Or am I missing something? Nope -- it's a downside to that scheme, probably fatal. From jim at zope.com Mon Nov 17 15:33:19 2003 From: jim at zope.com (Jim Fulton) Date: Mon Nov 17 15:34:35 2003 Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc In-Reply-To: References: Message-ID: <3FB9308F.7010305@zope.com> Tim Peters wrote: > [Tim] > >>>If the callback itself is part of the garbage getting collected, >>>then the weakref holding the callback must also be part of the >>>garbage getting collected (else the weakref holding the callback >>>would act as an external root, preventing the callback from being >>>part of the garbage being collected too). >>> >>>My thought then was that a simpler scheme could simply call tp_clear >>>on the trash weakrefs first. Calling tp_clear on a weakref just >>>throws away the associated callbacks (if any) unexecuted, and if >>>they don't get run then we have no reason to care what's reachable >>>from them anymore. > > > [Neil Schemenauer] > >>This I don't get. Don't people want the callbacks to be called? As Tim pointed out, not if the weakref object dies before the object it references. I agree with Tim that if both the weakref and the object it references are in a cycle, it makes sense to remove the weakrefs first. ... > We can certainly repair that by treating objects with callbacks the same as > objects with __del__ methods when they're in cyclic trash, and that's an > easy change to the implementation. Then the objects with callbacks, and > everything reachable from them, leak unless/until the user snaps enough > cycles in gc.garbage. I think this would be really really bad. > I don't have a feel for how much trouble it would be to avoid running afoul > of that. Jim has so far presented it as an unacceptable burden. There's a big difference between __del__ and weakref callbacks. The __del__ method is "internal" to a design. When you design a class with a del method, you know you have to avoid including the class in cycles. Now, suppose you have a design that makes has no __del__ methods but that does use cyclic data structures. You reason about the design, run tests, and convince yourself you don't have a leak. Now, suppose some external code creates a weak ref to one of your objects. All of a sudden, you start leaking. You can look at your code all you want and you won't find a reason for the leak. To protext yourself against this, you'd need a way of preventing wekrefs to your class instances. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From nas-python at python.ca Mon Nov 17 16:46:45 2003 From: nas-python at python.ca (Neil Schemenauer) Date: Mon Nov 17 16:44:35 2003 Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc In-Reply-To: References: <20031117175456.GA22498@mems-exchange.org> Message-ID: <20031117214645.GA23186@mems-exchange.org> On Mon, Nov 17, 2003 at 02:20:30PM -0500, Tim Peters wrote: > When a weakref goes away, its callback (if any) goes away too, > unexecuted, cyclic gc or not. I did not know that. > It's the one-shot business that (I think) makes them easier to > live with, in conjunction with that a callback vanishes if the > weakref holding it goes away. A __del__ method never goes away. > While a callback *can* install new callbacks, all over the place, > I don't expect that real code does that. For code that doesn't, > gc can make good progress. That sounds pragmatic and Pythonic. > Jim empathically doesn't want to poll gc.garbage looking for > weakrefs that appear in cycles. Maybe "tough luck" is the best > response we can come up with to that, but cycles are getting very > easy to create in Python by accident, so I don't really want to > settle for that. Agreed. It sucks to have to make things a lot more inconvenient just because it's theoretically possible for people to make the system behave badly. > Another scheme is to just run all the weakref callbacks associated > with trash cycles, without tp_clear'ing anything first. Then run > gc again to figure out what's still trash, and repeat until no > more weakref callbacks in trash cycles exist. Repeatedly running the GC sounds like trouble to me. I think it would be better to move everything reachable from them into the youngest generation, finish the GC pass and then run them. I haven't been thinking about this as hard as you have though, so perhaps I'm missing some subtlety. I have to wonder if anyone would care if __del__ methods were one-shot as well. As a user, I would rather have one-shot __del__ methods and not have to deal with gc.garbage. It would be nice if we could treat both kinds of finalizers consistently. Unfortunately I can't think of a way of noting that the __del__ method was already run. I suppose if __del__ method continued to work the way they do, people could just use weakref callbacks to do finalization. Neil From fred at zope.com Mon Nov 17 16:58:01 2003 From: fred at zope.com (Fred L. Drake, Jr.) Date: Mon Nov 17 16:58:24 2003 Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc In-Reply-To: <20031117214645.GA23186@mems-exchange.org> References: <20031117175456.GA22498@mems-exchange.org> <20031117214645.GA23186@mems-exchange.org> Message-ID: <16313.17513.84542.827027@grendel.zope.com> Neil Schemenauer writes: > I did not know that. The callback is intended to be a notification that the referenced object has gone away for anyone who's still interested. To "lose interest", you can just throw away you're reference. > I suppose if __del__ method continued to work the way they do, > people could just use weakref callbacks to do finalization. Sigh. So then everyone would wonder why the destructor registration is done through the weakref module. And constructors would assign a weakref with a callback to an attribute on self. Sounds nasty. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From tim at zope.com Mon Nov 17 16:59:25 2003 From: tim at zope.com (Tim Peters) Date: Mon Nov 17 17:00:22 2003 Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc In-Reply-To: <3FB9308F.7010305@zope.com> Message-ID: [Jim Fulton] > ... > There's a big difference between __del__ and weakref callbacks. > The __del__ method is "internal" to a design. When you design a > class with a del method, you know you have to avoid including the > class in cycles. > > Now, suppose you have a design that makes has no __del__ methods but > that does use cyclic data structures. You reason about the design, > run tests, and convince yourself you don't have a leak. > > Now, suppose some external code creates a weak ref to one of your > objects. All of a sudden, you start leaking. You can look at your > code all you want and you won't find a reason for the leak. I think that's an excellent argument -- thanks. > To protext yourself against this, you'd need a way of preventing > wekrefs to your class instances. Not just to them, but also to anything in a cycle with one of your class instances. This may include the class itself, or instance bound method objects I got hold of as "a callable" from somewhere else, and where I had no idea that your class is involved. It becomes intractable then for both the class designer and the weakref user. The patch I posted seemed correct for the problem it was solving. Unfortunately, that wasn't the real problem . However, instead of identifying the transitive closure of objects reachable from trash objects with a weakref callback, it could compute the transitive closure of objects reachable from (all) the callbacks associated with trash objects having a (at least one) weakref callback Don't call tp_clear on those objects, and everything callbacks see will be wholly intact. Apart from a pile of new hair to compute that complicated set instead, the rest of the patch is probably fine. The other plausible idea is fixing the glitch with the simpler-at-first "do tp_clear on trash weakref objects first" idea. The problem with that is that doing tp_clear on a weakref (or proxy) object ends up decref'ing the callback, and the callback may *itself* have a weak reference to it, so that decref'ing the callback triggers a different callback, and again arbitrary Python code starts running in the middle of gc. From tismer at tismer.com Mon Nov 17 17:46:47 2003 From: tismer at tismer.com (Christian Tismer) Date: Mon Nov 17 17:46:10 2003 Subject: [Python-Dev] more on pickling Message-ID: <3FB94FD7.1030508@tismer.com> Hi again, trying to pickle bound python methods, I'm now running into another problem. It seems to give a problem when asking for an attribute of a bound method: >>> class a: ... def x(self): pass >>> a.x # good so far >>> a().x # very good > >>> a.x.__reduce__ # naaaaah? Sounds bad >>> a().x.__reduce__ # very bad. >>> a.x.__reduce__==a().x.__reduce__ 1 So I have the impression these methods loose their relationship to their originating object. Is this behavior by intent, i.e. is it impossible to write a working __reduce__ method for a bound class method? thanks again - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From tim at zope.com Mon Nov 17 17:57:06 2003 From: tim at zope.com (Tim Peters) Date: Mon Nov 17 17:57:27 2003 Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc In-Reply-To: <20031117214645.GA23186@mems-exchange.org> Message-ID: [Tim] >> ... but cycles are getting very easy to create in Python by accident, >> so I don't really want to settle for that [push cyclic trash with >> weakref callbacks into gc.garbage] [Neil Schemenauer] > Agreed. Good! I haven't worked with you for a year -- let's party . > It sucks to have to make things a lot more inconvenient just > because it's theoretically possible for people to make the > system behave badly. I don't know how it happened, but sometime over the last few years I've switched from thinking "well, ya, they could do that, but no real code would" to "if they can do that, they will -- and especially if they're hostile". I didn't even have to take a job at Elemental Security to enjoy this personality adjustment . >> Another scheme is to just run all the weakref callbacks associated >> with trash cycles, without tp_clear'ing anything first. Then run >> gc again to figure out what's still trash, and repeat until no >> more weakref callbacks in trash cycles exist. > Repeatedly running the GC sounds like trouble to me. Me too. > I think it would be better to move everything reachable from them > into the youngest generation, finish the GC pass and then run them. > I haven't been thinking about this as hard as you have though, so > perhaps I'm missing some subtlety. That's essentially what my SF patch does, but with a maddeningly wrong idea for "them" (in "move everything reachable from them"). I think it could be repaired by computing the objects reachable from the callbacks (instead of computing the objects reachable from the objects *with* callbacks). That gets hairier, though, and there's one more thing ... > I have to wonder if anyone would care if __del__ methods were > one-shot as well. As a user, I would rather have one-shot __del__ > methods and not have to deal with gc.garbage. Are you sure? All Java programmers I've heard talk about it say that finalizers in Java are so bloody useless they don't use them at all. Maybe that's a good thing. Part of the problem is that the order of finalization isn't defined, and a program that appears to run fine under testing can fail horribly in real life when the conditions feeding gc change a bit and provoke a different order of finalization. That's the primary reason I was loathe to run __del__ methods in an arbitrary order: horrid order-dependent bugs can easily escape non-exhaustive testing, and there's no feasible way for the user to provoke all N! ways of running N finalizers in a cycle even if they want to get exhaustive. For that reason, I'm growing increasingly fond of the idea of clearing the trash weakrefs first. If no callbacks get invoked, the order they're not invoked in probably doesn't matter . The technical hangup with that one right now is that clearing a weakref decrefs the callback, which can make the callback object die, and the callback object can itself have a weakref (with a different callback) pointing to *it*. In that case, arbitrary Python code gets executed during gc, and in an arbitrary order again. There must be a hack to worm around that. > It would be nice if we could treat both kinds of finalizers consistently. > Unfortunately I can't think of a way of noting that the __del__ method > was already run. One bit in the object would be enough. Alas, that "one bit" turns out to be 4 bytes, and I've lost count of how many useful one-bit flags we've failed to add over the years to fear of losing those bytes for the first time. > I suppose if __del__ method continued to work the way they do, > people could just use weakref callbacks to do finalization. If they can ensure the weakref outlives the object, maybe. Another barrier is that the weakref callback doesn't expose the object that died: it's presumed to already be trash, and, in the absence of trash cycles, *is* already trash by the time the callback is invoked. So getting at "self" is a puzzle for a weakref callback pointing at self. A binding for self can be installed as a default argument for the callback, but then that self appears in the function object keeps self alive for as long as the callback is alive! Then the only way for self to go away is for the whole shebang to vanish in a trash cycle. So finalization of an object isn't what Python's weakref callbacks were aiming at, and it's a real strain to use them for that. Python callbacks were designed to let other objects know that a given object went away; that's what weak dicts need to know, for example. From nas-python at python.ca Mon Nov 17 18:35:02 2003 From: nas-python at python.ca (Neil Schemenauer) Date: Mon Nov 17 18:32:51 2003 Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc In-Reply-To: References: <20031117214645.GA23186@mems-exchange.org> Message-ID: <20031117233502.GA23672@mems-exchange.org> On Mon, Nov 17, 2003 at 05:57:06PM -0500, Tim Peters wrote: > That's the primary reason I was loathe to run __del__ methods in > an arbitrary order: horrid order-dependent bugs can easily escape > non-exhaustive testing Very good point. I had forgotten about that issue. > For that reason, I'm growing increasingly fond of the idea of clearing the > trash weakrefs first. If no callbacks get invoked, the order they're not > invoked in probably doesn't matter . The technical hangup with that > one right now is that clearing a weakref decrefs the callback, which can > make the callback object die, and the callback object can itself have a > weakref (with a different callback) pointing to *it*. In that case, > arbitrary Python code gets executed during gc, and in an arbitrary order > again. There must be a hack to worm around that. A hack you say? Create a list the references itself (i.e. append itself). Append all the unreachable callbacks to it and remove them from the weakrefs. Put the list in the youngest generation. The next gc should clean it up. Neil From greg at cosc.canterbury.ac.nz Mon Nov 17 19:53:20 2003 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon Nov 17 19:53:36 2003 Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc In-Reply-To: Message-ID: <200311180053.hAI0rKM07940@oma.cosc.canterbury.ac.nz> Tim Peters : > The other plausible idea is fixing the glitch with the simpler-at-first "do > tp_clear on trash weakref objects first" idea. The problem with that is > that doing tp_clear on a weakref (or proxy) object ends up decref'ing the > callback, and the callback may *itself* have a weak reference to it, so that > decref'ing the callback triggers a different callback, and again arbitrary > Python code starts running in the middle of gc. If the second weakref is from inside the cycle, it's callback doesn't need to be called, by the same reasoning that applies to the first one. If the second weakref is from outside the cycle, its callback can't reach anything inside the cycle by strong refs, otherwise the cycle wouldn't be garbage. So calling its callback can safely be deferred until after the cycle has been torn down. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Mon Nov 17 20:21:45 2003 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon Nov 17 20:21:59 2003 Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc In-Reply-To: Message-ID: <200311180121.hAI1Lji08119@oma.cosc.canterbury.ac.nz> Tim Peters : > So getting at "self" is a puzzle for a weakref callback pointing at > self. How often does a finalizer really *need* access to the entire object that triggered the finalization, and not just some part of its state? I remember reading once about the finalization scheme used in a particular Smalltalk implementation (I think it was ParcPlace) in which an object requiring finalization registers another object to be notified after it has died. This seems to be more or less equivalent to what we have with weakref callbacks. It might be worth studying how they deal with reference cycles in their system, since the same solution may well apply to us. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From nas-python at python.ca Mon Nov 17 21:11:24 2003 From: nas-python at python.ca (Neil Schemenauer) Date: Mon Nov 17 21:09:15 2003 Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc In-Reply-To: <200311180121.hAI1Lji08119@oma.cosc.canterbury.ac.nz> References: <200311180121.hAI1Lji08119@oma.cosc.canterbury.ac.nz> Message-ID: <20031118021124.GA24070@mems-exchange.org> On Tue, Nov 18, 2003 at 02:21:45PM +1300, Greg Ewing wrote: > I remember reading once about the finalization scheme used in a > particular Smalltalk implementation (I think it was ParcPlace) in > which an object requiring finalization registers another object to be > notified after it has died. I think that may be called "guardians". Neil From tim.one at comcast.net Mon Nov 17 21:45:23 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Nov 17 21:45:20 2003 Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc In-Reply-To: <200311180121.hAI1Lji08119@oma.cosc.canterbury.ac.nz> Message-ID: [Greg Ewing] > How often does a finalizer really *need* access to the entire object > that triggered the finalization, and not just some part of its state? > > I remember reading once about the finalization scheme used in a > particular Smalltalk implementation (I think it was ParcPlace) in > which an object requiring finalization registers another object to be > notified after it has died. > > This seems to be more or less equivalent to what we have with weakref > callbacks. It might be worth studying how they deal with reference > cycles in their system, since the same solution may well apply to us. It appears that most (not all) Smalltalks pass a shallow copy of "self" to the method registered for finalization, and the original self is truly unreachable then. If you've got time for research, go for it. My experience is that you generally can't get answers to such obscure questions without studying source code, and then the precise answers depend on accidental implementation details. Takes a long time, and I don't have it. Here are "The Rules" for Dolphin Smalltalk, which has "real" finalization and weak references. Good luck : The Rules The co-ordination and initiation of Finalization and the murdering of Weak References are the responsibility of the memory manager, and are performed during a garbage collection (GC) cycle, according to the following rules (you may want to skip this advanced topic): Any objects which are directly reachable down a chain of strong references from the "roots of the world" will survive the GC, and will NOT be queued for finalization. Any objects which are NOT directly reachable by following a chain of strong references from one of the roots of the world, are candidates for finalization during a particular GC cycle. Any weakly referencing objects which contain finalization candidates identified as above, are candidates for a bereavement notification following the GC cycle, and will have their pointers to those candidates changed to pointers to the corpse object during this GC cycle, regardless of whether those objects are actually queued for finalization during this GC cycle. Any weakling which has suffered one or more bereavements during a GC cycle which is also a member of a class marked with the mourning special behaviour bit (termed a mourning weakling), will receive an #elementsExpired: message telling them how many of such losses the garbage collector inflicted on them. A bereavement notification candidate will only actually be queued for such a notification if it is a member of a class bearing the mourning special behaviour mark (applied by sending the class #makeMourner). Mourning weaklings queued for bereavement notifications will receive an #elementsExpired: message before any of the objects they previously referenced has actually been finalized. This is ordering is necessary in order that when objects are queued for finalization, they do not have any non-circular references, strong or weak, because a pre-condition for finalization is that an object must be about to expire. A mouring weakling which has suffered bereavements during a GC cyle, but which would otherwise be garbage collected itself, is rescued until after it has been sent an #elementsExpired: message. If such object still have no references after processing the #elementsExpired: message, then they will be garbage collected as normal. A finalization candidate will only actually be queued for finalization if it bears the finalization mark (applied by sending #beFinalizable). Should a finalization candidate contain other finalizable objects, then even if those contained finalizable objects are only strongly referenced from the original finalization candidate, then they will not be finalized during the current GC cycle, but will instead survive until at least the completion of the containers #finalize (and probably until the next full GC cycle is complete, should they be circularly referenced). This guarantees that when an object is finalized, any objects which it "owns" (directly or indirectly) will not yet have been finalized, and should therefore be in a valid state. Where a finalizable object, call it A, references another finalizable object, call it B, then B is guaranteed to be finalized before A. Indeed A cannot be finalized until B has been finalized. Where a circular reference exists between two finalizable objects, then the order in which those objects are actually finalized is undefined (though they will not be finalized in the same cycle). An example of where such a situation might arise is where there is a finalizable parent which strongly references all its children, and those children are finalizable and have a back pointer to the parent. Although conceptually their is a parent-child relationship, there is no way for the memory manager to determine which should be finalized first (indeed it is not necessarily clear). Where this is the case, #finalize methods must coded defensively, and not depend on ordering. Any object in the finalization queue which is not actively being finalized will have no other references in the image. You may be wondering why these complex rules are necessary, why not just finalize every candidate marked as requiring finalization? Well, the rules are designed to ensure that objects queued for finalization remain valid until their finalization is complete. If we simply queued every candidate for finalization, then we could not guarantee that constituent objects had not already been finalized. This would make coding #finalize methods horribly complicated and confusing. Bereavement notifications are not sent to all weaklings by default, because the necessity of rescuing GC'able weak objects to receive the notification can potentially extend the lifetime of large groups of weak objects referenced by other weak objects (e.g. weak tree nodes) due to a "cascading rescue" effect. Cascading rescues significantly degrade the operation of the system because they may prevent garbage weaklings from being collected for many many GC cycles. The memory manager must ensure that an object does not receive a #finalize message until there are no strong references to it (which are not circular), and we need to take account of strong references from objects which are queued for finalization in the same garbage collection cycle. Even if an object to be finalized is only referenced from another object to be finalized in the same cycle, we must delay its finalization until the next cycle, so that parents are finalized before children, otherwise the parent may not be in a valid state during its finalize. It is not acceptable to have the order of finalization depend purely on the ordering the objects are visited during garbage collection. Where a finalizable object is circularly referenced (perhaps indirectly), we must ensure that it can be garbage collected - so this precludes simply marking any candidates for finalization, and then only actually finalising those which are unreferenced, because this would mean that circularly referencing finalizable objects (phew!) would never be garbage collected. In fact it is possible that an indirect circular reference could exist between two finalizable objects, and where this is the case there is no general mechanism for deciding which to finalize first, since there is no notion of ownership. This complexity is probably one of the reasons that some other Smalltalks do not support finalization of objects directly. They have only weak references and implement finalization with it: Any object which is not directly reachable through a strong pointer chain is garbage collected, and any weak references are "nilled". The weakly referencing objects which suffer bereavements, are informed, and it is up to them to perform finalization actions on behalf of the objects that they have lost. This is typically achieved by having a copy of the finalizable objects, and using them as 'executors'. This approach makes garbage collection simpler, but is inefficient and requries more complex Smalltalk classes to support it. Furthermore, it does not address the finalization ordering problem. If you want to implement such finalization in Dolphin, you do so quite easily using mourning weak objects, because the Dolphin facilities are a superset. From tim.one at comcast.net Mon Nov 17 22:09:17 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Nov 17 22:09:07 2003 Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc In-Reply-To: <20031117233502.GA23672@mems-exchange.org> Message-ID: [Neil Schemenauer] > A hack you say? Create a list the references itself (i.e. append > itself). Append all the unreachable callbacks to it and remove them > from the weakrefs. Put the list in the youngest generation. The > next gc should clean it up. Alas, we don't know we can get enough space for a list, and if we can't we're stuck. Maybe Py_FatalError would be OK then, but I'd rather not. I think I can abuse the weakref objects themselves to hold "the list", though. Heh. Now *that's* a hack . From tismer at tismer.com Mon Nov 17 23:05:02 2003 From: tismer at tismer.com (Christian Tismer) Date: Mon Nov 17 23:05:10 2003 Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong Message-ID: <3FB99A6E.5070000@tismer.com> Hi again, again! After hours of investigating why my instance method __reduce__ doesn't work, I found out the following: instancemethod_getattro does this: if (PyType_HasFeature(tp, Py_TPFLAGS_HAVE_CLASS)) { if (tp->tp_dict == NULL) { if (PyType_Ready(tp) < 0) return NULL; } descr = _PyType_Lookup(tp, name); } f = NULL; if (descr != NULL) { f = TP_DESCR_GET(descr->ob_type); if (f != NULL && PyDescr_IsData(descr)) return f(descr, obj, (PyObject *)obj->ob_type); } Why, please can someone explain, why does it ask for PyDescr_IsData ??? I think this is wrong. I'm defining an __reduce__ method, and it doesn't provide a tp_descr_set, as defined in... int PyDescr_IsData(PyObject *d) { return d->ob_type->tp_descr_set != NULL; } but for what reason is this required??? This thingie is going wrong both in Py 2.2.3 and in Py 2.3.2, so I guess there is something very basically going wrong. I'd like to fix that, but I need to understand what the intent of this code has been. Can somebody, perhaps the author, explain why this is this way? thanks so much -- chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From guido at python.org Tue Nov 18 01:04:20 2003 From: guido at python.org (Guido van Rossum) Date: Tue Nov 18 01:04:36 2003 Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong In-Reply-To: Your message of "Tue, 18 Nov 2003 05:05:02 +0100." <3FB99A6E.5070000@tismer.com> References: <3FB99A6E.5070000@tismer.com> Message-ID: <200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net> > instancemethod_getattro > > does this: > > if (PyType_HasFeature(tp, Py_TPFLAGS_HAVE_CLASS)) { > if (tp->tp_dict == NULL) { > if (PyType_Ready(tp) < 0) > return NULL; > } > descr = _PyType_Lookup(tp, name); > } > > f = NULL; > if (descr != NULL) { > f = TP_DESCR_GET(descr->ob_type); > if (f != NULL && PyDescr_IsData(descr)) > return f(descr, obj, (PyObject *)obj->ob_type); > } > > [...] why does it ask for PyDescr_IsData ??? It's the general pattern: a data descriptor on the class can override an attribute on the instance, but a method descriptor cannot. You'll find this in PyObject_Generic{Get,Set}Attr() too, and in type_getattro(). This is so that if you define a method in a class, you can override it by setting an instance variable of the same name; this was always possible for classic classes and I don't see why it shouldn't work for new-style classes. But it should also be possible to put a descriptor on the class that takes complete control. The case you quote is about delegating bound method attributes to function attributes, but the same reasoning applies generally, I would think: unless the descriptor is a data descriptor, the function attribute should have precedence, IOW a function attribute should be able to override a method on a bound instance. Here's an example of the difference: class C: def f(s): pass f.__repr__ = lambda: "42" print C().f.__repr__() This prints "42". If you comment out the PyDescr_IsData() call, it will print ">". I'm not entirely clear what goes wrong in your case. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Tue Nov 18 01:08:58 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Tue Nov 18 01:09:12 2003 Subject: [Python-Dev] more on pickling In-Reply-To: <3FB94FD7.1030508@tismer.com> References: <3FB94FD7.1030508@tismer.com> Message-ID: Christian Tismer writes: > So I have the impression these methods loose their > relationship to their originating object. > Is this behavior by intent, i.e. is it impossible to write > a working __reduce__ method for a bound class method? I don't think it is impossible; see also python.org/sf/558238 However, I would make pickling of bound methods "built-in", i.e. by pickle explicitly recognizing bound methods, or using copy_reg, as Konrad suggests. If you really want to use __reduce__, you probably have to make sure it isn't delegated to the function object. Regards, Martin From tim.one at comcast.net Tue Nov 18 01:28:16 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Nov 18 01:28:08 2003 Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc In-Reply-To: Message-ID: There's a new version of the patch at: http://www.python.org/sf/843455 trying to force the "do tp_clear() on trash weakrefs first" idea to work. All the tests we've discussed here survive it fine (including the ones that broke the first patch, and there are corresponding unittests for all these cases in the new patch), but there are several combinations of extreme endcase complication that haven't yet been tested (except in my head). I haven't yet been able to convince myself that the following does or does not have a slow memory leak after the patch (this can be hard to tell on Win9x! the system allocator is so strange): """ import gc, weakref def boom(ignore): print 'boom' while 1: class C(object): def callback(self, ignore): self.k class D(C): pass class E(object): def __del__(self): print 'del', c1, c2 = C(), D() c1.wr = weakref.ref(c2, c1.callback) c2.wr = weakref.ref(c1, c2.callback) c1.c = c2 c2.c = c1 C.objs = [c1, c2] C.wr = weakref.ref(D, boom) D.wr = weakref.ref(E, boom) C.E = E() print '.', assert gc.garbage == [] """ Try that under 2.3.2 instead, and it will eventually segfault -- but not as soon as you expect! It typically goes thru about 8 rounds of gc on my box before it blows up -- it may be a memory corruption bug there. From tim at zope.com Tue Nov 18 10:18:48 2003 From: tim at zope.com (Tim Peters) Date: Tue Nov 18 10:19:13 2003 Subject: [Python-Dev] Making weakref callbacks safe in cyclic gc In-Reply-To: Message-ID: [Tim, on ... > I haven't yet been able to convince myself that the following does or > does not have a slow memory leak after the patch ... It doesn't -- when I got up today, it was still chugging along, and was using less memory than when I went to sleep. If it weren't for that I was running it on a Win98SE box, we could conclude that cyclic trash is now collected faster than it's generated . From tim at zope.com Tue Nov 18 11:39:36 2003 From: tim at zope.com (Tim Peters) Date: Tue Nov 18 11:41:28 2003 Subject: [Python-Dev] Provoking Jim's MRO segfault before shutdown In-Reply-To: Message-ID: [Barry Warsaw, from last week] > When Python's shutting down, will there /be/ another GC invocation? This doesn't appear to be an issue in the current version of the patch. Nothing is systemically delayed until "the next" GC invocation anymore. Weakref callbacks triggered *by* a weakref callback going away are excruciatingly suppressed until near the end of a gc run under the patch, but they're allowed to trigger before gc returns. That may create more cyclic trash, which won't be discovered before the next gc invocation, but that would have been true even if the callbacks-on-callbacks hadn't been temporarily suppressed (i.e., it was already that way). From raymond.hettinger at verizon.net Tue Nov 18 16:17:09 2003 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Tue Nov 18 16:17:39 2003 Subject: [Python-Dev] Removing operator.isMappingType Message-ID: <000201c3ae19$53de5140$a4b82c81@oemcomputer> My previous posting on this didn't get resolved. This issue is that the function doesn't work: >>> map(operator.isMappingType, ['', u'', (), [], {}]) [True, True, True, True, True] If someone thinks this should not be removed, please speak up. Raymond Hettinger -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20031118/24fb088f/attachment.html From raymond.hettinger at verizon.net Tue Nov 18 16:50:17 2003 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Tue Nov 18 16:50:52 2003 Subject: [Python-Dev] __reversed__ protocol Message-ID: <000f01c3ae1d$f5278b80$a4b82c81@oemcomputer> At one point, PEP 322 had proposed checking to see if an object defined __reversed__ and if not available, then proceeding normally using __getitem__ and __len__. While the idea had supporters, it got taken out because Guido worried that it would be abused by being applied to general iterables like generators and objects returned by itertools. So, an improved version of the idea is to check for __reversed__ but only use it when the object also defines __len__. That precludes the abuses but leaves the protocol open for the normal use cases. The simple patch is listed below. Guido doesn't have time for this now and asked me to present it to you guys. What do you guys think? Raymond diff -c -r1.10 enumobject.c *** enumobject.c 7 Nov 2003 15:38:08 -0000 1.10 --- enumobject.c 18 Nov 2003 21:39:51 -0000 *************** *** 174,181 **** if (!PyArg_UnpackTuple(args, "reversed", 1, 1, &seq)) return NULL; ! /* Special case optimization for xrange and lists */ ! if (PyRange_Check(seq) || PyList_Check(seq)) return PyObject_CallMethod(seq, "__reversed__", NULL); if (!PySequence_Check(seq)) { --- 174,181 ---- if (!PyArg_UnpackTuple(args, "reversed", 1, 1, &seq)) return NULL; ! if (PyObject_HasAttrString(seq, "__reversed__") && ! PyObject_HasAttrString(seq, "__len__")) return PyObject_CallMethod(seq, "__reversed__", NULL); if (!PySequence_Check(seq)) { From bac at OCF.Berkeley.EDU Tue Nov 18 17:07:27 2003 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Tue Nov 18 17:07:25 2003 Subject: [Python-Dev] __reversed__ protocol In-Reply-To: <000f01c3ae1d$f5278b80$a4b82c81@oemcomputer> References: <000f01c3ae1d$f5278b80$a4b82c81@oemcomputer> Message-ID: <3FBA981F.1040908@ocf.berkeley.edu> Raymond Hettinger wrote: > At one point, PEP 322 had proposed checking to see if an object defined > __reversed__ and if not available, then proceeding normally using > __getitem__ and __len__. While the idea had supporters, it got taken > out because Guido worried that it would be abused by being applied to > general iterables like generators and objects returned by itertools. > > So, an improved version of the idea is to check for __reversed__ but > only use it when the object also defines __len__. That precludes the > abuses but leaves the protocol open for the normal use cases. The > simple patch is listed below. > > Guido doesn't have time for this now and asked me to present it to you > guys. What do you guys think? > With 'reversed' now a built-in, it seems reasonable to have some magic method support for it. Then again it does add one more thing to have to be aware of that is not necessarily needed. As for the solution in terms of the problem, I think it is a great way to handle it. It should cause people to think more about supporting __reversed__ then had they not had to define __len__. So, with 'reversed' in the language, I am +0 on adding this with a slight leaning toward +1 if I come across a personal need for 'reversed' itself. -Brett From fincher.8 at osu.edu Tue Nov 18 18:10:38 2003 From: fincher.8 at osu.edu (Jeremy Fincher) Date: Tue Nov 18 17:12:51 2003 Subject: [Python-Dev] __reversed__ protocol In-Reply-To: <000f01c3ae1d$f5278b80$a4b82c81@oemcomputer> References: <000f01c3ae1d$f5278b80$a4b82c81@oemcomputer> Message-ID: <200311181810.38561.fincher.8@osu.edu> On Tuesday 18 November 2003 04:50 pm, Raymond Hettinger wrote: > Guido doesn't have time for this now and asked me to present it to you > guys. What do you guys think? I think it's a great idea. Jeremy From guido at python.org Tue Nov 18 18:06:05 2003 From: guido at python.org (Guido van Rossum) Date: Tue Nov 18 18:07:55 2003 Subject: [Python-Dev] __reversed__ protocol In-Reply-To: Your message of "Tue, 18 Nov 2003 16:50:17 EST." <000f01c3ae1d$f5278b80$a4b82c81@oemcomputer> References: <000f01c3ae1d$f5278b80$a4b82c81@oemcomputer> Message-ID: <200311182306.hAIN65t13220@c-24-5-183-134.client.comcast.net> > diff -c -r1.10 enumobject.c > *** enumobject.c 7 Nov 2003 15:38:08 -0000 1.10 > --- enumobject.c 18 Nov 2003 21:39:51 -0000 > *************** > *** 174,181 **** > if (!PyArg_UnpackTuple(args, "reversed", 1, 1, &seq)) > return NULL; > > ! /* Special case optimization for xrange and lists */ > ! if (PyRange_Check(seq) || PyList_Check(seq)) > return PyObject_CallMethod(seq, "__reversed__", NULL); > > if (!PySequence_Check(seq)) { > --- 174,181 ---- > if (!PyArg_UnpackTuple(args, "reversed", 1, 1, &seq)) > return NULL; > > ! if (PyObject_HasAttrString(seq, "__reversed__") && > ! PyObject_HasAttrString(seq, "__len__")) > return PyObject_CallMethod(seq, "__reversed__", NULL); > > if (!PySequence_Check(seq)) { Note that the two HasAttrString calls can be quite a bit more expensive than the PyRange_Check and PyList_Check calls... --Guido van Rossum (home page: http://www.python.org/~guido/) From wade at treyarch.com Tue Nov 18 19:12:13 2003 From: wade at treyarch.com (Wade Brainerd) Date: Tue Nov 18 19:12:22 2003 Subject: [Python-Dev] generator/microthread syntax Message-ID: <3FBAB55D.5070807@treyarch.com> Hello, I'm working on a game engine using Python as the scripting language and have a question about generators. I'm using what I guess are called 'microthreads' as my basic script building block, and I'd like to know if there is some kind of syntax that could make them clearer, either something in Python already or something that could be added. Here's an example script that illustrates the problem. from jthe import * def darlene_ai(self): while True: for x in wait_until_near(player.po.w,self.po.w): yield None begin_cutscene(self) for x in wait_face_each_other(player.po,self.po): yield None if not player.inventory.has_key("papers"): for x in say("Hi, I'm Darlene! I found these papers,\ndid you lose them?"): yield None else: for x in say("Hey, I'm new to this town, wanna go out sometime?"): yield None end_cutscene(self) if not player.inventory.has_key("papers"): spawn(give_item("papers")) for x in wait(2.5): yield None Now in our in-house script language the above code would look very similar, only without the for x in : yield None constructs. Instead, subroutines with a wait_ prefix execute yield statements which are automatically propogated up the call stack all the way to the thread manager. Is there anything to be done about this in Python? I can see it implemented three ways: 1. A new declaration for the caller function. yield statements propogate up the call stack automatically until the first non-microthread function is found. microthread darlene_ai(self): ... 2. A special kind of exception. The wait_ function throws an exception containing the current execution context, which is caught by the thread manager and then later resumed. Generators would not be used at all. 3. A new yield-like keyword, which assumes that the argument is a generator and whose definition is to return the result of argument.next() until it catches a StopIteration exception, at which point it continues. This is just shorthand for the for loop, and would look something like: def darlene_ai(self): while True: wait until_near(player.po.w,self.po.w) Anyway, thanks for your time, and for the amazing language and modules. -Wade From greg at cosc.canterbury.ac.nz Tue Nov 18 19:27:59 2003 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue Nov 18 19:28:05 2003 Subject: [Python-Dev] generator/microthread syntax In-Reply-To: <3FBAB55D.5070807@treyarch.com> Message-ID: <200311190027.hAJ0Rxb17547@oma.cosc.canterbury.ac.nz> Wade Brainerd : > Instead, subroutines with a wait_ prefix execute yield > statements which are automatically propogated up the call stack all the > way to the thread manager. > > Is there anything to be done about this in Python? Python generators aren't really designed for use as general-purpose coroutines, and trying to use them as such is messy. You might like to investigate Stackless Python, which has real microthreads that *are* designed for the sort of thing you're doing. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From pje at telecommunity.com Tue Nov 18 19:42:55 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Nov 18 19:43:10 2003 Subject: [Python-Dev] generator/microthread syntax In-Reply-To: <3FBAB55D.5070807@treyarch.com> Message-ID: <5.1.1.6.0.20031118192603.032e9cb0@telecommunity.com> At 04:12 PM 11/18/03 -0800, Wade Brainerd wrote: >Hello, I'm working on a game engine using Python as the scripting language >and have a question about generators. >I'm using what I guess are called 'microthreads' as my basic script >building block, and I'd like to know if there is some kind of syntax that >could make them clearer, either something in Python already or something >that could be added. > >Here's an example script that illustrates the problem. > >from jthe import * > >def darlene_ai(self): > while True: > for x in wait_until_near(player.po.w,self.po.w): yield None > > begin_cutscene(self) > > for x in wait_face_each_other(player.po,self.po): yield None > > if not player.inventory.has_key("papers"): > for x in say("Hi, I'm Darlene! I found these papers,\ndid you > lose them?"): yield None > else: > for x in say("Hey, I'm new to this town, wanna go out > sometime?"): yield None > > end_cutscene(self) > > if not player.inventory.has_key("papers"): > spawn(give_item("papers")) > > for x in wait(2.5): yield None > >Now in our in-house script language the above code would look very >similar, only without the > >for x in : yield None > >constructs. Instead, subroutines with a wait_ prefix execute yield >statements which are automatically propogated up the call stack all the >way to the thread manager. >Is there anything to be done about this in Python? I can see it >implemented three ways: Since you don't seem to be using the values yielded, how about doing this instead: while True: yield wait_until_near(...) begin_cutscene(self) yield wait_face_each_other(player.po,self.po) ... All you need to do is change your microthread scheduler so that when a microthread yields a generator-iterator, you push the current microthread onto a stack, and replace it with the yielded generator. Whenever a generator raises StopIteration, you pop the stack it's associated with and resume that generator. This will produce the desired behavior without any language changes. Your scheduler might look like: class Scheduler: def __init__(self): self.threads = [] def spawn(self,thread): stack = [thread] threads.append(stack) def __iter__(self): while True: for thread in self.threads: current = thread[-1] try: step = current.next() except StopIteration: # Current generator is finished, remove it # and give the next thread a chance thread.pop() if not thread: self.threads.remove(thread) yield None continue try: # Is the yielded result iterable? new = iter(step) except TypeError: # No, skip it yield None continue # Yes, push it on the thread's call stack thread.append(new) So, to use this, you would do, e.g: scheduler = Scheduler() runOnce = iter(scheduler).next scheduler.spawn( whatever.darlene_ai() ) while True: runOnce() # do between-quanta activities All this is untested, so use at your own risk. From pje at telecommunity.com Tue Nov 18 19:46:51 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Nov 18 19:46:57 2003 Subject: [Python-Dev] generator/microthread syntax In-Reply-To: <5.1.1.6.0.20031118192603.032e9cb0@telecommunity.com> References: <3FBAB55D.5070807@treyarch.com> Message-ID: <5.1.1.6.0.20031118194459.032ea3e0@telecommunity.com> At 07:42 PM 11/18/03 -0500, Phillip J. Eby wrote: > def spawn(self,thread): > stack = [thread] > threads.append(stack) Oops. That should've been 'self.threads.append(stack)'. Told you it was untested. :) There's one other bug, too. The 'while True' loop in the __iter__ method really should be 'while self.threads', or else it'll go into an infinite loop when all microthreads have terminated. From tismer at tismer.com Tue Nov 18 20:27:00 2003 From: tismer at tismer.com (Christian Tismer) Date: Tue Nov 18 20:27:06 2003 Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong In-Reply-To: <200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net> References: <3FB99A6E.5070000@tismer.com> <200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net> Message-ID: <3FBAC6E4.2020202@tismer.com> Guido van Rossum wrote: ... > Here's an example of the difference: > > class C: > def f(s): pass > f.__repr__ = lambda: "42" > print C().f.__repr__() > > This prints "42". If you comment out the PyDescr_IsData() call, it > will print ">". > > I'm not entirely clear what goes wrong in your case. Well, in my case, I try to pickle a bound method, so I expect that C().f.__reduce__ gives me a reasonable object: A method of an instance of C that is able to do an __reduce__, that is, I need the bound f and try to get its __reduce__ in a bound way. If that's not the way to do it, which is it? thanks - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From guido at python.org Tue Nov 18 20:33:04 2003 From: guido at python.org (Guido van Rossum) Date: Tue Nov 18 20:33:12 2003 Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong In-Reply-To: Your message of "Wed, 19 Nov 2003 02:27:00 +0100." <3FBAC6E4.2020202@tismer.com> References: <3FB99A6E.5070000@tismer.com> <200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net> <3FBAC6E4.2020202@tismer.com> Message-ID: <200311190133.hAJ1X4J13394@c-24-5-183-134.client.comcast.net> > > Here's an example of the difference: > > > > class C: > > def f(s): pass > > f.__repr__ = lambda: "42" > > print C().f.__repr__() > > > > This prints "42". If you comment out the PyDescr_IsData() call, it > > will print ">". > > > > I'm not entirely clear what goes wrong in your case. > > Well, in my case, I try to pickle a bound method, so Um, my brain just did a double-take. Standard Python doesn't let you do that, so you must be changing some internals. Which parts of Python are you trying to change and which parts are you trying to keep unchanged? If you were using a different metaclass you could just create a different implementation of instancemethod that does what you want, so apparently you're not going that route. (With new-style classes, instancemethod isn't that special any more -- it's just a currying construct with some extra baggage.) > I expect that C().f.__reduce__ gives me a reasonable > object: A method of an instance of C that is able to > do an __reduce__, that is, I need the bound f and try > to get its __reduce__ in a bound way. Try again. I don't think that C().f.__reduce__ should be a method of an instance of C. You want it to be a method of a bound method object, right? > If that's not the way to do it, which is it? I think what I suggested above -- forget about the existing instancemethod implementation. But I really don't understand the context in which you are doing this well enough to give you advice, and in any context that I understand the whole construct doesn't make sense. :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer at tismer.com Tue Nov 18 20:34:17 2003 From: tismer at tismer.com (Christian Tismer) Date: Tue Nov 18 20:34:21 2003 Subject: [Python-Dev] more on pickling In-Reply-To: References: <3FB94FD7.1030508@tismer.com> Message-ID: <3FBAC899.8090206@tismer.com> Martin v. L?wis wrote: > Christian Tismer writes: > > >>So I have the impression these methods loose their >>relationship to their originating object. >>Is this behavior by intent, i.e. is it impossible to write >>a working __reduce__ method for a bound class method? > > > I don't think it is impossible; see also python.org/sf/558238 will look ito this. > However, I would make pickling of bound methods "built-in", i.e. by > pickle explicitly recognizing bound methods, or using copy_reg, as > Konrad suggests. I tried to avoid messing with pickle, since I think it should get a complete, nonrecursive rewrite, ASAP. Not by me, btw. Or maybe... :-) > If you really want to use __reduce__, you probably have to make sure > it isn't delegated to the function object. I'm quite tempted to special-case __reduce__ since this is very very simple. And I already spent way too much time into pickling, because I believe this is a Python feature, not a Stackless one. If you have a nice and quick solution, please let me know. I'm not so very keen on finding the best way possible. The fact is, that I implemented pickling, and now I hear people complaining about its imperfectness. Gosh, I was so happy that it works at all. So it there is anything I would like to get rid of (and to move it into core Python), then it is pickling! cheers - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From python at rcn.com Tue Nov 18 20:44:13 2003 From: python at rcn.com (Raymond Hettinger) Date: Tue Nov 18 20:46:39 2003 Subject: [Python-Dev] __reversed__ protocol In-Reply-To: <200311182306.hAIN65t13220@c-24-5-183-134.client.comcast.net> Message-ID: <001301c3ae3e$a41bda40$a4b82c81@oemcomputer> > Note that the two HasAttrString calls can be quite a bit more > expensive than the PyRange_Check and PyList_Check calls... Right! So we need to keep those: if (PyRange_Check(seq) || PyList_Check(seq) || PyObject_HasAttrString(seq, "__reversed__") && PyObject_HasAttrString(seq, "__len__")) return PyObject_CallMethod(seq, "__reversed__", NULL); Raymond From tismer at tismer.com Tue Nov 18 20:50:07 2003 From: tismer at tismer.com (Christian Tismer) Date: Tue Nov 18 20:50:39 2003 Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong In-Reply-To: <200311190133.hAJ1X4J13394@c-24-5-183-134.client.comcast.net> References: <3FB99A6E.5070000@tismer.com> <200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net> <3FBAC6E4.2020202@tismer.com> <200311190133.hAJ1X4J13394@c-24-5-183-134.client.comcast.net> Message-ID: <3FBACC4F.7090404@tismer.com> Hi Guido, ... > Um, my brain just did a double-take. Standard Python doesn't let you > do that, so you must be changing some internals. Which parts of > Python are you trying to change and which parts are you trying to keep > unchanged? If you were using a different metaclass you could just > create a different implementation of instancemethod that does what you > want, so apparently you're not going that route. (With new-style > classes, instancemethod isn't that special any more -- it's just a > currying construct with some extra baggage.) No no no, I'm not fiddling around with any internals, here. I just want to use the machinary as it is, and to be able to pickle almost everything. So, if somebody did a v=C().x, I have that variable around. In order to pickle it, I ask for its __reduce__, or in other words, I don't ask for it, I try to supply it, so the pickling engine can find it. My expectation is that C().x.__reduce__ gives me the bound __reduce__ method of the bound x method of a C instance. ... > Try again. I don't think that C().f.__reduce__ should be a method of > an instance of C. You want it to be a method of a bound method > object, right? No, __reduce__ is a method of f, which is bound to an instance of C. Calling it will give me what I need to pickle the bound f method. This is all what I want. I think this is just natural. >>If that's not the way to do it, which is it? > > > I think what I suggested above -- forget about the existing > instancemethod implementation. But I really don't understand the > context in which you are doing this well enough to give you advice, > and in any context that I understand the whole construct doesn't make > sense. :-( Once again. What I try to achieve is complete thread pickling. That means, I need to supply pickling methods to all objects which don't have builtin support in cPickle or which don't provide __reduce__ already. I have done this for some 10 or more types, successfully. Bound PyCFunction objects are nice and don't give me a problem. Bound PyFunction objects do give me a problem, since they don't want to give me what they are bound to. My options are: - Do an ugly patch that special cases for __reduce__, which I did just now, in order to seet hings working. - get the master's voice about how to do this generally right, and do it generally right. I would of course prefer the latter, but I also try to save as much time as I can while supporting my clients, since Stackless is almost no longer sponsored, and I have money problems. thanks so much -- chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From tismer at tismer.com Tue Nov 18 21:02:24 2003 From: tismer at tismer.com (Christian Tismer) Date: Tue Nov 18 21:02:29 2003 Subject: [Python-Dev] more on pickling In-Reply-To: References: <3FB94FD7.1030508@tismer.com> Message-ID: <3FBACF30.7000201@tismer.com> Martin v. L?wis wrote: > Christian Tismer writes: > > >>So I have the impression these methods loose their >>relationship to their originating object. >>Is this behavior by intent, i.e. is it impossible to write >>a working __reduce__ method for a bound class method? > > > I don't think it is impossible; see also python.org/sf/558238 > > However, I would make pickling of bound methods "built-in", i.e. by > pickle explicitly recognizing bound methods, or using copy_reg, as > Konrad suggests. Eh, ich seh ?berhaupt nix von Konrad? Where did he post his messages? -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From tismer at tismer.com Tue Nov 18 21:09:46 2003 From: tismer at tismer.com (Christian Tismer) Date: Tue Nov 18 21:09:49 2003 Subject: [Python-Dev] more on pickling In-Reply-To: References: <3FB94FD7.1030508@tismer.com> Message-ID: <3FBAD0EA.9080604@tismer.com> Martin v. L?wis wrote: > Christian Tismer writes: > > >>So I have the impression these methods loose their >>relationship to their originating object. >>Is this behavior by intent, i.e. is it impossible to write >>a working __reduce__ method for a bound class method? > > > I don't think it is impossible; see also python.org/sf/558238 > > However, I would make pickling of bound methods "built-in", i.e. by > pickle explicitly recognizing bound methods, or using copy_reg, as > Konrad suggests. Oh, I see. My strategy was to avoid copy_reg at all, and to make everything using C constructs from the beginning. Maybe this was not so efficient. I agree (and have checked) that Konrad's solution works. Maybe I should go that way. On the other hand, I don't agree that it should be impossible with the __reduce__ protocol. There is possible some construct missing, which allows to ask the object machinery the right question. ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From eppstein at ics.uci.edu Tue Nov 18 22:23:59 2003 From: eppstein at ics.uci.edu (David Eppstein) Date: Tue Nov 18 22:24:10 2003 Subject: [Python-Dev] 2.2=>2.3 object.__setattr__(cls,attr,value) Message-ID: In 2.2 I was able to call object.__setattr__(cls,attr,value) where cls is a new-style type (first argument of a classmethod), and attr and value are the name and value of a class attribute I want to create programmatically. I just upgraded to 2.3 but now when I try it I get >>> class foo(object):pass ... >>> object.__setattr__(foo,'foo',None) Traceback (most recent call last): File "", line 1, in ? TypeError: can't apply this __setattr__ to type object Instead I apparently have to call >>> type(foo).__setattr__(foo,'foo',None) Anyway, my question: no harm done here because this was in undeployed code and I've found a workaround, but shouldn't this have at least been mentioned in "What's New in Python 2.3"? Or maybe this is one of the some-other-change-with-far-reaching-consequences things that was mentioned and I just don't see the connection? -- David Eppstein http://www.ics.uci.edu/~eppstein/ Univ. of California, Irvine, School of Information & Computer Science From jeremy at alum.mit.edu Tue Nov 18 23:07:22 2003 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Tue Nov 18 23:09:53 2003 Subject: [Python-Dev] 2.2=>2.3 object.__setattr__(cls,attr,value) In-Reply-To: References: Message-ID: <1069214841.6983.59.camel@localhost.localdomain> On Tue, 2003-11-18 at 22:23, David Eppstein wrote: > In 2.2 I was able to call object.__setattr__(cls,attr,value) > where cls is a new-style type (first argument of a classmethod), > and attr and value are the name and value of a class attribute I want to > create programmatically. I just upgraded to 2.3 but now when I try it I > get > > >>> class foo(object):pass > ... > >>> object.__setattr__(foo,'foo',None) > Traceback (most recent call last): > File "", line 1, in ? > TypeError: can't apply this __setattr__ to type object > > Instead I apparently have to call > >>> type(foo).__setattr__(foo,'foo',None) > > > Anyway, my question: no harm done here because this was in undeployed > code and I've found a workaround, but shouldn't this have at least been > mentioned in "What's New in Python 2.3"? Or maybe this is one of the > some-other-change-with-far-reaching-consequences things that was > mentioned and I just don't see the connection? The change was reported on python-dev, but apparently got left out of the NEWS file. Here are the details: http://mail.python.org/pipermail/python-dev/2003-April/034605.html I don't know that it does much good to change NEWS after the fact, but I don't think there's anything more that can be done. Jeremy From eppstein at ics.uci.edu Tue Nov 18 23:27:26 2003 From: eppstein at ics.uci.edu (David Eppstein) Date: Tue Nov 18 23:27:34 2003 Subject: [Python-Dev] Re: 2.2=>2.3 object.__setattr__(cls,attr,value) References: <1069214841.6983.59.camel@localhost.localdomain> Message-ID: In article <1069214841.6983.59.camel@localhost.localdomain>, Jeremy Hylton wrote: > The change was reported on python-dev, but apparently got left out of > the NEWS file. Here are the details: > http://mail.python.org/pipermail/python-dev/2003-April/034605.html Thanks! Now that you mention it, I vaguely remember something of that discussion. But the messages there seem to be mostly or entirely about preventing __setattr__ on built-in types (justifiably called "evil" in the thread) while the code I needed this for was to do it on my own types. Was there some other discussion about preventing object.__setattr__ on non-builtins or was this just an unintended consequence? Not that it matters much now, it's done... Of course, all of this has led me to realize that my code was unnecessarily obscure: I should have just used setattr(cls,...) -- David Eppstein http://www.ics.uci.edu/~eppstein/ Univ. of California, Irvine, School of Information & Computer Science From guido at python.org Tue Nov 18 23:50:33 2003 From: guido at python.org (Guido van Rossum) Date: Tue Nov 18 23:50:41 2003 Subject: [Python-Dev] 2.2=>2.3 object.__setattr__(cls,attr,value) In-Reply-To: Your message of "Tue, 18 Nov 2003 19:23:59 PST." References: Message-ID: <200311190450.hAJ4oXs13602@c-24-5-183-134.client.comcast.net> > In 2.2 I was able to call object.__setattr__(cls,attr,value) > where cls is a new-style type (first argument of a classmethod), > and attr and value are the name and value of a class attribute I want to > create programmatically. I just upgraded to 2.3 but now when I try it I > get > > >>> class foo(object):pass > ... > >>> object.__setattr__(foo,'foo',None) > Traceback (most recent call last): > File "", line 1, in ? > TypeError: can't apply this __setattr__ to type object > > Instead I apparently have to call > >>> type(foo).__setattr__(foo,'foo',None) > > > Anyway, my question: no harm done here because this was in undeployed > code and I've found a workaround, but shouldn't this have at least been > mentioned in "What's New in Python 2.3"? Or maybe this is one of the > some-other-change-with-far-reaching-consequences things that was > mentioned and I just don't see the connection? I think this was a side effect of closing a hole that allowed using object.__setattr__ to set attributes on built-in classes. A quick look didn't reveal anything in NEWS, but the 2.3 NEWS file is truly huge, so it may be there. :-( Andrew Kuchling's "What's New" doesn't claim completeness... I think this was fixed in a later version of 2.2 too BTW. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Nov 18 23:57:51 2003 From: guido at python.org (Guido van Rossum) Date: Tue Nov 18 23:57:59 2003 Subject: [Python-Dev] Re: 2.2=>2.3 object.__setattr__(cls,attr,value) In-Reply-To: Your message of "Tue, 18 Nov 2003 20:27:26 PST." References: <1069214841.6983.59.camel@localhost.localdomain> Message-ID: <200311190457.hAJ4vpg13650@c-24-5-183-134.client.comcast.net> > In article <1069214841.6983.59.camel@localhost.localdomain>, > Jeremy Hylton wrote: > > > The change was reported on python-dev, but apparently got left out of > > the NEWS file. Here are the details: > > http://mail.python.org/pipermail/python-dev/2003-April/034605.html [Good sleuthing, Jeremy!] > Thanks! Now that you mention it, I vaguely remember something of that > discussion. But the messages there seem to be mostly or entirely about > preventing __setattr__ on built-in types (justifiably called "evil" in > the thread) while the code I needed this for was to do it on my own > types. Was there some other discussion about preventing > object.__setattr__ on non-builtins or was this just an unintended > consequence? Not that it matters much now, it's done... Blame it on Carlo Verre. :-) The fix requires that whenever a built-in type derived from object overrides __setattr__, you cannot call object.__setattr__ directly, but must use the more derived built-in type's __setattr__. This is reasonable IMO, and is now enforced in 2.2.x as well. > Of course, all of this has led me to realize that my code was > unnecessarily obscure: I should have just used setattr(cls,...) :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at alum.mit.edu Wed Nov 19 00:02:39 2003 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed Nov 19 00:06:46 2003 Subject: [Python-Dev] Re: 2.2=>2.3 object.__setattr__(cls,attr,value) In-Reply-To: <200311190457.hAJ4vpg13650@c-24-5-183-134.client.comcast.net> References: <1069214841.6983.59.camel@localhost.localdomain> <200311190457.hAJ4vpg13650@c-24-5-183-134.client.comcast.net> Message-ID: <1069218159.6983.83.camel@localhost.localdomain> On Tue, 2003-11-18 at 23:57, Guido van Rossum wrote: > > In article <1069214841.6983.59.camel@localhost.localdomain>, > > Jeremy Hylton wrote: > > > > > The change was reported on python-dev, but apparently got left out of > > > the NEWS file. Here are the details: > > > http://mail.python.org/pipermail/python-dev/2003-April/034605.html > > [Good sleuthing, Jeremy!] Tricks of the master sleuth revealed: http://www.google.com/search?q=object.__setattr__ Jeremy From guido at python.org Wed Nov 19 00:07:05 2003 From: guido at python.org (Guido van Rossum) Date: Wed Nov 19 00:07:14 2003 Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong In-Reply-To: Your message of "Wed, 19 Nov 2003 02:50:07 +0100." <3FBACC4F.7090404@tismer.com> References: <3FB99A6E.5070000@tismer.com> <200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net> <3FBAC6E4.2020202@tismer.com> <200311190133.hAJ1X4J13394@c-24-5-183-134.client.comcast.net> <3FBACC4F.7090404@tismer.com> Message-ID: <200311190507.hAJ575213691@c-24-5-183-134.client.comcast.net> > Hi Guido, > > ... > > Um, my brain just did a double-take. Standard Python doesn't let you > > do that, so you must be changing some internals. Which parts of > > Python are you trying to change and which parts are you trying to keep > > unchanged? If you were using a different metaclass you could just > > create a different implementation of instancemethod that does what you > > want, so apparently you're not going that route. (With new-style > > classes, instancemethod isn't that special any more -- it's just a > > currying construct with some extra baggage.) > > No no no, I'm not fiddling around with any internals, here. > I just want to use the machinary as it is, and to be able to > pickle almost everything. > > So, if somebody did a v=C().x, I have that variable around. > In order to pickle it, I ask for its __reduce__, or in other > words, I don't ask for it, I try to supply it, so the pickling > engine can find it. But how, I wonder, are you providing it? You can't subclass instancemethod -- how do you manage to add a __reduce__ method to it without fiddling with any internals? > My expectation is that C().x.__reduce__ gives me the bound > __reduce__ method of the bound x method of a C instance. Yes, unfortunately you get the __reduce__ method of the unbound function instead. I think Martin is right: copy_reg may be your last hope. (Or subclassing pickle to special-case instancemethod.) The pickling machinery wasn't intended to pickle bound methods or functions etc., and doesn't particularly go out of its way to allow you to add that functionality. > ... > > > Try again. I don't think that C().f.__reduce__ should be a method of > > an instance of C. You want it to be a method of a bound method > > object, right? > > No, __reduce__ is a method of f, which is bound to an instance > of C. Calling it will give me what I need to pickle the bound > f method. This is all what I want. I think this is just natural. And it would be except for the delegation of method attributes to function attributes. It is a similar aliasing problem as you see when you try to access the __getattr__ implementation for classes as C.__getattr__ -- you get the __getattr__ for C instances instead. So you have to use type(C).__getattr__ instead. That would work for __reduce__ too I think: new.instancemethod.__reduce__(C().f). > >>If that's not the way to do it, which is it? > > > > > > I think what I suggested above -- forget about the existing > > instancemethod implementation. But I really don't understand the > > context in which you are doing this well enough to give you advice, > > and in any context that I understand the whole construct doesn't make > > sense. :-( > > Once again. > What I try to achieve is complete thread pickling. > That means, I need to supply pickling methods to > all objects which don't have builtin support in > cPickle or which don't provide __reduce__ already. > I have done this for some 10 or more types, successfully. > Bound PyCFunction objects are nice and don't give me a problem. > Bound PyFunction objects do give me a problem, since they > don't want to give me what they are bound to. OK, so you *are* messing with internals after all (== changing C code), right? Or else how do you accomplish this? > My options are: > - Do an ugly patch that special cases for __reduce__, which I did > just now, in order to seet hings working. > - get the master's voice about how to do this generally right, > and do it generally right. > > I would of course prefer the latter, but I also try to save > as much time as I can while supporting my clients, since > Stackless is almost no longer sponsored, and I have money problems. I have a real job too, that's why I have little time to help you. :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Wed Nov 19 02:13:33 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Wed Nov 19 02:16:05 2003 Subject: [Python-Dev] more on pickling In-Reply-To: <3FBAC899.8090206@tismer.com> References: <3FB94FD7.1030508@tismer.com> <3FBAC899.8090206@tismer.com> Message-ID: Christian Tismer writes: > If you have a nice and quick solution, please let me know. Install something in copy_reg. Nice and quick. Regards, Martin From tommy at ilm.com Wed Nov 19 20:19:53 2003 From: tommy at ilm.com (Tommy Burnette) Date: Wed Nov 19 20:20:03 2003 Subject: [Python-Dev] airspeed of an unladen swallow Message-ID: <16316.5817.601214.578299@evoke.lucasdigital.com> in case this hasn't been seen on the regular python list yet.... http://www.style.org/unladenswallow From tismer at tismer.com Wed Nov 19 22:18:46 2003 From: tismer at tismer.com (Christian Tismer) Date: Wed Nov 19 22:18:52 2003 Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong In-Reply-To: <200311190507.hAJ575213691@c-24-5-183-134.client.comcast.net> References: <3FB99A6E.5070000@tismer.com> <200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net> <3FBAC6E4.2020202@tismer.com> <200311190133.hAJ1X4J13394@c-24-5-183-134.client.comcast.net> <3FBACC4F.7090404@tismer.com> <200311190507.hAJ575213691@c-24-5-183-134.client.comcast.net> Message-ID: <3FBC3296.1090004@tismer.com> Hi again, Guido, >>No no no, I'm not fiddling around with any internals, here. >>I just want to use the machinary as it is, and to be able to >>pickle almost everything. Sorry, this was a lie. Sure I'm fiddling internaly, but simply by installing some __reduce__ methids, hoping that they work. This worked most of the time, but I'm having problems with bound methods. > But how, I wonder, are you providing it? You can't subclass > instancemethod -- how do you manage to add a __reduce__ method to it > without fiddling with any internals? I added __reduce__ to the PyMethod type and tried to figure out why it didn't take it. >>My expectation is that C().x.__reduce__ gives me the bound >>__reduce__ method of the bound x method of a C instance. > > > Yes, unfortunately you get the __reduce__ method of the unbound > function instead. > > I think Martin is right: copy_reg may be your last hope. (Or > subclassing pickle to special-case instancemethod.) Well, I see your point, but please let me explain mine, again: If there is a class C which has a method x, then C().x is a perfectly fine expression, yielding a bound method. If I now like to pickle this expression, I would use the __reduce__ protocol and ask C().x for its __reduce__ property. Now, please see that __reduce__ has no parameters, i.e. it has no other chance to do the right thing(TM) but by relying on to be bound to the right thing. So, doesn't it make sense to have __reduce__ to be always returned as a method of some bound anything? In other words, shouldn't things that are only useful as bound things, always be bound? > The pickling machinery wasn't intended to pickle bound methods or > functions etc., and doesn't particularly go out of its way to allow > you to add that functionality. The pickling machinery gives me an __reduce__ interface, and I'm expecting that this is able to pickle everything. ... > And it would be except for the delegation of method attributes to > function attributes. It is a similar aliasing problem as you see when > you try to access the __getattr__ implementation for classes as > C.__getattr__ -- you get the __getattr__ for C instances instead. So > you have to use type(C).__getattr__ instead. That would work for > __reduce__ too I think: new.instancemethod.__reduce__(C().f). I agree! But I can't do this in this context, using __reduce__ only. In other words, I'd have to add stuff to copyreg.py, which I tried to circumvent. ... > OK, so you *are* messing with internals after all (== changing C > code), right? Or else how do you accomplish this? Yessir, I'm augmenting all things-to-be-pickled with __reduce__ methods. And this time is the first time that it doesn't work. ... > I have a real job too, that's why I have little time to help you. :-( I agree (and I didn't ask *you* in the first place), but still I'd like to ask the general question: Is this really the right way to handle bound objects? Is the is_data criterion correct? If I am asking for an attribute that makes *only* sense if it is bound, like in the parameter-less __reduce__ case, wouldn't it be the correct behavior to give me that bound object? I have the strong impression that there is some difference in methods which isn't dealt with, correctly, at the moment. If a method wants to be bound to something, it should be get bound to something. Especially, if this method is useless without being bound. Please, swallow this idea a little bit, before rejecting it. I think that "is_data" is too rough and doesn't fit the requirements, all the time. sincerely -- chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From tismer at tismer.com Wed Nov 19 22:19:55 2003 From: tismer at tismer.com (Christian Tismer) Date: Wed Nov 19 22:19:58 2003 Subject: [Python-Dev] more on pickling In-Reply-To: References: <3FB94FD7.1030508@tismer.com> <3FBAC899.8090206@tismer.com> Message-ID: <3FBC32DB.2010607@tismer.com> Martin v. L?wis wrote: > Christian Tismer writes: > > >>If you have a nice and quick solution, please let me know. > > > Install something in copy_reg. Nice and quick. Gack! probably my only chance, without starting a major flame war. But I know it *is* wrong. ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From pje at telecommunity.com Wed Nov 19 23:23:42 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Nov 19 23:22:12 2003 Subject: [Python-Dev] more on pickling In-Reply-To: <3FBC32DB.2010607@tismer.com> References: <3FB94FD7.1030508@tismer.com> <3FBAC899.8090206@tismer.com> Message-ID: <5.1.0.14.0.20031119232053.025e78d0@mail.telecommunity.com> At 04:19 AM 11/20/03 +0100, Christian Tismer wrote: >Martin v. L?wis wrote: > >>Christian Tismer writes: >> >>>If you have a nice and quick solution, please let me know. >> >>Install something in copy_reg. Nice and quick. > >Gack! probably my only chance, without starting a major flame war. >But I know it *is* wrong. Not according to the documentation: """The copy_reg module provides support for the pickle and cPickle modules.... It provides configuration information about object constructors which are not classes.""" Hmm. Maybe that last bit should actually say "object types that do not support __reduce__ or other pickling protocols", now that everything's a class. Other than that, it seems dead on to what you're trying to do. From guido at python.org Thu Nov 20 01:18:45 2003 From: guido at python.org (Guido van Rossum) Date: Thu Nov 20 01:18:58 2003 Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong In-Reply-To: Your message of "Thu, 20 Nov 2003 04:18:46 +0100." <3FBC3296.1090004@tismer.com> References: <3FB99A6E.5070000@tismer.com> <200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net> <3FBAC6E4.2020202@tismer.com> <200311190133.hAJ1X4J13394@c-24-5-183-134.client.comcast.net> <3FBACC4F.7090404@tismer.com> <200311190507.hAJ575213691@c-24-5-183-134.client.comcast.net> <3FBC3296.1090004@tismer.com> Message-ID: <200311200618.hAK6Ikv23729@c-24-5-183-134.client.comcast.net> Summary: Chistian is right after all. instancemethod_getattro should always prefer bound method attributes over function attributes. > >>No no no, I'm not fiddling around with any internals, here. > >>I just want to use the machinary as it is, and to be able to > >>pickle almost everything. > > Sorry, this was a lie. Sigh. OK, you're forgiven. > Sure I'm fiddling internaly, but simply by > installing some __reduce__ methids, hoping that > they work. OK, so you *could* just make the change you want, but you are asking why it isn't like that in the first place. Good idea... > This worked most of the time, but I'm having problems > with bound methods. We've established that without a doubt, yes. :-) > > But how, I wonder, are you providing it? You can't subclass > > instancemethod -- how do you manage to add a __reduce__ method to it > > without fiddling with any internals? > > I added __reduce__ to the PyMethod type and tried to figure out > why it didn't take it. OK. Stating that upfront would have helped... > >>My expectation is that C().x.__reduce__ gives me the bound > >>__reduce__ method of the bound x method of a C instance. > > > > > > Yes, unfortunately you get the __reduce__ method of the unbound > > function instead. > > > > I think Martin is right: copy_reg may be your last hope. (Or > > subclassing pickle to special-case instancemethod.) > > Well, I see your point, but please let me explain mine, again: > If there is a class C which has a method x, then C().x is > a perfectly fine expression, yielding a bound method. Of course. > If I now like to pickle this expression, I would use the > __reduce__ protocol and ask C().x for its __reduce__ property. Which unfortunately gets the __reduce__ property of the underlying *function* object (also named x) used to implement the method. This function can be accessed as C.__dict__['x']. (Not as C.x, that returns an unbound method object, which is the same kind of object as a bound method object but without an instance. :-) > Now, please see that __reduce__ has no parameters, i.e. it has > no other chance to do the right thing(TM) but by relying > on to be bound to the right thing. > So, doesn't it make sense to have __reduce__ to be always returned > as a method of some bound anything? > > In other words, shouldn't things that are only useful as bound > things, always be bound? This question doesn't address the real issue, which is the attribute delegation to the underlying function object. What *should* happen when the same attribute name exists on the function and on the bound method? In 2.1, when function attributes were first introduced, this was easy: a few attributes were special for the bound method (im_func, im_self, im_class) and for these the bound method attribute wins (if you set an attribute with one of those names on the function, you can't access it through the bound method). The *intention* was for the 2.2 version to have the same behavior: only im_func, im_self and im_class would be handled by the bound method, other attributes would be handled by the function object. This is what the IsData test is attempting to do -- the im_* attributes are represented by data descriptors now. The __class__ attribute is also a data descriptor, so that C().x.__class__ gives us rather than . But for anything else, including the various methods that all objects inherit from 'object' unless they override them, the choice was made to let the function attribute win. But when we look at the attributes where both function and bound method provide a value, it seems that the bound method's offering is always more useful! You've already established this for __reduce__; the same is true for __call__ and __str__, and there I stopped. (Actually, I also looked at __setattr__, where delegation to the function also seems a mistake: C().x.foo = 42 is refused, but C().x.__setattr__('foo', 42) sets the attribute on the function, because this returns the (bound) method __setattr__ on functions.) > > The pickling machinery wasn't intended to pickle bound methods or > > functions etc., and doesn't particularly go out of its way to allow > > you to add that functionality. > > The pickling machinery gives me an __reduce__ interface, and I'm > expecting that this is able to pickle everything. I don't think you'd have a chance of pickle classes if you only relied on __reduce__. Fortunately there are other mechanisms. :-) (I wonder if the pickling code shouldn't try to call x.__class__.__reduce__(x) rather than x.__reduce__() -- then none of these problems would have occurred... :-) > ... > > > And it would be except for the delegation of method attributes to > > function attributes. It is a similar aliasing problem as you see when > > you try to access the __getattr__ implementation for classes as > > C.__getattr__ -- you get the __getattr__ for C instances instead. So > > you have to use type(C).__getattr__ instead. That would work for > > __reduce__ too I think: new.instancemethod.__reduce__(C().f). > > I agree! > But I can't do this in this context, using __reduce__ only. > In other words, I'd have to add stuff to copyreg.py, which > I tried to circumvent. Or you could change the pickling system. Your choice of what to change and what not to change seems a bit arbitrary. :-) > ... > > > OK, so you *are* messing with internals after all (== changing C > > code), right? Or else how do you accomplish this? > > Yessir, I'm augmenting all things-to-be-pickled with __reduce__ > methods. And this time is the first time that it doesn't work. But not necessarily the last time. :-) > ... > > > I have a real job too, that's why I have little time to help you. :-( > > I agree (and I didn't ask *you* in the first place), but still > I'd like to ask the general question: > Is this really the right way to handle bound objects? > Is the is_data criterion correct? > If I am asking for an attribute that makes *only* sense if it is > bound, like in the parameter-less __reduce__ case, wouldn't > it be the correct behavior to give me that bound object? > > I have the strong impression that there is some difference > in methods which isn't dealt with, correctly, at the moment. > If a method wants to be bound to something, it should be > get bound to something. > Especially, if this method is useless without being bound. It's not that it isn't being bound. It's that the *wrong* attribute is being bound (the function's __reduce__ method, bound to the function object, is returned!). > Please, swallow this idea a little bit, before rejecting > it. I think that "is_data" is too rough and doesn't fit > the requirements, all the time. I agree. The bound method's attributes should always win, since bound methods only have a small, fixed number of attributes, and they are all special for bound methods. This *is* a change in functionality, even though there appear to be no unit tests for it, so I'm reluctant to fix it in 2.3. But I think in 2.4 it should definitely change. --Guido van Rossum (home page: http://www.python.org/~guido/) From Jack.Jansen at cwi.nl Thu Nov 20 06:52:11 2003 From: Jack.Jansen at cwi.nl (Jack Jansen) Date: Thu Nov 20 06:51:51 2003 Subject: [Python-Dev] Ripping out Macintosh support Message-ID: As you may have noticed if you follow the checkins mailing list I've enthusiastically started ripping out 90% of the work I did on Python the last 10 years (and quite a bit of really old code by Guido too:-): everything related to support for pre-Mac OS X macintoshes. Over the last year I've asked various times whether anyone was willing to even consider doing support for MacOS9 for 2.4, and I got absolutely no replies, not even the usual "I'd love to have it but I can't help":-). So out it goes! I'm trying to be careful that I don't break anything, and I make sure the selftests pass every time, but there's always the chance that I do get something wrong. So if things suddenly break inexplicably you're all free to blame me, initially, until I can point out that I have nothing whatsoever to do with the breakage:-) -- Jack Jansen http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman From mwh at python.net Thu Nov 20 07:06:50 2003 From: mwh at python.net (Michael Hudson) Date: Thu Nov 20 07:06:55 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib codeop.py, 1.7, 1.8 In-Reply-To: (doerwalter@users.sourceforge.net's message of "Wed, 19 Nov 2003 05:35:51 -0800") References: Message-ID: <2mn0arkzvp.fsf@starship.python.net> doerwalter@users.sourceforge.net writes: > Update of /cvsroot/python/python/dist/src/Lib > In directory sc8-pr-cvs1:/tmp/cvs-serv29941/Lib > > Modified Files: > codeop.py > Log Message: > Fix typos. Uh, no. > This module provides two interfaces, broadly similar to the builtin ^^^^^^^^^^^^^^ > ! function compile(), that take progam text, a filename and a 'mode' ^^^^ perhaps this should be which... Cheers, mwh -- 6. The code definitely is not portable - it will produce incorrect results if run from the surface of Mars. -- James Bonfield, http://www.ioccc.org/2000/rince.hint From walter at livinglogic.de Thu Nov 20 08:40:00 2003 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Thu Nov 20 08:40:14 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib codeop.py, 1.7, 1.8 In-Reply-To: <2mn0arkzvp.fsf@starship.python.net> References: <2mn0arkzvp.fsf@starship.python.net> Message-ID: <3FBCC430.4020709@livinglogic.de> Michael Hudson wrote: > doerwalter@users.sourceforge.net writes: > > >>Update of /cvsroot/python/python/dist/src/Lib >>In directory sc8-pr-cvs1:/tmp/cvs-serv29941/Lib >> >>Modified Files: >> codeop.py >>Log Message: >>Fix typos. > > Uh, no. > >> This module provides two interfaces, broadly similar to the builtin > ^^^^^^^^^^^^^^ > >>! function compile(), that take progam text, a filename and a 'mode' > ^^^^ > perhaps this should be which... This depens on whether "take program text..." refers to compile() or to "two interfaces". OK, I've fixed the fix. Bye, Walter D?rwald From mwh at python.net Thu Nov 20 08:53:02 2003 From: mwh at python.net (Michael Hudson) Date: Thu Nov 20 08:55:48 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib codeop.py, 1.7, 1.8 In-Reply-To: <3FBCC430.4020709@livinglogic.de> References: <2mn0arkzvp.fsf@starship.python.net> <3FBCC430.4020709@livinglogic.de> Message-ID: Walter D?rwald writes: > Michael Hudson wrote: > > > doerwalter@users.sourceforge.net writes: > > > >>Update of /cvsroot/python/python/dist/src/Lib > >>In directory sc8-pr-cvs1:/tmp/cvs-serv29941/Lib > >> > >>Modified Files: > >> codeop.py Log Message: > >>Fix typos. > > Uh, no. > > > >> This module provides two interfaces, broadly similar to the builtin > > ^^^^^^^^^^^^^^ > > > >>! function compile(), that take progam text, a filename and a 'mode' > > ^^^^ > > perhaps this should be which... > > This depens on whether "take program text..." refers to compile() or > to "two interfaces". Hmm, yes. Hadn't thought of reading it that way... > OK, I've fixed the fix. Thank you! Cheers, mwh -- Q: What are 1000 lawyers at the bottom of the ocean? A: A good start. (A lawyer told me this joke.) -- Michael Str?der, comp.lang.python From skip at pobox.com Thu Nov 20 10:04:07 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Nov 20 10:04:26 2003 Subject: [Python-Dev] Ripping out Macintosh support In-Reply-To: References: Message-ID: <16316.55271.205085.815371@montanaro.dyndns.org> Jack> Over the last year I've asked various times whether anyone was Jack> willing to even consider doing support for MacOS9 for 2.4, and I Jack> got absolutely no replies, not even the usual "I'd love to have it Jack> but I can't help":-). So out it goes! This is maybe too late to ask, but did you create something like a last-pre-macosx branch before making your changes? That would allow someone to easily come back later and do the work. Someone asked on c.l.py about running Python on OS6 (yes, Six) a few days ago and Python is maintained by interested individuals on other legacy platforms like OS/2 and the Amiga, maybe not at the latest and greatest release, but they're still there. There's probably someone on the planet who'd be willing to putter around with Python on MacOS9. That person just hasn't been found yet. Skip From martin at v.loewis.de Thu Nov 20 14:43:10 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Thu Nov 20 14:43:56 2003 Subject: [Python-Dev] Ripping out Macintosh support In-Reply-To: <16316.55271.205085.815371@montanaro.dyndns.org> References: <16316.55271.205085.815371@montanaro.dyndns.org> Message-ID: Skip Montanaro writes: > Someone asked on c.l.py about running Python on OS6 (yes, Six) a few days > ago and Python is maintained by interested individuals on other legacy > platforms like OS/2 and the Amiga, maybe not at the latest and greatest > release, but they're still there. There's probably someone on the planet > who'd be willing to putter around with Python on MacOS9. That person just > hasn't been found yet. I think they could easily start with Python 2.3, though. Regards, Martin From greg at cosc.canterbury.ac.nz Thu Nov 20 17:32:38 2003 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu Nov 20 17:32:44 2003 Subject: [Python-Dev] Ripping out Macintosh support In-Reply-To: Message-ID: <200311202232.hAKMWcY08939@oma.cosc.canterbury.ac.nz> Jack Jansen : > As you may have noticed if you follow the checkins mailing list I've >enthusiastically started ripping out 90% of the work I did on Python >the last 10 years What are you ripping out, exactly? I hope you're not getting rid of Carbon too soon, because I'm in the midst of doing a Mac version of my Python GUI using it! Mind you, the main reason I chose to use Carbon in the first place was so that there was some chance the same version would work on both 9 and X. But if there's never going to be a Python for MacOS 9 at all, ever again, maybe I should just give up now and re-do it all using PyObjC or something? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at comcast.net Thu Nov 20 18:30:15 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Nov 20 18:30:25 2003 Subject: [Python-Dev] Time for 2.3.3? Message-ID: Over the last week, I checked in fixes for two distinct broad causes of segfaults in code using weakrefs with callbacks. The bugs have been there since weakrefs were introduced, but for whatever reason nobody bumped into them (knowingly) until Jim Fulton and Thomas Heller happened to provoke both, independently, within a day of each other. It was especially easy under Thomas's scenario *not* to get a segfault in a release build, but to suffer random memory corruption instead (if the double-deallocation provoked pymalloc into handing out the same chunk of memory to two distinct objects alive at the same time -- and that is, alas, a likely outcome). I suspect these bugs hid for so long because it's taken Pythoneers a long time to discover why weakrefs can be so cool, and start to build serious apps on top of them. Casual programmers aren't likely to use weakrefs at all, but once you've built a cache based on weakrefs in a large app, weakrefs become critical to your code and your design. So I think either of these fixes is enough to justify a bugfix release, and having two of them makes a compelling case. What say we get 2.3.3 in motion? I did the weakref checkins already on the trunk and on release23-maint; Thomas Heller confirmed that his problems went away on release23-maint, and Jim Fulton confirmed that his Zope3 segfaults went away on the released 2.3.2 + a patch identical in all functional respects to what got checked in (the new test_weakref test cases, and some code comments, were different). If we get 2.3.3c1 out in early December, we could release 2.3.3 final before the end of the year, and start 2004 with a 100% bug-free codebase . From anthony at interlink.com.au Thu Nov 20 19:19:19 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu Nov 20 19:19:49 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: Message-ID: <200311210019.hAL0JJjH011663@localhost.localdomain> I was planning on a just-before-Christmas 2.3.3. Maybe a RC around the 15th of December, and a release around the 22nd? -- Anthony Baxter It's never too late to have a happy childhood. From tismer at tismer.com Thu Nov 20 21:45:25 2003 From: tismer at tismer.com (Christian Tismer) Date: Thu Nov 20 21:45:30 2003 Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong In-Reply-To: <200311200618.hAK6Ikv23729@c-24-5-183-134.client.comcast.net> References: <3FB99A6E.5070000@tismer.com> <200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net> <3FBAC6E4.2020202@tismer.com> <200311190133.hAJ1X4J13394@c-24-5-183-134.client.comcast.net> <3FBACC4F.7090404@tismer.com> <200311190507.hAJ575213691@c-24-5-183-134.client.comcast.net> <3FBC3296.1090004@tismer.com> <200311200618.hAK6Ikv23729@c-24-5-183-134.client.comcast.net> Message-ID: <3FBD7C45.3020607@tismer.com> Guido van Rossum wrote: > Summary: Chistian is right after all. instancemethod_getattro should > always prefer bound method attributes over function attributes. Guido, I'm very happy with your decision, which is most probably a wise decision (without any relation to me). The point is, that I didn't know what's right or wrong, so basically I was asking for advice on a thing I felt unhappy with. So I asked you to re-think if the behavior is really what you indented, or if you just stopped, early. Thanks a lot! That's the summary and all about it, you can skip the rest if you like. ----------------------------------------------------------------------- ... >>Sure I'm fiddling internaly, but simply by >>installing some __reduce__ methids, hoping that >>they work. > > > OK, so you *could* just make the change you want, but you are asking > why it isn't like that in the first place. Good idea... I actually hacked a special case for __reduce__, to see whether it works at all, but then asked, of course. Most of my pickling stuff might be of general interest, and changing semantics is by no means what I ever would like to do without following the main path. ... >>I added __reduce__ to the PyMethod type and tried to figure out >>why it didn't take it. > > OK. Stating that upfront would have helped... Sorry about that. I worked too long on these issues already and had the perception that everybody knows that I'm patching __reduce__ into many objects like a bozo :-) ... >>In other words, shouldn't things that are only useful as bound >>things, always be bound? > > This question doesn't address the real issue, which is the attribute > delegation to the underlying function object. Correct, I misspelled things. Of course there is binding, but the chain back to the instance is lost. ... > The *intention* was for the 2.2 version to have the same behavior: > only im_func, im_self and im_class would be handled by the bound > method, other attributes would be handled by the function object. Ooh, I begin to understand! > This is what the IsData test is attempting to do -- the im_* > attributes are represented by data descriptors now. The __class__ > attribute is also a data descriptor, so that C().x.__class__ gives us > rather than . IsData is a test for having a write method, too, so we have the side effect here that im_* works like I expect, since they happen to be writable? Well, I didn't look into 2.3 for this, but in 2.2 I get >>> a().x.__class__=42 Traceback (most recent call last): File "", line 1, in ? TypeError: __class__ must be set to new-style class, not 'int' object [9511 refs] >>> which says for sure that this is a writable property, while >>> a().x.im_class=42 Traceback (most recent call last): File "", line 1, in ? TypeError: readonly attribute [9511 refs] >>> seems to be handled differently. I only thought of IsData in terms of accessing the getter/setter wrappers. > But for anything else, including the various methods that all objects > inherit from 'object' unless they override them, the choice was made > to let the function attribute win. That's most probably right to do, since most defaults from object are probably just surrogates. > But when we look at the attributes where both function and bound > method provide a value, it seems that the bound method's offering is > always more useful! You've already established this for __reduce__; > the same is true for __call__ and __str__, and there I stopped. > (Actually, I also looked at __setattr__, where delegation to the > function also seems a mistake: C().x.foo = 42 is refused, but > C().x.__setattr__('foo', 42) sets the attribute on the function, > because this returns the (bound) method __setattr__ on functions.) Your examples are much better than mine. >>The pickling machinery gives me an __reduce__ interface, and I'm >>expecting that this is able to pickle everything. > > I don't think you'd have a chance of pickle classes if you only relied > on __reduce__. Fortunately there are other mechanisms. :-) I don't need to pickle classes, this works fine in most cases, and behavior can be modified by users. They can use copy_reg, and that's one of my reasons to avoid copy_reg. I want to have the basics built in, without having to import a Python module. > (I wonder if the pickling code shouldn't try to call > x.__class__.__reduce__(x) rather than x.__reduce__() -- then none of > these problems would have occurred... :-) That sounds reasonable. Explicit would have been better than implicit (by hoping for the expected bound chain). __reduce__ as a class method would allow to explicitly spell that I want to reduce the instance x of class C. x.__class__.__reduce__(x) While, in contrast x.__class__.__reduce__(x.thing) would spell that I want to reduce the "thing" property of the x instance of C. While x.__class__.__reduce__(C.thing) # would be the same as C.__reduce__(C.thing) which would reduce the class method "thing" of C, or the class property of C, or whatsoever of class C. I could envision a small extension to the __reduce__ protocol, by providing an optional parameter, which would open these new ways, and all pickling questions could be solved, probably. This is so, since we can find out whether __reduce__ is a class method or not. If it is just an instance method (implictly bound), it behaves as today. If it is a class method, is takes a parameter, and then it can find out whether to pickle a class, instance, class property or an instance property. Well, I hope. The above was said while being in bed with 39? Celsius, so don't put my words on the assay-balance. [trying to use __reduce__, only] > Or you could change the pickling system. Your choice of what to > change and what not to change seems a bit arbitrary. :-) Not really. I found __reduce__ very elegant. It gave me the chance to have almost all patches in a single file, since I didn't need to patch most of the implementation files. Just adding something to the type objects was sufficient, and this keeps my workload smaller when migrating to the next Python. Until now, I only had to change traceback.c and iterator.c, since these don't export enough of their structures to patch things from outside. If at some point somebody might decide that some of this support code makes sense for the main distribution, things should of couzrse move to where they belong. Adding to copy_reg, well, I don't like to modify Python modules from C so much, and even less I like to add extra Python files to Stackless, if I can do without it. Changing the pickling engine: Well, I'm hesitant, since it has been developed so much more between 2.2 and 2.3, and I didn't get my head into that machinery, now. What I want to do at some time is to change cPickle to use a non-recursive implementation. (Ironically, the Python pickle engine *is* non-recursive, if it is run under Stackless). So, if I would hack at cPickle at all, I would probably do the big big change, and that would be too much to get done in reasonable time. That's why I decided to stay small and just chime a few __reduce__ thingies in, for the time being. Maybe this was not the best way, I don't know. >>>OK, so you *are* messing with internals after all (== changing C >>>code), right? Or else how do you accomplish this? >> >>Yessir, I'm augmenting all things-to-be-pickled with __reduce__ >>methods. And this time is the first time that it doesn't work. > > > But not necessarily the last time. :-) Right. probably, I will get into trouble with pickling unbound class methods. Maybe I would just ignore this. Bound class methods do appear in my Tasklet system and need to get pickled. Unbound methods are much easier to avoid and probably not worth the effort. (Yes, tomorrow I will be told that it *is* :-) ... > I agree. The bound method's attributes should always win, since bound > methods only have a small, fixed number of attributes, and they are > all special for bound methods. > > This *is* a change in functionality, even though there appear to be no > unit tests for it, so I'm reluctant to fix it in 2.3. But I think in > 2.4 it should definitely change. That means, for Py 2.2 and 2.3, my current special case for __reduce__ is exactly the way to go, since it doesn't change any semantics but for __reduce__, and in 2.4 I just drop these three lines? Perfect! sincerely - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From tim.one at comcast.net Fri Nov 21 00:43:08 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Nov 21 00:43:14 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: <200311210019.hAL0JJjH011663@localhost.localdomain> Message-ID: [Anthony Baxter] > I was planning on a just-before-Christmas 2.3.3. Maybe a RC around the > 15th of December, and a release around the 22nd? That's good enough for me. I'd rather push the RC up a week earlier, though, to give more time for user testing. Many people take large blocks of time off around Christmas, and have major extra demands on their time the week before too (planning and shopping and endless bickering with family -- Christmas is great ). What else does 2.3.3 need? IIRC, the sre tests still fail on 2.3 maint, and that's a showstopper. I'd like to "do something" about the 2.3 changes to Python finalization that have provoked new problems, but don't have time. If nothing else, I'd at least like to common out the second call to gc in Py_Finalize -- with hidnsight, that wasn't ready for prime time, and the # of things that can go wrong when trying to execute Python code after modules (particularly sys) have been torn down appears boundless. The only bad thing I've seen come out of the first call to gc in Py_Finalize is nonsense errors complaining that Python hasn't been initialized (when a __del__ or weakref callback triggered then tries to import a new module). What else does 2.3.3 need? From anthony at interlink.com.au Fri Nov 21 00:51:23 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Fri Nov 21 00:59:14 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: Message-ID: <200311210551.hAL5pOVd015765@localhost.localdomain> >>> "Tim Peters" wrote > What else does 2.3.3 need? IIRC, the sre tests still fail on 2.3 maint, and > that's a showstopper. I thought I'd fixed that. I have a bunch of compatibility fixes that I'd like to work on. I'm also considering switching to the newer version of autoconf. -- Anthony Baxter It's never too late to have a happy childhood. From tim.one at comcast.net Fri Nov 21 01:41:43 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Nov 21 01:41:49 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: <200311210551.hAL5pOVd015765@localhost.localdomain> Message-ID: >> What else does 2.3.3 need? IIRC, the sre tests still fail on 2.3 >> maint, and that's a showstopper. [Anthony Baxter] > I thought I'd fixed that. I don't know. How did they fail? This is how they fail for me today (Windows): C:\Code\23\PCbuild>python ../lib/test/test_re.py test_anyall (__main__.ReTests) ... ok test_basic_re_sub (__main__.ReTests) ... ok test_bigcharset (__main__.ReTests) ... ok test_bug_113254 (__main__.ReTests) ... ok test_bug_114660 (__main__.ReTests) ... ok test_bug_117612 (__main__.ReTests) ... ok test_bug_418626 (__main__.ReTests) ... ERROR test_bug_448951 (__main__.ReTests) ... ok test_bug_449000 (__main__.ReTests) ... ok test_bug_449964 (__main__.ReTests) ... ok test_bug_462270 (__main__.ReTests) ... ok test_bug_527371 (__main__.ReTests) ... ok test_bug_545855 (__main__.ReTests) ... ok test_bug_612074 (__main__.ReTests) ... ok test_bug_725106 (__main__.ReTests) ... ok test_bug_725149 (__main__.ReTests) ... ok test_bug_764548 (__main__.ReTests) ... ok test_category (__main__.ReTests) ... ok test_constants (__main__.ReTests) ... ok test_expand (__main__.ReTests) ... ok test_finditer (__main__.ReTests) ... ok test_flags (__main__.ReTests) ... ok test_getattr (__main__.ReTests) ... ok test_getlower (__main__.ReTests) ... ok test_groupdict (__main__.ReTests) ... ok test_ignore_case (__main__.ReTests) ... ok test_non_consuming (__main__.ReTests) ... ok test_not_literal (__main__.ReTests) ... ok test_pickling (__main__.ReTests) ... ok test_qualified_re_split (__main__.ReTests) ... ok test_qualified_re_sub (__main__.ReTests) ... ok test_re_escape (__main__.ReTests) ... ok test_re_findall (__main__.ReTests) ... ok test_re_groupref (__main__.ReTests) ... ok test_re_groupref_exists (__main__.ReTests) ... ok test_re_match (__main__.ReTests) ... ok test_re_split (__main__.ReTests) ... ok test_re_subn (__main__.ReTests) ... ok test_repeat_minmax (__main__.ReTests) ... ok test_scanner (__main__.ReTests) ... ok test_search_coverage (__main__.ReTests) ... ok test_search_star_plus (__main__.ReTests) ... ok test_special_escapes (__main__.ReTests) ... ok test_sre_character_literals (__main__.ReTests) ... ok test_stack_overflow (__main__.ReTests) ... ERROR test_symbolic_refs (__main__.ReTests) ... ok ====================================================================== ERROR: test_bug_418626 (__main__.ReTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "../lib/test/test_re.py", line 410, in test_bug_418626 self.assertEqual(re.search('(a|b)*?c', 10000*'ab'+'cd').end(0), 20001) File "C:\CODE\23\lib\sre.py", line 137, in search return _compile(pattern, flags).search(string) RuntimeError: maximum recursion limit exceeded ====================================================================== ERROR: test_stack_overflow (__main__.ReTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "../lib/test/test_re.py", line 420, in test_stack_overflow self.assertEqual(re.match('(x)*', 50000*'x').group(1), 'x') File "C:\CODE\23\lib\sre.py", line 132, in match return _compile(pattern, flags).match(string) RuntimeError: maximum recursion limit exceeded ---------------------------------------------------------------------- Ran 46 tests in 0.550s FAILED (errors=2) > I have a bunch of compatibility fixes that I'd like to work on. I'm > also considering switching to the newer version of autoconf. A newer & buggier version, or a newer & better version ? From theller at python.net Fri Nov 21 04:59:11 2003 From: theller at python.net (Thomas Heller) Date: Fri Nov 21 04:59:28 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: (Tim Peters's message of "Fri, 21 Nov 2003 00:43:08 -0500") References: Message-ID: > [Anthony Baxter] >> I was planning on a just-before-Christmas 2.3.3. Maybe a RC around the >> 15th of December, and a release around the 22nd? > [Tim] > That's good enough for me. I'd rather push the RC up a week earlier, > though, to give more time for user testing. Many people take large blocks > of time off around Christmas, and have major extra demands on their time the > week before too (planning and shopping and endless bickering with family -- > Christmas is great ). I'm among those people having extra demands on the time before Christmas (well, I've got wife and children), so I would prefer to do all this one week earlier: build the RC around the 8th, and the release around the 15th of december. Thomas From mwh at python.net Fri Nov 21 07:20:05 2003 From: mwh at python.net (Michael Hudson) Date: Fri Nov 21 07:20:10 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: (Tim Peters's message of "Fri, 21 Nov 2003 00:43:08 -0500") References: Message-ID: <2mk75toqve.fsf@starship.python.net> "Tim Peters" writes: > [Anthony Baxter] >> I was planning on a just-before-Christmas 2.3.3. Maybe a RC around the >> 15th of December, and a release around the 22nd? > > That's good enough for me. I'm not expecting to be around much in between those dates, but could do some work for the RC with those dates. > What else does 2.3.3 need? There are a bunch of build problems which my brain, sadly but not surprisingly, has thoroughly paged out. We should give the new autoconf a go, at least. Cheers, mwh -- "Well, the old ones go Mmmmmbbbbzzzzttteeeeeep as they start up and the new ones go whupwhupwhupwhooopwhooooopwhooooooommmmmmmmmm." -- Graham Reed explains subway engines on asr From skip at pobox.com Fri Nov 21 08:55:40 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Nov 21 08:55:52 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: References: Message-ID: <16318.6492.580944.89131@montanaro.dyndns.org> Tim> What say we get 2.3.3 in motion? As long as a primary motivator for a 2.3.3 release seems to be weakref-related, perhaps someone who's familiar enough with their usage could beef up the docs enough to get rid of this comment at the top of the module doc: XXX -- need to say more here! I was motivated to take a look at the weakref docs for the first time after Tim mentioned: Casual programmers aren't likely to use weakrefs at all, but once you've built a cache based on weakrefs in a large app, weakrefs become critical to your code and your design. Skip From tim.one at comcast.net Fri Nov 21 11:16:19 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Nov 21 11:16:26 2003 Subject: [Python-Dev] test_re failures, Windows, 2.3 maint Message-ID: I sent test_re output from Windows on 2.3 maint yesterday. Two tests fail with "maximum recursion limit exceeded". Why do we expect them not to fail? 32-bit Windows may be unique in using this check: #if defined(USE_STACKCHECK) if (level % 10 == 0 && PyOS_CheckStack()) return SRE_ERROR_RECURSION_LIMIT; #endif PyOS_CheckStack() there isn't guessing, it's using Windows-specific facilities to check directly whether the C stack is about to overflow. In test_bug_418626, that check triggers twice, once at level = 15090 and again at level 15210. In test_stack_overflow, it triggers once at level 15210. The test comments appear to believe that sre shouldn't be recursing at all in these tests, but 15K+ levels is hard to sell as no recursion . From niemeyer at conectiva.com Fri Nov 21 11:22:53 2003 From: niemeyer at conectiva.com (Gustavo Niemeyer) Date: Fri Nov 21 11:23:21 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: References: <200311210551.hAL5pOVd015765@localhost.localdomain> Message-ID: <20031121162253.GA23299@burma.localdomain> > ====================================================================== > ERROR: test_bug_418626 (__main__.ReTests) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "../lib/test/test_re.py", line 410, in test_bug_418626 > self.assertEqual(re.search('(a|b)*?c', 10000*'ab'+'cd').end(0), 20001) > File "C:\CODE\23\lib\sre.py", line 137, in search > return _compile(pattern, flags).search(string) > RuntimeError: maximum recursion limit exceeded > > ====================================================================== > ERROR: test_stack_overflow (__main__.ReTests) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "../lib/test/test_re.py", line 420, in test_stack_overflow > self.assertEqual(re.match('(x)*', 50000*'x').group(1), 'x') > File "C:\CODE\23\lib\sre.py", line 132, in match > return _compile(pattern, flags).match(string) > RuntimeError: maximum recursion limit exceeded > > ---------------------------------------------------------------------- > Ran 46 tests in 0.550s > > FAILED (errors=2) It looks like someone have backported the changes done in test_re.py. These tests were expected to fail with the SRE from 2.3. -- Gustavo Niemeyer http://niemeyer.net From tim.one at comcast.net Fri Nov 21 11:48:27 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Nov 21 11:48:34 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: <20031121162253.GA23299@burma.localdomain> Message-ID: >> ====================================================================== >> ERROR: test_bug_418626 (__main__.ReTests) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File "../lib/test/test_re.py", line 410, in test_bug_418626 >> self.assertEqual(re.search('(a|b)*?c', 10000*'ab'+'cd').end(0), >> 20001) File "C:\CODE\23\lib\sre.py", line 137, in search >> return _compile(pattern, flags).search(string) >> RuntimeError: maximum recursion limit exceeded >> >> ====================================================================== >> ERROR: test_stack_overflow (__main__.ReTests) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File "../lib/test/test_re.py", line 420, in test_stack_overflow >> self.assertEqual(re.match('(x)*', 50000*'x').group(1), 'x') >> File "C:\CODE\23\lib\sre.py", line 132, in match >> return _compile(pattern, flags).match(string) >> RuntimeError: maximum recursion limit exceeded >> >> ---------------------------------------------------------------------- >> Ran 46 tests in 0.550s >> >> FAILED (errors=2) [Gustavo Niemeyer] > It looks like someone have backported the changes done in test_re.py. > These tests were expected to fail with the SRE from 2.3. The tests are never expected to fail, so I think you mean that test_re in 2.3 should expect (and suppress) the RuntimeError in these cases. It looks like Anthony changed this most recently: test_re.py Revision 1.45.6.1 Tue Nov 4 14:11:01 2003 UTC (2 weeks, 3 days ago) by anthonybaxter Branch: release23-maint Changes since 1.45: +9 -7 lines get tests working again. partial backport of 1.46 - I fixed the recursive tests that used to fail, but left test_re_groupref_exists disabled, as it fails on the release23-maint branch. Maybe something else needs to be backported? We've got more than one problem here, then, because Barry reports that test_re on release23-maint, as it exists today, does *not* fail on a RedHat 9 build. So if Anthony reverted that change, test_re would pass again on Windows, but would start to fail on RH9. From niemeyer at conectiva.com Fri Nov 21 11:54:28 2003 From: niemeyer at conectiva.com (Gustavo Niemeyer) Date: Fri Nov 21 11:54:46 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: References: <20031121162253.GA23299@burma.localdomain> Message-ID: <20031121165428.GA27853@burma.localdomain> > > It looks like someone have backported the changes done in test_re.py. > > These tests were expected to fail with the SRE from 2.3. > > The tests are never expected to fail, so I think you mean that test_re in > 2.3 should expect (and suppress) the RuntimeError in these cases. Yes, that's what I meant. Sorry for not being clear. > It looks like Anthony changed this most recently: > > test_re.py > Revision 1.45.6.1 > Tue Nov 4 14:11:01 2003 UTC (2 weeks, 3 days ago) by anthonybaxter > Branch: release23-maint > Changes since 1.45: +9 -7 lines > > get tests working again. partial backport of 1.46 - I fixed the > recursive tests that used to fail, but left test_re_groupref_exists > disabled, as it fails on the release23-maint branch. Maybe something > else needs to be backported? Yes, he seems to belive that the new SRE scheme was introduced in 2.3, but these tests should still expect RuntimeError in 2.3. > We've got more than one problem here, then, because Barry reports that > test_re on release23-maint, as it exists today, does *not* fail on a > RedHat 9 build. So if Anthony reverted that change, test_re would > pass again on Windows, but would start to fail on RH9. That's strange indeed. Either other changes were introduced in 2.3 which changed the number of recursions, what I don't belive to be the case, or the fixed recursion limit was raised in that platform. -- Gustavo Niemeyer http://niemeyer.net From mwh at python.net Fri Nov 21 12:09:41 2003 From: mwh at python.net (Michael Hudson) Date: Fri Nov 21 12:10:41 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: <20031121165428.GA27853@burma.localdomain> (Gustavo Niemeyer's message of "Fri, 21 Nov 2003 14:54:28 -0200") References: <20031121162253.GA23299@burma.localdomain> <20031121165428.GA27853@burma.localdomain> Message-ID: <2m7k1todgq.fsf@starship.python.net> Gustavo Niemeyer writes: > Yes, he seems to belive that the new SRE scheme was introduced in 2.3, > but these tests should still expect RuntimeError in 2.3. I was under the impression (and slightly alarmed) that the recursion removal gimmicks had been backported from the trunk to the release23-maint branch. Was that not the case? (If so, phew). If that *wasn't* the case, then why were the tests failing for Anthony before he made that checkin? Cheers, mwh -- Or here's an even simpler indicator of how much C++ sucks: Print out the C++ Public Review Document. Have someone hold it about three feet above your head and then drop it. Thus you will be enlightened. -- Thant Tessman From barry at python.org Fri Nov 21 12:22:22 2003 From: barry at python.org (Barry Warsaw) Date: Fri Nov 21 12:23:01 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: <2m7k1todgq.fsf@starship.python.net> References: <20031121162253.GA23299@burma.localdomain> <20031121165428.GA27853@burma.localdomain> <2m7k1todgq.fsf@starship.python.net> Message-ID: <1069435342.2383.69.camel@anthem> FWIW, I'm having much more problems with 2.3cvs on RH7.3. test_re.py core dumps for me for instance. I'm doing a fresh build --with-pydebug and will try to get more information. -Barry From barry at python.org Fri Nov 21 13:09:12 2003 From: barry at python.org (Barry Warsaw) Date: Fri Nov 21 13:09:28 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: References: Message-ID: <1069438152.2383.85.camel@anthem> Never mind. A fresh debug build, test -u all yields no problems with 2.3cvs on RH7.3 either. -Barry From tim.one at comcast.net Fri Nov 21 13:09:35 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Nov 21 13:09:42 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: <1069435342.2383.69.camel@anthem> Message-ID: [Barry] > FWIW, I'm having much more problems with 2.3cvs on RH7.3. test_re.py > core dumps for me for instance. For Guido too. > I'm doing a fresh build --with-pydebug and will try to get more > information. It's one of two things: USE_RECURSION_LIMIT isn't #define'd or USE_RECURSION_LIMIT is #define'd, but to a value too large for that box There's a maze of #ifdef'ery near the start of _sre.c setting USE_RECURSION_LIMIT differently for different platforms. Windows doesn't use USE_RECURSION_LIMIT -- it uses a different gimmick based on being able to test for C stack overflow directly on Windows. test_re.py *should*, at this time, fail in exactly the same ways I reported it failing on Windows. From tim.one at comcast.net Fri Nov 21 13:13:50 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Nov 21 13:13:53 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: <1069438152.2383.85.camel@anthem> Message-ID: [Barry Warsaw] > Never mind. A fresh debug build, test -u all yields no problems with > 2.3cvs on RH7.3 either. test_re.py isn't supposed to pass on 2.3 maint today. If it passed, it's broken, and will start to fail as soon as the breakage is repaired. Find out what USE_RECURSION_LIMIT is set to on that box. From barry at python.org Fri Nov 21 13:15:58 2003 From: barry at python.org (Barry Warsaw) Date: Fri Nov 21 13:16:14 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: References: Message-ID: <1069438558.2383.91.camel@anthem> On Fri, 2003-11-21 at 13:09, Tim Peters wrote: > test_re.py *should*, at this time, fail in exactly the same ways I reported > it failing on Windows. Then Something Else is going on. As reported in another message, it doesn't fail for me on either RH9 or RH7.3, and a fresh debug build on RH7.3 also doesn't crash for me either. -Barry From barry at python.org Fri Nov 21 13:43:52 2003 From: barry at python.org (Barry Warsaw) Date: Fri Nov 21 13:44:03 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: References: Message-ID: <1069440231.2383.95.camel@anthem> On Fri, 2003-11-21 at 13:13, Tim Peters wrote: > [Barry Warsaw] > > Never mind. A fresh debug build, test -u all yields no problems with > > 2.3cvs on RH7.3 either. > > test_re.py isn't supposed to pass on 2.3 maint today. If it passed, it's > broken, and will start to fail as soon as the breakage is repaired. Find > out what USE_RECURSION_LIMIT is set to on that box. Is it possible that USE_RECURSION_LIMIT isn't defined for my RH builds?! I added the attached little bit of (seemingly useful) code to _sre.c, recompiled and then... % ./python Python 2.3.3a0 (#4, Nov 21 2003, 13:39:39) [GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import _sre [24546 refs] >>> _sre.RECURSION_LIMIT [24546 refs] >>> [24546 refs] [7129 refs] Very odd. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: sre-patch.txt Type: text/x-patch Size: 682 bytes Desc: not available Url : http://mail.python.org/pipermail/python-dev/attachments/20031121/69c981e0/sre-patch.bin From tim.one at comcast.net Fri Nov 21 14:22:18 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Nov 21 14:22:24 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: <1069440231.2383.95.camel@anthem> Message-ID: [Barry Warsaw] > Is it possible that USE_RECURSION_LIMIT isn't defined for my RH > builds?! I can't see how: it's set by a giant maze of #ifdef's, which are almost as reliable as a giant maze of CVS branches . Because the #ifdef's nest 4 deep at one point, and the bodies aren't indented, it's damned hard to figure out what they're doing by eyeball. But I *think* this part: """ #else #define USE_RECURSION_LIMIT 10000 #endif #endif #endif """ which gives all the appearance of defining a default value (if nothing else triggers), is actually nested *inside* an #elif defined(__FreeBSD__) block (which is in turn nested in a !defined(USE_STACKCHECK) block, which is in turn nested in an ifndef SRE_RECURSIVE block). God only knows what the intent was. But I expect that, yes, USE_RECURSION_LIMIT isn't getting defined on anything other than FreeBSD and Win64. > I added the attached little bit of (seemingly useful) code > to _sre.c, recompiled and then... > > % ./python > Python 2.3.3a0 (#4, Nov 21 2003, 13:39:39) > [GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import _sre > [24546 refs] > >>> _sre.RECURSION_LIMIT > [24546 refs] > >>> > [24546 refs] > [7129 refs] > > Very odd. OTOH, if you believe what it says, that leads directly to the cause . From barry at python.org Fri Nov 21 14:36:06 2003 From: barry at python.org (Barry Warsaw) Date: Fri Nov 21 14:37:31 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: References: Message-ID: <1069443361.2383.118.camel@anthem> On Fri, 2003-11-21 at 14:22, Tim Peters wrote: > block (which is in turn nested in a !defined(USE_STACKCHECK) block, which is > in turn nested in an ifndef SRE_RECURSIVE block). God only knows what the > intent was. But I expect that, yes, USE_RECURSION_LIMIT isn't getting > defined on anything other than FreeBSD and Win64. Yep, you're right. If I hack _sre.c with the patch below, I think I get something closer to what we expect to see. % ./python Python 2.3.3a0 (#5, Nov 21 2003, 14:26:25) [GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import _sre [24583 refs] >>> _sre.RECURSION_LIMIT 10000 [24585 refs] >>> [24585 refs] [7130 refs] ... ====================================================================== ERROR: test_bug_418626 (__main__.ReTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "Lib/test/test_re.py", line 410, in test_bug_418626 self.assertEqual(re.search('(a|b)*?c', 10000*'ab'+'cd').end(0), 20001) File "/home/barry/projects/python23/Lib/sre.py", line 137, in search return _compile(pattern, flags).search(string) RuntimeError: maximum recursion limit exceeded ====================================================================== ERROR: test_stack_overflow (__main__.ReTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "Lib/test/test_re.py", line 420, in test_stack_overflow self.assertEqual(re.match('(x)*', 50000*'x').group(1), 'x') File "/home/barry/projects/python23/Lib/sre.py", line 132, in match return _compile(pattern, flags).match(string) RuntimeError: maximum recursion limit exceeded I'll leave it to someone else to check in the proper fix. (But does anybody else like exposing RECURSION_LIMIT in the _sre module?) -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: sre-patch2.txt Type: text/x-patch Size: 825 bytes Desc: not available Url : http://mail.python.org/pipermail/python-dev/attachments/20031121/86e2a559/sre-patch2.bin From niemeyer at conectiva.com Fri Nov 21 14:50:58 2003 From: niemeyer at conectiva.com (Gustavo Niemeyer) Date: Fri Nov 21 14:51:28 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: References: <1069440231.2383.95.camel@anthem> Message-ID: <20031121195057.GA24270@burma.localdomain> [...] > which gives all the appearance of defining a default value (if nothing else > triggers), is actually nested *inside* an > > #elif defined(__FreeBSD__) > > block (which is in turn nested in a !defined(USE_STACKCHECK) block, which is > in turn nested in an ifndef SRE_RECURSIVE block). God only knows what the > intent was. But I expect that, yes, USE_RECURSION_LIMIT isn't getting > defined on anything other than FreeBSD and Win64. It looks to be this patch's fault: ------------- From: loewis@users.sourceforge.net To: python-checkins@python.org Cc: Bcc: Subject: [Python-checkins] python/dist/src/Modules _sre.c,2.99,2.99.8.1 Reply-To: python-dev@python.org Update of /cvsroot/python/python/dist/src/Modules In directory sc8-pr-cvs1:/tmp/cvs-serv28127 Modified Files: Tag: release23-maint _sre.c Log Message: Patch #813391: Reduce limits for amd64 and sparc64. Index: _sre.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Modules/_sre.c,v retrieving revision 2.99 retrieving revision 2.99.8.1 diff -C2 -d -r2.99 -r2.99.8.1 *** _sre.c 26 Jun 2003 14:41:08 -0000 2.99 --- _sre.c 20 Oct 2003 20:59:45 -0000 2.99.8.1 *************** *** 72,78 **** /* FIXME: maybe the limit should be 40000 / sizeof(void*) ? */ #define USE_RECURSION_LIMIT 7500 - #else ! #if defined(__GNUC__) && defined(WITH_THREAD) && defined(__FreeBSD__) /* the pthreads library on FreeBSD has a fixed 1MB stack size for the * initial (or "primary") thread, which is insufficient for the default --- 72,83 ---- /* FIXME: maybe the limit should be 40000 / sizeof(void*) ? */ #define USE_RECURSION_LIMIT 7500 ! #elif defined(__FreeBSD__) ! /* FreeBSD/amd64 and /sparc64 require even smaller limits */ ! #if defined(__amd64__) ! #define USE_RECURSION_LIMIT 6000 ! #elif defined(__sparc64__) ! #define USE_RECURSION_LIMIT 3000 ! #elif defined(__GNUC__) && defined(WITH_THREAD) /* the pthreads library on FreeBSD has a fixed 1MB stack size for the * initial (or "primary") thread, which is insufficient for the default _______________________________________________ Python-checkins mailing list Python-checkins@python.org http://mail.python.org/mailman/listinfo/python-checkins -- Gustavo Niemeyer http://niemeyer.net From tim.one at comcast.net Fri Nov 21 14:56:42 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Nov 21 14:56:50 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: <1069443361.2383.118.camel@anthem> Message-ID: [Barry] > Yep, you're right. If I hack _sre.c with the patch below, I think I > get something closer to what we expect to see. > > % ./python > Python 2.3.3a0 (#5, Nov 21 2003, 14:26:25) > [GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> import _sre > [24583 refs] > >>> _sre.RECURSION_LIMIT > 10000 > [24585 refs] > > ... > > ====================================================================== > ERROR: test_bug_418626 (__main__.ReTests) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "Lib/test/test_re.py", line 410, in test_bug_418626 > self.assertEqual(re.search('(a|b)*?c', 10000*'ab'+'cd').end(0), > 20001) File "/home/barry/projects/python23/Lib/sre.py", line 137, > in search return _compile(pattern, flags).search(string) > RuntimeError: maximum recursion limit exceeded > > ====================================================================== > ERROR: test_stack_overflow (__main__.ReTests) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "Lib/test/test_re.py", line 420, in test_stack_overflow > self.assertEqual(re.match('(x)*', 50000*'x').group(1), 'x') > File "/home/barry/projects/python23/Lib/sre.py", line 132, in match > return _compile(pattern, flags).match(string) > RuntimeError: maximum recursion limit exceeded Yup, that's how they fail on Windows today, and is how they're *expected* to fail everywhere today. > I'll leave it to someone else to check in the proper fix. I expect Anthony has the best shot at understanding why he did what he did before, so has the best shot at undoing it too without creating more new problems. > (But does anybody else like exposing RECURSION_LIMIT in the > _sre module?) For 2.3 maint it would be a new feature, so probably not. For 2.4, I believe all this code has become a mass of decoys (that is, it's still there, but is no longer used; I don't know why it hasn't been deleted) -- Gustavo reworked sre to stop using C-level recursion. BTW, Gustavo, we get a big pile of compiler warnings on the trunk (2.4 development) in _sre.c now, on Windows, and apparently under some-but-not-all gcc flavors. How about cleaning those up? See: http://mail.python.org/pipermail/python-dev/2003-October/039059.html From niemeyer at conectiva.com Fri Nov 21 15:00:56 2003 From: niemeyer at conectiva.com (Gustavo Niemeyer) Date: Fri Nov 21 15:01:46 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: References: <1069443361.2383.118.camel@anthem> Message-ID: <20031121200056.GA24592@burma.localdomain> > For 2.4, I believe all this code has become a mass of decoys (that is, > it's still there, but is no longer used; I don't know why it hasn't > been deleted) -- Gustavo reworked sre to stop using C-level recursion. > > BTW, Gustavo, we get a big pile of compiler warnings on the trunk (2.4 > development) in _sre.c now, on Windows, and apparently under > some-but-not-all gcc flavors. How about cleaning those up? See: > > http://mail.python.org/pipermail/python-dev/2003-October/039059.html Thanks for pointing me this. I'll manage to clean these issues (the code and the warnings) ASAP. -- Gustavo Niemeyer http://niemeyer.net From barry at python.org Fri Nov 21 15:17:36 2003 From: barry at python.org (Barry Warsaw) Date: Fri Nov 21 15:17:49 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: References: Message-ID: <1069445855.2383.122.camel@anthem> On Fri, 2003-11-21 at 14:56, Tim Peters wrote: > For 2.3 maint it would be a new feature, so probably not. > > For 2.4, I believe all this code has become a mass of decoys (that is, it's > still there, but is no longer used; I don't know why it hasn't been > deleted) -- Gustavo reworked sre to stop using C-level recursion. Works for me. or-did-ly y'rs, -Barry From tim.one at comcast.net Fri Nov 21 17:24:01 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Nov 21 17:24:06 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: <16318.6492.580944.89131@montanaro.dyndns.org> Message-ID: [Skip Montanaro] > As long as a primary motivator for a 2.3.3 release seems to be > weakref-related, perhaps someone who's familiar enough with their > usage could beef up the docs enough to get rid of this comment at the > top of the module doc: > > XXX -- need to say more here! I checked in more words (on the trunk and on 2.3 maint). Feel free to add even more . From anthony at interlink.com.au Fri Nov 21 21:23:07 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Fri Nov 21 21:23:33 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: Message-ID: <200311220223.hAM2N8E7007850@localhost.localdomain> >>> "Tim Peters" wrote > I expect Anthony has the best shot at understanding why he did what he did > before, so has the best shot at undoing it too without creating more new > problems. Sorry - I (and a bunch of other folks, Alex included if I recall correctly) was seeing a bunch of test failures in test_re - I ported the "fixed" tests from the trunk, in the assumption that the relevant change had been made to the branch. I'll undo it, once someone's fixed _sre in the branch to be broken again Anthony -- Anthony Baxter It's never too late to have a happy childhood. From tim.one at comcast.net Fri Nov 21 22:58:02 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Nov 21 22:58:13 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: <200311220223.hAM2N8E7007850@localhost.localdomain> Message-ID: [Anthony Baxter] > Sorry - I (and a bunch of other folks, Alex included if I recall > correctly) was seeing a bunch of test failures in test_re - I ported > the "fixed" tests from the trunk, in the assumption that the relevant > change had been made to the branch. I'll undo it, once someone's > fixed _sre in the branch to be broken again I checked in all the changes I thought were necessary. But as the checkin comment says, This needs fresh testing on all non-Win32 platforms ... Running the standard test_re.py is an adequate test. So start testing, or (my recommendation) upgrade to Win32 . From jeremy at alum.mit.edu Fri Nov 21 23:46:29 2003 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Fri Nov 21 23:49:17 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: References: Message-ID: <1069476389.22019.0.camel@localhost.localdomain> On Fri, 2003-11-21 at 22:58, Tim Peters wrote: > I checked in all the changes I thought were necessary. But as the checkin > comment says, > > This needs fresh testing on all non-Win32 platforms ... > Running the standard test_re.py is an adequate test. > > So start testing, or (my recommendation) upgrade to Win32 . Did a cvs update about 30 minutes ago. make test reports no errors. Running again with "-u all -r" to see what happens. Jeremy From jeremy at alum.mit.edu Sat Nov 22 00:10:05 2003 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Sat Nov 22 00:12:52 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: <1069476389.22019.0.camel@localhost.localdomain> References: <1069476389.22019.0.camel@localhost.localdomain> Message-ID: <1069477805.22019.2.camel@localhost.localdomain> On Fri, 2003-11-21 at 23:46, Jeremy Hylton wrote: > On Fri, 2003-11-21 at 22:58, Tim Peters wrote: > > I checked in all the changes I thought were necessary. But as the checkin > > comment says, > > > > This needs fresh testing on all non-Win32 platforms ... > > Running the standard test_re.py is an adequate test. > > > > So start testing, or (my recommendation) upgrade to Win32 . > > Did a cvs update about 30 minutes ago. make test reports no errors. > Running again with "-u all -r" to see what happens. Also looks good. This was with a RH9 system. Jeremy From skip at pobox.com Sat Nov 22 00:13:15 2003 From: skip at pobox.com (Skip Montanaro) Date: Sat Nov 22 00:13:25 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: References: <200311220223.hAM2N8E7007850@localhost.localdomain> Message-ID: <16318.61547.384955.115515@montanaro.dyndns.org> Tim> So ... upgrade to Win32 . I'll consider that after you've been in charge of software development at Microsoft for a couple years. Skip From skip at pobox.com Sat Nov 22 01:23:51 2003 From: skip at pobox.com (Skip Montanaro) Date: Sat Nov 22 01:23:59 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: <1069476389.22019.0.camel@localhost.localdomain> References: <1069476389.22019.0.camel@localhost.localdomain> Message-ID: <16319.247.594634.98507@montanaro.dyndns.org> >> This needs fresh testing on all non-Win32 platforms ... >> Running the standard test_re.py is an adequate test. >> >> So start testing, or (my recommendation) upgrade to Win32 . Jeremy> Did a cvs update about 30 minutes ago. make test reports no Jeremy> errors. Running again with "-u all -r" to see what happens. "regrtest.py -u all -r" worked for me on Mac OS X. Skip From raymond.hettinger at verizon.net Sat Nov 22 01:47:12 2003 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Sat Nov 22 01:47:44 2003 Subject: [Python-Dev] copy() and deepcopy() Message-ID: <000c01c3b0c4$772fb640$8fbb958d@oemcomputer> I would like to confirm my understanding of copying and its implications. A shallow copy builds only a new outer shell and leaves the inner references unchanged. If the outer object is immutable, then a copy might as well be the original object. So, in the copy module, the copy function for tuples should just return the original object (the function looks like it does more but actually does return itself). And, since a frozenset is immutable, its copy function should also just return self. The point of a deepcopy is to replace each sub-component (at every nesting level) that could possibly change. Since sets can only contain hashable objects which in turn can only contain hashable objects, I surmise that a shallowcopy of a set would also suffice as its deepcopy. IOW: For frozensets, shallowcopy == deepcopy == self For sets, shallowcopy == deepcopy == set(list(self)) # done with PyDict_Copy() Raymond Hettinger -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20031122/3de21c58/attachment.html From anthony at interlink.com.au Sat Nov 22 02:39:35 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Sat Nov 22 02:40:00 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: <2mk75toqve.fsf@starship.python.net> Message-ID: <200311220739.hAM7dZ7n016749@localhost.localdomain> >>> Michael Hudson wrote > We should give the new autoconf a go, at least. I would strongly prefer to do this sooner than later, so I was thinking of doing the upgrade sometime this week. Does anyone have/know any reasons to not upgrade to the newer autoconf? It should fix a bunch of build annoyances (and I can get rid of aclocal.m4) Anthony -- Anthony Baxter It's never too late to have a happy childhood. From martin at v.loewis.de Sat Nov 22 06:48:48 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Sat Nov 22 06:49:28 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: <20031121195057.GA24270@burma.localdomain> References: <1069440231.2383.95.camel@anthem> <20031121195057.GA24270@burma.localdomain> Message-ID: Gustavo Niemeyer writes: > It looks to be this patch's fault: [...] > Patch #813391: Reduce limits for amd64 and sparc64. Sorry for causing so much confusion, and thanks to Tim for fixing it. Regards, Martin From barry at python.org Sat Nov 22 07:59:56 2003 From: barry at python.org (Barry Warsaw) Date: Sat Nov 22 08:00:12 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: <1069477805.22019.2.camel@localhost.localdomain> References: <1069476389.22019.0.camel@localhost.localdomain> <1069477805.22019.2.camel@localhost.localdomain> Message-ID: <1069505993.2383.172.camel@anthem> On Sat, 2003-11-22 at 00:10, Jeremy Hylton wrote: > > Did a cvs update about 30 minutes ago. make test reports no errors. > > Running again with "-u all -r" to see what happens. > > Also looks good. This was with a RH9 system. Unfortunately, no so for me: test_mimetypes test test_mimetypes failed -- Traceback (most recent call last): File "/home/barry/projects/python23/Lib/test/test_mimetypes.py", line 52, in test_guess_all_types eq(all, ['.bat', '.c', '.h', '.ksh', '.pl', '.txt']) File "/home/barry/projects/python23/Lib/unittest.py", line 302, in failUnlessEqual raise self.failureException, \ AssertionError: ['.asc', '.bat', '.c', '.h', '.ksh', '.pl', '.txt'] != ['.bat', '.c', '.h', '.ksh', '.pl', '.txt'] But we've seen these before, right? Doesn't some test interfere with globals in a way that screws mimetypes occasionally? -Barry From aahz at pythoncraft.com Sat Nov 22 09:21:19 2003 From: aahz at pythoncraft.com (Aahz) Date: Sat Nov 22 10:31:20 2003 Subject: [Python-Dev] copy() and deepcopy() In-Reply-To: <000c01c3b0c4$772fb640$8fbb958d@oemcomputer> References: <000c01c3b0c4$772fb640$8fbb958d@oemcomputer> Message-ID: <20031122142119.GA23946@panix.com> On Sat, Nov 22, 2003, Raymond Hettinger wrote: > > The point of a deepcopy is to replace each sub-component (at every > nesting level) that could possibly change. Since sets can only contain > hashable objects which in turn can only contain hashable objects, I > surmise that a shallowcopy of a set would also suffice as its deepcopy. Thing is, it *is* possible to have a mutable and hashable object. The hashable part needs to be immutable, but not the rest. Consider dicts in the generic sense: the key needs to be immutable, but the value need not, and it certainly can be useful to combine key/value into a single object. Now, I'm still not sure that your analysis is wrong, but I wanted to be very, very clear that hashability is not the same thing as immutability. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Weinberg's Second Law: If builders built buildings the way programmers wrote programs, then the first woodpecker that came along would destroy civilization. From akma43 at umkc.edu Sat Nov 22 13:22:54 2003 From: akma43 at umkc.edu (Avneet Mathur) Date: Sat Nov 22 13:20:59 2003 Subject: [Python-Dev] Help Message-ID: <000601c3b125$a6e90060$5502a8c0@zeratec> Hi group, I have been given a problem and as I am novice in Python, I am asking for the help of you experts. I am supposed to read in a file, search for in the opened file an expression like this from a list of similar expressions and print out Hello world. {(inp:han) } Thus the expression in <> after (inp:han) has to be printed out. Please help! Is there any way to output this to the browser! Thanks a lot in advance. Avneet Mathur -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20031122/e3ecfcb2/attachment.html From andymac at bullseye.apana.org.au Fri Nov 21 16:39:34 2003 From: andymac at bullseye.apana.org.au (Andrew MacIntyre) Date: Sat Nov 22 13:27:20 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: <1069435342.2383.69.camel@anthem> References: <20031121162253.GA23299@burma.localdomain> <20031121165428.GA27853@burma.localdomain> <2m7k1todgq.fsf@starship.python.net> <1069435342.2383.69.camel@anthem> Message-ID: <20031122081535.W77270@bullseye.apana.org.au> On Fri, 21 Nov 2003, Barry Warsaw wrote: > FWIW, I'm having much more problems with 2.3cvs on RH7.3. test_re.py > core dumps for me for instance. I'm doing a fresh build --with-pydebug > and will try to get more information. sre in 2.3x is compiler sensitive - the stack frame size becomes critical in how many sre recursions are supported, and a core dump is certain if the sre recursion limit is more than the available stack space allows. Threads support may be mixed in with this, as the size of the stack for the primary or initial thread is what gets exercised by test_re. On FreeBSD 4.x the stack size for this thread is fixed at 1MB (pthreads implementation limitation, not OS limit). gcc versions < 3.0 don't cause problems with the default sre recursion limit of 10000, but later versions do. So I'd suggest trying a lower sre recursion limit to see whether this helps. -- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au (pref) | Snail: PO Box 370 andymac@pcug.org.au (alt) | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia From tim.one at comcast.net Sat Nov 22 14:31:14 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Nov 22 14:32:14 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: <1069505993.2383.172.camel@anthem> Message-ID: >>> Did a cvs update about 30 minutes ago. make test reports no errors. >>> Running again with "-u all -r" to see what happens. >> Also looks good. This was with a RH9 system. [Barry Warsaw] > Unfortunately, no so for me: > > test_mimetypes > test test_mimetypes failed -- Traceback (most recent call last): > File "/home/barry/projects/python23/Lib/test/test_mimetypes.py", > line 52, in test_guess_all_types > eq(all, ['.bat', '.c', '.h', '.ksh', '.pl', '.txt']) > File "/home/barry/projects/python23/Lib/unittest.py", line 302, > in failUnlessEqual > raise self.failureException, \ > AssertionError: ['.asc', '.bat', '.c', '.h', '.ksh', '.pl', '.txt'] > != ['.bat', '.c', '.h', '.ksh', '.pl', '.txt'] > > But we've seen these before, right? Doesn't some test interfere with > globals in a way that screws mimetypes occasionally? googling on test_guess_all_types nails it: http://mail.python.org/pipermail/python-dev/2003-September/038264.html Jeff Epler reported there, in a reply to you about the same thing in 2.3.1, that test_urllib2 interferes with test_mimetypes (when run in that order), and included a patch claimed to fix it. Of course, since he didn't put the patch on SF, it just got lost. From martin at v.loewis.de Sat Nov 22 16:38:04 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Sat Nov 22 16:38:31 2003 Subject: [Python-Dev] Help In-Reply-To: <000601c3b125$a6e90060$5502a8c0@zeratec> References: <000601c3b125$a6e90060$5502a8c0@zeratec> Message-ID: "Avneet Mathur" writes: > I have been given a problem and as I am novice in Python, I am asking > for the help of you experts. Dear Avneet Mathur, Please post your question to python-list@python.org, or any other Python "users" lists. python-dev is for the development of Python. Regards, Martin From guido at python.org Sat Nov 22 17:48:22 2003 From: guido at python.org (Guido van Rossum) Date: Sat Nov 22 17:46:55 2003 Subject: [Python-Dev] copy() and deepcopy() In-Reply-To: Your message of "Sat, 22 Nov 2003 01:47:12 EST." <000c01c3b0c4$772fb640$8fbb958d@oemcomputer> References: <000c01c3b0c4$772fb640$8fbb958d@oemcomputer> Message-ID: <200311222248.hAMMmMm02546@c-24-5-183-134.client.comcast.net> > I would like to confirm my understanding of copying and its > implications. > > A shallow copy builds only a new outer shell and leaves the inner > references unchanged. If the outer object is immutable, then a copy > might as well be the original object. So, in the copy module, the copy > function for tuples should just return the original object (the function > looks like it does more but actually does return itself). And, since a > frozenset is immutable, its copy function should also just return self. Right. (I have no idea why _copy_tuple(x) doesn't return x; it feels like superstition or copy-paste from _copy_list().) > The point of a deepcopy is to replace each sub-component (at every > nesting level) that could possibly change. Since sets can only contain > hashable objects which in turn can only contain hashable objects, I > surmise that a shallowcopy of a set would also suffice as its deepcopy. No. Look at what _deepcopy_tuple() does. There could be an object that implements __hash__ but has some instance variable that could be mutated but isn't part of the hash. > IOW: > For frozensets, shallowcopy == deepcopy == self > For sets, shallowcopy == deepcopy == set(list(self)) # done with > PyDict_Copy() No. For frozensets, shallow copy should return self; for sets, shallow copy should return set(self). In both cases, deepcopy() should do something like _deepcopy_list() and _deepcopy_tuple(), respectively. That is, deepcopying a set is pretty straightforward, but must store self in the memo first, so that (circular!) references to self are correctly deepcopied. Deepcopying a frozenset will be a little harder, because there can still be circular references! _deepcopy_tuple() shows how to do it. --Guido van Rossum (home page: http://www.python.org/~guido/) From bac at OCF.Berkeley.EDU Sat Nov 22 17:48:58 2003 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Sat Nov 22 17:49:10 2003 Subject: [Python-Dev] How Python is Developed essay (final rough draft) Message-ID: <3FBFE7DA.2030601@ocf.berkeley.edu> OK, since I want to have this thing finished and online (plus I need this finished for submitting to PyCon) I am making this the final rough draft. This means this is last call on corrections and changes before it hopefully makes its public debut (up to pydotorg and whether anyone objects to me putting it up on python.org/dev/ ). Respond with any comment, corrections, etc. And the sooner the better since I am hoping to get it up some time next week. ---------------------------- Guido, Some Guys, and a Mailing List: How Python is Developed +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ by Brett Cannon (brett at python.org) Introduction ============ Software does not make itself. Code does not spontaneously come from the ether of the universe. Python_ is no exception to this rule. Since Python made its public debut back in 1991 many people beyond the BDFL (Benevolent Dictator For Life, `Guido van Rossum`_) have helped contribute time and energy to making Python what it is today; a powerful, simple programming language available to all. But it has not been a random process of people doing whatever they wanted to Python. Over the years a process to the development of Python has emerged by the group that heads Python's growth and maintenance; `python-dev`_. This document is an attempt to write this process down in hopes of lowering any barriers possibly preventing people from contributing to the development of Python. .. _Python: http://www.python.org/ .. _Guido van Rossum: http://www.python.org/~guido/ .. _python-dev:http://mail.python.org/mailman/listinfo/python-dev Tools Used ========== To help facilitate the development of Python, certain tools are used. Beyond the obvious ones such as a text editor and email client, two tools are very pervasive in the development process. SourceForge_ is used by python-dev to keep track of feature requests, reported bugs, and contributed patches. A detailed explanation on how to use SourceForge is covered later in `General SourceForge Guidelines`_. CVS_ is a networked file versioning system that stores all of files that make up Python that is currently hosted on SourceForge. It allows the developers to have a single repository for the files along with being able to keep track of any and all changes to every file. The basic commands and uses can be found in the `dev FAQ`_ along with a multitude of tutorials spread across the web. .. _SourceForge: http://sourceforge.net/projects/python/ .. _CVS: http://www.cvshome.org/ .. _dev FAQ: http://www.python.org/dev/devfaq.html Communicating ============= Python development is not just programming. It requires a great deal of communication between people. This communication is not just between the members of python-dev; communication within the greater Python community also helps with development. Several mailing lists and newsgroups are used to help organize all of these discussions. In terms of Python development, the primary location for communication is the `python-dev`_ mailing list. This is where the members of python-dev hash out ideas and iron out issues. It is an open list; anyone can subscribe to the mailing list. While the discussion can get quite technical, it is not all out of the reach for even a novice and thus should not discourage anyone from joining the list. Please realize, though, this list is meant for the discussion of the development of Python; all other questions should be directed somewhere else, such as `python-list`_. Along with this, a level of etiquette is expected to be maintained. A lack of manners will not be tolerated. When the greater Python community is involved in a discussion, it always ends up on `python-list`_. This mailing list is a gateway to the newsgroup `comp.lang.python`_. This is also a good place to go when you have a question about Python that does not pertain to the actual development of the language. Using CVS_ allows the development team to know who made a change to a file and when they made their change. But unless one wants to continuously update their local checkout of the repository, the best way to stay on top of changes to the repository is to subscribe to `Python-checkins`_. This list sends out an email for each and every change to a file in Python. This list can generate a large amount of traffic since even changing a typo in some text will trigger an email to be sent out. But if you wish to be kept abreast of all changes to Python then this is a good way to do so. The Patches_ mailing list sends out an email for all changes to patch items on SourceForge_. This list, just like Python-checkins, can generate a large amount of email traffic. It is in general useful to people who wish to help out with the development of Python by knowing about all new submitted patches as well as any new developments on preexisting ones. `Python-bugs-list`_ functions much like the Patches mailing list except it is for bug items on SourceForge. If you find yourself wanting to help to close and remove bugs in Python this is the right list to subscribe to if you can handle the volume of email. .. _python-list: http://mail.python.org/mailman/listinfo/python-list .. _comp.lang.python: http://groups.google.com/groups?q=comp.lang.python .. _Python-checkins: http://mail.python.org/mailman/listinfo/python-checkins .. _Patches: http://mail.python.org/mailman/listinfo/patches .. _Python-bugs-list: http://mail.python.org/mailman/listinfo/python-bugs-list The Actual Development ====================== Developing Python is not all just conversations about neat new language features (although those neat conversations do come up and there is a process to it). Developing Python also involves maintaining it by eliminating discovered bugs, adding and changing features, and various other jobs that are not necessarily glamorous but are just as important to the language as anything else. General SourceForge Guidelines ------------------------------ Since a good amount of Python development involves using SourceForge_, it is important to follow some guidelines when handling a tracker item (bug, patch, etc.). Probably one of the most important things you can do is make sure to set the various options in a new tracker item properly. The submitter should make sure that the Data Type, Category, and Group are all set to reasonable values. The remaining values (Assigned To, Status, and Resolution) should in general be left to Python developers to set. The exception to this rule is when you want to retract a patch; then "close" the patch by setting Status to "closed" and Resolution to whatever is appropriate. Make sure you do a cursory check to make sure what ever you are submitting was not previously submitted by someone else. Duplication just uses up valuable time. And **please** do not post feature requests, bug reports, or patches to the python-dev mailing list. If you do you will be instructed to create an appropriate SourceForge tracker item. When in doubt as to whether you should bring something to python-dev's attention, you can always ask on `comp.lang.python`_; Python developers actively participate there and move the conversation over if it is deemed reasonable. Feature Requests ---------------- `Feature requests`_ are for features that you wish Python had but you have no plans on actually implementing by writing a patch. On occasion people do go through the features requests (also called RFEs on SourceForge) to see if there is anything there that they think should be implemented and actually do the implementation. But in general do not expect something put here to be implemented without some participation on your part. The best way to get something implemented is to campaign for it in the greater Python community. `comp.lang.python`_ is the best place to accomplish this. Post to the newsgroup with your idea and see if you can either get support or convince someone to implement it. It might even end up being added to `PEP 42`_ so that the idea does not get lost in the noise as time passes. .. _feature requests: http://sourceforge.net/tracker/?group_id=5470&atid=355470 .. _PEP 42: http://www.python.org/peps/pep-0042.html Bug Reports ----------- Think you found a bug? Then submit a `bug report`_ on SourceForge. Make sure you clearly specify what version of Python you are using, what OS, and under what conditions the bug was triggered. The more information you can give the faster the bug can be fixed since time will not be wasted requesting more information from you. .. _bug report: http://sourceforge.net/tracker/?group_id=5470&atid=105470 Patches ------- Create a patch_ tracker item on SourceForge for any code you think should be applied to the Python CVS tree. For practically any change to Python's functionality the documentation and testing suite will need to be changed as well. Doing this in the first place speeds things up considerably. Please make sure your patch is against the CVS repository. If you don't know how to use it (basics are covered in the `dev FAQ`_), then make sure you specify what version of Python you made your patch against. In terms of coding standards, `PEP 8`_ specifies for Python while `PEP 7`_ specifies for C. Always try to maximize your code reuse; it makes maintenance much easier. For C code make sure to limit yourself to ANSI C code as much as possible. If you must use non-ANSI C code then see if what you need is checked for by looking in pyconfig.h . You can also look in Include/pyport.h for more helpful C code. If what you need is still not there but it is in general available, then add a check in configure.in for it (don't forget to run autoreconf to make the changes to take effect). And if that *still* doesn't fit your needs then code up a solution yourself. The reason for all of this is to limit the dependence on external code that might not be available for all OSs that Python runs on. Be aware of intellectual property when handling patches. Any code with no copyright will fall under the copyright of the `Python Software Foundation`_. If you have no qualms with that, wonderful; this is the best solution for Python. But if you feel the need to include a copyright then make sure that it is compatible with copyright used on Python (i.e., BSD-style). The best solution, though, is to sign the copyright over to the Python Software Foundation. .. _patch: http://sourceforge.net/tracker/?group_id=5470&atid=305470 .. _dev FAQ: http://www.python.org/dev/devfaq.html .. _PEP 7: http://www.python.org/peps/pep-0007.html .. _PEP 8: http://www.python.org/peps/pep-0008.html .. _Python Software Foundation: http://www.python.org/psf/ Changing the Language ===================== You understand how to file a patch. You think you have a great idea on how Python should change. You are ready to write code for your change. Great, but you need to realize that certain things must be done for a change to be accepted. Changes fall into two categories; changes to the standard library (referred to as the "stdlib") and changes to the language proper. Changes to the stdlib --------------------- Changes to the stdlib can consist of adding functionality or changing existing functionality. Adding minor functionality (such as a new function or method) requires convincing a member of python-dev that the addition of code caused by implementing the feature is worth it. A big addition such as a module tends to require more support than just a single member of python-dev. As always, getting community support for your addition is a good idea. With all additions, make sure to write up documentation for your new functionality. Also make sure that proper tests are added to the testing suite. If you want to add a module, be prepared to be called upon for any bug fixes or feature requests for that module. Getting a module added to the stdlib makes you by default its maintainer. If you can't take that level of responsibility and commitment and cannot get someone else to take it on for you then your battle will be very difficult; when there is not a specific maintainer of code python-dev takes responsibility and thus your code must be useful to them or else they will reject the module. There is also the possibility of having to write a PEP_ (read about PEPs in `Changing the Language Proper`_). Changing existing functionality can be difficult to do if it breaks backwards-compatibility. If your code will break existing code, you must provide a legitimate reason on why making the code act in a non-compatible way is better than the status quo. This requires python-dev as a whole to agree to the change. Changing the Language Proper ---------------------------- Changing Python the language is taken **very** seriously. Python is often heralded for its simplicity and cleanliness. Any additions to the language must continue this tradition and view. Thus any changes must go through a long process. First, you must write a PEP_ (Python Enhancement Proposal). This is basically just a document that explains what you want, why you want it, what could be bad about the change, and how you plan on implementing the change. It is best to get feedback on PEPs on `comp.lang.python`_ and from python-dev. Once you feel the document is ready you can request a PEP number and to have it added to the official list of PEPs in `PEP 0`_. Once you have a PEP, you must then convince python-dev and the BDFL that your change is worth it. Be expected to be bombarded with questions and counter-arguments. It can drag on for over a month, easy. If you are not up for that level of discussion then do not bother with trying to get your change in. If you manage to convince a majority of python-dev and the BDFL (or most of python-dev; that can lead to the BDFL changing his mind) then your change can be applied. As with all new code make sure you also have appropriate documentation patches along with tests for the new functionality. .. _PEP: http://www.python.org/peps/pep-0001.html .. _PEP 0: http://www.python.org/peps/pep-0000.html Helping Out =========== Many people say they wish they could help out with the development of Python but feel they are not up to writing code. There are plenty of things one can do, though, that does not require you to write code. Regardless of your coding abilities, there is something for everyone to help with. For feature requests, adding a comment about what you think is helpful. State whether or not you would like to see the feature. You can also volunteer to write the code to implement the feature if you feel up to it. For bugs, stating whether or not you can reproduce the bug yourself can be extremely helpful. If you can write a fix for the bug that is very helpful as well; start a patch item and reference it in a comment in the bug item. For patches, apply the patch and run the testing suite. You can do a code review on the patch to make sure that it is good, clean code. If the patch adds a new feature, comment on whether you think it is worth adding. If it changes functionality then comment on whether you think it might break code; if it does, say whether you think it is worth the cost of breaking existing code. Help add to the patch if it is missing documentation patches or needed regression tests. A special mention about adding a file to a tracker item. Only official developers and the creator of the tracker item can add a file. This means that if you want to add a file and you are neither of the types of people just mentioned you have to do an extra step or two. One thing you can do is post the file you want added somewhere else online and reference the URL in a comment. You can also create a new patch item if you feel the change is thorough enough and cross-reference between both patches in the comments. Be wary of this last option, though, since some people might be offended since it might come off as if you think there code is bad and yours is better. The best solution of all is to work with the original poster if they are receptive to help. But if they do not respond or are not friendly then do go ahead and do one of the other two suggestions. For language changes, make your voice be heard. Comment about any PEPs on `comp.lang.python`_ so that the general opinion of the community can be assessed. If there is nothing specific you find you want to work on but still feel like contributing nonetheless, there are several things you can do. The documentation can always use fleshing out. Adding more tests to the testing suite is always useful. Contribute to discussions on python-dev, `comp.lang.python`_, or one of the `SIGs`_ (Special Interest Groups). Just helping out in the community by spreading the word about Python or helping someone with a question is helpful. If you really want to get knee-deep in all of this, join python-dev. Once you have been actively participating for a while and are generally known on python-dev you can request to have checkin rights on the CVS tree. It is a great way to learn how to work in a large, distributed group along with how to write great code. And if all else fails give money; the `Python Software Foundation`_ is a non-profit organization that accepts donations that are tax-deductible in the United States. The funds are used for various thing such as lawyers for handling the intellectual property of Python to funding PyCon_. But the PSF could do a lot more if they had the funds. One goal is to have enough money to fund having Guido work on Python for a full year full-time; this would bring about Python 3. Every dollar does help, so please contribute if you can. .. _SIGs: http://www.python.org/sigs/ .. _PyCon: http://www.python.org/pycon/ Conclusion ========== If you get any message from this document, it should be that *anyone* can help with the development of Python. All help is greatly appreciated and keeps the language the wonderful piece of software that it is. From bac at OCF.Berkeley.EDU Sat Nov 22 17:56:07 2003 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Sat Nov 22 17:56:17 2003 Subject: [Python-Dev] Thesis ideas list Message-ID: <3FBFE987.2050203@ocf.berkeley.edu> As requested, here is a annotated list of the ideas that I received for my masters thesis. I tried to do a decent job of referencing where info is and summarizing the emails I received. I also did **very** rough commenting on each of them in trying to understand them and to see if I thought they would make a good thesis for me. If you have something to contribute to the list, please do so. Don't bother with spelling and grammatical fixes, though, since this is just for my personal use and for anyone interested in the ideas; it will not see the light of day on python.org or anything unless someone else decides to put the effort into that. I am planning to go in and talk with my thesis advisor after Thanksgiving break (next week) to try to narrow down the list so I can have a thesis topic chosen by Jan 1. ---------------------------------- ===== Misc. ===== Annotations ----------- from Martin: http://mail.python.org/pipermail/python-dev/2003-October/039768.html and http://mail.python.org/pipermail/python-dev/2003-October/039809.html Similar to attributes as done in .NET . Michael's ``func()[]`` syntax might pull off what Martin wants. For an overview of attributes in C#, see http://www.ondotnet.com/pub/a/dotnet/excerpt/prog_csharp_ch18/index.html?page=1 . They appear to be a way to connect data with code. Using reflection you can find out what attributes are attached to an object. You can control what types of objects an attribute can be bound to along with specifying arguments. It seems like the compiler has some built-in attributes that it uses to process the code when available. Since it not only just attaches info to an object, but it is used by the language to modify the code. It seems like the func()[] syntax along with Python's dynamic attribute creation covers this, just without the built-in syntax. Could be interesting to come up with a variance on descriptors for assigning to something like __metadict__. If it is a data descriptor it is just attached as info to the ojbect. If it a non-data descriptor, though, it gets the code object passed to it. The only perk of this is a way to have info attached in a more abstracted way than just sticking the info into __dict__ and making sure you don't overwrite the value (in other words __metadict__ would not come into play for name resolution). Basically just standard place to store metadata. Martin's suggestion was having an attribute that would automatically create an XML-RPC interface for a code object. That might be doable as a metaclass, but that could get complicated and messy. If you could do something like:: def meth() [xml-rpcbuilder]: pass and have 'meth' automatically get an ``annotation(meth, 'xml-rpc')`` ability that returns a wrapper implementing an XML-RPC interface that might be cool. You could do this now with a function that takes something, creates a wrapper, and then stores it on the object so that it does not have to be recreated every time. But that becomes an issue of overwriting values in the object. Having a place for metadata would lesson that problem somewhat. All in all it *might* be a good thing, but with Python dynamicism I don't see a use-case for it at this moment. Work on PyPy ------------ from `Holger `__ Vague statement that one could work on PyPy. OSCON 2003 paper at http://codespeak.net/pypy/index.cgi?doc/oscon2003-paper.html . PyPy's EU funding proposal has interesting parts at http://codespeak.net/pypy/index.cgi?doc/funding/B1.0 and http://codespeak.net/pypy/index.cgi?doc/funding/B6.0 . Multiple dispatch ----------------- from `Neil `__ Look at Dylan_ and Goo_ for inspiration. An explanation of how Dylan does multiple dispatch can be seen at http://www.gwydiondylan.org/gdref/tutorial/multiple-dispatch.html . Multiple dispatch is basically a mechanism of registering a group of methods under one generic method that then calls the registered methods based on whether the parameter lists can match the arguments being passed. If there is more than one match then they are ordered in terms of how "specific" they are; if a parameter requires a subclass or the actual class it is less specific. Methods can then call a special method that will call the next method in the calculated order. The issue with this in terms of Python is how to handle comparing the arguments given to a method when a parameter list is just so damn vague. If you have the parameter lists, ``def A(arg1, arg2, *args)`` and ``def B(*args)``, which one is more specific? The other issue is that since Python has not parameter type-checking beyond argument counts you can't base whether a method is more specific or not on the type or arguments. In order for this to be built into the language one would have to add type-checking first. Otherwise one would need to have all of this be external to the language. It should be doable in terms of Python code now. Building it into the language might be nice, but without type checking I don't know how useful it would be. .. _Dylan: http://www.gwydiondylan.org/drm/drm_1.htm .. _Goo: http://www.ai.mit.edu/~jrb/goo/manual/goomanual.html Static analysis of Python C code -------------------------------- from Neal (private email) Look at the work done by `Dawson Engler`_. Could check for missing DECREF/INCREFs, null pointer dereferences, threading issues, etc. Appears the research was developing a system to check that basic rules were met for code (returned values were checked, disabled interrupts get re-enabled, etc.). .. _Dawson Engler: http://www.stanford.edu/~engler/ ====== Memory ====== Mark-and-sweep GC ----------------- from `Neil `__ Only really worth it in terms of code complexity (does C code become easier? How hard to move existing extension modules over?) and to measure performance difference. Chicken GC ---------- from `Neil `__ with more ideas from `Samuele `__ and `Phillip Eby `__ Chicken_ has its GC covered in a paper entitled "`Cheney on the M.T.A.`_". Seems to be the one Neil likes the most. Interestingly, Chicken (which is a Scheme-to-C compiler) does all memory allocation on the stack. .. _Chicken: http://www.call-with-current-continuation.org/chicken.html .. _Cheney on the M.T.A.: http://citeseer.nj.nec.com/baker94cons.html Boehm-Demers-Weiser collector ----------------------------- from `Jeremy `__ The collector can be found at http://www.hpl.hp.com/personal/Hans_Boehm/gc/index.html . It is a generic mark-and-sweep collector that has been designed to be portable and easy to use. Analyze memory usage -------------------- from `Jeremy `__ Apparently `some guys`_ claim that a high-performance, general memory allocator works better than a bunch of custom allocators (Python has a bunch of the latter). .. _some guys: http://citeseer.nj.nec.com/berger01reconsidering.html ========= Threading ========= Provide free threading efficiently ---------------------------------- from `Martin `__ `In the free threading model, a client app may call any object method ... from any thread at any time. The object must serialize access to all of its methods to whatever extent it requires to keep incoming calls from conflicting, providing the maximum performance and flexibility. `__. In other words you shouldn't have to do any locking to do a method call. MP threading ------------ from `Dennis Allison `__ Try to eliminate the serialization of Python code execution because of the GIL. Look at research by `Maurice Herlihy`_ and `Kourosh Gharachorloo`_. .. _Maurice Herlihy: http://www.cs.brown.edu/people/mph/home.html .. _Kourosh Gharachorloo: http://research.compaq.com/wrl/people/kourosh/bio.html ========= Compiling ========= Python to C ----------- from `Fernando Perez `__ `Pat Miller`_ presented a paper on this for scientific work at SciPy 2003. Can look to Squeak_ for inspiration. .. _Pat Miller: http://www.llnl.gov/CASC/people/pmiller/ .. _Squeak: http://www.squeak.org/ Finish AST branch ----------------- from `Neil `__ No research left, but could lead to macros_. Macros ------ from `Jeremy `__ Once access to an AST is available, macros are doable. Lisp's macros work so well because of quasiquotation_. In order for this to work in Python, though, you need some other way to handle it; either through the AST like in Maya_ or the CST as in JSE_. Something else to look at is Polyglot_ (what Jeremy wishes the compiler package had become). .. _quasiquotation: http://citeseer.nj.nec.com/bawden99quasiquotation.html .. _Maya: http://citeseer.nj.nec.com/baker02maya.html .. _JSE: http://citeseer.nj.nec.com/context/1821961/0 .. _Polyglot: http://www.cs.cornell.edu/Projects/polyglot/ Refactoring code editor for AST ------------------------------- from `Neil `__ Integrating XML and SQL into the language ----------------------------------------- from `Jeremy `__ Seems to be to make XML and SQL first-class citizens in Python. Based on the work of `Erik Meijer`_. Paper at http://www.research.microsoft.com/~emeijer/Papers/XML2003/xml2003.html with his main research page at http://research.microsoft.com/~emeijer/ . .. _Erik Meijer: http://blogs.gotdotnet.com/emeijer/ Optional type checking ---------------------- from me, but with support from Guido (private email) Guido thinks it is "one tough problem". He suggested looking at the `types-sig archive`_ for ideas. Guido would love to have someone sanctioned to tackle this problem. Might be much easier to do if limited to only parameter lists. Doing that minimal amount would allow for a better multiple dispatch implementation. It would also allow for a rudimentary form of polymorphism based on parameter signatures. .. _types-sig archive: http://www.python.org/pipermail/types-sig/ Type inferencing ---------------- from `Martin `__ Either run-time or compile-time. "Overlap with the specializing compilers". Register-based VM ----------------- from Neal (private email) Should get a nice performance improvement. Look at Skip and Neil's rattler VM. Would be a step towards hooking Python into GCC for assembly code generation. Lower-level bytecode -------------------- from Neal (private email) Supposedly Java's bytecode is fairly low-level. Would make the transition to a register-based VM easier. Also would make compiling to machine code or JIT compilation simpler. An IBM developerWorks article on Java bytecode is available at http://www-106.ibm.com/developerworks/ibm/library/it-haggar_bytecode/ . Could look at assembly languages (RISC and CISC) and other VMs for ideas on bytecodes. ========= Execution ========= Portable floating point ----------------------- from Martin: http://mail.python.org/pipermail/python-dev/2003-October/039768.html and http://mail.python.org/pipermail/python-dev/2003-October/039809.html Come up with code on a per-platform basis to make up for problems on that platform's FPU implementation. Compare to how Python just provides the CPU's implementation while Java guarantees a specific semantic behavior by providing the needed code to make it the same on all platforms. Martin suggested looking at Java's strictfp mode (which was added after Java 1.0). See http://developer.java.sun.com/developer/JDCTechTips/2001/tt0410.html#using on its usage. Save interpreter state to disk ------------------------------ from `Martin `__ Similar to Smalltalk's images. Would be nice since would provide a fail-safe mechanism for long-running processes. Could also help with debugging by being able to pass around state of a program just before an error occurs. Deterministic Finalization -------------------------- from Martin: http://mail.python.org/pipermail/python-dev/2003-October/039768.html and http://mail.python.org/pipermail/python-dev/2003-October/039809.html Having objects implicitly destroyed at certain points. Example is threaded code (in Python):: def bump_counter(self): self.mutex.acquire() try: self.counter = self.counter+1 more_actions() finally: self.mutex.release() In C++, you do:: void bump_counter(){ MutexAcquistion acquire(this); this->counter+=1; more_actions(); which is nice since you don't have to explicitly release the lock. Optimize global namespace access -------------------------------- from `Neil `__ and `Jeremy `__ Look at `PEP 267`_ and Jeremy's `Faster Namespace`_ slides from 10th Python conference. Neil pointed out that "If we can disallow inter-module shadowing of names the job becomes easier" (e.g., making ``import Foo; Foo.len = 42`` illegal). .. _PEP 267: http://www.python.org/peps/pep-0267.html .. _Faster Namespace: http://www.python.org/~jeremy/talks/spam10/PEP-267-1.html Restricted execution -------------------- from Andrew Bennett (private email) See the python-dev archives and Summaries for more painful details. Tail Recursion -------------- from Me (my brain) Have proper tail recursion in Python. Would require identifying where a direct function call is returned (could keep it simple and just do it where CALL_FUNCTION and RETURN bytecodes are in a row). Also have to deal with exception catching since that requires the frame to stay alive to handle the exception. But getting it to work well could help with memory and performance. Don't know if it has been done for a language that had exception handling. From python at rcn.com Sat Nov 22 18:02:38 2003 From: python at rcn.com (Raymond Hettinger) Date: Sat Nov 22 18:03:10 2003 Subject: [Python-Dev] copy() and deepcopy() In-Reply-To: <200311222248.hAMMmMm02546@c-24-5-183-134.client.comcast.net> Message-ID: <005201c3b14c$b9c608a0$6523a044@oemcomputer> [Aahz] > Thing is, it *is* possible to have a mutable and hashable object. The > hashable part needs to be immutable, but not the rest. Consider dicts in > the generic sense: the key needs to be immutable, but the value need not, > and it certainly can be useful to combine key/value into a single object. > Now, I'm still not sure that your analysis is wrong, but I wanted to be > very, very clear that hashability is not the same thing as immutability. [Guido] > For frozensets, shallow copy should return self; for sets, shallow > copy should return set(self). > > In both cases, deepcopy() should do something like _deepcopy_list() > and _deepcopy_tuple(), respectively. That is, deepcopying a set is > pretty straightforward, but must store self in the memo first, so that > (circular!) references to self are correctly deepcopied. Deepcopying > a frozenset will be a little harder, because there can still be > circular references! _deepcopy_tuple() shows how to do it. Thanks guys. It's all clear now. The good news is that nothing special has to be done to implement deepcopying. The copy.deepcopy() function is already smart enough to do the right thing when the type provides a __reduce__() method for pickling. Raymond From mfb at lotusland.dyndns.org Sat Nov 22 18:27:24 2003 From: mfb at lotusland.dyndns.org (Matthew F. Barnes) Date: Sat Nov 22 18:27:30 2003 Subject: [Python-Dev] Extending struct.unpack to produce nested tuples Message-ID: <33671.192.168.1.101.1069543644.squirrel@server.lotusland.dyndns.org> I posted this to c.l.py the other day but didn't get any replies, so I thought I might see how it fares on python-dev. It's just an idea I had earlier this week. I'll attempt a patch if the response is positive. --- I was wondering if there would be any interest in extending the struct.unpack format notation to be able to express groups of data with parenthesis. For example: >>> data = struct.pack('iiii', 1, 2, 3, 4) >>> struct.unpack('i(ii)i', data) # Note the parentheses (1, (2, 3), 4) Use Case: I have a program written in C that contains a bunch of aggregate data structures (arrays of structs, structs containing arrays, etc.) and I'm transmitting these structures over a socket connection to a Python program that then unpacks the data using the struct module. Problem is that I have to unpack the incoming data as a flat sequence of data elements, and then repartition the sequence into nested sequences to better reflect how the data is structured in the C program. It would be more convenient to express these groupings as I'm unpacking the raw data. I'm sure there are plenty of other use cases for such a feature. Matthew Barnes From guido at python.org Sat Nov 22 18:38:49 2003 From: guido at python.org (Guido van Rossum) Date: Sat Nov 22 18:37:26 2003 Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong In-Reply-To: Your message of "Fri, 21 Nov 2003 03:45:25 +0100." <3FBD7C45.3020607@tismer.com> References: <3FB99A6E.5070000@tismer.com> <200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net> <3FBAC6E4.2020202@tismer.com> <200311190133.hAJ1X4J13394@c-24-5-183-134.client.comcast.net> <3FBACC4F.7090404@tismer.com> <200311190507.hAJ575213691@c-24-5-183-134.client.comcast.net> <3FBC3296.1090004@tismer.com> <200311200618.hAK6Ikv23729@c-24-5-183-134.client.comcast.net> <3FBD7C45.3020607@tismer.com> Message-ID: <200311222338.hAMNcnG03504@c-24-5-183-134.client.comcast.net> > Guido van Rossum wrote: > > Summary: Chistian is right after all. instancemethod_getattro should > > always prefer bound method attributes over function attributes. [Christian] > Guido, I'm very happy with your decision, which is most > probably a wise decision (without any relation to me). > > The point is, that I didn't know what's right or wrong, > so basically I was asking for advice on a thing I felt > unhappy with. So I asked you to re-think if the behavior > is really what you indented, or if you just stopped, early. > > Thanks a lot! > > That's the summary and all about it, you can skip the rest if you like. Note to python-dev folks: I will make the change in 2.4. I won't backport to 2.3 unless someone can really make a case for it; it *does* change behavior. [...] > > The *intention* was for the 2.2 version to have the same behavior: > > only im_func, im_self and im_class would be handled by the bound > > method, other attributes would be handled by the function object. > > Ooh, I begin to understand! > > > This is what the IsData test is attempting to do -- the im_* > > attributes are represented by data descriptors now. The __class__ > > attribute is also a data descriptor, so that C().x.__class__ gives us > > rather than . > > IsData is a test for having a write method, too, so we have > the side effect here that im_* works like I expect, since > they happen to be writable? > Well, I didn't look into 2.3 for this, but in 2.2 I get > > >>> a().x.__class__=42 > Traceback (most recent call last): > File "", line 1, in ? > TypeError: __class__ must be set to new-style class, not 'int' object > [9511 refs] > >>> > > which says for sure that this is a writable property, while > > >>> a().x.im_class=42 > Traceback (most recent call last): > File "", line 1, in ? > TypeError: readonly attribute > [9511 refs] > >>> > > seems to be handled differently. > > I only thought of IsData in terms of accessing the > getter/setter wrappers. It's all rather complicated. IsData only checks for the presence of a tp_descr_set method in the type struct. im_* happen to be implemented by a generic approach for defining data attributes which uses a descriptor type that has a tp_descr_set method, but its implementation looks for a "READONLY" flag. This is intentional -- in fact, having a tp_descr_set (or __set__) method that raises an error is the right way to create a read-only data attribute (at least for classes whose instances have a __dict__). [...] > I don't need to pickle classes, this works fine in most cases, > and behavior can be modified by users. Right. When you are pickling classes, you're really pickling code, not data, and that's usually not what pickling is used for. (Except in Zope 3, which can store code in the database and hence must pickle classes. But it's a lot of work, as Jeremy can testify. :-) > > (I wonder if the pickling code shouldn't try to call > > x.__class__.__reduce__(x) rather than x.__reduce__() -- then none of > > these problems would have occurred... :-) > > That sounds reasonable. Explicit would have been better than > implicit (by hoping for the expected bound chain). Especially since *internally* most new-style classes do this for all of the built-in operations (operations for which there is a function pointer slot in the type struct or one of its extensions). This is different from old-style classes: a classic *instance* can overload (nearly) any special method by having an instance attribute, e.g. __add__; but this is not supported for new-style instances. > __reduce__ as a class method would allow to explicitly spell > that I want to reduce the instance x of class C. > > x.__class__.__reduce__(x) > > While, in contrast > > x.__class__.__reduce__(x.thing) > > would spell that I want to reduce the "thing" property of the > x instance of C. > > While > > x.__class__.__reduce__(C.thing) # would be the same as > C.__reduce__(C.thing) > > which would reduce the class method "thing" of C, or the class > property of C, or whatsoever of class C. You've lost me here. How does x.__class__.__reduce__ (i.e., C.__reduce__) tell the difference between x and x.thing and C.thing??? > I could envision a small extension to the __reduce__ protocol, > by providing an optional parameter, which would open these > new ways, and all pickling questions could be solved, probably. > This is so, since we can find out whether __reduce__ is a class > method or not. > If it is just an instance method (implictly bound), it behaves as > today. > If it is a class method, is takes a parameter, and then it can find > out whether to pickle a class, instance, class property or an instance > property. > > Well, I hope. The above was said while being in bed with 39° Celsius, > so don't put my words on the assay-balance. I sure don't understand it. If you really want this, please sit down without a fever and explain it with more examples and a clarification of what you want to change, and how. [...] > Until now, I only had to change traceback.c and iterator.c, since > these don't export enough of their structures to patch things > from outside. If at some point somebody might decide that some of > this support code makes sense for the main distribution, things > should of couzrse move to where they belong. Do you realize that (in C code) you can always get at a type object if you can create an instance of it, and then you can patch the type object just fine? [...] > What I want to do at some time is to change cPickle to use > a non-recursive implementation. (Ironically, the Python pickle > engine *is* non-recursive, if it is run under Stackless). > So, if I would hack at cPickle at all, I would probably do the > big big change, and that would be too much to get done in > reasonable time. That's why I decided to stay small and just > chime a few __reduce__ thingies in, for the time being. > Maybe this was not the best way, I don't know. What's the reason for wanting to make cPickle non-recursive? [...] > Right. probably, I will get into trouble with pickling > unbound class methods. > Maybe I would just ignore this. Bound class methods do > appear in my Tasklet system and need to get pickled. > Unbound methods are much easier to avoid and probably > not worth the effort. (Yes, tomorrow I will be told > that it *is* :-) Unbound methods have the same implementation as bound methods -- they have the same type, but im_self is None (NULL at the C level). So you should be able to handle this easily. (Unbound methods are not quite the same as bare functions; the latter of course are pickled by reference, like classes.) [...] > That means, for Py 2.2 and 2.3, my current special case for > __reduce__ is exactly the way to go, since it doesn't change any > semantics but for __reduce__, and in 2.4 I just drop these > three lines? Perfect! Right. (I'm not quite ready for the 2.4 checkin, watch the checkins list though.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sat Nov 22 18:41:03 2003 From: guido at python.org (Guido van Rossum) Date: Sat Nov 22 18:39:35 2003 Subject: [Python-Dev] Extending struct.unpack to produce nested tuples In-Reply-To: Your message of "Sat, 22 Nov 2003 17:27:24 CST." <33671.192.168.1.101.1069543644.squirrel@server.lotusland.dyndns.org> References: <33671.192.168.1.101.1069543644.squirrel@server.lotusland.dyndns.org> Message-ID: <200311222341.hAMNf4u03532@c-24-5-183-134.client.comcast.net> > I was wondering if there would be any interest in extending the > struct.unpack format notation to be able to express groups of data > with parenthesis. > > For example: > > >>> data = struct.pack('iiii', 1, 2, 3, 4) > >>> struct.unpack('i(ii)i', data) # Note the parentheses > (1, (2, 3), 4) > > Use Case: I have a program written in C that contains a bunch of > aggregate data structures (arrays of structs, structs containing > arrays, etc.) and I'm transmitting these structures over a socket > connection to a Python program that then unpacks the data using the > struct module. Problem is that I have to unpack the incoming data as > a flat sequence of data elements, and then repartition the sequence > into nested sequences to better reflect how the data is structured in > the C program. It would be more convenient to express these groupings > as I'm unpacking the raw data. > > I'm sure there are plenty of other use cases for such a feature. This is a reasonable suggestion. You should also be able to write things like '4(ii)' which would be equivalent to '(ii)(ii)(ii)(ii)'. Please use SourceForge to upload a patch. Without a patch nobody is going to be interested though, I suspect, so don't wait for someone else to implement this. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at comcast.net Sat Nov 22 21:34:29 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Nov 22 21:34:38 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: Message-ID: [martin@v.loewis.de] > Sorry for causing so much confusion, and thanks to Tim for fixing it. It's OK, Martin! It was a wonderful example of a simple mistake getting misdiagnosed and so leading to further mistakes, until the whole was much more confusing than the sum of its parts. And, as always, the root cause was trying to cover up Unix bugs with C's preprocessor . From jeremy at alum.mit.edu Sat Nov 22 23:10:09 2003 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Sat Nov 22 23:13:03 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: <1069505993.2383.172.camel@anthem> References: <1069476389.22019.0.camel@localhost.localdomain> <1069477805.22019.2.camel@localhost.localdomain> <1069505993.2383.172.camel@anthem> Message-ID: <1069560608.22019.8.camel@localhost.localdomain> On Sat, 2003-11-22 at 07:59, Barry Warsaw wrote: > On Sat, 2003-11-22 at 00:10, Jeremy Hylton wrote: > > > > Did a cvs update about 30 minutes ago. make test reports no errors. > > > Running again with "-u all -r" to see what happens. > > > > Also looks good. This was with a RH9 system. > > Unfortunately, no so for me: > > test_mimetypes > test test_mimetypes failed -- Traceback (most recent call last): > File "/home/barry/projects/python23/Lib/test/test_mimetypes.py", line 52, in test_guess_all_types > eq(all, ['.bat', '.c', '.h', '.ksh', '.pl', '.txt']) > File "/home/barry/projects/python23/Lib/unittest.py", line 302, in failUnlessEqual > raise self.failureException, \ > AssertionError: ['.asc', '.bat', '.c', '.h', '.ksh', '.pl', '.txt'] != ['.bat', '.c', '.h', '.ksh', '.pl', '.txt'] > > But we've seen these before, right? Doesn't some test interfere with > globals in a way that screws mimetypes occasionally? Yes and yes. Use of mimetypes causes the module's init() function to be run on a set of known files. test_mimetypes calls init() after zapping the list of knownfiles. init() does not clear out existing global state before re-initializing, which is why the test fails if mimetypes has been used before test_mimetypes. Jeremy From tismer at tismer.com Sun Nov 23 00:33:48 2003 From: tismer at tismer.com (Christian Tismer) Date: Sun Nov 23 00:33:50 2003 Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong In-Reply-To: <200311222338.hAMNcnG03504@c-24-5-183-134.client.comcast.net> References: <3FB99A6E.5070000@tismer.com> <200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net> <3FBAC6E4.2020202@tismer.com> <200311190133.hAJ1X4J13394@c-24-5-183-134.client.comcast.net> <3FBACC4F.7090404@tismer.com> <200311190507.hAJ575213691@c-24-5-183-134.client.comcast.net> <3FBC3296.1090004@tismer.com> <200311200618.hAK6Ikv23729@c-24-5-183-134.client.comcast.net> <3FBD7C45.3020607@tismer.com> <200311222338.hAMNcnG03504@c-24-5-183-134.client.comcast.net> Message-ID: <3FC046BC.3030500@tismer.com> Hi Guido, >>Guido van Rossum wrote: >> >>>Summary: Chistian is right after all. instancemethod_getattro should >>>always prefer bound method attributes over function attributes. ... > Note to python-dev folks: I will make the change in 2.4. I won't > backport to 2.3 unless someone can really make a case for it; it > *does* change behavior. ... >>I only thought of IsData in terms of accessing the >>getter/setter wrappers. > > It's all rather complicated. IsData only checks for the presence of a > tp_descr_set method in the type struct. im_* happen to be implemented > by a generic approach for defining data attributes which uses a > descriptor type that has a tp_descr_set method, but its implementation > looks for a "READONLY" flag. This is intentional -- in fact, having a > tp_descr_set (or __set__) method that raises an error is the right way > to create a read-only data attribute (at least for classes whose > instances have a __dict__). Arghh! This is in fact harder than I was aware of. You *have* a setter, for its existance, although it won't set, for the readonly flag. Without criticism, you are for sure not finally happy with the solution, which sounds more like a working proof of concept than a real concept which you are happy to spread on the world. I'm better off to keep my hands off and not touch it now. > [...] > >>I don't need to pickle classes, this works fine in most cases, >>and behavior can be modified by users. > > Right. When you are pickling classes, you're really pickling code, > not data, and that's usually not what pickling is used for. (Except > in Zope 3, which can store code in the database and hence must pickle > classes. But it's a lot of work, as Jeremy can testify. :-) Heh! :-) You have not seen me pickling code, while pickling frames? All kind of frames (since Stackless has many more frame types), with running code attached, together with iterators, generators, the whole catastrophe.... >>>(I wonder if the pickling code shouldn't try to call >>>x.__class__.__reduce__(x) rather than x.__reduce__() -- then none of >>>these problems would have occurred... :-) >> >>That sounds reasonable. Explicit would have been better than >>implicit (by hoping for the expected bound chain). Having that said, without understanding what you meant. See below. > Especially since *internally* most new-style classes do this for all > of the built-in operations (operations for which there is a function > pointer slot in the type struct or one of its extensions). This is > different from old-style classes: a classic *instance* can overload > (nearly) any special method by having an instance attribute, > e.g. __add__; but this is not supported for new-style instances. > >>__reduce__ as a class method would allow to explicitly spell >>that I want to reduce the instance x of class C. >> >>x.__class__.__reduce__(x) >> >>While, in contrast >> >>x.__class__.__reduce__(x.thing) crap. crappedi crap. *I* was lost! ... > You've lost me here. How does x.__class__.__reduce__ (i.e., > C.__reduce__) tell the difference between x and x.thing and C.thing??? Nonsense. >>I could envision a small extension to the __reduce__ protocol, ... Nonsense. With 39? Celsius. > I sure don't understand it. If you really want this, please sit down > without a fever and explain it with more examples and a clarification > of what you want to change, and how. Reset() Revert() I got an email from Armin Rigo today, which clearly said what to do, and I did it. it works perfectly. I patches pickle.py and cPickle.c to do essentially what Armin said: """ So I'm just saying that pickle.py in wrong in just one place: reduce = getattr(obj, "__reduce__", None) if reduce: rv = reduce() should be: reduce = getattr(type(obj), "__reduce__", None) if reduce: rv = reduce(obj) """ An almost trivial change, although I also had to change copy.py, and overall I was unhappy since this extends my patch set to more than replacing python2x.dll, but I hope this will become an official patch and back-patch. [moo moo about patching almost all from outside, but iterators and tracebacks] > Do you realize that (in C code) you can always get at a type object if > you can create an instance of it, and then you can patch the type > object just fine? Sure I know that. What I hate is if I have to duplicate or change data structure declarations, if I can't access them, directly. For tracebacks, I had to add a field (one reason for the non-recursive wish, below). For iterobject.c, it was clumsy, since I had to extend the existing! method table, so I had to touch the source file, anyway. (Meanwhile, I see a different way to do it, but well, it is written...) ... > What's the reason for wanting to make cPickle non-recursive? Several reasons. For one, the same reason why I started arguing about deeply recursive destruction code, and implemented the initial elevator destructor, you remember. (trashcan) Same reason. When __del__ crashes, cPickle will crash as well. Now that I *can* pickle tracebacks and very deep recursions, I don't want them to crash. Several people asked on the main list, how to pickle deeply nested structures without crashing pickle. Well, my general answer was to rewrite pickle in a non-recursive manner. On the other hand, my implementation for tracebacks and tasklets (with large chains of frames attached) was different: In order to avoid cPickle's shortcomings of recursion, I made the tasklets produce a *list* of all related frames, instead of having them refer to each other via f_back. I did the same for tracebacks, by making the leading traceback object special, to produce a *list* of all other traceback objects in the chain. Armin once said, "rewrite the pickle code", which I'd happily do, but I do think, the above layout changes are not that bad, anyway. WHile frame chains and traceback chains are looking somewhat recursive, they aren't really. I think, they are lists/tuples by nature, and pickling them as that not only makes the result of __reduce__ more readable and usable, but the pickle is also a bit shorter than that of a deeply nested structure. >>Right. probably, I will get into trouble with pickling >>unbound class methods. I'm Wrong! It worked, immediately, after I understood how. > Unbound methods have the same implementation as bound methods -- they > have the same type, but im_self is None (NULL at the C level). So you > should be able to handle this easily. (Unbound methods are not quite > the same as bare functions; the latter of course are pickled by > reference, like classes.) Yes, here we go: It was a cake walk: static PyObject * method_reduce(PyObject * m) { PyObject *tup, *name, *self_or_class; name = PyObject_GetAttrString(m, "__name__"); if (name == NULL) return NULL; self_or_class = PyMethod_GET_SELF(m); if (self_or_class == NULL) self_or_class = PyMethod_GET_CLASS(m); if (self_or_class == NULL) self_or_class = Py_None; tup = Py_BuildValue("(O(OS))", &PyMethod_Type, self_or_class, name); Py_DECREF(name); return tup; } Works perfectly. The unpickler code later does nothing at all but do the existing lookup machinery do the work. here an excerpt: if (!PyArg_ParseTuple (args, "OS", &inst, &methname)) return NULL; /* let the lookup machinery do all the work */ ret = PyObject_GetAttr(inst, methname); Perfect, whether inst is a class or an instance, it works. >>That means, for Py 2.2 and 2.3, my current special case for >>__reduce__ is exactly the way to go, since it doesn't change any >>semantics but for __reduce__, and in 2.4 I just drop these >>three lines? Perfect! Dropped it, dropped it! Yay! > Right. (I'm not quite ready for the 2.4 checkin, watch the checkins > list though.) Well, after Armin's input, I dropped my special case, and instead I will submit a patch for 2.2 and 2.3, which uses your proposed way to use __reduce__ from pickle and copy. This is completely compatible and does what we want! ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From pf_moore at yahoo.co.uk Sun Nov 23 07:24:44 2003 From: pf_moore at yahoo.co.uk (Paul Moore) Date: Sun Nov 23 07:24:39 2003 Subject: [Python-Dev] Re: Thesis ideas list References: <3FBFE987.2050203@ocf.berkeley.edu> Message-ID: "Brett C." writes: > Deterministic Finalization > -------------------------- FWIW, the Parrot developers are (or have been) struggling with this issue. Specifically, how to do deterministic finalization in the presence of full (non-refcounting) GC. If you're interested in this, the parrot dev archives may be worth a look... Paul. -- This signature intentionally left blank From skip at manatee.mojam.com Sun Nov 23 08:00:47 2003 From: skip at manatee.mojam.com (Skip Montanaro) Date: Sun Nov 23 08:01:01 2003 Subject: [Python-Dev] Weekly Python Bug/Patch Summary Message-ID: <200311231300.hAND0l7Z005619@manatee.mojam.com> Bug/Patch Summary ----------------- 574 open / 4359 total bugs (+43) 193 open / 2459 total patches (+11) New Bugs -------- inconsistent popen[2-4]() docs (2003-11-16) http://python.org/sf/843293 help(obj) should use __doc__ when available (2003-11-16) http://python.org/sf/843385 tkFileDialog.Open is broken (2003-11-17) http://python.org/sf/843999 "up" instead of "down" in turtle module documentation (2003-11-17) http://python.org/sf/844123 urllib2 fails its builtin test (2003-11-18) http://python.org/sf/844336 codecs.open().readlines(sizehint) bug (2003-11-18) http://python.org/sf/844561 PackageManager: deselect show hidden: indexerror (2003-11-18) http://python.org/sf/844676 os.exec* and first 'arg' (2003-11-19) http://python.org/sf/845342 imaplib: traceback from _checkquote with empty string (2003-11-19) http://python.org/sf/845560 Python crashes when __init__.py is a directory. (2003-11-20) http://python.org/sf/845802 os.chmod does not work with a unicode filename (2003-11-20) http://python.org/sf/846133 error in python's grammar (2003-11-21) http://python.org/sf/846521 "and" operator tests the first argument twice (2003-11-21) http://python.org/sf/846564 control-c is being sent to child thread rather than main (2003-11-21) http://python.org/sf/846817 email.Parser.Parser doesn't check for valid Content-Type (2003-11-21) http://python.org/sf/846938 datetime.datetime initialization needs more strict checking (2003-11-21) http://python.org/sf/847019 NotImplemented return value misinterpreted in new classes (2003-11-21) http://python.org/sf/847024 textwrap ignoring fix_sentence_endings for single lines (2003-11-22) http://python.org/sf/847346 New Patches ----------- socketmodule.c: fix for platforms w/o IPV6 (i.e.Solaris 5.7) (2003-11-19) http://python.org/sf/845306 Check for signals during regular expression matches (2003-11-20) http://python.org/sf/846388 fix for bug #812325 (tarfile violates bufsize) (2003-11-21) http://python.org/sf/846659 Closed Bugs ----------- IDE Preferences (2002-09-11) http://python.org/sf/607816 Support RFC 2111 in email package (2002-10-21) http://python.org/sf/626452 RFC 2112 in email package (2002-11-06) http://python.org/sf/634412 elisp: IM-python menu and newline in function defs (2003-03-21) http://python.org/sf/707707 Problem With email.MIMEText Package (2003-05-12) http://python.org/sf/736407 test zipimport fails (2003-07-03) http://python.org/sf/765456 IDE defaults to Mac linefeeds (2003-08-04) http://python.org/sf/782686 email bug with message/rfc822 (2003-08-24) http://python.org/sf/794458 email.Message param parsing problem II (2003-08-25) http://python.org/sf/795081 plat-mac/applesingle.py needs cosmetic changes (2003-09-09) http://python.org/sf/803498 _tkinter compilation fails (2003-09-12) http://python.org/sf/805200 RedHat 9 blows up at dlclose of pyexpat.so (2003-09-29) http://python.org/sf/814726 bug with ill-formed rfc822 attachments (2003-09-30) http://python.org/sf/815563 Missing import in email example (2003-10-01) http://python.org/sf/816344 exception with Message.get_filename() (2003-10-15) http://python.org/sf/824417 bad value of INSTSONAME in Makefile (2003-10-15) http://python.org/sf/824565 email/Generator.py: Incorrect header output (2003-10-20) http://python.org/sf/826756 httplib hardcodes Accept-Encoding (2003-10-28) http://python.org/sf/831747 email generator can give bad output (2003-11-04) http://python.org/sf/836293 Bug in type's GC handling causes segfaults (2003-11-10) http://python.org/sf/839548 weakref callbacks and gc corrupt memory (2003-11-12) http://python.org/sf/840829 Windows mis-installs to network drive (2003-11-14) http://python.org/sf/842629 Closed Patches -------------- 755617: better docs for os.chmod (2003-06-16) http://python.org/sf/755677 startup file compiler flags (2003-08-24) http://python.org/sf/794400 Build changes for AIX (2003-11-05) http://python.org/sf/836434 One more patch for --enable-shared (2003-11-13) http://python.org/sf/841807 NameError in the example of sets module (2003-11-15) http://python.org/sf/842994 doc fixes builtin super and string.replace (2003-11-16) http://python.org/sf/843088 From barry at python.org Sun Nov 23 11:20:18 2003 From: barry at python.org (Barry Warsaw) Date: Sun Nov 23 11:20:29 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: References: Message-ID: <1069604417.28025.9.camel@anthem> On Sat, 2003-11-22 at 14:31, Tim Peters wrote: > googling on test_guess_all_types nails it: > > http://mail.python.org/pipermail/python-dev/2003-September/038264.html > > Jeff Epler reported there, in a reply to you about the same thing in 2.3.1, > that test_urllib2 interferes with test_mimetypes (when run in that order), > and included a patch claimed to fix it. Of course, since he didn't put the > patch on SF, it just got lost. Ah yes, thanks for the memory jog. I applied (essentially) the set suggestion to the test. -Barry From bac at OCF.Berkeley.EDU Sun Nov 23 17:40:03 2003 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Sun Nov 23 17:40:13 2003 Subject: [Python-Dev] PEP for removal of string module? Message-ID: <3FC13743.2070209@ocf.berkeley.edu> As I was writing the Summary, I noticed that the discussion of how to handle the removal of the string module got a little complicated thanks to how to deal with stuff that is different between str and unicode. There was no explicit (i.e., patch) resolution to the whole thing. Does this warrant a PEP to work out the details? Now I am not explicitly volunteering to write one since I am no Unicode or locale expert and that seems to be the sticking point. But if one is needed and no one steps forward I guess I could (will have to wait until after generator expressions get implemented, though, since I am already committed to working on that). -Brett From bac at OCF.Berkeley.EDU Sun Nov 23 18:48:18 2003 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Sun Nov 23 18:48:30 2003 Subject: [Python-Dev] python-dev Summary for 10-16-2003 through 11-15-2003 [draft] Message-ID: <3FC14742.50201@ocf.berkeley.edu> Thanks to school I didn't get to the latter half of October summary until I needed to start worrying about the first summary for November. So I just combined them. I am hoping to send this summary out Wednesday or Thursday so as to not worry about it beyond Thanksgiving morning. So please try to get your corrections and comments in by then. Thanks. ------------------------------- python-dev Summary for 2003-10-16 through 2003-11-15 ++++++++++++++++++++++++++++++++++++++++++++++++++++ This is a summary of traffic on the `python-dev mailing list`_ from October 16, 2003 through November 15, 2003. It is intended to inform the wider Python community of on-going developments on the list. To comment on anything mentioned here, just post to `comp.lang.python`_ (or email python-list@python.org which is a gateway to the newsgroup) with a subject line mentioning what you are discussing. All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on something. And if all of this really interests you then get involved and join `python-dev`_! This is the twenty-eighth and twenty-ninth summaries written by Brett Cannon (does anyone even read this?). All summaries are archived at http://www.python.org/dev/summary/ . Please note that this summary is written using reStructuredText_ which can be found at http://docutils.sf.net/rst.html . Any unfamiliar punctuation is probably markup for reST_ (otherwise it is probably regular expression syntax or a typo =); you can safely ignore it, although I suggest learning reST; it's simple and is accepted for `PEP markup`_ and gives some perks for the HTML output. Also, because of the wonders of programs that like to reformat text, I cannot guarantee you will be able to run the text version of this summary through Docutils_ as-is unless it is from the original text file. .. _PEP Markup: http://www.python.org/peps/pep-0012.html The in-development version of the documentation for Python can be found at http://www.python.org/dev/doc/devel/ and should be used when looking up any documentation on something mentioned here. PEPs (Python Enhancement Proposals) are located at http://www.python.org/peps/ . To view files in the Python CVS online, go to http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/ . Reported bugs and suggested patches can be found at the SourceForge_ project page. .. _python-dev: http://www.python.org/dev/ .. _SourceForge: http://sourceforge.net/tracker/?group_id=5470 .. _python-dev mailing list: http://mail.python.org/mailman/listinfo/python-dev .. _comp.lang.python: http://groups.google.com/groups?q=comp.lang.python .. _Docutils: http://docutils.sf.net/ .. _reST: .. _reStructuredText: http://docutils.sf.net/rst.html .. contents:: .. _last summary: http://www.python.org/dev/summary/2003-09-01_2003-09-15.html ===================== Summary Announcements ===================== Thanks to midterms and projects my time got eaten up by school. That postponed when I could work on the twenty-eighth summary so much that the twenty-ninth was need of being written. So they are combined in into one to just get the stuff out the door. The second half of October had some major discussions happen. Guido and Alex Martelli talking equals pain for me. =) There was a large discussion on scoping and accessing specific namespaces. Jeremy Hylton is working on a PEP on the subject so I am not going to stress myself over summarizing the topic. A big discussion on the first half of November was about weakrefs and shutdown. Tim Peters figured out the problem (had to do with weakrefs referencing things already gc'ed and thus throwing a fit when trying to gc them later or keeping an object alive because of the weakref). It was long and complicated, but the problem was solved. If you have ever wanted to see linked lists used in Python in a rather elegant way, take a look at Guido's implementation of itertools.tee at http://mail.python.org/pipermail/python-dev/2003-October/039593.html . Europython is going to be held from June 7-9, 2004 in Sweden. See http://mail.python.org/pipermail/europython/2003-November/003634.html for more details. PyCon is slowly moving along. The registration site is being put through QA and the paper submission system is being worked on. The Call for Proposals (CFP) is still on-going; details at http://www.python.org/pycon/dc2004/cfp.html . Keep an eye out for when we announce when the registration and paper submission systems go live. ========= Summaries ========= ------------------------------------------ How to help with the development of Python ------------------------------------------ In an attempt to make it easy as possible for people to find out how they can help contribute to Python's development, I wrote an essay on the topic (mentioned last month, but some revisions have been done). It covers how Python is developed and how **anyone** can contribute to the development process. The latest version can be found at http://mail.python.org/pipermail/python-dev/2003-October/039473.html . Any comments on the essay are appreciated. Contributing threads: - `Draft of an essay on Python development `__ - `2nd draft of "How Py is Developed" essay `__ ------------------------------------------------ Generator Expressions: list comp's older brother ------------------------------------------------ If you ever wanted to have the power of list comprehensions but without the overhead of generating the entire list you have Peter Norvig initially and then what seems like the rest of the world for generator expressions. `PEP 289`_ covers all the details, but here is a quick intro. You can think of generator expressions as list comprehensions that return an iterator for each item instead a list items. The syntax is practically the same as list comprehensions as well; just substitute parentheses for square brackets (most of the time; generator expressions just need parentheses around them, so being the only argument to a method takes care of the parentheses requirement). A quick example is:: (x for x in range(10) if x%2) returns an iterator that returns the odd numbers from 0 to 10. This make list comprehensions just syntactic sugar for passing a generator expression to list() (note how extra parentheses are not needed):: list(x for x in range(10) is x%2) Having list comprehensions defined this way also takes away the dangling item variable for the 'for' loop. Using that dangling variable is now discouraged and will be made illegal at some point. For complete details, read the PEP. .. _PEP 289: http://www.python.org/peps/pep-0289.html Contributing threads: - `decorate-sort-undecorate `__ - `accumulator display syntax `__ - `listcomps vs. for loops `__ - `PEP 289: Generator Expressions (second draft) `__ --------------------- list.sorted() is born --------------------- After the addition of the 'key' argument to list.sort(), people began to clamor for list.sort() to return self. Guido refused to do give in, so a compromise was reached. 'list' now has a class method named 'sorted'. Pass it a list and it will return a *copy* of that list sorted. Contributing threads: - `decorate-sort-undecorate `__ - `inline sort option `__ - `sort() return value `__ - `copysort patch `__ ------------------------------------ Recursion limit in re is now history ------------------------------------ Thanks to Gustavo Niemeyer the recursion limit in the re module has now be removed! Contributing threads: - `SRE recursion `__ - `SRE recursion removed `__ ----------------------------------- Copying iterators one day at a time ----------------------------------- Reiteration for iterators came up as part of the immense discussion on generator expressions. The difficulty of doing it generally came up. This lead to Alex Martelli proposing magic method support for __copy__ in iterators that have want to allow making a copy of itself. This was written down as `PEP 323`_. As an interim solution, itertools grew a new function: tee. It takes in an iterable and returns two iterators which independently iterate over the iterable. .. _PEP 323: http://www.python.org/peps/pep-0323.html Contributing threads: - `Reiterability `__ - `cloning iterators again `__ - `... python/nondist/peps pep-0323.txt, NONE ... `__ - `Guido's Magic Code `__ ------------------------------------------------------ Returning Py_(None, True, False) now easier than ever! ------------------------------------------------------ Py_RETURN_NONE, Py_RETURN_TRUE, and Py_RETURN_FALSE have been added to Python 2.4. They are macros for returning the singleton mentioned in the name. Documentation has yet to be written (my fault). Contributing threads: - `How to spell Py_return_None and friends `__ - `python/dist/src/Include object.h, 2.121, ... `__ ------------------------------------------------------------------------- 'String substitutions'/'dict interpolation'/'replace %(blah)s with $blah' ------------------------------------------------------------------------- The idea of introducing string substitutions using '$' came up. Guido said that if this was made a built-in feature it would have to wait until Python 3. He was receptive to moving the functionality to a module, though. Barry Warsaw pasted code into http://mail.python.org/pipermail/python-dev/2003-October/039369.html that handles string substitutions. Contributing threads: - `Can we please have a better dict interpolation syntax? `__ ------------------------------------------ "reduce() just doesn't get enough mileage" ------------------------------------------ That quote comes from Guido during the discussion over whether 'product' should be added as an accumulator function built-in like 'sum'. The idea was shot down and conversation quickly turned to whether 'reduce' should stay in the language (the consensus was "no" since the function does not read well and its functionality can easily be done with a 'for' loop). A larger discussion on what built-ins should eventually disappear will be covered in the next Summary. Contributing threads: - `product() `__ ----------- PyPy update ----------- The PyPy_ development group sent an update on their happenings to the list. Turns out they are trying to get funding from the European Union. They are also fairly close to getting a working version (albeit with some bootstrapping from CPython, but it will still be damn cool what they have pulled off even with this caveat). They also announced a sprint they are holding in Amsterdam from Dec. 14-21. More info can be found at http://codespeak.net/moin/pypy/moin.cgi/AmsterdamSprint . .. _PyPy: http://codespeak.net/pypy/ Contributing threads: - `PyPy: sprint and news `__ ---------------------------- Never say Python is finished ---------------------------- I asked python-dev for masters thesis ideas. I great number of possibilities were put out. If anyone out there is curious to see what some people would like to see done for Python in terms of a large project check the thread out. Contributing threads: - `Looking for master thesis ideas involving Python `__ --------------------------------- Rough draft of Decimal object PEP --------------------------------- Facundo Batista has posted a rough draft of a PEP for a decimal object that is being worked on in the sandbox. Comment on it on `comp.lang.python`_ if this interests you. Contributing threads: - `prePEP: Decimal data type `__ ---------------------------------------------------------- Relations of basestring and bye-bye operator.isMappingType ---------------------------------------------------------- The idea of introducing relatives of basestring for numbers came from Alex Martelli. That idea was shot down for not being needed once the merger of int and long occurs. The point that operator.isMappingType is kind of broken came up. Both Alex and Raymond Hettinger would not mind seeing it disappear. No one objected. It is still in CVS at the moment, but I would not count on it necessarily sticking around. Contributed threads: - `reflections on basestring -- and other abstract basetypes `__ - `operator.isMappingType `__ --------------------------------------------------------- Why one checks into the trunk before a maintenance branch --------------------------------------------------------- The question of whether checking a change into a maintenance branch before applying it to the main trunk was acceptable came up. The short answer is "no". Basically the trunk gets more testing than the maintenance branches and thus the patch should have to prove its stability first. Only then should it go into a maintenance branch. The same goes for changes to code that will eventually disappear in the trunk. Someone might be planning on removing some code, but if that person falls off the face of the earth the code will still be there. That means applying the patch to the code that is scheduled to disappear is still a good idea. Contributing threads: - `check-in policy, trunk vs maintenance branch `__ ----------------------- New reversed() built-in ----------------------- There was a new built-in named reversed(), and all rejoiced. Straight from the function's doc string: "reverse iterator over values of the sequence". `PEP 322`_ has the relevant details on this toy. .. _PEP 322: http://www.python.org/peps/pep-0322.html Contributing threads: - `PEP 322: Reverse Iteration `__ --------------------------- Cleaning the built-in house --------------------------- Guido asked what built-ins should be considered for deprecation. Instantly intern, coerce, and apply came up. apply already had a PendingDeprecationWarning and that will stay for the next release or two. intern and coerce, though, did not have any major champions (intern had some fans, but just for the functionality). Guido did state that none of these built-in will be removed any time soon. If they do get deprecated it does not lead to immediate removal. Python 3, though, takes the gloves off and that can see them just completely disappear. Contributing threads: - `Deprecating obsolete builtins `__ ---------------------------------------- Passing arguments to str.(encode|decode) ---------------------------------------- The idea of allowing keyword arguments be passed to any specified encoder/decoder was brought up by Raymond Hettinger. It seemed like an idea that was supported. The idea of specifying the encoder or decoder based on the actual object instead of the current way of specifying a string that is passed to the registered codec search functions was suggested. Nothing has been finalized on this idea as of now. Contributing threads: - `Optional arguments for str.encode /.decode `__ ------------------------------------------------------ Where, oh where, to move the good stuff out of string? ------------------------------------------------------ It looks like ascii_* and possibly maketrans from the string module will be tacked on to the str type so that the string module can finally be removed from the language. It has not been pronounced upon, but it looks like that is what the BDFL is leaning towards. Issues of using the methods of str as unbound methods did come up. As it stands you cannot pass a unicode object to str.upper and thus there is no one uppercasing function as there is in the string module. This issue brought up the problem of Unicode and locale ties and collation (how to sort things). Contributing threads: - `other "magic strings" issues `__ ----------------------------------------- Supported versions of Sleepycat for bsddb ----------------------------------------- The basic answer is 3.2 - 4.2 should work when you compile from source. Contributing threads: - `which sleepycat versions do we support in 2.3.* ? `__ ----------------------------- Sets now at blazing C speeds! ----------------------------- Raymond Hettinger implemented the sets API in C! The new built-ins are set (which replaces sets.Set) and frozenset (which replaces sets.ImmutableSet). The APIs are the same as the sets module sans the name change from ImmutableSet to frozenset. Contributing threads: - `set() and frozenset() `__ From greg at cosc.canterbury.ac.nz Sun Nov 23 18:56:41 2003 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Sun Nov 23 18:56:47 2003 Subject: [Python-Dev] Extending struct.unpack to produce nested tuples In-Reply-To: <33671.192.168.1.101.1069543644.squirrel@server.lotusland.dyndns.org> Message-ID: <200311232356.hANNufP24780@oma.cosc.canterbury.ac.nz> > Use Case: I have a program written in C that contains a bunch of > aggregate data structures (arrays of structs, structs containing > arrays, etc.) and I'm transmitting these structures over a socket > connection to a Python program that then unpacks the data using the > struct module. An alternative would be to teach the C program to write the data in pickle or marshal format... :-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From hunterp at fastmail.fm Sun Nov 23 19:27:20 2003 From: hunterp at fastmail.fm (Hunter Peress) Date: Sun Nov 23 19:27:27 2003 Subject: [Python-Dev] quick patch for better debugging Message-ID: <20031124002720.360534252A@server1.messagingengine.com> Both IndexError and KeyError dont report which object the retrieval failed on. Having this data would save lots of typing and annoyance. Eg: KeyError: 'jio' could look like KeyError: "Dictionary(some_name) has no key 'jio'" IndexError: list index out of range could look like: IndexError: list(some_name) index(some_value) out of range If this is ok, ill make a patch! ----- PS: And for those of you that think even more debugging info is needed, think no more, because I prodded enough a few months ago such that textmode cgitb is now in 2.3 tree. Try: import cgitb; cgitb.enable(format='text') make error here. -- Hunter Peress hunterp@fastmail.fm From python at rcn.com Sun Nov 23 19:27:29 2003 From: python at rcn.com (Raymond Hettinger) Date: Sun Nov 23 19:28:06 2003 Subject: [Python-Dev] python-dev Summary for 10-16-2003 through 11-15-2003[draft] In-Reply-To: <3FC14742.50201@ocf.berkeley.edu> Message-ID: <005c01c3b221$bf2d7c80$edb02c81@oemcomputer> > If you ever wanted to have the power of list comprehensions but without > the overhead of generating the entire list you have Peter Norvig > initially and then what seems like the rest of the world for generator > expressions. [possibly mangled sentence doesn't make sense] > After the addition of the 'key' argument to list.sort(), people began to > clamor for list.sort() to return self. Guido refused to do give in, so > a compromise was reached. 'list' now has a class method named 'sorted'. > Pass it a list and it will return a *copy* of that list sorted. [Add] What makes a class method so attractive is that the argument need not be a list, any iterable will do. The return value *is* of course a list. By returning a list instead of None, list.sorted() can be used as an expression instead of a statement. This makes it possible to use it as an argument in a function call or as the iterable in a for-loop:: # iterate over a dictionary sorted by key for key, value in list.sorted(mydict.iteritems()): > As an interim solution, itertools grew a new function: tee. It takes in > an iterable and returns two iterators which independently iterate over > the iterable. [replace] two [with] two or more > The point that operator.isMappingType is kind of broken came up. Both > Alex and Raymond Hettinger would not mind seeing it disappear. No one > objected. It is still in CVS at the moment, but I would not count on it > necessarily sticking around. ["It's not quite dead yet" ;-) Actually, there may be a way to partially fix-it so that it won't be totally useless]. > There was a new built-in named reversed(), and all rejoiced. [And much flogging of the person who proposed it] > Straight from the function's doc string: "reverse iterator over values > of the sequence". `PEP 322`_ has the relevant details on this toy. [Replace] toy [With] major technological innovation of the first order [Or just] builtin. > Sets now at blazing C speeds! [Looks like a certain parroteer will soon by eating pie!] Another fine summary. Thanks for the good work. Raymond From greg at cosc.canterbury.ac.nz Sun Nov 23 20:05:50 2003 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Sun Nov 23 20:06:20 2003 Subject: [Python-Dev] quick patch for better debugging In-Reply-To: <20031124002720.360534252A@server1.messagingengine.com> Message-ID: <200311240105.hAO15oD25037@oma.cosc.canterbury.ac.nz> Hunter Peress : > KeyError: 'jio' could look like KeyError: "Dictionary(some_name) ^^^^^^^^^ > has no key 'jio'" > IndexError: list(some_name) index(some_value) out of range ^^^^^^^^^ Where do you propose to get these names from? Lists and dictionaries don't have names... I agree with the general idea of providing some sort of identifying information, but in these cases I can't think what sort of information would be useful short of displaying the entire repr() of the object, which would be too much for a backtrace message, I think. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido at python.org Sun Nov 23 20:26:09 2003 From: guido at python.org (Guido van Rossum) Date: Sun Nov 23 20:24:39 2003 Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong In-Reply-To: Your message of "Sun, 23 Nov 2003 06:33:48 +0100." <3FC046BC.3030500@tismer.com> References: <3FB99A6E.5070000@tismer.com> <200311180604.hAI64Ku02457@c-24-5-183-134.client.comcast.net> <3FBAC6E4.2020202@tismer.com> <200311190133.hAJ1X4J13394@c-24-5-183-134.client.comcast.net> <3FBACC4F.7090404@tismer.com> <200311190507.hAJ575213691@c-24-5-183-134.client.comcast.net> <3FBC3296.1090004@tismer.com> <200311200618.hAK6Ikv23729@c-24-5-183-134.client.comcast.net> <3FBD7C45.3020607@tismer.com> <200311222338.hAMNcnG03504@c-24-5-183-134.client.comcast.net> <3FC046BC.3030500@tismer.com> Message-ID: <200311240126.hAO1Q9I01704@c-24-5-183-134.client.comcast.net> > Arghh! This is in fact harder than I was aware of. You *have* > a setter, for its existance, although it won't set, for > the readonly flag. > Without criticism, you are for sure not finally happy with the > solution, which sounds more like a working proof of concept > than a real concept which you are happy to spread on the world. > I'm better off to keep my hands off and not touch it now. Actually, I like it fine. There really are four categories: 0) not a descriptor 1) overridable descriptor (used for methods) 2a) read-only non-overridable descriptor (used for read-only data) 2b) writable non-overridable descriptor (used for writable data) Case (0) is recognized by not having __get__ at all. Case (1) has __get__ but not __set__. Cases (2a) and (2b) have __get__ and __set__; case (2a) has a __set__ that raises an exception. There are other (older) examples of __setattr__ implementations that always raise an exception. > I patches pickle.py and cPickle.c to do essentially what Armin said: > """ > So I'm just saying that pickle.py in wrong in just one place: > > reduce = getattr(obj, "__reduce__", None) > if reduce: > rv = reduce() > > should be: > > reduce = getattr(type(obj), "__reduce__", None) > if reduce: > rv = reduce(obj) > """ Right. (That's what I was trying to say, too. :-) > An almost trivial change, although I also had to change copy.py, > and overall I was unhappy since this extends my patch set to more > than replacing python2x.dll, but I hope this will become an > official patch and back-patch. Give it to me baby. (On SF. :-) > > What's the reason for wanting to make cPickle non-recursive? > > Several reasons. > For one, the same reason why I started arguing about deeply > recursive destruction code, and implemented the initial > elevator destructor, you remember. (trashcan) Yeah. Maybe I should get out of the serious language implementation business, because I still liked it better before. It may work, but it is incredibly ugly, and also had bugs for the longest time (and those bugs were a lot harder to track down than the bug it was trying to fix). With Tim's version I can live with it -- but I just don't like this kind of complexification of the implementation, even if it works better. > Same reason. When __del__ crashes, cPickle will crash as well. Please don't call it __del__. __del__ is a user-level finalization callback with very specific properties and problems. You were referring to tp_dealloc, which has different issues. > Now that I *can* pickle tracebacks and very deep recursions, > I don't want them to crash. > > Several people asked on the main list, how to pickle deeply > nested structures without crashing pickle. Well, my general > answer was to rewrite pickle in a non-recursive manner. I guess it's my anti-Scheme attitude. I just think the problem is in the deeply nested structures. There usually is a less nested data structure that doesn't have the problem. But I'll shut up, because this rant is not productive. :-( > On the other hand, my implementation for tracebacks and > tasklets (with large chains of frames attached) was different: > In order to avoid cPickle's shortcomings of recursion, I made > the tasklets produce a *list* of all related frames, instead of > having them refer to each other via f_back. > I did the same for tracebacks, by making the leading traceback > object special, to produce a *list* of all other traceback > objects in the chain. Hey, just what I said. :-) > Armin once said, "rewrite the pickle code", which I'd happily do, > but I do think, the above layout changes are not that bad, > anyway. WHile frame chains and traceback chains are looking > somewhat recursive, they aren't really. I think, they are > lists/tuples by nature, and pickling them as that not only makes > the result of __reduce__ more readable and usable, but the pickle > is also a bit shorter than that of a deeply nested structure. Well, unclear. Frame chains make sense as chains because they are reference-counted individually. > Well, after Armin's input, I dropped my special case, and instead > I will submit a patch for 2.2 and 2.3, which uses your proposed > way to use __reduce__ from pickle and copy. > This is completely compatible and does what we want! Wonderful! Please send me the SF issue, I don't subscribe to SF any more. (I've done my checkin in case you wondered.) --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at cosc.canterbury.ac.nz Sun Nov 23 20:32:31 2003 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Sun Nov 23 20:32:41 2003 Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong In-Reply-To: <200311240126.hAO1Q9I01704@c-24-5-183-134.client.comcast.net> Message-ID: <200311240132.hAO1WVL25240@oma.cosc.canterbury.ac.nz> Guido says: > I guess it's my anti-Scheme attitude. I just think the problem is in > the deeply nested structures. There usually is a less nested data > structure that doesn't have the problem. and then he says: > Well, unclear. Frame chains make sense as chains because they are > reference-counted individually. which surely goes to show that sometimes it *does* make sense to use a deeply nested structure? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tismer at tismer.com Sun Nov 23 21:05:55 2003 From: tismer at tismer.com (Christian Tismer) Date: Sun Nov 23 21:06:03 2003 Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong In-Reply-To: <200311240132.hAO1WVL25240@oma.cosc.canterbury.ac.nz> References: <200311240132.hAO1WVL25240@oma.cosc.canterbury.ac.nz> Message-ID: <3FC16783.6070304@tismer.com> Greg Ewing wrote: > Guido says: > > >>I guess it's my anti-Scheme attitude. I just think the problem is in >>the deeply nested structures. There usually is a less nested data >>structure that doesn't have the problem. > > > and then he says: > > >>Well, unclear. Frame chains make sense as chains because they are >>reference-counted individually. > > > which surely goes to show that sometimes it *does* make > sense to use a deeply nested structure? You might interpret him this way. But I don't think he had my implementation of frame chain pickling in mind, because he doesn't know it, and nobody but me probably has a working one. I'm pickling disjoint frame chains, and in my case, these are linked in both directions, via f_back, and via f_callee, for other reasons. There is no reason for nested pickling, just because of the caller/callee relationship. I agree there might be useful situations for deeply nested structures, but not this one. Instead, it would be asking for problems. ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From juo34n6y4tg at hotmail.com Mon Nov 24 06:11:20 2003 From: juo34n6y4tg at hotmail.com (Darrell Hurst) Date: Sun Nov 23 21:11:10 2003 Subject: [Python-Dev] Prescription MEDS. Valium, Xanax. ANYTHING and EVERYTHING ... h fxq fxbb Message-ID: Pharmacy Anxiety Specials: NEW - VAL1UM & XANAX sale! TOP SELLER'S! Lowest Prices w/ overnight delivery. No questions. Specials this week also include: -Amb1en -Phenterm1ne -Alprazolam -Lev1tra -Soma -Plus More Secure site for all products: http://www.shoprxtoday.biz/rxlist To see no more of our specials in the future: http://www.shoprxtoday.biz/a.html jdje e hjultoz oy ig faf uebyizo r tiufnruja u hajf bg wrl pfasxfndnaz jmuxx stfuf From guido at python.org Sun Nov 23 22:03:07 2003 From: guido at python.org (Guido van Rossum) Date: Sun Nov 23 22:01:37 2003 Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong In-Reply-To: Your message of "Mon, 24 Nov 2003 14:32:31 +1300." <200311240132.hAO1WVL25240@oma.cosc.canterbury.ac.nz> References: <200311240132.hAO1WVL25240@oma.cosc.canterbury.ac.nz> Message-ID: <200311240303.hAO337X01787@c-24-5-183-134.client.comcast.net> > Guido says: > > > I guess it's my anti-Scheme attitude. I just think the problem is in > > the deeply nested structures. There usually is a less nested data > > structure that doesn't have the problem. > > and then he says: > > > Well, unclear. Frame chains make sense as chains because they are > > reference-counted individually. > > which surely goes to show that sometimes it *does* make > sense to use a deeply nested structure? Well, without deeply nested data structures the stack wouldn't be that deep, would it? :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From anthony at interlink.com.au Sun Nov 23 22:42:53 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Sun Nov 23 22:43:24 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: Message-ID: <200311240342.hAO3gtwQ015914@localhost.localdomain> > I checked in all the changes I thought were necessary. But as the checkin > comment says, > > This needs fresh testing on all non-Win32 platforms ... > Running the standard test_re.py is an adequate test. > > So start testing, or (my recommendation) upgrade to Win32 . Works with GCC 3.3.2 and GCC 3.2.3 compiled versions of Python on Fedora Core 1. From edloper at gradient.cis.upenn.edu Mon Nov 24 00:27:38 2003 From: edloper at gradient.cis.upenn.edu (Edward Loper) Date: Sun Nov 23 23:26:06 2003 Subject: [Python-Dev] string-valued fget/fset/fdel for properties Message-ID: <3FC196CA.3000007@gradient.cis.upenn.edu> I was wondering if there would be any interest for adding a special case for properties with string-valued fget/fset/fdel: - if fget is a string, then the getter returns the value of the member variable with the given name. - if fset is a string, then the setter sets the value of the member variable with the given name. - if fdel is a string, then the deleter deletes the member variable with the given name. I.e., the following groups would be functionally equivalant: property(fget='_foo') property(fget=lambda self: self._foo) property(fget=lambda self: getattr(self, '_foo')) property(fset='_foo') property(fset=lambda self, value: setattr(self, '_foo', value)) property(fdel='_foo') property(fdel=lambda self: delattr(self, '_foo')) This change has 2 advantages: 1. It's easier to read. (In my opinion, anyway; what do other people think?) 2. It's faster: for properties whose fget/fset/fdel are strings, we can avoid a function call (since the changes are implemented in c). Preliminary tests indicate that this results in approximately a 3x speedup for a tight loop of attribute lookups. (It's unclear how much of a speed increase you'd get in actual code, though.) and one disadvantage (that I can think of): - It's one more special case to document/know. This change shouldn't break any existing code, because there's currently no reason to use string-valued fget/fset/fdel. Does this change seem useful to other people? Do the advantages outweigh the disadvantage? Or are there other disadvantage that I neglected to notice? If this seems like a useful addition, I'd be happy to work on making a patch that includes test cases & doc changes. -Edward From guido at python.org Sun Nov 23 23:34:03 2003 From: guido at python.org (Guido van Rossum) Date: Sun Nov 23 23:32:32 2003 Subject: [Python-Dev] Int FutureWarnings and other 2.4 TODOs Message-ID: <200311240434.hAO4Y4L06979@c-24-5-183-134.client.comcast.net> There's a bunch of FutureWarnings e.g. about 0xffffffff<<1 that promise they will disappear in Python 2.4. If anyone has time to fix these, I'd appreciate it. (It's not just a matter of removing the FutureWarnings -- you actually have to implement the promised future behavior. :-) I may get to these myself, but they're not exactly rocket science, so they might be a good thing for a beginning developer (use SF please if you'd like someone to review the changes first). Another -- much bigger -- TODO is to implement generator expressions (PEP 289). Raymond asked for help but I don't think he got any, unless it was offered through private email. Anyone interested? (Of course, I don't want any of this to interfere with the work to get 2.3.3 out in December.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun Nov 23 23:52:51 2003 From: guido at python.org (Guido van Rossum) Date: Sun Nov 23 23:51:20 2003 Subject: [Python-Dev] string-valued fget/fset/fdel for properties In-Reply-To: Your message of "Sun, 23 Nov 2003 23:27:38 CST." <3FC196CA.3000007@gradient.cis.upenn.edu> References: <3FC196CA.3000007@gradient.cis.upenn.edu> Message-ID: <200311240452.hAO4qps07024@c-24-5-183-134.client.comcast.net> > I was wondering if there would be any interest for adding a special > case for properties with string-valued fget/fset/fdel: > > - if fget is a string, then the getter returns the value of the > member variable with the given name. > - if fset is a string, then the setter sets the value of the > member variable with the given name. > - if fdel is a string, then the deleter deletes the member variable > with the given name. > > I.e., the following groups would be functionally equivalant: > > property(fget='_foo') > property(fget=lambda self: self._foo) > property(fget=lambda self: getattr(self, '_foo')) Why bother with the getattr() example? > property(fset='_foo') > property(fset=lambda self, value: setattr(self, '_foo', value)) Also of course (and IMO more readable): def _set_foo(self, value): self._foo = value property(fset=_set_foo) > property(fdel='_foo') > property(fdel=lambda self: delattr(self, '_foo')) (And similar here.) > This change has 2 advantages: > > 1. It's easier to read. (In my opinion, anyway; what do other > people think?) Only if you're used to the new syntax. Otherwise it could mean a costly excursion into the docs. > 2. It's faster: for properties whose fget/fset/fdel are strings, > we can avoid a function call (since the changes are implemented > in c). Preliminary tests indicate that this results in > approximately a 3x speedup for a tight loop of attribute > lookups. (It's unclear how much of a speed increase you'd get > in actual code, though.) Which makes me wonder if this argument has much value. > and one disadvantage (that I can think of): > > - It's one more special case to document/know. Right. It feels like a hack. > This change shouldn't break any existing code, because there's > currently no reason to use string-valued fget/fset/fdel. Correct. > Does this change seem useful to other people? Do the advantages > outweigh the disadvantage? Or are there other disadvantage that I > neglected to notice? If this seems like a useful addition, I'd be > happy to work on making a patch that includes test cases & doc > changes. It feels somewhat un-Pythonic to me: a special case that just happens to be useful to some folks. I want to be very careful in adding too many of those to the language, because it makes it harder to learn and makes it feel full of surprises for casual users. (I'm trying hard to avoid using the word "Perl" here. :-) I'm curious about the use case that makes you feel the need for speed. I would expect most properties not to simply redirect to another attribute, but to add at least *some* checking or other calculation. I'd be more in favor if you used a separate "renamed" property: foo = renamed("_foo") being a shortcut for def _get_foo(self): return self._foo def _set_foo(self, value): self._foo = value def _del_foo(self): del self._foo foo = property(_get_foo, _set_foo, _del_foo) but I've got a suspicion you want to combine some string argument (most likely for fget) with some function argument. --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at cosc.canterbury.ac.nz Mon Nov 24 00:28:07 2003 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon Nov 24 00:28:14 2003 Subject: [Python-Dev] string-valued fget/fset/fdel for properties In-Reply-To: <200311240452.hAO4qps07024@c-24-5-183-134.client.comcast.net> Message-ID: <200311240528.hAO5S7V26159@oma.cosc.canterbury.ac.nz> Guido: > I'm curious about the use case that makes you feel the need for speed. > I would expect most properties not to simply redirect to another > attribute, but to add at least *some* checking or other calculation. I suspect he's thinking of cases where you only want to wrap special behaviour around *some* of the accessors, e.g. you want writing to a property to be mediated by a function, but reading it can just be a normal attribute access. Currently, you're forced to pay the price of a function call for both reading and writing in this case. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From edloper at gradient.cis.upenn.edu Mon Nov 24 04:35:33 2003 From: edloper at gradient.cis.upenn.edu (Edward Loper) Date: Mon Nov 24 03:34:05 2003 Subject: [Python-Dev] string-valued fget/fset/fdel for properties In-Reply-To: <200311240452.hAO4qps07024@c-24-5-183-134.client.comcast.net> References: <3FC196CA.3000007@gradient.cis.upenn.edu> <200311240452.hAO4qps07024@c-24-5-183-134.client.comcast.net> Message-ID: <3FC1D0E5.5000907@gradient.cis.upenn.edu> Guido van Rossum wrote: >> 1. It's easier to read. (In my opinion, anyway; what do other >> people think?) > > Only if you're used to the new syntax. Otherwise it could mean a > costly excursion into the docs. > [...] >> - It's one more special case to document/know. > > Right. It feels like a hack. To me it seems like the "obvious" behavior for a string fget/fset/fdel, but if it's not universally obvious than you're proably right that it's a bad idea to add it. > but I've got a suspicion you want to combine some string argument > (most likely for fget) with some function argument. Yes, the idea was that some properties only redirect on read, or only on write; and that the syntax could be made "cleaner" for those cases. > I'm curious about the use case that makes you feel the need for speed. > I would expect most properties not to simply redirect to another > attribute, but to add at least *some* checking or other calculation. The primary motivation was actually to make the code "easier to read"; the speed boost was an added bonus. (Though not a trivial one -- I do have a good number of fairly tight loops that access properties.) The use case that inspired the idea is defining read-only properties for immutable objects. But I guess I would be better off going with wrapper functions that create the read-only properties for me (like ). Thanks for the feedback! -Edward From fincher.8 at osu.edu Mon Nov 24 06:12:36 2003 From: fincher.8 at osu.edu (Jeremy Fincher) Date: Mon Nov 24 05:14:52 2003 Subject: [Python-Dev] quick patch for better debugging In-Reply-To: <200311240105.hAO15oD25037@oma.cosc.canterbury.ac.nz> References: <200311240105.hAO15oD25037@oma.cosc.canterbury.ac.nz> Message-ID: <200311240612.36215.fincher.8@osu.edu> On Sunday 23 November 2003 08:05 pm, Greg Ewing wrote: > I agree with the general idea of providing some sort of > identifying information, but in these cases I can't think > what sort of information would be useful short of displaying > the entire repr() of the object, which would be too much for > a backtrace message, I think. You don't have to include the offending index/key in the __str__ of the exception itself. Even if it was just available in the exception's args tuple, or even as an attribute on the exception object, it'd still be highly useful as a debugging tool. I've wished for this myself on several occasions. Jeremy From mwh at python.net Mon Nov 24 05:53:42 2003 From: mwh at python.net (Michael Hudson) Date: Mon Nov 24 05:53:49 2003 Subject: [Python-Dev] Time for 2.3.3? In-Reply-To: <200311220739.hAM7dZ7n016749@localhost.localdomain> (Anthony Baxter's message of "Sat, 22 Nov 2003 18:39:35 +1100") References: <200311220739.hAM7dZ7n016749@localhost.localdomain> Message-ID: <2mvfpam409.fsf@starship.python.net> Anthony Baxter writes: >>>> Michael Hudson wrote >> We should give the new autoconf a go, at least. > > I would strongly prefer to do this sooner than later, so I was thinking > of doing the upgrade sometime this week. Does anyone have/know any > reasons to not upgrade to the newer autoconf? Well, there was an almost instananeous brown-paper-bag 2.59 release, but I don't know of any problems with 2.59. Not sure I would if there were, mind. > It should fix a bunch of build annoyances (and I can get rid of > aclocal.m4) That's the motivation :-) Cheers, mwh -- Make this IDLE version 0.8. (We have to skip 0.7 because that was a CNRI release in a corner of the basement of a government building on a planet circling Aldebaran.) -- Guido Van Rossum, in a checkin comment From mwh at python.net Mon Nov 24 07:04:19 2003 From: mwh at python.net (Michael Hudson) Date: Mon Nov 24 07:04:23 2003 Subject: [Python-Dev] Thesis ideas list In-Reply-To: <3FBFE987.2050203@ocf.berkeley.edu> (Brett C.'s message of "Sat, 22 Nov 2003 14:56:07 -0800") References: <3FBFE987.2050203@ocf.berkeley.edu> Message-ID: <2m65ham0qk.fsf@starship.python.net> "Brett C." writes: > Restricted execution > -------------------- > from Andrew Bennett (private email) > > See the python-dev archives and Summaries for more painful details. I think this would be a good choice, actually. Probably fairly hard... > Tail Recursion > -------------- > from Me (my brain) > > Have proper tail recursion in Python. Would require identifying where > a direct function call is returned (could keep it simple and just do > it where CALL_FUNCTION and RETURN bytecodes are in a row). Also have > to deal with exception catching since that requires the frame to stay > alive to handle the exception. > > But getting it to work well could help with memory and > performance. Don't know if it has been done for a language that had > exception handling. How is this different from stackless? Cheers, mwh -- QNX... the OS that walks like a duck, quacks like a duck, but is, in fact, a platypus. ... the adventures of porting duck software to the platypus were avoidable this time. -- Chris Klein, alt.sysadmin.recovery From guido at python.org Mon Nov 24 10:32:34 2003 From: guido at python.org (Guido van Rossum) Date: Mon Nov 24 10:32:49 2003 Subject: [Python-Dev] Tail recursion In-Reply-To: Your message of "Mon, 24 Nov 2003 12:04:19 GMT." <2m65ham0qk.fsf@starship.python.net> References: <3FBFE987.2050203@ocf.berkeley.edu> <2m65ham0qk.fsf@starship.python.net> Message-ID: <200311241532.hAOFWYV09067@c-24-5-183-134.client.comcast.net> > > Tail Recursion > > -------------- > > from Me (my brain) > > > > Have proper tail recursion in Python. Would require identifying where > > a direct function call is returned (could keep it simple and just do > > it where CALL_FUNCTION and RETURN bytecodes are in a row). Also have > > to deal with exception catching since that requires the frame to stay > > alive to handle the exception. > > > > But getting it to work well could help with memory and > > performance. Don't know if it has been done for a language that had > > exception handling. > > How is this different from stackless? AFAIK Stackless only curtails the *C* stack, not the chain of Python frames on the heap. But I have a problem with tail recursion. It's generally requested by new converts from the Scheme/Lisp or functional programming world, and it usually means they haven't figured out yet how to write code without using recursion for everything yet. IOW I'm doubtful on how much of a difference it would make for real Python programs (which, simplifying a bit, tend to use loops instead of recursion). And also note that even if an exception is not caught, you'd like to see all stack frames listed when the traceback is printed or when the debugger is invoked. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Nov 24 10:37:03 2003 From: guido at python.org (Guido van Rossum) Date: Mon Nov 24 10:37:13 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/idlelib NEWS.txt, 1.25, 1.26 idlever.py, 1.15, 1.16 In-Reply-To: Your message of "Mon, 24 Nov 2003 07:29:25 EST." <20031124122925.GA13677@rogue.amk.ca> References: <20031124122925.GA13677@rogue.amk.ca> Message-ID: <200311241537.hAOFb3w09106@c-24-5-183-134.client.comcast.net> > On Sun, Nov 23, 2003 at 07:23:18PM -0800, kbk@users.sourceforge.net wrote: > > + - IDLE now does not fail to save the file anymore if the Tk buffer is not a > > + Unicode string, yet eol_convention is. Python Bugs 774680, 788378 > > The above sentence is unfinished. Not if you assume that the omitted part is "a Unicode string", which I think was intended. --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh at python.net Mon Nov 24 10:55:49 2003 From: mwh at python.net (Michael Hudson) Date: Mon Nov 24 10:55:53 2003 Subject: [Python-Dev] Tail recursion In-Reply-To: <200311241532.hAOFWYV09067@c-24-5-183-134.client.comcast.net> (Guido van Rossum's message of "Mon, 24 Nov 2003 07:32:34 -0800") References: <3FBFE987.2050203@ocf.berkeley.edu> <2m65ham0qk.fsf@starship.python.net> <200311241532.hAOFWYV09067@c-24-5-183-134.client.comcast.net> Message-ID: <2mr7zxlq0q.fsf@starship.python.net> Guido van Rossum writes: >> > Tail Recursion >> > -------------- >> > from Me (my brain) >> > >> > Have proper tail recursion in Python. Would require identifying where >> > a direct function call is returned (could keep it simple and just do >> > it where CALL_FUNCTION and RETURN bytecodes are in a row). Also have >> > to deal with exception catching since that requires the frame to stay >> > alive to handle the exception. >> > >> > But getting it to work well could help with memory and >> > performance. Don't know if it has been done for a language that had >> > exception handling. >> >> How is this different from stackless? > > AFAIK Stackless only curtails the *C* stack, not the chain of Python > frames on the heap. Oh, I see. Yes. > But I have a problem with tail recursion. It's generally requested by > new converts from the Scheme/Lisp or functional programming world, and > it usually means they haven't figured out yet how to write code > without using recursion for everything yet. IOW I'm doubtful on how > much of a difference it would make for real Python programs (which, > simplifying a bit, tend to use loops instead of recursion). And also > note that even if an exception is not caught, you'd like to see all > stack frames listed when the traceback is printed or when the debugger > is invoked. Well, this was why I assumed you didn't really want to do the full-on tail-call-elimination thing :-) Cheers, mwh -- Need to Know is usually an interesting UK digest of things that happened last week or might happen next week. [...] This week, nothing happened, and we don't care. -- NTK Now, 2000-12-29, http://www.ntk.net/ From tismer at tismer.com Mon Nov 24 11:31:44 2003 From: tismer at tismer.com (Christian Tismer) Date: Mon Nov 24 11:31:54 2003 Subject: [Python-Dev] Tail recursion In-Reply-To: <200311241532.hAOFWYV09067@c-24-5-183-134.client.comcast.net> References: <3FBFE987.2050203@ocf.berkeley.edu> <2m65ham0qk.fsf@starship.python.net> <200311241532.hAOFWYV09067@c-24-5-183-134.client.comcast.net> Message-ID: <3FC23270.6090506@tismer.com> Guido van Rossum wrote: >>>Tail Recursion ... > AFAIK Stackless only curtails the *C* stack, not the chain of Python > frames on the heap. Yup. > But I have a problem with tail recursion. It's generally requested by > new converts from the Scheme/Lisp or functional programming world, and > it usually means they haven't figured out yet how to write code > without using recursion for everything yet. IOW I'm doubtful on how > much of a difference it would make for real Python programs (which, > simplifying a bit, tend to use loops instead of recursion). And also > note that even if an exception is not caught, you'd like to see all > stack frames listed when the traceback is printed or when the debugger > is invoked. Same here. I'm not for automatic tail recursion detection. A very simple approach, also pretty easy to implement would be a "jump" property, which would be added to a function. It would simply allow to run a different (or the same) function than the current one without returning. def sort3(a, b, c): if a>b: return sort3.jump(b, a, c) if b>c: return sort3.jump(a, c, b) return a, b, c -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From tdelaney at avaya.com Mon Nov 24 15:54:15 2003 From: tdelaney at avaya.com (Delaney, Timothy C (Timothy)) Date: Mon Nov 24 15:54:26 2003 Subject: [Python-Dev] Tail recursion Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DEEEC2C8@au3010avexu1.global.avaya.com> > From: Guido van Rossum > > But I have a problem with tail recursion. It's generally requested by > new converts from the Scheme/Lisp or functional programming world, and > it usually means they haven't figured out yet how to write code > without using recursion for everything yet. IOW I'm doubtful on how > much of a difference it would make for real Python programs (which, > simplifying a bit, tend to use loops instead of recursion). And also > note that even if an exception is not caught, you'd like to see all > stack frames listed when the traceback is printed or when the debugger > is invoked. However, that doesn't preclude it from being a thesis subject - in some ways it's actually a bonus as it encourages exploration as it's a direction that is *not* going to be explored by the language designer. It's possible that we could see some truly unexpected benefits come out of this - or there could be no benefits to Python whatsoever. However, from a purely academic point of view, I think it would be a quite reasonable thesis. It allows applying a well-explorered field of research to a new arena. Besides ... sometimes a recursive solution is truly beautiful. Although I think in many (most?) cases a loop on a generator is probably the most appropriate and elegant approach. Tim Delaney From martin at v.loewis.de Mon Nov 24 17:52:41 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Mon Nov 24 18:29:43 2003 Subject: [Python-Dev] PEP for removal of string module? In-Reply-To: <3FC13743.2070209@ocf.berkeley.edu> References: <3FC13743.2070209@ocf.berkeley.edu> Message-ID: "Brett C." writes: > As I was writing the Summary, I noticed that the discussion of how to > handle the removal of the string module got a little complicated > thanks to how to deal with stuff that is different between str and > unicode. There was no explicit (i.e., patch) resolution to the whole > thing. > > Does this warrant a PEP to work out the details? IMO, no. I'm personally not convinced that the removal of the string module is desirable. I doubt a PEP could change this attitude. Regards, Martin From nicodemus at esss.com.br Mon Nov 24 19:40:00 2003 From: nicodemus at esss.com.br (Nicodemus) Date: Mon Nov 24 18:40:29 2003 Subject: [Python-Dev] string-valued fget/fset/fdel for properties In-Reply-To: <3FC1D0E5.5000907@gradient.cis.upenn.edu> References: <3FC196CA.3000007@gradient.cis.upenn.edu> <200311240452.hAO4qps07024@c-24-5-183-134.client.comcast.net> <3FC1D0E5.5000907@gradient.cis.upenn.edu> Message-ID: <3FC2A4E0.4090707@esss.com.br> Hi everyone. My first post to the list, even thought I have been reading it for a long time now. 8) Edward Loper wrote: > Guido van Rossum wrote: > > I'm curious about the use case that makes you feel the need for speed. > > I would expect most properties not to simply redirect to another > > attribute, but to add at least *some* checking or other calculation. > > The primary motivation was actually to make the code "easier to read"; > the speed boost was an added bonus. (Though not a trivial one -- I do > have a good number of fairly tight loops that access properties.) The > use case that inspired the idea is defining read-only properties for > immutable objects. But I guess I would be better off going with > wrapper functions that create the read-only properties for me (like > ). Actually, this would introduce a nice feature: allow to easily subclass the functions that are part of the property, without the need to re-create the property in the subclass. class C(object): def get_foo(self): return 'C.foo' c = property('get_foo') class D(object): def get_foo(self): return 'D.foo' In the current behaviour, D would have to recreate the property, which can be cumbersome if you're only interested in overwriting one of the property's methods (which is the common case in my experience). But I don't agree with Edward that property should accept strings. I think they should just accept functions as of now, but don't store the actual function object, just it's name, and delay the name lookup until it is actually needed. What you guys think? Regards, Nicodemus. From greg at cosc.canterbury.ac.nz Mon Nov 24 19:05:16 2003 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon Nov 24 19:05:57 2003 Subject: [Python-Dev] string-valued fget/fset/fdel for properties In-Reply-To: <3FC2A4E0.4090707@esss.com.br> Message-ID: <200311250005.hAP05G203869@oma.cosc.canterbury.ac.nz> Nicodemus : > Actually, this would introduce a nice feature: allow to easily subclass > the functions that are part of the property, without the need to > re-create the property in the subclass. > > class C(object): > > def get_foo(self): > return 'C.foo' > > c = property('get_foo') Now *that* would be useful (it's slightly different from the original proposal, as I understood it). I wrote a function recently to create properties that work like that, and I'm finding it very useful. It would be great to have it as a standard feature, either as a part of the existing 'property' object, or an alternative one. > But I don't agree with Edward that property should accept strings. I > think they should just accept functions as of now, but don't store the > actual function object, just it's name, and delay the name lookup until > it is actually needed. No! If all that's being used is the name, then just pass the name. Anything else would be pointless and confusing. Plus it would allow the new behaviour to coexist with the current one: if it's a function, call it, and if it's a string, use it as a method name to look up. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Mon Nov 24 19:12:41 2003 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon Nov 24 19:12:48 2003 Subject: [Python-Dev] string-valued fget/fset/fdel for properties In-Reply-To: <3FC2A4E0.4090707@esss.com.br> Message-ID: <200311250012.hAP0CfE03877@oma.cosc.canterbury.ac.nz> I just thought of another small benefit - the property definition can precede the definitions of the methods which implement it, e.g. class C(object): c = property('get_foo') def get_foo(self): ... which is a more natural order to write things in. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido at python.org Mon Nov 24 19:20:22 2003 From: guido at python.org (Guido van Rossum) Date: Mon Nov 24 19:21:36 2003 Subject: [Python-Dev] string-valued fget/fset/fdel for properties In-Reply-To: Your message of "Tue, 25 Nov 2003 13:05:16 +1300." <200311250005.hAP05G203869@oma.cosc.canterbury.ac.nz> References: <200311250005.hAP05G203869@oma.cosc.canterbury.ac.nz> Message-ID: <200311250020.hAP0KMC09974@c-24-5-183-134.client.comcast.net> > Nicodemus : > > > Actually, this would introduce a nice feature: allow to easily subclass > > the functions that are part of the property, without the need to > > re-create the property in the subclass. > > > > class C(object): > > > > def get_foo(self): > > return 'C.foo' > > > > c = property('get_foo') [Greg Ewing] > Now *that* would be useful (it's slightly different from the > original proposal, as I understood it). > > I wrote a function recently to create properties that work > like that, and I'm finding it very useful. It would be > great to have it as a standard feature, either as a part > of the existing 'property' object, or an alternative one. > > > But I don't agree with Edward that property should accept > > strings. I think they should just accept functions as of now, but > > don't store the actual function object, just it's name, and delay > > the name lookup until it is actually needed. > > No! If all that's being used is the name, then just pass > the name. Anything else would be pointless and confusing. > > Plus it would allow the new behaviour to coexist with the > current one: if it's a function, call it, and if it's a > string, use it as a method name to look up. This alternate possibility is yet another argument against Edward's proposal. :-) But I think it can be done without using string literals: a metaclass could scan a class definition for new methods that override functions used by properties defined in base classes, and automatically create a new property. If you only want this behavior for selected properties, you can use a different class instead of 'property'. You could then also do away with the metaclass, but you'd be back at Nicodemus's proposal, and that seems to incur too much overhead (we could use heavy caching, but it would be a bit hairy). Anyway, all of this can be implemented easily by subclassign property or by defining your own descriptor class -- there's no magic, just define __get__ and __set__ (and __delete__ and __doc__, to be complete). So maybe somebody should implement this for themselves and find out how often they really use it. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Nov 24 19:22:57 2003 From: guido at python.org (Guido van Rossum) Date: Mon Nov 24 19:23:06 2003 Subject: [Python-Dev] string-valued fget/fset/fdel for properties In-Reply-To: Your message of "Tue, 25 Nov 2003 13:12:41 +1300." <200311250012.hAP0CfE03877@oma.cosc.canterbury.ac.nz> References: <200311250012.hAP0CfE03877@oma.cosc.canterbury.ac.nz> Message-ID: <200311250022.hAP0Mvl09993@c-24-5-183-134.client.comcast.net> > I just thought of another small benefit - the property > definition can precede the definitions of the methods > which implement it, e.g. > > class C(object): > > c = property('get_foo') > > def get_foo(self): > ... > > which is a more natural order to write things in. OTOH I hate seeing name references inside string quotes, because it complicates reference checking by tools like PyChecker (which would have to be told about the meaning of the arguments to property to check this kind of forward references). --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at cosc.canterbury.ac.nz Mon Nov 24 19:36:06 2003 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon Nov 24 19:36:20 2003 Subject: [Python-Dev] string-valued fget/fset/fdel for properties In-Reply-To: <200311250020.hAP0KMC09974@c-24-5-183-134.client.comcast.net> Message-ID: <200311250036.hAP0a6D03921@oma.cosc.canterbury.ac.nz> Guido: > So maybe somebody should implement this for themselves and find out > how often they really use it. As I said, I have implemented something very similar to this and I'm making extensive use of it in my current project, which is a re-working of my Python GUI library. The world will get a chance to see it soon, I hope... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Mon Nov 24 19:38:21 2003 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon Nov 24 19:38:29 2003 Subject: [Python-Dev] string-valued fget/fset/fdel for properties In-Reply-To: <200311250022.hAP0Mvl09993@c-24-5-183-134.client.comcast.net> Message-ID: <200311250038.hAP0cLc03924@oma.cosc.canterbury.ac.nz> Guido: > OTOH I hate seeing name references inside string quotes, because it > complicates reference checking by tools like PyChecker Oh, dear... you're going to like some of the other tricks I'm pulling in PyGUI even less, then... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From nicodemus at esss.com.br Mon Nov 24 21:39:39 2003 From: nicodemus at esss.com.br (Nicodemus) Date: Mon Nov 24 20:39:48 2003 Subject: [Python-Dev] string-valued fget/fset/fdel for properties In-Reply-To: <200311250020.hAP0KMC09974@c-24-5-183-134.client.comcast.net> References: <200311250005.hAP05G203869@oma.cosc.canterbury.ac.nz> <200311250020.hAP0KMC09974@c-24-5-183-134.client.comcast.net> Message-ID: <3FC2C0EB.2050104@esss.com.br> Guido van Rossum wrote: >You could then >also do away with the metaclass, but you'd be back at Nicodemus's >proposal, and that seems to incur too much overhead (we could use >heavy caching, but it would be a bit hairy). > > I think the overhead is very small, unless I'm overlooking something. The only extra overhead that I see is the extra lookup every time the property is accessed, which is the same as calling a method. But I agree that this difference could be significant for some applications. >Anyway, all of this can be implemented easily by subclassign property >or by defining your own descriptor class -- there's no magic, just >define __get__ and __set__ (and __delete__ and __doc__, to be >complete). > >So maybe somebody should implement this for themselves and find out >how often they really use it. > > Actually, I already did it. 8) The class accepts functions just like property does, but keeps only the names of the functions, and uses getattr in __get__ and __set__ to access the actual functions (nothing magical, as you pointed it out). I use it quite often, and the biggest advantage is that when you *do* need to overwrite one of the property's methods, you don't have to change anything in the base class: you just overwrite the method in the derived class and that's it. So as a rule, I always use this property instead of the built-in, but that's for other reasons besides easy subclassing. Regards, Nicodemus. From eppstein at ics.uci.edu Mon Nov 24 21:12:41 2003 From: eppstein at ics.uci.edu (David Eppstein) Date: Mon Nov 24 21:12:45 2003 Subject: [Python-Dev] Re: string-valued fget/fset/fdel for properties References: <200311250022.hAP0Mvl09993@c-24-5-183-134.client.comcast.net> <200311250038.hAP0cLc03924@oma.cosc.canterbury.ac.nz> Message-ID: In article <200311250038.hAP0cLc03924@oma.cosc.canterbury.ac.nz>, Greg Ewing wrote: > > OTOH I hate seeing name references inside string quotes, because it > > complicates reference checking by tools like PyChecker > > Oh, dear... you're going to like some of the other tricks > I'm pulling in PyGUI even less, then... Name references inside string quotes are also a standard part of PyObjC (used to represent an objective-C "selector" i.e. a method name that has not yet been bound to an object type). -- David Eppstein http://www.ics.uci.edu/~eppstein/ Univ. of California, Irvine, School of Information & Computer Science From raymond.hettinger at verizon.net Tue Nov 25 01:24:18 2003 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Tue Nov 25 01:24:55 2003 Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern() Message-ID: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer> After re-reading previous posts on the subject, I had an idea. Let's isolate these functions in the documentation into a separate section following the rest of the builtins. The cost of having these builtins is not that they take up a few entries in the directory listing. Also, it's no real burden to leave them in the code base. The real cost is that when learning the language, after reading the tutorial, the next step is to make sure you know what all the builtins do before moving on to study the library offerings. The problem with buffer() and intern() is not that they are totally useless. The problem is that it that it is darned difficult an everyday user to invent productive use cases. Here on python-dev, one defender arose for each and said that they once had a use for them. So, let's leave the functionality intact and just move it off the list of things you need to know. In both cases, it would have saved me some hours spent trying to figure out what they were good for - I wish someone had just said, "you can ignore these two". These functions are just distractors in a person's mental concept space. There's really nothing wrong with have apply() and coerce() being supported for old code. The problem with them is why bother even knowing that they exist - they just don't figure into modern python code. Any time spent learning them now is time that could have been spent learning about the copy or pickle modules or some such. Moving these functions to a separate section sends a clear message to trainers and book writers that it is okay to skip these topics. Getting them out of the critical path for learning python will make the language even easier to master. Some are highly resistant to deprecation because it makes their lives more difficult. However, I think even they would like a list of "things you just don't need to know anymore". In other words, you don't have to wait for Py3.0 for a clean house, just push all the clutter in a corner and walk around it. 'nuff said, Raymond Hettinger -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20031125/852c8ed0/attachment.html From guido at python.org Tue Nov 25 01:45:08 2003 From: guido at python.org (Guido van Rossum) Date: Tue Nov 25 01:45:35 2003 Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern() In-Reply-To: Your message of "Tue, 25 Nov 2003 01:24:18 EST." <004d01c3b31c$c1d7afe0$e804a044@oemcomputer> References: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer> Message-ID: <200311250645.hAP6j8C10327@c-24-5-183-134.client.comcast.net> > After re-reading previous posts on the subject, I had an idea. Let's > isolate these functions in the documentation into a separate section > following the rest of the builtins. > > The cost of having these builtins is not that they take up a few entries > in the directory listing. Also, it's no real burden to leave them in > the code base. The real cost is that when learning the language, after > reading the tutorial, the next step is to make sure you know what all > the builtins do before moving on to study the library offerings. > > The problem with buffer() and intern() is not that they are totally > useless. The problem is that it that it is darned difficult an everyday > user to invent productive use cases. Here on python-dev, one defender > arose for each and said that they once had a use for them. So, let's > leave the functionality intact and just move it off the list of things > you need to know. In both cases, it would have saved me some hours > spent trying to figure out what they were good for - I wish someone had > just said, "you can ignore these two". These functions are just > distractors in a person's mental concept space. > > There's really nothing wrong with have apply() and coerce() being > supported for old code. The problem with them is why bother even > knowing that they exist - they just don't figure into modern python > code. Any time spent learning them now is time that could have been > spent learning about the copy or pickle modules or some such. > > Moving these functions to a separate section sends a clear message to > trainers and book writers that it is okay to skip these topics. Getting > them out of the critical path for learning python will make the language > even easier to master. > > Some are highly resistant to deprecation because it makes their lives > more difficult. However, I think even they would like a list of "things > you just don't need to know anymore". In other words, you don't have to > wait for Py3.0 for a clean house, just push all the clutter in a corner > and walk around it. Sounds like a good idea. --Guido van Rossum (home page: http://www.python.org/~guido/) From oussoren at cistron.nl Tue Nov 25 03:39:10 2003 From: oussoren at cistron.nl (Ronald Oussoren) Date: Tue Nov 25 03:39:10 2003 Subject: [Python-Dev] Re: string-valued fget/fset/fdel for properties In-Reply-To: References: <200311250022.hAP0Mvl09993@c-24-5-183-134.client.comcast.net> <200311250038.hAP0cLc03924@oma.cosc.canterbury.ac.nz> Message-ID: On 25 nov 2003, at 3:12, David Eppstein wrote: > In article <200311250038.hAP0cLc03924@oma.cosc.canterbury.ac.nz>, > Greg Ewing wrote: > >>> OTOH I hate seeing name references inside string quotes, because it >>> complicates reference checking by tools like PyChecker >> >> Oh, dear... you're going to like some of the other tricks >> I'm pulling in PyGUI even less, then... > > Name references inside string quotes are also a standard part of PyObjC > (used to represent an objective-C "selector" i.e. a method name that > has > not yet been bound to an object type). That's an implementation detail, and the name references are references to *Objective-C* identifiers which are not always valid Python identifiers (it's highly unlikely that 'foo:bar:' will ever be a valid Python indentifier, while it is a valid Objective-C method name) Ronald From pedronis at bluewin.ch Tue Nov 25 10:32:52 2003 From: pedronis at bluewin.ch (Samuele Pedroni) Date: Tue Nov 25 10:29:59 2003 Subject: type inference project Re: [Python-Dev] Thesis ideas list In-Reply-To: <3FBFE987.2050203@ocf.berkeley.edu> Message-ID: <5.2.1.1.0.20031125162750.027eb6a0@pop.bluewin.ch> >Type inferencing >---------------- > >from `Martin >`__ > >Either run-time or compile-time. "Overlap with the specializing compilers". does somebody know anything about this project, what's happened of it http://www.ai.mit.edu/projects/dynlangs/Talks/star-killer.htm http://web.mit.edu/msalib/www/urop/ http://web.mit.edu/msalib/www/urop/presentation-2001-august-10/html-png/ Samuele. From tim.one at comcast.net Tue Nov 25 15:22:51 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Nov 25 15:22:57 2003 Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong In-Reply-To: <200311240126.hAO1Q9I01704@c-24-5-183-134.client.comcast.net> Message-ID: [Christian] >> Several people asked on the main list, how to pickle deeply >> nested structures without crashing pickle. Well, my general >> answer was to rewrite pickle in a non-recursive manner. [Guido] > I guess it's my anti-Scheme attitude. I just think the problem is in > the deeply nested structures. There usually is a less nested data > structure that doesn't have the problem. But I'll shut up, because > this rant is not productive. :-( Ya, but it *used* to be -- in the early days, many people learned a lot about writing better programs by avoiding constructs Python penalized (nested functions, cyclic references, deep recursion, very long reference chains, massively incestuous multiple inheritance). Learning to design with flatter data structures and flatter code was highly educational, and rewarding, at least for those who played along. I suppose that's gone for good now. An irony specific to pickle is that cPickle coding was driven mostly by Zope's needs, and multi-gigabyte Zope databases live happily with its recursive design -- most data ends up in BTrees, and those hardly ever go deeper than 3 levels. I don't think it's coincidence that, needing to find a scalable container type with demanding size and speed constraints, Jim ended up with a "shallow" BTree design. The lack of need for deep C recursion was then a consequence of needing to avoid (for search speed) long paths from root to data. Oh well. The next generation will learn the hard way . looking-forward-to-death-ly y'rs - tim From kalle at lysator.liu.se Tue Nov 25 15:32:58 2003 From: kalle at lysator.liu.se (Kalle Svensson) Date: Tue Nov 25 15:33:06 2003 Subject: [Python-Dev] Int FutureWarnings and other 2.4 TODOs In-Reply-To: <200311240434.hAO4Y4L06979@c-24-5-183-134.client.comcast.net> References: <200311240434.hAO4Y4L06979@c-24-5-183-134.client.comcast.net> Message-ID: <20031125203258.GA29814@i92.ryd.student.liu.se> [Guido van Rossum] > There's a bunch of FutureWarnings e.g. about 0xffffffff<<1 that > promise they will disappear in Python 2.4. If anyone has time to > fix these, I'd appreciate it. (It's not just a matter of removing > the FutureWarnings -- you actually have to implement the promised > future behavior. :-) I may get to these myself, but they're not > exactly rocket science, so they might be a good thing for a > beginning developer (use SF please if you'd like someone to review > the changes first). I've submitted a patch (http://python.org/sf/849227). And yes, somebody should probably take a good look at it before applying. The (modified) test suite does pass on my machine, but that's all. I may well have forgotten to add tests for new special cases, and I'm not the most experienced C programmer on the block either. As a side note, I think that line 233 in Lib/test/test_format.py if sys.maxint == 2**32-1: should be if sys.maxint == 2**31-1: but I didn't include that in the patch or submit a bug report. Should I? Peace, Kalle -- Kalle Svensson, http://www.juckapan.org/~kalle/ Student, root and saint in the Church of Emacs. From guido at python.org Tue Nov 25 15:50:32 2003 From: guido at python.org (Guido van Rossum) Date: Tue Nov 25 15:50:39 2003 Subject: [Python-Dev] Int FutureWarnings and other 2.4 TODOs In-Reply-To: Your message of "Tue, 25 Nov 2003 21:32:58 +0100." <20031125203258.GA29814@i92.ryd.student.liu.se> References: <200311240434.hAO4Y4L06979@c-24-5-183-134.client.comcast.net> <20031125203258.GA29814@i92.ryd.student.liu.se> Message-ID: <200311252050.hAPKoW912502@c-24-5-183-134.client.comcast.net> > I've submitted a patch (http://python.org/sf/849227). And yes, > somebody should probably take a good look at it before applying. The > (modified) test suite does pass on my machine, but that's all. I may > well have forgotten to add tests for new special cases, and I'm not > the most experienced C programmer on the block either. Thanks! > As a side note, I think that line 233 in Lib/test/test_format.py > > if sys.maxint == 2**32-1: > > should be > > if sys.maxint == 2**31-1: > > but I didn't include that in the patch or submit a bug report. > Should I? This definitely smells like a bug (I've never seen a machine with 33-bit ints :-) so feel free to submit a separate patch to SF. --Guido van Rossum (home page: http://www.python.org/~guido/) From tdelaney at avaya.com Tue Nov 25 15:52:12 2003 From: tdelaney at avaya.com (Delaney, Timothy C (Timothy)) Date: Tue Nov 25 15:52:21 2003 Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern() Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DEEEC545@au3010avexu1.global.avaya.com> > From: python-dev-bounces+tdelaney=avaya.com@python.org > > After re-reading previous posts on the subject, I had an idea. Let's > isolate these functions in the documentation into a separate section > following the rest of the builtins. Sounds like a good idea to me. I was the person that had a use case for intern(), but would be quite happy for it to be in a less prominent position in the docs - though more prominent than apply ... Cheers. Tim Delaney From fincher.8 at osu.edu Tue Nov 25 17:11:22 2003 From: fincher.8 at osu.edu (Jeremy Fincher) Date: Tue Nov 25 16:13:30 2003 Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern() In-Reply-To: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer> References: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer> Message-ID: <200311251711.23260.fincher.8@osu.edu> On Tuesday 25 November 2003 01:24 am, Raymond Hettinger wrote: > Some are highly resistant to deprecation because it makes their lives > more difficult. However, I think even they would like a list of "things > you just don't need to know anymore". In other words, you don't have to > wait for Py3.0 for a clean house, just push all the clutter in a corner > and walk around it. I think it's a great idea. Jeremy From Jack.Jansen at cwi.nl Tue Nov 25 16:45:09 2003 From: Jack.Jansen at cwi.nl (Jack Jansen) Date: Tue Nov 25 16:46:14 2003 Subject: [Python-Dev] Ripping out Macintosh support In-Reply-To: References: <16316.55271.205085.815371@montanaro.dyndns.org> Message-ID: On 20-nov-03, at 20:43, Martin v. L?wis wrote: > Skip Montanaro writes: > >> Someone asked on c.l.py about running Python on OS6 (yes, Six) a few >> days >> ago and Python is maintained by interested individuals on other legacy >> platforms like OS/2 and the Amiga, maybe not at the latest and >> greatest >> release, but they're still there. There's probably someone on the >> planet >> who'd be willing to putter around with Python on MacOS9. That person >> just >> hasn't been found yet. > > I think they could easily start with Python 2.3, though. That was my thinking too. I've never tried to compile 2.4 on OS9, and I don't have the intention to do so. -- Jack Jansen, , http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman From Jack.Jansen at cwi.nl Tue Nov 25 16:48:07 2003 From: Jack.Jansen at cwi.nl (Jack Jansen) Date: Tue Nov 25 16:48:10 2003 Subject: [Python-Dev] Ripping out Macintosh support In-Reply-To: <200311202232.hAKMWcY08939@oma.cosc.canterbury.ac.nz> References: <200311202232.hAKMWcY08939@oma.cosc.canterbury.ac.nz> Message-ID: <0DEA22DF-1F91-11D8-9B5A-000A27B19B96@cwi.nl> On 20-nov-03, at 23:32, Greg Ewing wrote: > Jack Jansen : > >> As you may have noticed if you follow the checkins mailing list I've >> enthusiastically started ripping out 90% of the work I did on Python >> the last 10 years > > What are you ripping out, exactly? I hope you're not getting rid of > Carbon too soon, because I'm in the midst of doing a Mac version of my > Python GUI using it! Don't worry, Carbon is going to be around for a long time, probably as long as Apple continues to support it (which is probably going to be forever). Some things will change, such as QuickTime and CoreFoundation moving out of the Carbon package where they didn't really belong in the first place, but for 2.4 I guess we'll have indirection modules in the Carbon package that print a warning and then import the real thing, just as we did when moving all the Mac modules from toplevel modules to being inside the Carbon package. Also, as long as time permits I'll continue to maintain the 2.3.X releases for MacOS9. -- Jack Jansen, , http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman From skip at pobox.com Tue Nov 25 17:56:02 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Nov 25 17:56:15 2003 Subject: [Python-Dev] Ripping out Macintosh support In-Reply-To: References: <16316.55271.205085.815371@montanaro.dyndns.org> Message-ID: <16323.56834.218973.642584@montanaro.dyndns.org> >> I think they could easily start with Python 2.3, though. Jack> That was my thinking too. I've never tried to compile 2.4 on OS9, Jack> and I don't have the intention to do so. Chicken. ;-) Skip From greg at cosc.canterbury.ac.nz Tue Nov 25 18:47:25 2003 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue Nov 25 18:47:34 2003 Subject: [Python-Dev] instancemethod_getattro seems to be partially wrong In-Reply-To: Message-ID: <200311252347.hAPNlPH13568@oma.cosc.canterbury.ac.nz> [Guido] > I guess it's my anti-Scheme attitude. I just think the problem is in > the deeply nested structures. There usually is a less nested data > structure that doesn't have the problem. A couple more thoughts: There's a difference between nested data structures and recursion. Use of one doesn't necessarily imply the other. Also, whether a given data structure is "nested" or not can depend on your point of view. Most people wouldn't consider a linked list to be nested -- it may be "wide", but it's not usually thought of as "deep". I don't think it's unreasonable to ask for a pickle that doesn't use up a recursion level for each unit of width in such a structure. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Tue Nov 25 19:24:14 2003 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue Nov 25 19:24:23 2003 Subject: [Python-Dev] Ripping out Macintosh support In-Reply-To: <0DEA22DF-1F91-11D8-9B5A-000A27B19B96@cwi.nl> Message-ID: <200311260024.hAQ0OEk13763@oma.cosc.canterbury.ac.nz> Jack: > Don't worry, Carbon is going to be around for a long time, probably as > long as Apple continues to support it (which is probably going to be > forever). That's good to hear, thanks! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From hunterp at fastmail.fm Wed Nov 26 01:44:16 2003 From: hunterp at fastmail.fm (Hunter Peress) Date: Wed Nov 26 01:44:21 2003 Subject: [Python-Dev] less quick patch for better debugging. Message-ID: <20031126064416.DDA0741547@server1.messagingengine.com> Ah. Theres clearly interest in the idea. I guess its as simple as adding a field to Py_Object that would record the last namespace name used for a given object (remember any object could have many names...) (not sure about Threads btw here). This would allow for all error lookup-type error messages to be much cleaner. The impetus for the above idea being an index error on the following line: a[1] + b[2] + c[3]...currently gives an error message that doesnt say which variable the list index error occurs in or at which index it occurs at (helpful if they were all the same object on the same line). Same issues go for dicts and even any object attributes as well. PS. maybe in the interest of runtime speed, the assigning to this new field could only occur when there actually is an error. From bac at OCF.Berkeley.EDU Wed Nov 26 01:48:53 2003 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Wed Nov 26 01:49:36 2003 Subject: [Python-Dev] Tail recursion In-Reply-To: <200311241532.hAOFWYV09067@c-24-5-183-134.client.comcast.net> References: <3FBFE987.2050203@ocf.berkeley.edu> <2m65ham0qk.fsf@starship.python.net> <200311241532.hAOFWYV09067@c-24-5-183-134.client.comcast.net> Message-ID: <3FC44CD5.2070009@ocf.berkeley.edu> Guido van Rossum wrote: >>>Tail Recursion >>>-------------- >>>from Me (my brain) >>> >>>Have proper tail recursion in Python. Would require identifying where >>>a direct function call is returned (could keep it simple and just do >>>it where CALL_FUNCTION and RETURN bytecodes are in a row). Also have >>>to deal with exception catching since that requires the frame to stay >>>alive to handle the exception. >>> >>>But getting it to work well could help with memory and >>>performance. Don't know if it has been done for a language that had >>>exception handling. >> >>How is this different from stackless? > > > AFAIK Stackless only curtails the *C* stack, not the chain of Python > frames on the heap. > > But I have a problem with tail recursion. It's generally requested by > new converts from the Scheme/Lisp or functional programming world, and > it usually means they haven't figured out yet how to write code > without using recursion for everything yet. IOW I'm doubtful on how > much of a difference it would make for real Python programs (which, > simplifying a bit, tend to use loops instead of recursion). And also > note that even if an exception is not caught, you'd like to see all > stack frames listed when the traceback is printed or when the debugger > is invoked. > I mostly agree with everything Guido has said. It probably should only be used when -OO is switched on. And yes, iterative solutions tend to be easier to grasp. I have to admit that I partially come from a Scheme world (learned it *very* shortly after I started the process of learning Python). So I have always had a slight soft spot for elegant recursive solutions. I will file this idea in the "not that popular" pile. =) -Brett From mwh at python.net Wed Nov 26 10:59:48 2003 From: mwh at python.net (Michael Hudson) Date: Wed Nov 26 10:59:53 2003 Subject: [Python-Dev] IRC Channels Message-ID: <2m8ym3ruh7.fsf@starship.python.net> Given that PEP 101 mentions the #python-dev IRC channel by name, I thought it might be prudent to register it on freenode. If anyone wants privileges, email me your nick. Also, wasn't someone rewriting the release PEPs to talk about roles instead of names? Cheers, mwh -- : exploding like a turd Never had that happen to me, I have to admit. They do that often in your world? -- Eric The Read & Dave Brown, asr From mwh at python.net Wed Nov 26 11:02:45 2003 From: mwh at python.net (Michael Hudson) Date: Wed Nov 26 11:02:49 2003 Subject: [Python-Dev] IRC Channels In-Reply-To: <2m8ym3ruh7.fsf@starship.python.net> (Michael Hudson's message of "Wed, 26 Nov 2003 15:59:48 +0000") References: <2m8ym3ruh7.fsf@starship.python.net> Message-ID: <2m4qwrruca.fsf@starship.python.net> Michael Hudson writes: > Given that PEP 101 mentions the #python-dev IRC channel by name, I > thought it might be prudent to register it on freenode. If anyone > wants privileges, email me your nick. And I meant to say, I registered #pydotorg and #starship while I was at it. Cheers, mwh -- Ya, ya, ya, except ... if I were built out of KSR chips, I'd be running at 25 or 50 MHz, and would be wrong about ALMOST EVERYTHING almost ALL THE TIME just due to being a computer! -- Tim Peters, 30 Apr 97 From fincher.8 at osu.edu Wed Nov 26 12:55:54 2003 From: fincher.8 at osu.edu (Jeremy Fincher) Date: Wed Nov 26 11:58:05 2003 Subject: [Python-Dev] IRC Channels In-Reply-To: <2m4qwrruca.fsf@starship.python.net> References: <2m8ym3ruh7.fsf@starship.python.net> <2m4qwrruca.fsf@starship.python.net> Message-ID: <200311261255.54797.fincher.8@osu.edu> On Wednesday 26 November 2003 11:02 am, Michael Hudson wrote: > Michael Hudson writes: > > Given that PEP 101 mentions the #python-dev IRC channel by name, I > > thought it might be prudent to register it on freenode. If anyone > > wants privileges, email me your nick. > > And I meant to say, I registered #pydotorg and #starship while I was > at it. If these channels won't regularly be occupied (i.e., they'll only be used when a release is looming) you should probably setup a default topic or a notice on join that will notify users of this, so their vacancy doesn't confuse/ dismay users. Jeremy From mwh at python.net Wed Nov 26 12:07:16 2003 From: mwh at python.net (Michael Hudson) Date: Wed Nov 26 12:07:22 2003 Subject: [Python-Dev] IRC Channels In-Reply-To: <200311261255.54797.fincher.8@osu.edu> (Jeremy Fincher's message of "Wed, 26 Nov 2003 12:55:54 -0500") References: <2m8ym3ruh7.fsf@starship.python.net> <2m4qwrruca.fsf@starship.python.net> <200311261255.54797.fincher.8@osu.edu> Message-ID: <2mznejqcsb.fsf@starship.python.net> Jeremy Fincher writes: > On Wednesday 26 November 2003 11:02 am, Michael Hudson wrote: >> Michael Hudson writes: >> > Given that PEP 101 mentions the #python-dev IRC channel by name, I >> > thought it might be prudent to register it on freenode. If anyone >> > wants privileges, email me your nick. >> >> And I meant to say, I registered #pydotorg and #starship while I was >> at it. > > If these channels won't regularly be occupied (i.e., they'll only be > used when a release is looming) you should probably setup a default > topic or a notice on join that will notify users of this, so their > vacancy doesn't confuse/ dismay users. I seem to have now done this :-) (at least for #python-dev) Suggestions for better wording would be welcome. Cheers, mwh -- No. In fact, my eyeballs fell out just from reading this question, so it's a good thing I can touch-type. -- John Baez, sci.physics.research From fdrake at acm.org Wed Nov 26 13:40:31 2003 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed Nov 26 13:40:47 2003 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Doc/whatsnew whatsnew24.tex, 1.13, 1.14 In-Reply-To: References: Message-ID: <16324.62367.318560.119195@grendel.zope.com> rhettinger@users.sourceforge.net writes: > Nits from a review of the documentation update. These too have been quietly pushed to the website. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From raymond.hettinger at verizon.net Wed Nov 26 15:56:05 2003 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Wed Nov 26 15:56:41 2003 Subject: [Python-Dev] Tutorial: Brief Introduction to the Standard Libary Message-ID: <000901c3b45f$b600bba0$5ab0958d@oemcomputer> I'm adding section to the tutorial with a brief sampling of library offerings and some short examples of how to use them. My first draft included: copy, glob, shelve, pickle, os, re, math/cmath, urllib, smtplib Guido's thoughts: - copy tends to be overused by beginners - the shelve module has pitfalls for new users - cmath is rarely needed and some folks are scared of complex numbers - urllib2 is be a better choice than urllib I'm interested to know what your experiences have been with teaching python. Which modules are necessary to start doing real work (like pickle and os), which are most easily grasped (like glob or random), which have impressive examples only a few lines long (i.e. urllib), and which might just be fun (turtle would be a candidate if it didn't have a Tk dependency). Note, re was included because everyone should know it's there and everyone should get advice to not use it when string methods will suffice. I'm especially interested in thoughts on whether shelve should be included. When I first started out, I was very impressed with shelves because they were the simplest way to add a form of persistence and because they could be dropped in place of a dictionary in scripts that were already built. Also, it was trivially easy to learn based on existing knowledge of dictionaries. OTOH, that existing knowledge is what makes the pitfalls so surprising. Likewise, I was impressed with the substitutability of line lists, text splits, file.readlines(), and urlopen(). While I think of copy() and deepcopy() as builtins that got tucked away in module, Guido is right about their rarity in well-crafted code. Some other candidates (let's pick just a two or three): - csv (basic tool for sharing data with other applications) - datetime (comes up frequently in real apps and admin tasks) - ftplib (because the examples are so brief) - getopt or optparse (because the task is common) - operator (because otherwise, the functionals can be a PITA) - pprint (because beauty counts) - struct (because fixed record layouts are common) - threading/Queue (because without direction people grab thread and mutexes) - timeit (because it answers most performance questions in a jiffy) - unittest (because TDD folks like myself live by it) I've avoided XML because it is a can of worms and because short examples don't do it justice. OTOH, it *is* the hot topic of the day and seems to be taking over the world one angle bracket at a time. Ideally, the new section should be relatively short but leave a reader with a reasonable foundation for crafting non-toy scripts. A secondary goal is to show-off the included batteries -- I think it is common for someone to download several languages and choose between them based on their tutorial experiences (so, a little flash and sizzle might be warranted). Raymond From fdrake at acm.org Wed Nov 26 18:49:11 2003 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed Nov 26 18:49:30 2003 Subject: [Python-Dev] Tutorial: Brief Introduction to the Standard Libary In-Reply-To: <000901c3b45f$b600bba0$5ab0958d@oemcomputer> References: <000901c3b45f$b600bba0$5ab0958d@oemcomputer> Message-ID: <16325.15351.885795.87694@grendel.fdrake.net> Raymond Hettinger writes: > I'm adding section to the tutorial with a brief sampling of library > offerings and some short examples of how to use them. Cool! > I've avoided XML because it is a can of worms and because short examples > don't do it justice. OTOH, it *is* the hot topic of the day and seems > to be taking over the world one angle bracket at a time. Actually, they usually travel in pairs. ;-) I would stay away from XML for this; there's too much there and how to pick one thing over another isn't always obvious even when someone explains it. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From greg at cosc.canterbury.ac.nz Wed Nov 26 19:26:08 2003 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed Nov 26 19:26:20 2003 Subject: [Python-Dev] less quick patch for better debugging. In-Reply-To: <20031126064416.DDA0741547@server1.messagingengine.com> Message-ID: <200311270026.hAR0Q8623420@oma.cosc.canterbury.ac.nz> Hunter Peress : > a[1] + b[2] + c[3]...currently gives an error message that doesnt say > which variable the list index error occurs in or at which index it occurs > at This would be considerably improved if the error message could just point out the position in the line instead of just the line number. Especially when a statement spans more than one line -- currently you can't even tell which line of a multi-line statement was the culprit! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From aleax at aleax.it Thu Nov 27 05:13:05 2003 From: aleax at aleax.it (Alex Martelli) Date: Thu Nov 27 05:13:10 2003 Subject: [Python-Dev] Tutorial: Brief Introduction to the Standard Libary In-Reply-To: <000901c3b45f$b600bba0$5ab0958d@oemcomputer> References: <000901c3b45f$b600bba0$5ab0958d@oemcomputer> Message-ID: <200311271113.05578.aleax@aleax.it> On Wednesday 26 November 2003 09:56 pm, Raymond Hettinger wrote: > I'm adding section to the tutorial with a brief sampling of library > offerings and some short examples of how to use them. Great idea. > copy, glob, shelve, pickle, os, re, math/cmath, urllib, smtplib ... > I'm interested to know what your experiences have been with teaching > python. Which modules are necessary to start doing real work (like I would add: sys -- "real programs" want to access their command-line arguments (sys.argv), want to terminate (sys.exit), want to write to sys.stderr. fileinput -- users are VERY likely to want to "rewrite textfiles in-place" (as well as wanting to read a bunch of textfiles) and fileinput is just the ticket for that. Users coming from perl particularly need fileinput desperately as it affords close translation of the "while(<>)" idiom. cStringIO -- I've noticed most newbies find it more natural to "write to a cStringIO.StringIO pseudofile as they go" then getvalue, rather than append'ing to a list of strings then ''.join . time, datetime, calendar -- many real programs want to deal with dates and times array -- many newbies try to use lists to do things that are perfect for array.array's > pickle and os), which are most easily grasped (like glob or random), > which have impressive examples only a few lines long (i.e. urllib), and I think zipfile and gzip are easily grasped AND impressive for people who've ever needed to read/write compressed files in other languages. xmlrpclib and SimpleXMLRPCServer are also eye-poppers (and despite their names you don't need to get into XML at all to show them off:-). CGIHTTPServer, while of course not all that suitable for "real programs", has also contributed more than its share in making instant converts to Python, in my experience -- "instant gratification". > I'm especially interested in thoughts on whether shelve should be > included. When I first started out, I was very impressed with shelves > because they were the simplest way to add a form of persistence and > because they could be dropped in place of a dictionary in scripts that > were already built. Also, it was trivially easy to learn based on > existing knowledge of dictionaries. OTOH, that existing knowledge is > what makes the pitfalls so surprising. Hmmm, yes, but, with writeback=True, you do work around the most surprising pitfalls (at a price in performance, of course). I dunno -- with so many other impressive modules to show off, maybe shelve might be avoided. > - threading/Queue (because without direction people grab thread and > mutexes) True, they do. But I don't know if the tutorial is the right time to indoctrinate people about proper Python threading architectures. > - timeit (because it answers most performance questions in a jiffy) > - unittest (because TDD folks like myself live by it) Absolute agreement here. And doctest is SO easy to use, that for the limited space of the tutorial it might also be quite appropriate -- it also encourages abundant use of docstrings, a neat thing in itself. Alex From Kepes.Krisztian at peto.hu Thu Nov 27 05:16:40 2003 From: Kepes.Krisztian at peto.hu (Kepes Krisztian) Date: Thu Nov 27 05:16:35 2003 Subject: [Python-Dev] list and string - method wishlist Message-ID: <879726515.20031127111640@peto.hu> Hi ! A.) The string object have a method named "index", and have a method named "find". It is good, because many times we need to find anything, and it is very long to write this: try: i=s.index('a') except: i=-1 if i<>-1: pass and not this: if (s.find('a')<>-1): pass Why don't exists same method in the list object ? It is very ugly thing (sorry, but I must say that). I must write in every times: l=[1,2,3,4] try: i=l.index(5) except: i=-1 if i<>-1: pass and not this: if (l.find(5)<>-1): pass B.) Same thing is the deleting. I think, this method is missing from strings, and lists. Example: I must write this: s='abcdef' l=[1,2,5,3,4,5] print s s=s[:2]+s[3:] print s print l l[2]=None l.remove(None) print l and not this: s='abcdef' l=[1,2,5,3,4,5] s=s.delete(2) l.delete(2) and delete more: s.delete() # s='' l.delete() # l=[] s.delete(2,2) # s='abef' l.delete(2,2) # l=[1,2,4,5] So: some functions/methods are neeeded to Python-like programming (less write, more effectivity). KK From aleax at aleax.it Thu Nov 27 05:39:10 2003 From: aleax at aleax.it (Alex Martelli) Date: Thu Nov 27 05:39:15 2003 Subject: [Python-Dev] list and string - method wishlist In-Reply-To: <879726515.20031127111640@peto.hu> References: <879726515.20031127111640@peto.hu> Message-ID: <200311271139.10505.aleax@aleax.it> On Thursday 27 November 2003 11:16 am, Kepes Krisztian wrote: ... > try: > i=s.index('a') > except: > i=-1 > if i<>-1: pass > > and not this: > > if (s.find('a')<>-1): pass Why don't you use the clearer, faster, more readable, AND more concise idiom if 'a' in s: pass instead? > Why don't exists same method in the list object ? The 'in' operator works just fine for lists, too. Perhaps if you studied Python's present capabilities a bit better, before requesting changes and additions to Python, you might achieve better results faster. > Same thing is the deleting. > > I think, this method is missing from strings, and lists. Look at the 'del' keyword (and slice assignments) -- for lists only: > print l > l[2]=None > l.remove(None) del l[2] or equivalently l[2:3] = [] > and delete more: > s.delete() # s='' Python strings are immutable and will always remain immutable. There is NO way to change an existing string object and there will never be. > l.delete() # l=[] l[:] = [] or equivalently del l[:] > s.delete(2,2) # s='abef' Ditto. > l.delete(2,2) # l=[1,2,4,5] l[2:4] = [] or equivalently del l[2:4] > So: some functions/methods are neeeded to Python-like programming > (less write, more effectivity). This is quite possible, but I have seen almost none listed in your wishlist. I.e., the only task you've listed that is not performed with easy, popular and widespread Python idioms would seem to be a string method roughly equivalent to the function: def delete(s, from, upto=None): if upto is None: upto = from + 1 return s[:from] + s[upto:] returning "a copy of s except for this slice". However, the addition of more functions and methods that might (perhaps) save typing a few characters, allowing a hypothetical z = s.delete(a, b) in lieu of z = s[:a] + s[b:] must overcome a serious general objection: as your very request shows, people ALREADY fail to notice and learn a lot of what Python offers today. Adding more and more marginally-useful functions and methods might therefore more likely just cause people to fail to notice and learn a larger fraction of Python's capabilities, rather than supply any burningly needed usefulness. Alex From Kepes.Krisztian at peto.hu Thu Nov 27 06:55:44 2003 From: Kepes.Krisztian at peto.hu (Kepes Krisztian) Date: Thu Nov 27 06:55:44 2003 Subject: [Python-Dev] Java final vs Py __del__ Message-ID: <19315670392.20031127125544@peto.hu> Hi ! I very wonder, when I get exp. in java with GC. I'm Delphi programmer, so I get used to destructorin objects. In Java the final method is not same, but is like to destructor (I has been think...). And then I try with some examples, I see, that the Java GC is sometimes not call this method of objects, only exit from program. So: the java programs sometimes end before the GC is use the final methods on objects. This mean that in Java the critical operations MUST do correctly by the programmmers, or some data losing happened. If it is open a file, then must write the critical modifications, and must use the flush, and close to be sure to the datas are saved. In the Py the __del__ is same java's final, or it is to be called in every way by GC ? I build this method as safe method: if the programmer don't do any closing/freeing thing, I do that ? simple example: class a: def __init__(self,filename): self.__filename=filename self.__data=[] self.__file=None def open(self): self.__file=open(self.__filename,"w") def write(self,data): self.__data.append(data) def close(self): self.__file.writelines(self.__data) self.__file.close() self.__file=None def __del__(self): if self.__file<>None: self.close() # like destructor: we do the things are forgotten by programmer Thanx for infos: KK From mwh at python.net Thu Nov 27 07:02:52 2003 From: mwh at python.net (Michael Hudson) Date: Thu Nov 27 07:02:57 2003 Subject: [Python-Dev] less quick patch for better debugging. In-Reply-To: <200311270026.hAR0Q8623420@oma.cosc.canterbury.ac.nz> (Greg Ewing's message of "Thu, 27 Nov 2003 13:26:08 +1300 (NZDT)") References: <200311270026.hAR0Q8623420@oma.cosc.canterbury.ac.nz> Message-ID: <2mn0aiqas3.fsf@starship.python.net> Greg Ewing writes: > Hunter Peress : > >> a[1] + b[2] + c[3]...currently gives an error message that doesnt say >> which variable the list index error occurs in or at which index it occurs >> at > > This would be considerably improved if the error message could > just point out the position in the line instead of just the line > number. Any ideas how to do that? I guess you could obfuscate c_lnotab even more... > Especially when a statement spans more than one line -- currently > you can't even tell which line of a multi-line statement was the > culprit! This is occasionally very annoying, and is probably fixable -- would require pretty serious compiler hackery, though. Cheers, mwh -- 3. Syntactic sugar causes cancer of the semicolon. -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html From aleax at aleax.it Thu Nov 27 07:54:41 2003 From: aleax at aleax.it (Alex Martelli) Date: Thu Nov 27 07:54:48 2003 Subject: [Python-Dev] Java final vs Py __del__ In-Reply-To: <19315670392.20031127125544@peto.hu> References: <19315670392.20031127125544@peto.hu> Message-ID: <200311271354.41367.aleax@aleax.it> On Thursday 27 November 2003 12:55 pm, Kepes Krisztian wrote: > Hi ! Hi Kepes. These questions are improper to pose here on Python-Dev, which is a mailing list about the development OF Python; for questions that are just related to Python programming, please send them to the general list, python-list@python.org, or help@python.org instead. I'm answering them this time, but please don't use this list again in the future unless it is for issues related to the development OF Python, thanks. > In Java the final method is not same, but is like to destructor (I has You're confusing final (which is a Java keyword indicating a method that cannot be overridden in subclasses) with finalize -- there is no connection at all between these two concepts in Java. The Python _language_ gives just as few guarantees about calling finalizers (__del__ in Python) as Java (otherwise, it would not be possible to implement Python on top of a Java Virtual Machine, yet Jython, the Python implementation running on a JVM, works quite productively). Some specific implementation (such as a given release of "classic Python") may happen to do a bit more, but for reliability you will want to use try/finally in Python just as you would in Java. Alex From ajs at optonline.net Thu Nov 27 08:52:51 2003 From: ajs at optonline.net (Arthur) Date: Thu Nov 27 09:30:19 2003 Subject: [Python-Dev] Tutorial: Brief Introduction to the Standard Libary Message-ID: <000601c3b4ed$bfea2d70$1c02a8c0@BasementDell> >I'm interested to know what your experiences have been >with teaching.python. How about one's experience in learning Python? It is clear to me - in retrospect - that the absence of copy() as a _built-in__ worked negatively. Learning isn't a linear process, and I can't make a simple linear argument as to why this is so. But it has more to do with getting a handle on assignment, than the direct use or lack of use of copy() itself. I tried to open up discussion of this issue on edu-sig, and was asked by Guido to take a hike. I had apparently chosen an inappropriate forum. I understand that a move of copy() to built-ins is not in the cards. Number #1 on a list of library modules in a tutorial may well be a better solution. I strongly encourage you to stick with your instincts and intuition here. But by all means including Guido's koan about the overuse of copy by novices as part of that presentation. Art From pinard at iro.umontreal.ca Thu Nov 27 10:26:49 2003 From: pinard at iro.umontreal.ca (=?iso-8859-1?Q?Fran=E7ois?= Pinard) Date: Thu Nov 27 10:40:36 2003 Subject: [Python-Dev] Tutorial: Brief Introduction to the Standard Libary In-Reply-To: <200311271113.05578.aleax@aleax.it> References: <000901c3b45f$b600bba0$5ab0958d@oemcomputer> <200311271113.05578.aleax@aleax.it> Message-ID: <20031127152649.GA4044@titan.progiciels-bpi.ca> [Alex Martelli] > cStringIO -- I've noticed most newbies find it more natural to "write to > a cStringIO.StringIO pseudofile as they go" then getvalue, rather than > append'ing to a list of strings then ''.join . I do not doubt that cStringIO is useful to know, and a tutorial could throw a short glimpse here about why the `c' prefix and speed issues. For a newcomer, here might be a good opportunity for illustrating one surprising capability of Python for those coming from other languages, which is using bound methods as "first-class" objects. Like: fragments = [] write = fragments.append ... ... result = ''.join(fragments) I think this approach is not much more difficult than `StringIO', not so bad efficiency-wise, but likely more fruitful about developing Python useful understanding and abilities. A tutorial might also show that the said `write' could be given and received in functions, which do not have to "know" if they are writing to a file, or in-memory fragments. -- Fran?ois Pinard http://www.iro.umontreal.ca/~pinard From ark-mlist at att.net Thu Nov 27 11:25:58 2003 From: ark-mlist at att.net (Andrew Koenig) Date: Thu Nov 27 11:25:50 2003 Subject: [Python-Dev] Tail recursion In-Reply-To: <3FC44CD5.2070009@ocf.berkeley.edu> Message-ID: <00fd01c3b503$23dbe9d0$6402a8c0@arkdesktop> > I mostly agree with everything Guido has said. It probably should only > be used when -OO is switched on. And yes, iterative solutions tend to > be easier to grasp. Not always. For example, suppose you want to find out how many (decimal) digits are in a (non-negative) integer. Yes, you could convert it to a string and see how long the string is, but suppose you want to do it directly. Then it is easy to solve the problem recursively by making use of two facts: 1) Non-negative integers less than 10 have one digit. 2) If x > 10, x//10 has one fewer digit than x. These two facts yield the following recursive solution: def numdigits(n): assert n >= 0 and n%1 == 0 if n < 10: return 1 return 1 + numdigits(n//10) An iterative version of this function might look like this: def numdigits(n): assert n >= 0 and n%1 == 0: length = 1 while n >= 10: length += 1 n //= 10 return length Although these two functions are pretty clearly equivalent, I find the recursive one much easier to understand. Moreover, I don't know how to write an interative version that is as easy to understand as the recursive version. Think, for example, how you might go about proving the iterative version correct. From tim.one at comcast.net Thu Nov 27 12:10:56 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Nov 27 12:11:03 2003 Subject: [Python-Dev] Tail recursion In-Reply-To: <00fd01c3b503$23dbe9d0$6402a8c0@arkdesktop> Message-ID: [Brett] >> I mostly agree with everything Guido has said. It probably should >> only be used when -OO is switched on. And yes, iterative solutions >> tend to be easier to grasp. [Andrew Koenig] > Not always. > > For example, suppose you want to find out how many (decimal) digits > are in a (non-nega tive) integer. Yes, you could convert it to a > string and see how long the string is, but suppose you want to do it > directly. Then it is easy to solve the problem recursively by making > use of two facts: > > 1) Non-negative integers less than 10 have one digit. > > 2) If x > 10, x//10 has one fewer digit than x. > > These two facts yield the following recursive solution: > > def numdigits(n): > assert n >= 0 and n%1 == 0 > if n < 10: > return 1 > return 1 + numdigits(n//10) Easy to understand, but it's not tail-recursive, so isn't an example of what was suggested for Brett to investigate. I think a tail-recursive version is more obscure than your iterative one: def numdigits(n): def inner(n, lensofar): if n < 10: return lensofar else: return inner(n//10, lensofar+1) return inner(n, 1) > An iterative version of this function might look like this: > > def numdigits(n): > assert n >= 0 and n%1 == 0: > length = 1 > while n >= 10: > length += 1 > n //= 10 > return length > > Although these two functions are pretty clearly equivalent, I find the > recursive one much easier to understand. Moreover, I don't know how > to write an interative version that is as easy to understand as the > recursive version. Think, for example, how you might go about > proving the iterative version correct. Exactly the same way as proving the tail-recursive version is correct . A different approach makes iteration much more natural: the number of digits in n (>= 0) is the least i >= 1 such that 10**i > n. Then iterative code is an obvious search loop: i = 1 while 10**i <= n: i += 1 return i Play strength-reduction tricks to get exponentiation out of the loop, and it's (just) a teensy bit less obvous. From devin at whitebread.org Thu Nov 27 14:19:04 2003 From: devin at whitebread.org (Devin) Date: Thu Nov 27 12:12:23 2003 Subject: [Python-Dev] Tail recursion Message-ID: On Thu, 27 Nov 2003, Andrew Koenig wrote: > --snip-- > Moreover, I don't know how to write an interative version that is as > easy to understand as the recursive version. ::Lurk mode off:: import math def numdigits(n): assert (n >= 0) and ((n % 1) == 0) if n < 10: return 1 return int(math.log10(n)) + 1 (not iterative, but it'll do :) ::Lurk mode on:: -- Devin devin@whitebread.org http://www.whitebread.org/ From tim.one at comcast.net Thu Nov 27 12:21:02 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Nov 27 12:21:06 2003 Subject: [Python-Dev] Tail recursion In-Reply-To: Message-ID: [Devin] > import math > > def numdigits(n): > assert (n >= 0) and ((n % 1) == 0) > if n < 10: > return 1 > return int(math.log10(n)) + 1 > > (not iterative, but it'll do :) Nope, integers in Python are unbounded, and this will deliver wrong answers for "big enough" integers. Depending on the vagaries of your platform C's log10 implementation, it may even deliver a wrong answer for small n near an exact power of 10. From guido at python.org Thu Nov 27 12:30:33 2003 From: guido at python.org (Guido van Rossum) Date: Thu Nov 27 12:32:47 2003 Subject: [Python-Dev] "groupby" iterator Message-ID: <200311271730.hARHUXg15777@c-24-5-183-134.client.comcast.net> In the shower (really!) I was thinking about the old problem of going through a list of items that are supposed to be grouped by some key, and doing something extra at the end of each group. I usually end up doing something ugly like this: oldkey = None for item in sequence: newkey = item.key # this could be any function of item if newkey != oldkey and oldkey is not None: ...do group processing... oldkey = newkey ...do item processing... ...do group processing... # for final group This is ugly because the group processing code has to be written twice (or turned into a mini-subroutine); it also doesn't handle empty sequences correctly. Solutions based on using an explicit index and peeking ahead are similarly cumbersome and hard to get right for all end cases. So I realized this is easy to do with a generator, assuming we can handle keeping a list of all items in a group. Here's the generator: def groupby(key, iterable): it = iter(iterable) value = it.next() # If there are no items, this takes an early exit oldkey = key(value) group = [value] for value in it: newkey = key(value) if newkey != oldkey: yield group group = [] oldkey = newkey group.append(value) yield group Here's the usage ("item.key" is just an example): for group in groupby(lambda item: item.key, sequence): for item in group: ...item processing... ...group processing... The only caveat is that if a group is very large, this accumulates all its items in a large list. I expect the generator can be reworked to return an iterator instead, but getting the details worked out seems too much work for a summy Thanskgiving morning. :-) Example: # Print lines of /etc/passwd, sorted, grouped by first letter lines = open("/etc/passwd").readlines() lines.sort() for group in groupby(lambda s: s[0], lines): print "-"*10 for line in group: print line, print "-"*10 Maybe Raymond can add this to the itertools module? Or is there a more elegant approach than my original code that I've missed all these years? --Guido van Rossum (home page: http://www.python.org/~guido/) From gerrit at nl.linux.org Thu Nov 27 12:37:01 2003 From: gerrit at nl.linux.org (Gerrit Holl) Date: Thu Nov 27 12:37:22 2003 Subject: [Python-Dev] Tutorial: Brief Introduction to the Standard Libary In-Reply-To: <200311271113.05578.aleax@aleax.it> References: <000901c3b45f$b600bba0$5ab0958d@oemcomputer> <200311271113.05578.aleax@aleax.it> Message-ID: <20031127173701.GA4140@nl.linux.org> [I'm Gerrit Holl (18) and I've been using Python for 3-4 years] Alex Martelli wrote: > time, datetime, calendar -- many real programs want to deal with > dates and times In my opinion, we should not include all three in the tutorial. I think only datetime should be included. datetime has largely the same niche as time, with the difference that datetime is object oriented and time is not. In my opinion, this makes datetime superior to time. Further, I think calender isn't used a lot... calendar, format3c, format3cstring, month, monthcalendar, prcal, prmonth, prweek, week, weekheader Those mostly copy the unix cal utility. They probably can be useful, but I'm not sure when. Don't most GUI's provide tools for selecting a date from a window? isleap, leapdays Useful functions. Never used them, though. firstweekday, setfirstweekday Don't really know when/why to use them timegm Doesn't belong here I think the calendar module does not contain enough functionality in order to justify it to be included in the tutorial. I think datetime does belong in the tutorial, while time and calendar do not. yours, Gerrit. -- 242. If any one hire oxen for a year, he shall pay four gur of corn for plow-oxen. -- 1780 BC, Hammurabi, Code of Law -- Asperger's Syndrome - a personal approach: http://people.nl.linux.org/~gerrit/english/ From ark-mlist at att.net Thu Nov 27 12:40:11 2003 From: ark-mlist at att.net (Andrew Koenig) Date: Thu Nov 27 12:40:37 2003 Subject: [Python-Dev] Tail recursion In-Reply-To: Message-ID: <015601c3b50d$824973c0$6402a8c0@arkdesktop> > Easy to understand, but it's not tail-recursive, so isn't an example of > what > was suggested for Brett to investigate. I think a tail-recursive version > is > more obscure than your iterative one: > > def numdigits(n): > def inner(n, lensofar): > if n < 10: > return lensofar > else: > return inner(n//10, lensofar+1) > return inner(n, 1) Ah. I will agree with you that wholly tail-recursive programs are usually no easier to understand than their iterative counterparts. On the other hand, there are partially tail-recursive functions that I find easier to understand, such as def traverse(t, f): if nonempty(t): traverse(t.left, f) traverse(t.right, f) Here, the second call to traverse is tail-recursive; the first isn't. Of course it could be rewritten this way def traverse(t, f): while nonempty(t): traverse(t.left, f) t = t.right but I think that this rewrite makes the code harder to follow and would prefer that the compiler do it for me. > A different approach makes iteration much more natural: the number of > digits in n (>= 0) is the least i >= 1 such that 10**i > n. Then > iterative > code is an obvious search loop: > > i = 1 > while 10**i <= n: > i += 1 > return i > > Play strength-reduction tricks to get exponentiation out of the loop, and > it's (just) a teensy bit less obvous. This code relies on 10**i being exact. Is that guaranteed? From guido at python.org Thu Nov 27 12:41:29 2003 From: guido at python.org (Guido van Rossum) Date: Thu Nov 27 12:41:37 2003 Subject: [Python-Dev] less quick patch for better debugging. In-Reply-To: Your message of "Thu, 27 Nov 2003 12:02:52 GMT." <2mn0aiqas3.fsf@starship.python.net> References: <200311270026.hAR0Q8623420@oma.cosc.canterbury.ac.nz> <2mn0aiqas3.fsf@starship.python.net> Message-ID: <200311271741.hARHfUq15815@c-24-5-183-134.client.comcast.net> > >> a[1] + b[2] + c[3]...currently gives an error message that doesnt > >> say which variable the list index error occurs in or at which > >> index it occurs at I would like to point out that one solution suggested here (store the most recently used name in the object itself) cannot work -- in an expression like x[i][j], if it is the [j] part that fails, the object name displayed might be some local variable in an earlier scope that briefly refenced x[i], and that would be just plain confusing. This apart from the significant memory and CPU time overhead (which I expect whoever requested the feature doesn't care about, until they have code that runs too slow, and then they will requested a Python-to-C compiler, and be indignant when they are asked to write it themselves :-). > > This would be considerably improved if the error message could > > just point out the position in the line instead of just the line > > number. > > Any ideas how to do that? I guess you could obfuscate c_lnotab even > more... Probably not worth it. (I should mention that I have a possible use case for messing with the lnotab to contain line numbers in a different file than the Python source code. :-) > > Especially when a statement spans more than one line -- currently > > you can't even tell which line of a multi-line statement was the > > culprit! > > This is occasionally very annoying, and is probably fixable -- would > require pretty serious compiler hackery, though. BTW, for the special case of multi-line argument lists, it is already fixed. --Guido van Rossum (home page: http://www.python.org/~guido/) From gerrit at nl.linux.org Thu Nov 27 12:44:30 2003 From: gerrit at nl.linux.org (Gerrit Holl) Date: Thu Nov 27 12:44:51 2003 Subject: [Python-Dev] Tutorial: Brief Introduction to the Standard Libary In-Reply-To: <000901c3b45f$b600bba0$5ab0958d@oemcomputer> References: <000901c3b45f$b600bba0$5ab0958d@oemcomputer> Message-ID: <20031127174430.GB4140@nl.linux.org> Raymond Hettinger wrote: > I'm adding section to the tutorial with a brief sampling of library > offerings and some short examples of how to use them. I think it's a great idea. > My first draft included: > copy, glob, shelve, pickle, os, re, math/cmath, urllib, smtplib > - csv (basic tool for sharing data with other applications) > - datetime (comes up frequently in real apps and admin tasks) > - ftplib (because the examples are so brief) > - getopt or optparse (because the task is common) If one of those is chosen, I'd go for the latter, because it can do more and it's more OO. > - operator (because otherwise, the functionals can be a PITA) > - pprint (because beauty counts) > - struct (because fixed record layouts are common) > - threading/Queue (because without direction people grab thread and > mutexes) Hm, not sure whether this should be in the tutorial. > - timeit (because it answers most performance questions in a jiffy) > - unittest (because TDD folks like myself live by it) - email (because it's impressive and common) - textwrap (because I love it :) and it's useful) But of course, it should stay a tutorial, and not become a reference. Users are intelligent enough to skim through the standard library looking for libraries. We should make a selection. Maybe some of them should only be pointed to, without going into detail about how to use it? yours, Gerrit. -- 135. If a man be taken prisoner in war and there be no sustenance in his house and his wife go to another house and bear children; and if later her husband return and come to his home: then this wife shall return to her husband, but the children follow their father. -- 1780 BC, Hammurabi, Code of Law -- Asperger's Syndrome - a personal approach: http://people.nl.linux.org/~gerrit/english/ From guido at python.org Thu Nov 27 12:45:12 2003 From: guido at python.org (Guido van Rossum) Date: Thu Nov 27 12:45:39 2003 Subject: [Python-Dev] Tail recursion In-Reply-To: Your message of "Thu, 27 Nov 2003 11:25:58 EST." <00fd01c3b503$23dbe9d0$6402a8c0@arkdesktop> References: <00fd01c3b503$23dbe9d0$6402a8c0@arkdesktop> Message-ID: <200311271745.hARHjCN15844@c-24-5-183-134.client.comcast.net> > For example, suppose you want to find out how many (decimal) digits are in a > (non-negative) integer. Yes, you could convert it to a string and see how > long the string is, but suppose you want to do it directly. Then it is easy > to solve the problem recursively by making use of two facts: > > 1) Non-negative integers less than 10 have one digit. > > 2) If x > 10, x//10 has one fewer digit than x. > > These two facts yield the following recursive solution: > > def numdigits(n): > assert n >= 0 and n%1 == 0 > if n < 10: > return 1 > return 1 + numdigits(n//10) > > An iterative version of this function might look like this: > > def numdigits(n): > assert n >= 0 and n%1 == 0: > length = 1 > while n >= 10: > length += 1 > n //= 10 > return length > > Although these two functions are pretty clearly equivalent, I find the > recursive one much easier to understand. Moreover, I don't know how to > write an interative version that is as easy to understand as the recursive > version. Think, for example, how you might go about proving the iterative > version correct. Hm. The iterative version looks totally fine to me. I wonder if it all depends on the (recursive) definition with which you started. --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz at pythoncraft.com Thu Nov 27 12:50:39 2003 From: aahz at pythoncraft.com (Aahz) Date: Thu Nov 27 12:51:49 2003 Subject: [Python-Dev] Tail recursion In-Reply-To: <015601c3b50d$824973c0$6402a8c0@arkdesktop> References: <015601c3b50d$824973c0$6402a8c0@arkdesktop> Message-ID: <20031127175039.GA13922@panix.com> On Thu, Nov 27, 2003, Andrew Koenig wrote: >Tim Peters: >> >> A different approach makes iteration much more natural: the number of >> digits in n (>= 0) is the least i >= 1 such that 10**i > n. Then >> iterative >> code is an obvious search loop: >> >> i = 1 >> while 10**i <= n: >> i += 1 >> return i > > This code relies on 10**i being exact. Is that guaranteed? For Python ints, yes. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Weinberg's Second Law: If builders built buildings the way programmers wrote programs, then the first woodpecker that came along would destroy civilization. From mwh at python.net Thu Nov 27 12:52:47 2003 From: mwh at python.net (Michael Hudson) Date: Thu Nov 27 12:52:51 2003 Subject: [Python-Dev] less quick patch for better debugging. In-Reply-To: <200311271741.hARHfUq15815@c-24-5-183-134.client.comcast.net> (Guido van Rossum's message of "Thu, 27 Nov 2003 09:41:29 -0800") References: <200311270026.hAR0Q8623420@oma.cosc.canterbury.ac.nz> <2mn0aiqas3.fsf@starship.python.net> <200311271741.hARHfUq15815@c-24-5-183-134.client.comcast.net> Message-ID: <2misl5r95c.fsf@starship.python.net> Guido van Rossum writes: >> > This would be considerably improved if the error message could >> > just point out the position in the line instead of just the line >> > number. >> >> Any ideas how to do that? I guess you could obfuscate c_lnotab even >> more... > > Probably not worth it. (I should mention that I have a possible use > case for messing with the lnotab to contain line numbers in a > different file than the Python source code. :-) That's not c_lnotab, is it? More likely co_firstlineno & co_filename. But anyway, eek! >> > Especially when a statement spans more than one line -- currently >> > you can't even tell which line of a multi-line statement was the >> > culprit! >> >> This is occasionally very annoying, and is probably fixable -- would >> require pretty serious compiler hackery, though. > > BTW, for the special case of multi-line argument lists, it is already > fixed. So it is. I guess the other situations that are worth fixing are long container -- list, tuple, dict -- literals. My brain is a bit too fried to think if a more general solution is feasible, but I will point out that since SET_LINENO went away, inserting superfluous calls to com_set_lineno doesn't result in superfluous bytecodes, so perhaps that could just be added to com_node or something. Although IIRC, in {k:v} v is evaluated before k, which could make life entertaining. Cheers, mwh -- ARTHUR: Don't ask me how it works or I'll start to whimper. -- The Hitch-Hikers Guide to the Galaxy, Episode 11 From bac at OCF.Berkeley.EDU Thu Nov 27 14:21:37 2003 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Thu Nov 27 14:21:44 2003 Subject: [Python-Dev] python-dev Summary for 10-16-2003 through 11-15-2003[draft] In-Reply-To: <005c01c3b221$bf2d7c80$edb02c81@oemcomputer> References: <005c01c3b221$bf2d7c80$edb02c81@oemcomputer> Message-ID: <3FC64EC1.1000306@ocf.berkeley.edu> Raymond Hettinger wrote: >>If you ever wanted to have the power of list comprehensions but > > without > >>the overhead of generating the entire list you have Peter Norvig >>initially and then what seems like the rest of the world for generator >>expressions. > > > [possibly mangled sentence doesn't make sense] > Or me not typing as fast as my brain is working. There is a critical "to thank" missing from that sentence. > > > >>After the addition of the 'key' argument to list.sort(), people began > > to > >>clamor for list.sort() to return self. Guido refused to do give in, > > so > >>a compromise was reached. 'list' now has a class method named > > 'sorted'. > >> Pass it a list and it will return a *copy* of that list sorted. > > > > [Add] > What makes a class method so attractive is that the argument need not be > a list, any iterable will do. The return value *is* of course a list. > > By returning a list instead of None, list.sorted() can be used as an > expression instead of a statement. This makes it possible to use it as > an argument in a function call or as the iterable in a for-loop:: > > # iterate over a dictionary sorted by key > for key, value in list.sorted(mydict.iteritems()): > Changed it to state that it takes an iterable. Didn't add the full-on tutorial on use, though. Chances are people who read the Summary know Python well enough to realize the method's use. > > > >>As an interim solution, itertools grew a new function: tee. It takes > > in > >>an iterable and returns two iterators which independently iterate over >>the iterable. > > > [replace] two > [with] two or more > > Done. > > >>The point that operator.isMappingType is kind of broken came up. Both >>Alex and Raymond Hettinger would not mind seeing it disappear. No one >>objected. It is still in CVS at the moment, but I would not count on > > it > >>necessarily sticking around. > > > ["It's not quite dead yet" ;-) Actually, there may be a way to > partially fix-it so that it won't be totally useless]. > > Fixed. > > >>There was a new built-in named reversed(), and all rejoiced. > > > [And much flogging of the person who proposed it] > > Fixed. =) > > >>Straight from the function's doc string: "reverse iterator over values >>of the sequence". `PEP 322`_ has the relevant details on this toy. > > > [Replace] toy > [With] major technological innovation of the first order > [Or just] builtin. > > I went with the latter since I need to keep some journalistic integrity and thus not be too biased. =) > > > >>Sets now at blazing C speeds! > > > [Looks like a certain parroteer will soon by eating pie!] > > > > Another fine summary. > Thanks for the good work. > You're quite welcome. Happy Thanksgiving, Raymond (and everyone else out there). -Brett From theller at python.net Thu Nov 27 14:50:59 2003 From: theller at python.net (Thomas Heller) Date: Thu Nov 27 14:51:09 2003 Subject: [Python-Dev] Patch to distutils.msvccompiler, 2.3 branch Message-ID: Several people on this list (IIRC Jim, Guido, Jeremy) have been bitten by the problem that distutils couldn't build extensions with MSVC6, complaining that the compiler isn't installed although in fact it was installed. The problem always seemed to be that MSVC6 only writes the complete registry entries which distutils requires after the GUI has been run at least one time. I have uploaded a patch to the latest bug report from Jim, http://www.python.org/sf/848614, which tries to detect these incomplete registry entries. It works for me (having removed and installed MSVC several times, with and without running the gui). It would be great if someone else would try it out and report if it works correctly - IMO it should be commited to the 2.3 maintanance branch before 2.3.3 goes out. (Suggestions for better wording would be accepted ;-) The effect of this patch would be the folling outputs from a 'setup.py build_ext' command, depending on the compiler installation state: Not installed: error: Python was built with version 6 of Visual Studio, and extensions need to be built with the same version of the compiler, but it isn't installed. Installed, but the GUI has never been run: warning: It seems you have Visual Studio 6 installed, but the expected registry settings are not present. You must at least run the Visual Studio GUI once so that these entries are created. error: Python was built with version 6 of Visual Studio, and extensions need to be built with the same version of the compiler, but it isn't installed. Installed, and GUI has been run: the extension should build normally. Thanks, Thomas From pje at telecommunity.com Thu Nov 27 15:34:45 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Nov 27 15:33:36 2003 Subject: [Python-Dev] less quick patch for better debugging. In-Reply-To: <200311271741.hARHfUq15815@c-24-5-183-134.client.comcast.ne t> References: <200311270026.hAR0Q8623420@oma.cosc.canterbury.ac.nz> <2mn0aiqas3.fsf@starship.python.net> Message-ID: <5.1.0.14.0.20031127153121.03d42ce0@mail.telecommunity.com> At 09:41 AM 11/27/03 -0800, Guido van Rossum wrote: >Probably not worth it. (I should mention that I have a possible use >case for messing with the lnotab to contain line numbers in a >different file than the Python source code. :-) DTML, perhaps? ;-) Yes, if the format is changed to add columns, it would be nice to make it be capable of having the code in one code block actually come from more than one file, or from non-contiguous lines in one file. Tools that use Python as an output format, or that preprocess Python (e.g. the dozen or so templating libraries out there), could really use something like a #line directive. From gball at cfa.harvard.edu Thu Nov 27 15:58:47 2003 From: gball at cfa.harvard.edu (Greg Ball) Date: Thu Nov 27 15:58:51 2003 Subject: [Python-Dev] "groupby" iterator Message-ID: Here's a reworking which returns iterators. I had to decide what to do if the user tries to access things out of order; I raise an exception. Anything else would complicate the code quite a lot I think. def groupby(key, iterable): it = iter(iterable) value = it.next() # If there are no items, this takes an early exit oldkey = [key(value)] cache = [value] lock = [] def grouper(): yield cache.pop() for value in it: newkey = key(value) if newkey == oldkey[0]: yield value else: oldkey[0] = newkey cache.append(value) break del lock[0] while 1: if lock: raise LookupError, "groups accessed out of order" if not cache: break lock.append(1) yield grouper() --Greg Ball From Jack.Jansen at cwi.nl Thu Nov 27 16:16:26 2003 From: Jack.Jansen at cwi.nl (Jack Jansen) Date: Thu Nov 27 16:16:32 2003 Subject: [Python-Dev] Tutorial: Brief Introduction to the Standard Libary In-Reply-To: <000901c3b45f$b600bba0$5ab0958d@oemcomputer> References: <000901c3b45f$b600bba0$5ab0958d@oemcomputer> Message-ID: On 26-nov-03, at 21:56, Raymond Hettinger wrote: > I'm adding section to the tutorial with a brief sampling of library > offerings and some short examples of how to use them. > > My first draft included: > copy, glob, shelve, pickle, os, re, math/cmath, urllib, smtplib My 2 cents (and actually what I plan to do for MacPython, Some Day:-): pick a small number of tutorials where you solve toy versions of real world problems from different domains. For example you could do a "publish spreadsheet to website" where you showcase csv, and urllib, or maybe the reverse "turn html table into csv" so you can show htmllib too); "analyse some sort of logfile" where you could probably show datetime, re and maybe glob and optparse; "something scientific" could probably show cmath and random and a few others; "form mailer" could show cgi, pprint and email. I think the advantage of examples from real world problem domains is that people will pick the one that they can relate to, and hence not only will they understand what the problem is all about (i.e. people won't look at a complex number example if they haven't a clue what a complex number is), but also the functionality demonstrated should produce the "aha!" that we're after. -- Jack Jansen, , http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman From guido at python.org Thu Nov 27 16:49:40 2003 From: guido at python.org (Guido van Rossum) Date: Thu Nov 27 16:49:52 2003 Subject: [Python-Dev] "groupby" iterator In-Reply-To: Your message of "Thu, 27 Nov 2003 15:58:47 EST." References: Message-ID: <200311272149.hARLneU16201@c-24-5-183-134.client.comcast.net> > Here's a reworking which returns iterators. I had to decide what to do if > the user tries to access things out of order; I raise an exception. > Anything else would complicate the code quite a lot I think. > > def groupby(key, iterable): > it = iter(iterable) > value = it.next() # If there are no items, this takes an early exit > oldkey = [key(value)] > cache = [value] > lock = [] > def grouper(): > yield cache.pop() > for value in it: > newkey = key(value) > if newkey == oldkey[0]: > yield value > else: > oldkey[0] = newkey > cache.append(value) > break > del lock[0] > while 1: > if lock: > raise LookupError, "groups accessed out of order" > if not cache: > break > lock.append(1) > yield grouper() Thanks! Here's a class version of the same, which strikes me as slightly easier to understand (though probably slower due to all the instance variable access). It may serve as an easier model for a C implementation. I decided not to deal explicitly with out-of-order access; if the caller doesn't play by the rules, some of their groups will be split and jumbled, but each split group will still have matching keys. class GroupBy(object): def __init__(self, key, iterable): self.key = key self.it = iter(iterable) self.todo = [] def __iter__(self): return self def next(self): if self.todo: value, oldkey = self.todo.pop() else: value = self.it.next() # Exit if this raises StopIteration oldkey = self.key(value) return self._grouper(value, oldkey) def _grouper(self, value, oldkey): yield value for value in self.it: newkey = self.key(value) if newkey != oldkey: self.todo.append((value, newkey)) break yield value This is an example of what's so cool about iterators and generators: You can code a particular idiom or mini-pattern (in this case grouping list items) once and apply it to lots of situations. That's of course what all subroutines do, but iterators and generators open up lots of places where previously it wasn't convenient to use a subroutine (you'd have to use lots of lambdas -- or you'd have to have a language supporting anonymous code blocks, which provide a lot of the same power in a different way). --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Thu Nov 27 17:08:35 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Thu Nov 27 17:09:25 2003 Subject: [Python-Dev] Patch to distutils.msvccompiler, 2.3 branch In-Reply-To: References: Message-ID: Thomas Heller writes: > It would be great if > someone else would try it out and report if it works correctly - IMO it > should be commited to the 2.3 maintanance branch before 2.3.3 goes out. > (Suggestions for better wording would be accepted ;-) I'm willing to trust you that you get this right. Most of us probably aren't even aware that VS6 has different registry settings depending on whether it was ever invoked after being installed. Regards, Martin From tdelaney at avaya.com Thu Nov 27 17:19:45 2003 From: tdelaney at avaya.com (Delaney, Timothy C (Timothy)) Date: Thu Nov 27 17:19:52 2003 Subject: [Python-Dev] Patch to distutils.msvccompiler, 2.3 branch Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DEEECA9C@au3010avexu1.global.avaya.com> > From: Martin v. L?wis > > Thomas Heller writes: > > > It would be great if > > someone else would try it out and report if it works > correctly - IMO it > > should be commited to the 2.3 maintanance branch before > 2.3.3 goes out. > > (Suggestions for better wording would be accepted ;-) > > I'm willing to trust you that you get this right. > > Most of us probably aren't even aware that VS6 has different registry > settings depending on whether it was ever invoked after being > installed. This is something I come across all the time with Microsoft product - in particular, we have a product which is a plugin to Microsoft Visio. Our installer currently has to detect that Visio is installed, but hasn't been run, and tell them to run it. I intend to rewrite the installer soon (from WISE to NSIS) and hopefully I'll be able to improve this behaviour ... Tim Delaney From greg at cosc.canterbury.ac.nz Thu Nov 27 18:08:14 2003 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu Nov 27 18:08:21 2003 Subject: [Python-Dev] less quick patch for better debugging. In-Reply-To: <2mn0aiqas3.fsf@starship.python.net> Message-ID: <200311272308.hARN8EE02347@oma.cosc.canterbury.ac.nz> Michael Hudson : > > This would be considerably improved if the error message could > > just point out the position in the line instead of just the line > > number. > > Any ideas how to do that? I guess you could obfuscate c_lnotab even > more... It would need to contain a lot more information, one way or another. I don't know whether it would be worth going to heroic lengths to compress it, though. Maybe it would be better to invest the effort in making the lineno tables lazily loaded instead -- leave them in the .pyc file until they're needed. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tjreedy at udel.edu Thu Nov 27 21:09:33 2003 From: tjreedy at udel.edu (Terry Reedy) Date: Thu Nov 27 21:09:41 2003 Subject: [Python-Dev] Re: Patch to distutils.msvccompiler, 2.3 branch References: Message-ID: "Thomas Heller" wrote in message news:brqx615o.fsf@python.net... > Several people on this list (IIRC Jim, Guido, Jeremy) have been bitten > by the problem that distutils couldn't build extensions with MSVC6, > complaining that the compiler isn't installed although in fact it was > installed. In my view, distutils is correct. VC6 has been loaded but installation is not finished. > The problem always seemed to be that MSVC6 only writes the complete > registry entries which distutils requires after the GUI has been run at > least one time. I have seen other programs (some games, in particular, that I remembert) do this sort of thing -- do final installation phase on first execution. Sometimes there is a menu option to repeat this phase without reloading. > Not installed: I would call this 'Not loaded' > > error: Python was built with version 6 of Visual Studio, and > extensions need to be built with the same version of the compiler, but > it isn't installed. > > Installed, but the GUI has never been run: and this 'Loaded, but installation incomplete' > Installed, and GUI has been run: the extension should build normally. and this 'Fully installed' Terry J. Reedy From tim.one at comcast.net Fri Nov 28 00:56:40 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Nov 28 00:56:42 2003 Subject: [Python-Dev] Tail recursion In-Reply-To: <015601c3b50d$824973c0$6402a8c0@arkdesktop> Message-ID: [Andrew Koenig] > Ah. I will agree with you that wholly tail-recursive programs are > usually no easier to understand than their iterative counterparts. Good! That's why I've never been keen to "do something" about tail recursion in Python -- the "one obvious way" to write a loop in Python is with a loop . > On the other hand, there are partially tail-recursive functions that > I find easier to understand, such as > > def traverse(t, f): > if nonempty(t): > traverse(t.left, f) > traverse(t.right, f) > > Here, the second call to traverse is tail-recursive; the first isn't. > > Of course it could be rewritten this way > > def traverse(t, f): > while nonempty(t): > traverse(t.left, f) > t = t.right > > but I think that this rewrite makes the code harder to follow I agree. Worse still is writing it iteratively with an explicit stack. Note that PEP 255 has both spellings for a tree-walking generator, and the fully iterative spelling is much harder to understand. > would prefer that the compiler do it for me. I don't in Python: if I coded a call, I want Python to make a call. WYSIWYG contributes greatly to the debuggability of large Python programs in practice. >> i = 1 >> while 10**i <= n: >> i += 1 >> return i > This code relies on 10**i being exact. Also on + being exact, and the other code in this thread depended on // being exact. > Is that guaranteed? + - * // % ** pow and divmod on integers in Python will either deliver an exact result or raise an exception (like MemoryError if malloc() can't find enough space to hold an intermediate result). From mwh at python.net Fri Nov 28 09:51:28 2003 From: mwh at python.net (Michael Hudson) Date: Fri Nov 28 09:51:32 2003 Subject: [Python-Dev] less quick patch for better debugging. In-Reply-To: <2misl5r95c.fsf@starship.python.net> (Michael Hudson's message of "Thu, 27 Nov 2003 17:52:47 +0000") References: <200311270026.hAR0Q8623420@oma.cosc.canterbury.ac.nz> <2mn0aiqas3.fsf@starship.python.net> <200311271741.hARHfUq15815@c-24-5-183-134.client.comcast.net> <2misl5r95c.fsf@starship.python.net> Message-ID: <2mekvsr1fz.fsf@starship.python.net> Michael Hudson writes: > Although IIRC, in {k:v} v is evaluated before k, which could make > life entertaining. Another situation where (more unavoidably) execution "goes backwards": r = [i for i in somelist] Cheers, mwh -- (Of course SML does have its weaknesses, but by comparison, a discussion of C++'s strengths and flaws always sounds like an argument about whether one should face north or east when one is sacrificing one's goat to the rain god.) -- Thant Tessman From mwh at python.net Fri Nov 28 10:58:58 2003 From: mwh at python.net (Michael Hudson) Date: Fri Nov 28 10:59:02 2003 Subject: [Python-Dev] less quick patch for better debugging. In-Reply-To: <2misl5r95c.fsf@starship.python.net> (Michael Hudson's message of "Thu, 27 Nov 2003 17:52:47 +0000") References: <200311270026.hAR0Q8623420@oma.cosc.canterbury.ac.nz> <2mn0aiqas3.fsf@starship.python.net> <200311271741.hARHfUq15815@c-24-5-183-134.client.comcast.net> <2misl5r95c.fsf@starship.python.net> Message-ID: <2m7k1kqybh.fsf@starship.python.net> Michael Hudson writes: > Guido van Rossum writes: >> BTW, for the special case of multi-line argument lists, it is already >> fixed. > > So it is. I guess the other situations that are worth fixing are long > container -- list, tuple, dict -- literals. My brain is a bit too > fried to think if a more general solution is feasible, but I will > point out that since SET_LINENO went away, inserting superfluous calls > to com_set_lineno doesn't result in superfluous bytecodes, so perhaps > that could just be added to com_node or something. Brain still fried, so someone else will have to tell me what's wrong with: http://python.org/sf/850789 which as sketched above calls com_set_lineno in every invocation of com_node and removes all the other calls. Cheers, mh -- Ability to type on a computer terminal is no guarantee of sanity, intelligence, or common sense. -- Gene Spafford's Axiom #2 of Usenet From pinard at iro.umontreal.ca Fri Nov 28 11:44:12 2003 From: pinard at iro.umontreal.ca (=?iso-8859-1?Q?Fran=E7ois?= Pinard) Date: Fri Nov 28 11:57:09 2003 Subject: [Python-Dev] Tail recursion In-Reply-To: References: <015601c3b50d$824973c0$6402a8c0@arkdesktop> Message-ID: <20031128164412.GA3028@titan.progiciels-bpi.ca> [Tim Peters] > [Andrew Koenig] > > Ah. I will agree with you that wholly tail-recursive programs are > > usually no easier to understand than their iterative counterparts. > Good! That's why I've never been keen to "do something" about tail > recursion in Python -- the "one obvious way" to write a loop in Python is > with a loop . Just a tiny remark on that topic. In my experience, it is rather unusual that I need to use tail recursion in a way that would not easily express itself with a simple loop, and more clearly that way. However, there are a few rare cases in which algorithms use tail recursion at various places and paths in a single function, in such a way that untangling these into a single loop would not be easy. But such situtations (let's call them [1]) are uncommon in practice. Moreover, tail recursion is an optimisation matter, and situations in which speed is excruciatingly important (let's call them [2]) are far less frequent, still in practice, than some people tend to believe. Since [1] and [2] are kind of independant, we could consider that it is extremely uncommon that we meet [1] and [2] simultaneously. So, in practice, it might be that Python does not really need tail recursion. > > On the other hand, there are partially tail-recursive functions that > > I find easier to understand, such as [...] Yes, of course, if an algorithm expresses itself more clearly using a notation which happens to be tail recursive, do not hesitate at expressing it that way, especially given that _on average_, one may safely assert that the algorithm is not speed-critical. Rare exceptions exist and can be used to build counter-examples, but these should not be seen as really compelling. On the other hand, if Guido feels like accepting tail-recursion in Python for the sake of an intellectual exercise or for the pleasure of its elegance, let's go for it. It cannot really hurt that much :-). -- Fran?ois Pinard http://www.iro.umontreal.ca/~pinard From guido at python.org Fri Nov 28 13:00:12 2003 From: guido at python.org (Guido van Rossum) Date: Fri Nov 28 13:00:37 2003 Subject: [Python-Dev] Tail recursion In-Reply-To: Your message of "Fri, 28 Nov 2003 00:56:40 EST." References: Message-ID: <200311281800.hASI0CW17161@c-24-5-183-134.client.comcast.net> > + - * // % ** pow and divmod on integers in Python will either deliver an > exact result or raise an exception (like MemoryError if malloc() can't find > enough space to hold an intermediate result). Except for ** if the exponent is negative. --Guido van Rossum (home page: http://www.python.org/~guido/) From gerrit at nl.linux.org Fri Nov 28 14:49:59 2003 From: gerrit at nl.linux.org (Gerrit Holl) Date: Fri Nov 28 14:50:29 2003 Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern() In-Reply-To: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer> References: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer> Message-ID: <20031128194959.GA4886@nl.linux.org> Raymond Hettinger wrote: > Date: Tue, 25 Nov 2003 07:26:15 +0100 > After re-reading previous posts on the subject, I had an idea. Let's > isolate these functions in the documentation into a separate section > following the rest of the builtins. I would like to nominate input() also. It is often misused by beginners. A better choice is almost always raw_input(). In the standard library, fpformat.py seems to be the only one using it. Further, I see Demo/classes/Dbm.py uses it, but that seems to be all. How about banishing input() too? yours, Gerrit. -- 59. If any man, without the knowledge of the owner of a garden, fell a tree in a garden he shall pay half a mina in money. -- 1780 BC, Hammurabi, Code of Law -- Asperger's Syndrome - a personal approach: http://people.nl.linux.org/~gerrit/english/ From perky at i18n.org Fri Nov 28 14:54:20 2003 From: perky at i18n.org (Hye-Shik Chang) Date: Fri Nov 28 14:54:31 2003 Subject: [Python-Dev] "groupby" iterator In-Reply-To: <200311272149.hARLneU16201@c-24-5-183-134.client.comcast.net> References: <200311272149.hARLneU16201@c-24-5-183-134.client.comcast.net> Message-ID: <20031128195420.GA63319@i18n.org> On Thu, Nov 27, 2003 at 01:49:40PM -0800, Guido van Rossum wrote: > > Thanks! Here's a class version of the same, which strikes me as > slightly easier to understand (though probably slower due to all the > instance variable access). It may serve as an easier model for a C > implementation. I decided not to deal explicitly with out-of-order > access; if the caller doesn't play by the rules, some of their groups > will be split and jumbled, but each split group will still have > matching keys. Here's yet another implementation for itertoolsmodule.c. (see attachment) I wrote it after the shower (really!) :) Regards, Hye-Shik -------------- next part -------------- Index: Modules/itertoolsmodule.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Modules/itertoolsmodule.c,v retrieving revision 1.26 diff -u -r1.26 itertoolsmodule.c --- Modules/itertoolsmodule.c 12 Nov 2003 14:32:26 -0000 1.26 +++ Modules/itertoolsmodule.c 28 Nov 2003 19:46:43 -0000 @@ -2081,6 +2081,272 @@ }; +/* groupby object ***********************************************************/ + +typedef struct { + PyObject_HEAD + PyObject *it; + PyObject *key; + PyObject *oldvalue; + PyObject *oldkey; +} groupbyobject; + +static PyTypeObject groupby_type; +static PyObject *_itergroup_create(groupbyobject *); + +static PyObject * +groupby_new(PyTypeObject *type, PyObject *args, PyObject *kwds) +{ + groupbyobject *gbo; + PyObject *it, *key; + + if (!PyArg_ParseTuple(args, "OO:groupby", &key, &it)) + return NULL; + + if (!PyCallable_Check(key)) { + PyErr_SetString(PyExc_ValueError, + "Key argument must be a callable object."); + return NULL; + } + + gbo = (groupbyobject *)type->tp_alloc(type, 0); + if (gbo == NULL) + return NULL; + gbo->oldvalue = NULL; + gbo->oldkey = NULL; + gbo->key = key; + Py_INCREF(key); + gbo->it = PyObject_GetIter(it); + if (it == NULL) { + Py_DECREF(gbo); + return NULL; + } + return (PyObject *)gbo; +} + +static void +groupby_dealloc(groupbyobject *gbo) +{ + PyObject_GC_UnTrack(gbo); + Py_XDECREF(gbo->it); + Py_XDECREF(gbo->key); + Py_XDECREF(gbo->oldvalue); + Py_XDECREF(gbo->oldkey); + gbo->ob_type->tp_free(gbo); +} + +static int +groupby_traverse(groupbyobject *gbo, visitproc visit, void *arg) +{ + int err; + + if (gbo->it) { + err = visit(gbo->it, arg); + if (err) + return err; + } + + if (gbo->key) { + err = visit(gbo->key, arg); + if (err) + return err; + } + + if (gbo->oldvalue) { + err = visit(gbo->oldvalue, arg); + if (err) + return err; + } + + if (gbo->oldkey) { + err = visit(gbo->oldkey, arg); + if (err) + return err; + } + + return 0; +} + +static PyObject * +groupby_next(groupbyobject *gbo) +{ + if (gbo->oldvalue == NULL) { + gbo->oldvalue = PyIter_Next(gbo->it); + if (gbo->oldvalue == NULL) + return NULL; + } + + return _itergroup_create(gbo); +} + +PyDoc_STRVAR(groupby_doc, +"groupby(key, iterable) -> create an iterator which returns sub-iterators\n\ +grouped by key(value).\n"); + +static PyTypeObject groupby_type = { + PyObject_HEAD_INIT(NULL) + 0, /* ob_size */ + "itertools.groupby", /* tp_name */ + sizeof(groupbyobject), /* tp_basicsize */ + 0, /* tp_itemsize */ + /* methods */ + (destructor)groupby_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + PyObject_GenericGetAttr, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC | + Py_TPFLAGS_BASETYPE, /* tp_flags */ + groupby_doc, /* tp_doc */ + (traverseproc)groupby_traverse, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + PyObject_SelfIter, /* tp_iter */ + (iternextfunc)groupby_next, /* tp_iternext */ + 0, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + groupby_new, /* tp_new */ + PyObject_GC_Del, /* tp_free */ +}; + + +/* _itergroup object (internal) **********************************************/ + +typedef struct { + PyObject_HEAD + PyObject *parent; +} _itergroupobject; + +static PyTypeObject _itergroup_type; + +static PyObject * +_itergroup_create(groupbyobject *parent) +{ + _itergroupobject *igo; + + igo = PyObject_New(_itergroupobject, &_itergroup_type); + if (igo == NULL) + return PyErr_NoMemory(); + igo->parent = (PyObject *)parent; + Py_INCREF(parent); + + return (PyObject *)igo; +} + +static void +_itergroup_dealloc(_itergroupobject *igo) +{ + Py_XDECREF(igo->parent); + PyObject_Del(igo); +} + +static PyObject * +_itergroup_next(_itergroupobject *igo) +{ + groupbyobject *gbo = (groupbyobject *)igo->parent; + PyObject *value, *newkey; + int rcmp; + + if (gbo->oldvalue != NULL) { + value = gbo->oldvalue; + gbo->oldvalue = NULL; + } else { + value = PyIter_Next(gbo->it); + if (value == NULL) + return NULL; + } + + newkey = PyObject_CallFunctionObjArgs(gbo->key, value, NULL); + if (newkey == NULL) { + /* throw the value away because it may fail on next iteration + * trial again. */ + Py_DECREF(value); + return NULL; + } + + if (gbo->oldkey == NULL) { + gbo->oldkey = newkey; + return value; + } else if (PyObject_Cmp(gbo->oldkey, newkey, &rcmp) == -1) { + Py_DECREF(newkey); + return NULL; + } + + if (rcmp == 0) { + Py_DECREF(newkey); + return value; + } else { + Py_DECREF(gbo->oldkey); + gbo->oldkey = newkey; + gbo->oldvalue = value; + return NULL; + } +} + +static PyTypeObject _itergroup_type = { + PyObject_HEAD_INIT(NULL) + 0, /* ob_size */ + "itertools._itergroup", /* tp_name */ + sizeof(_itergroupobject), /* tp_basicsize */ + 0, /* tp_itemsize */ + /* methods */ + (destructor)_itergroup_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + PyObject_GenericGetAttr, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT, /* tp_flags */ + 0, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + PyObject_SelfIter, /* tp_iter */ + (iternextfunc)_itergroup_next, /* tp_iternext */ + 0, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + 0, /* tp_new */ + _PyObject_Del, /* tp_free */ +}; + + /* module level code ********************************************************/ PyDoc_STRVAR(module_doc, @@ -2103,6 +2369,7 @@ chain(p, q, ...) --> p0, p1, ... plast, q0, q1, ... \n\ takewhile(pred, seq) --> seq[0], seq[1], until pred fails\n\ dropwhile(pred, seq) --> seq[n], seq[n+1], starting when pred fails\n\ +groupby(key, iterable) --> iterates iteraters by group\n\ "); @@ -2130,6 +2397,7 @@ &count_type, &izip_type, &repeat_type, + &groupby_type, NULL }; From python at rcn.com Fri Nov 28 16:14:40 2003 From: python at rcn.com (Raymond Hettinger) Date: Fri Nov 28 16:15:22 2003 Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern() In-Reply-To: <20031128194959.GA4886@nl.linux.org> Message-ID: <001801c3b5f4$a45cd1e0$e841fea9@oemcomputer> > I would like to nominate input() also. It is often misused by beginners. > A better choice is almost always raw_input(). In the standard library, > fpformat.py seems to be the only one using it. Further, I see > Demo/classes/Dbm.py uses it, but that seems to be all. How about > banishing input() too? I won't name names, but input() has a very important friend who happens to be a dictator, the author of the tutorial, and the creator of a well thought out programming language. The risks are clearly documented. So no one can't say they weren't warned. Also, it does have its uses and is friendly to beginning programmers who don't enjoy having to coerce strings back to the data type they actually wanted. Also, it is somewhat nice to be able enter expressions in personal, interactive scripts. all-builtins-have-at-least-one-friend, Raymond Hettinger From guido at python.org Fri Nov 28 16:42:04 2003 From: guido at python.org (Guido van Rossum) Date: Fri Nov 28 16:42:37 2003 Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern() In-Reply-To: Your message of "Fri, 28 Nov 2003 20:49:59 +0100." <20031128194959.GA4886@nl.linux.org> References: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer> <20031128194959.GA4886@nl.linux.org> Message-ID: <200311282142.hASLg4p17337@c-24-5-183-134.client.comcast.net> > I would like to nominate input() also. It is often misused by beginners. I've seen many programming texts for real beginners that use it -- it's handy to be able to read numbers before you have explained strings or how to parse them. So I say let's be kind on input(). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Nov 28 16:46:53 2003 From: guido at python.org (Guido van Rossum) Date: Fri Nov 28 16:47:01 2003 Subject: [Python-Dev] "groupby" iterator In-Reply-To: Your message of "Sat, 29 Nov 2003 04:54:20 +0900." <20031128195420.GA63319@i18n.org> References: <200311272149.hARLneU16201@c-24-5-183-134.client.comcast.net> <20031128195420.GA63319@i18n.org> Message-ID: <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net> > Here's yet another implementation for itertoolsmodule.c. (see > attachment) I wrote it after the shower (really!) :) Wow! Thanks. Let's all remember to take or showers and maybe Python will become the cleanest programming language. :) Raymond, what do you think? I would make one change: after looking at another use case, I'd like to change the outer iterator to produce (key, grouper) tuples. This way, you can write things like totals = {} for key, group in sequence: totals[key] = sum(group) --Guido van Rossum (home page: http://www.python.org/~guido/) From python at rcn.com Fri Nov 28 18:24:30 2003 From: python at rcn.com (Raymond Hettinger) Date: Fri Nov 28 18:25:14 2003 Subject: [Python-Dev] "groupby" iterator In-Reply-To: <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net> Message-ID: <002701c3b606$c61304a0$e841fea9@oemcomputer> > > Here's yet another implementation for itertoolsmodule.c. (see > > attachment) I wrote it after the shower (really!) :) > > Wow! Thanks. Let's all remember to take or showers and maybe Python > will become the cleanest programming language. :) > > Raymond, what do you think? Yes. I recommend taking showers on a regular basis ;-) I'll experiment with groupby() for a few more days and see how it feels. The first impression is that it meets all the criteria for becoming an itertool (iters in, iters out; no unexpected memory use; works well with other tools; not readily constructed from existing tools). At first, the tool seems more special purpose than general purpose. OTOH, it is an excellent solution to a specific class of problems and it makes code much cleaner by avoiding the repeated code block in the non-iterator version. > I would make one change: after looking at another use case, I'd like > to change the outer iterator to produce (key, grouper) tuples. This > way, you can write things like > > totals = {} > for key, group in sequence: > totals[key] = sum(group) This is a much stronger formulation than the original. It is clear, succinct, expressive, and less error prone. The implementation would be more complex than the original. If the group is ignored, the outer iterator needs to be smart enough to read through the input iterator until the next group is encountered: >>> names = ['Tim D', 'Jack D', 'Jack J', 'Barry W', 'Tim P'] >>> firstname = lambda n: n.split()[0] >>> names.sort() >>> unique_first_names = [first for first, _ in groupby(firstname, names)] ['Barry' , 'Jack', 'Tim'] In experimenting with groupby(), I am starting to see a need for a high speed data extractor function. This need is common to several tools that take function arguments (like list.sort(key=)). While extractor functions can be arbitrarily complex, many only fetch a specific attribute or element number. Alex's high-speed curry suggests that it is possible to create a function maker for fast lookups: students.sort(key=extract('grade')) # key=lambda r:r.grade students.sort(key=extract(2)) # key=lambda r:[2] Raymond From guido at python.org Fri Nov 28 18:41:58 2003 From: guido at python.org (Guido van Rossum) Date: Fri Nov 28 18:42:09 2003 Subject: [Python-Dev] "groupby" iterator In-Reply-To: Your message of "Fri, 28 Nov 2003 18:24:30 EST." <002701c3b606$c61304a0$e841fea9@oemcomputer> References: <002701c3b606$c61304a0$e841fea9@oemcomputer> Message-ID: <200311282341.hASNfwE17612@c-24-5-183-134.client.comcast.net> > Yes. I recommend taking showers on a regular basis ;-) Jack Jansen wants me to add: especially right after riding your bicycle to work. And my boss will agree. (Enough for in-jokes that no-one will get. :-) > I'll experiment with groupby() for a few more days and see how it > feels. The first impression is that it meets all the criteria for > becoming an itertool (iters in, iters out; no unexpected memory use; > works well with other tools; not readily constructed from existing > tools). Right. > At first, the tool seems more special purpose than general purpose. > OTOH, it is an excellent solution to a specific class of problems and it > makes code much cleaner by avoiding the repeated code block in the > non-iterator version. > > > > I would make one change: after looking at another use case, I'd like > > to change the outer iterator to produce (key, grouper) tuples. This > > way, you can write things like > > > > totals = {} > > for key, group in sequence: > > totals[key] = sum(group) Oops, there's a mistake. I meant to say: totals = {} for key, group in groupby(keyfunc, sequence): totals[key] = sum(group) > This is a much stronger formulation than the original. It is clear, > succinct, expressive, and less error prone. I'm not sure to what extent this praise was inspired by my mistake of leaving out the groupby() call. > The implementation would be more complex than the original. To the contrary. It was a microscopic change to either of the Python versions I posted, because the key to be returned is always available at exactly the right time. > If the > group is ignored, the outer iterator needs to be smart enough to read > through the input iterator until the next group is encountered: > > >>> names = ['Tim D', 'Jack D', 'Jack J', 'Barry W', 'Tim P'] > >>> firstname = lambda n: n.split()[0] > >>> names.sort() > >>> unique_first_names = [first for first, _ in groupby(firstname, > names)] > ['Barry' , 'Jack', 'Tim'] I don't think those semantics should be implemented. You should be required to iterate through each group. I was just thinking that returning the key might save the caller cumbersome logic if the key is needed but the inner iterator is also needed. The sum-by-group example would become much uglier: totals = {} for group in groupby(keyfunc, sequence): first = group.next() key = keyfunc(first) totals[key] = first + sum(group, 0) > In experimenting with groupby(), I am starting to see a need for a high > speed data extractor function. This need is common to several tools > that take function arguments (like list.sort(key=)). Exactly: it was definitely inspired by list.sort(key=). > While extractor > functions can be arbitrarily complex, many only fetch a specific > attribute or element number. Alex's high-speed curry suggests that it > is possible to create a function maker for fast lookups: > > students.sort(key=extract('grade')) # key=lambda r:r.grade > students.sort(key=extract(2)) # key=lambda r:[2] Perhaps we could do this by changing list.sort() and groupby() to take a string or int as first argument to mean exactly this. For the string case I had thought of this already (in my second shower today :-); the int case makes sense too. (Though it may weaken my objection against property('foo') in a different thread. :-) But I recommend holding off on this -- the "pure" groupby() has enough merit without speed hacks, and I find the clarity it provides more important than possible speed gains. I expect that the original, ugly code is usually faster, but in the cases where I've needed this I don't care: either the sequence isn't all that long, or the program doesn't run all that frequently, or it does so much other stuff that the speed gain would be drowned in the noise. --Guido van Rossum (home page: http://www.python.org/~guido/) From anthony at ekit-inc.com Fri Nov 28 22:44:17 2003 From: anthony at ekit-inc.com (Anthony Baxter) Date: Fri Nov 28 22:44:36 2003 Subject: [Python-Dev] minor interruption to service. Message-ID: <200311290344.hAT3iHF5013034@maxim.off.ekorp.com> I'm going to be pretty much offline for a week or so - we got burgled the other night while we were asleep and my laptop was stolen. The data's backed up, but it'll be a few days til the replacement laptop arrives. In the meantime, if someone wants to take on the "upgrade to autoconf 2.59" task, I'd appreciate it very much. thanks, Anthony From anthony at ekit-inc.com Fri Nov 28 23:25:18 2003 From: anthony at ekit-inc.com (Anthony Baxter) Date: Fri Nov 28 23:25:36 2003 Subject: [Python-Dev] test_mimetools failure when hostname unknown. Message-ID: <200311290425.hAT4PIrO013499@maxim.off.ekorp.com> If you have a machine who's local hostname is just something you've set, and there's no matching entry in /etc/hosts, test_mimetools fails with test test_mimetools failed -- Traceback (most recent call last): File "/home/anthony/src/py/23maint/Lib/test/test_mimetools.py", line 30, in test_boundary nb = mimetools.choose_boundary() File "/home/anthony/src/py/23maint/Lib/mimetools.py", line 130, in choose_boundary hostid = socket.gethostbyname(socket.gethostname()) gaierror: (-2, 'Name or service not known') This seems, to me, to be a bit bogus - should we just, in this case, have some sensible default (maybe just use the hostname, or 127.0.0.1) And yes, I know this is not strictly a python bug, but it just popped up while I was building a new system up. Anthony From guido at python.org Fri Nov 28 23:32:13 2003 From: guido at python.org (Guido van Rossum) Date: Fri Nov 28 23:32:26 2003 Subject: [Python-Dev] test_mimetools failure when hostname unknown. In-Reply-To: Your message of "Sat, 29 Nov 2003 15:25:18 +1100." <200311290425.hAT4PIrO013499@maxim.off.ekorp.com> References: <200311290425.hAT4PIrO013499@maxim.off.ekorp.com> Message-ID: <200311290432.hAT4WEF17810@c-24-5-183-134.client.comcast.net> > If you have a machine who's local hostname is just something you've > set, and there's no matching entry in /etc/hosts, test_mimetools fails > with > test test_mimetools failed -- Traceback (most recent call last): > File "/home/anthony/src/py/23maint/Lib/test/test_mimetools.py", line 30, in test_boundary > nb = mimetools.choose_boundary() > File "/home/anthony/src/py/23maint/Lib/mimetools.py", line 130, in choose_boundary > hostid = socket.gethostbyname(socket.gethostname()) > gaierror: (-2, 'Name or service not known') > > This seems, to me, to be a bit bogus - should we just, in this case, > have some sensible default (maybe just use the hostname, or 127.0.0.1) > > And yes, I know this is not strictly a python bug, but it just popped > up while I was building a new system up. Yeah, in general all tests that use gethostname() are subject to various kinds of errors like this. It really shouldn't be used in the test suite at all. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at comcast.net Sat Nov 29 00:34:01 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Nov 29 00:34:05 2003 Subject: [Python-Dev] "groupby" iterator In-Reply-To: <200311271730.hARHUXg15777@c-24-5-183-134.client.comcast.net> Message-ID: [Guido, on grouping elements of a sequence by key] > ... > Or is there a more elegant approach than my original code that I've > missed all these years? I've always done it like: d = {} for x in sequence: d.setdefault(key(x), []).append(x) # Now d has partitioned sequence by key. The keys are # available as d.keys(), the associated groups as d.values(). # So, e.g., for key, group in d.iteritems(): d[key] = sum(group) There's no code duplication, or warts for an empty sequence, which are the ugly parts of the non-dict approach. It doesn't matter here whether the elements orginally appear with equal keys all adjacent, and input often isn't sorted that way. When it isn't, not needing to sort first can be a major time savings if the sequence is big. Against it, a dict is a large data structure. I don't think it's ever been a real problem that it requires keys to be hashable. groupby() looks very nice when it applies. > ... > totals = {} > for key, group in groupby(keyfunc, sequence): > totals[key] = sum(group) Or totals = dict((key, sum(group)) for key, group in groupby(keyfunc, sequence)) exploiting generator expressions too. [after Raymond wonders about cases where the consumer doesn't iterate over the group generators ] > I don't think those semantics should be implemented. You should be > required to iterate through each group. Brrrr. Sounds error-prone (hard to explain, and impossible to enforce unless the implementation does almost all the work it would need to allow groups to get skipped -- if the implementation can detect that a group hasn't been fully iterated, then it could almost as easily go on to skip over remaining equal keys itself instead of whining about it; but if the implementation can't detect it, accidental violations of the requirement will be hard to track down). You're a security guy now. You've got a log with line records of the form month day hhmmss severity_level threat_id It's sorted ascending by month then desceding by severity_level. You want a report of the top 10 threats seen each month. for month, lines in groupby(lamdba s: s.split()[0], input_file): print month print itertools.islice(lines, 10) Like array[:10], islice() does the right thing if there are fewer than 10 lines in a month. It's just not natural to require that an iterator be run to exhaustion (if it *were* natural, this wouldn't be the first context ever to require it ). From tim.one at comcast.net Sat Nov 29 00:41:13 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Nov 29 00:41:17 2003 Subject: [Python-Dev] Tail recursion In-Reply-To: <200311281800.hASI0CW17161@c-24-5-183-134.client.comcast.net> Message-ID: [Tim] >> + - * // % ** pow and divmod on integers in Python will either >> deliver an exact result or raise an exception (like MemoryError if >> malloc() can't find enough space to hold an intermediate result). [Guido] > Except for ** if the exponent is negative. Yup, and I do keep forgetting that -- it's just an accident due to that we're stilling using floats to approximate rationals . From guido at python.org Sat Nov 29 00:50:56 2003 From: guido at python.org (Guido van Rossum) Date: Sat Nov 29 00:51:08 2003 Subject: [Python-Dev] "groupby" iterator In-Reply-To: Your message of "Sat, 29 Nov 2003 00:34:01 EST." References: Message-ID: <200311290550.hAT5ouK17896@c-24-5-183-134.client.comcast.net> > I've always done it like: > > d = {} > for x in sequence: > d.setdefault(key(x), []).append(x) > # Now d has partitioned sequence by key. The keys are > # available as d.keys(), the associated groups as d.values(). > # So, e.g., > for key, group in d.iteritems(): > d[key] = sum(group) > > There's no code duplication, or warts for an empty sequence, which are the > ugly parts of the non-dict approach. It doesn't matter here whether the > elements orginally appear with equal keys all adjacent, and input often > isn't sorted that way. When it isn't, not needing to sort first can be a > major time savings if the sequence is big. Against it, a dict is a large > data structure. I don't think it's ever been a real problem that it > requires keys to be hashable. The major downside of this is that this keeps everything in memory. When that's acceptable, it's a great approach (especially because it doesn't require sorting). But often you really want to be able to handle input of arbitrary size. For example, suppose you are given a file with some kind of records, timestamped and maintained in chronological order (e.g. a log file -- perfect example of data that won't fit in memory and is already sorted). You're supposed to output this for printing, while inserting a header at the start of each day and a footer at the end of each day with various counts or totals per day. > groupby() looks very nice when it applies. Right. :-) > > ... > > totals = {} > > for key, group in groupby(keyfunc, sequence): > > totals[key] = sum(group) > > Or > > totals = dict((key, sum(group)) > for key, group in groupby(keyfunc, sequence)) > > exploiting generator expressions too. Nice. When can we get these? :-) > [after Raymond wonders about cases where the consumer doesn't > iterate over the group generators > ] > > > I don't think those semantics should be implemented. You should be > > required to iterate through each group. > > Brrrr. Sounds error-prone (hard to explain, and impossible to enforce > unless the implementation does almost all the work it would need to allow > groups to get skipped -- if the implementation can detect that a group > hasn't been fully iterated, then it could almost as easily go on to skip > over remaining equal keys itself instead of whining about it; but if the > implementation can't detect it, accidental violations of the requirement > will be hard to track down). I take it back after seeing Raymond's implementation -- it's simple enough to make sure that each group is exhausted before starting the next group, and this is clearly the "natural" semantics. --Guido van Rossum (home page: http://www.python.org/~guido/) From python at rcn.com Sat Nov 29 01:12:34 2003 From: python at rcn.com (Raymond Hettinger) Date: Sat Nov 29 01:13:12 2003 Subject: [Python-Dev] "groupby" iterator In-Reply-To: <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net> Message-ID: <000101c3b63f$c7fc4720$e841fea9@oemcomputer> [Guido] > I would make one change: after looking at another use case, I'd like > to change the outer iterator to produce (key, grouper) tuples. This > way, you can write things like > > totals = {} > for key, group in sequence: > totals[key] = sum(group) Here is an implementation that translates readily into C. It uses Guido's syntax and meets my requirement that bad things don't happen when someone runs the outer iterator independently of the inner iterator. class groupby(object): __slots__ = ('keyfunc', 'it', 'tgtkey', 'currkey', 'currvalue') def __init__(self, key, iterable): NULL = 1+909.9j # In C, use the real NULL self.keyfunc = key self.it = iter(iterable) self.tgtkey = NULL self.currkey = NULL self.currvalue = NULL def __iter__(self): return self def next(self): while self.currkey == self.tgtkey: self.currvalue = self.it.next() # Exit on StopIteration self.currkey = self.keyfunc(self.currvalue) self.tgtkey = self.currkey return (self.currkey, self._grouper(self.currkey)) def _grouper(self, tgtkey): while self.currkey == tgtkey: yield self.currvalue self.currvalue = self.it.next() # Exit on StopIteration self.currkey = self.keyfunc(self.currvalue) import unittest class TestBasicOps(unittest.TestCase): def test_groupby(self): # Check zero length input self.assertEqual([], list(groupby(lambda r:r[0], []))) # Check normal input s = [(0, 10, 20), (0, 11,21), (0,12,21), (1,13,21), (1,14,22), (2,15,22), (3,16,23), (3,17,23)] dup = [] for k, g in groupby(lambda r:r[0], s): for elem in g: self.assertEqual(k, elem[0]) dup.append(elem) self.assertEqual(s, dup) # Check nested case dup = [] for k, g in groupby(lambda r:r[0], s): for ik, ig in groupby(lambda r:r[2], g): for elem in ig: self.assertEqual(k, elem[0]) self.assertEqual(ik, elem[2]) dup.append(elem) self.assertEqual(s, dup) # Check case where inner iterator is not used keys = [] for k, g in groupby(lambda r:r[0], s): keys.append(k) expectedkeys = set([r[0] for r in s]) self.assertEqual(set(keys), expectedkeys) self.assertEqual(len(keys), len(expectedkeys)) suite = unittest.TestSuite() suite.addTest(unittest.makeSuite(TestBasicOps)) unittest.TextTestRunner(verbosity=2).run(suite) Raymond From python at rcn.com Sat Nov 29 01:31:45 2003 From: python at rcn.com (Raymond Hettinger) Date: Sat Nov 29 01:32:22 2003 Subject: [Python-Dev] genexps Was: "groupby" iterator In-Reply-To: <200311290550.hAT5ouK17896@c-24-5-183-134.client.comcast.net> Message-ID: <000201c3b642$75b07060$e841fea9@oemcomputer> > > totals = dict((key, sum(group)) > > for key, group in groupby(keyfunc, sequence)) > > > > exploiting generator expressions too. > > Nice. When can we get these? :-) Unless someone in the know volunteers, it will need to wait until Christmas vacation. Currently, the implementation is beyond my skill level. It will take a while raise my skills to cover adding new syntax and what to do in the compiler. Raymond From aleax at aleax.it Sat Nov 29 02:03:22 2003 From: aleax at aleax.it (Alex Martelli) Date: Sat Nov 29 02:03:28 2003 Subject: [Python-Dev] "groupby" iterator In-Reply-To: <200311282341.hASNfwE17612@c-24-5-183-134.client.comcast.net> References: <002701c3b606$c61304a0$e841fea9@oemcomputer> <200311282341.hASNfwE17612@c-24-5-183-134.client.comcast.net> Message-ID: <200311290803.22730.aleax@aleax.it> On Saturday 29 November 2003 12:41 am, Guido van Rossum wrote: ... > > > totals = {} > > > for key, group in sequence: > > > totals[key] = sum(group) > > Oops, there's a mistake. I meant to say: > > totals = {} > for key, group in groupby(keyfunc, sequence): > totals[key] = sum(group) > > > This is a much stronger formulation than the original. It is clear, > > succinct, expressive, and less error prone. > > I'm not sure to what extent this praise was inspired by my mistake of > leaving out the groupby() call. Can't answer for RH, but, to me, the groupby call looks just fine. However, one cosmetic suggestion: for analogy with list.sorted, why not let the call be spelled as groupby(sequence, key=keyfunc) ? I realize most itertools take a callable _first_, while, to be able to name the key-extractor this way, it would have to go second. I still think it would be nicer, partly because while sequence could not possibly default, key _could_ -- and its one obvious default is to an identity (lambda x: x). This would let elimination and/or counting of adjacent duplicates be expressed smoothly (for counting, it would help to have an ilen that gives the length of a finite iterable argument, but worst case one can substitute def ilen(it): for i, _ in enumerate(it): pass return i+1 or its inline equivalent). Naming the function 'grouped' rather than 'groupby' would probably be better if the callable was the second arg rather than the first. > > >>> names = ['Tim D', 'Jack D', 'Jack J', 'Barry W', 'Tim P'] > > >>> firstname = lambda n: n.split()[0] > > >>> names.sort() > > >>> unique_first_names = [first for first, _ in groupby(firstname, > > names)] > > ['Barry' , 'Jack', 'Tim'] > > I don't think those semantics should be implemented. You should be > required to iterate through each group. I was just thinking that Right, so basically it would have to be nested like: ufn = [ f for g in groupby(firstname, names) for f, _ in g ] > > In experimenting with groupby(), I am starting to see a need for a high > > speed data extractor function. This need is common to several tools > > that take function arguments (like list.sort(key=)). > > Exactly: it was definitely inspired by list.sort(key=). That's part of why I'd love to be able to spell key= for this iterator too. > > While extractor > > functions can be arbitrarily complex, many only fetch a specific > > attribute or element number. Alex's high-speed curry suggests that it > > is possible to create a function maker for fast lookups: > > > > students.sort(key=extract('grade')) # key=lambda r:r.grade > > students.sort(key=extract(2)) # key=lambda r:[2] > > Perhaps we could do this by changing list.sort() and groupby() to take > a string or int as first argument to mean exactly this. For the It seems to be that this would be specialcasing things while an extract function might help in other contexts as well. E.g., itertools has several other iterators that take a callable and might use this. > But I recommend holding off on this -- the "pure" groupby() has enough > merit without speed hacks, and I find the clarity it provides more > important than possible speed gains. I expect that the original, ugly I agree that the case for extract is separate from that for groupby (although the latter does increase the attractiveness of the former). Alex From skumar at datec-systems.com Sat Nov 29 02:47:24 2003 From: skumar at datec-systems.com (sa) Date: Sat Nov 29 02:47:26 2003 Subject: [Python-Dev] Telnet server Message-ID: <001a01c3b64d$07660790$1501a8c0@datec21> Hi all, I want to develop a thin telnet server using the curses library .First of all is this possible ? The box on which I want to develop this, does not provide a shell as it runs customized embedded linux so i want to write a telnet server on the box which presents the telnet client, a curses kinda intrface after he gets past the login prompt authentication. I can execute my python scripts on this box without any problems . Any pointers ? Thanks, -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20031129/f53e14f8/attachment-0001.html From python at rcn.com Sat Nov 29 03:26:38 2003 From: python at rcn.com (Raymond Hettinger) Date: Sat Nov 29 03:27:24 2003 Subject: [Python-Dev] "groupby" iterator In-Reply-To: <200311290803.22730.aleax@aleax.it> Message-ID: <001801c3b652$8534a820$e841fea9@oemcomputer> [Alex] > However, one cosmetic suggestion: for analogy with list.sorted, why > not let the call be spelled as > groupby(sequence, key=keyfunc) > ? > > I realize most itertools take a callable _first_, while, to be able to > name the key-extractor this way, it would have to go second. I still > think it would be nicer, partly because while sequence could not > possibly default, key _could_ -- and its one obvious default is to an > identity (lambda x: x). This would let elimination and/or counting of > adjacent duplicates be expressed smoothly (for counting, it would > help to have an ilen that gives the length of a finite iterable argument, > but worst case one can substitute > def ilen(it): > for i, _ in enumerate(it): pass > return i+1 > or its inline equivalent). Though the argument order makes my stomach churn, the identity function default is quite nice: >>> s = 'abracadabra; >>> # sort s | uniq >>> [k for k, g in groupby(list.sorted(s))] ['a', 'b', 'c', 'd', 'r'] >>> # sort s | uniq -d >>> [k for k, g in groupby(list.sorted('abracadabra')) if ilen(g)>1] ['a', 'b', 'r'] >>> # sort s | uniq -c >>> [(ilen(g), k) for k, g in groupby(list.sorted(s))] [(5, 'a'), (2, 'b'), (1, 'c'), (1, 'd'), (2, 'r')] >>> sort s | uniq -c | sort -rn | head -3 >>> list.sorted([(ilen(g), k) for k, g in groupby(list.sorted(s))], reverse=True)[:3] [(5, 'a'), (2, 'r'), (2, 'b')] > > > While extractor > > > functions can be arbitrarily complex, many only fetch a specific > > > attribute or element number. Alex's high-speed curry suggests that it > > > is possible to create a function maker for fast lookups: > > > > > > students.sort(key=extract('grade')) # key=lambda r:r.grade > > > students.sort(key=extract(2)) # key=lambda r:[2] > > > > Perhaps we could do this by changing list.sort() and groupby() to take > > a string or int as first argument to mean exactly this. For the > > It seems to be that this would be specialcasing things while an extract > function might help in other contexts as well. E.g., itertools has > several > other iterators that take a callable and might use this. > > > But I recommend holding off on this -- the "pure" groupby() has enough > > merit without speed hacks, and I find the clarity it provides more > > important than possible speed gains. I expect that the original, ugly > > I agree that the case for extract is separate from that for groupby > (although > the latter does increase the attractiveness of the former). Yes, it's clearly a separate issue (and icing on the cake). I was thinking extract() would be a nice addition to the operator module where everything is basically a lambda evading speed hack for accessing intrinsic operations: operator.add = lambda x,y: x+y Raymond From barry at barrys-emacs.org Sat Nov 29 09:50:38 2003 From: barry at barrys-emacs.org (Barry Scott) Date: Sat Nov 29 09:50:42 2003 Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern() In-Reply-To: <20031128194959.GA4886@nl.linux.org> References: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer> <20031128194959.GA4886@nl.linux.org> Message-ID: <6.0.1.1.2.20031129144050.02304ef8@torment.chelsea.private> At 28-11-2003 19:49, you wrote: >Raymond Hettinger wrote: > > Date: Tue, 25 Nov 2003 07:26:15 +0100 > > > After re-reading previous posts on the subject, I had an idea. Let's > > isolate these functions in the documentation into a separate section > > following the rest of the builtins. Is the `expr` worth banishing? I've never used it myself because of the chance of misreading `expr` vs. 'expr'. Isn't it a hard to read str()? Note: I tried to find it in the language reference and its not in the index but then neither is %. Barry From gerrit at nl.linux.org Sat Nov 29 10:32:50 2003 From: gerrit at nl.linux.org (Gerrit Holl) Date: Sat Nov 29 10:33:18 2003 Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern() In-Reply-To: <6.0.1.1.2.20031129144050.02304ef8@torment.chelsea.private> References: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer> <20031128194959.GA4886@nl.linux.org> <6.0.1.1.2.20031129144050.02304ef8@torment.chelsea.private> Message-ID: <20031129153250.GA8274@nl.linux.org> Barry Scott wrote: > Is the `expr` worth banishing? I've never used it myself > because of the chance of misreading `expr` vs. 'expr'. > Isn't it a hard to read str()? It's a hard-to-read repr(), actually. Guido once published a list of Python regrets, which can be found at: http://www.python.org/doc/essays/ppt/regrets/PythonRegrets.pdf At page 5, it suggests to drop `...` for repr(...), so unless Guido changed his mind (I don't think so), this is a deprecation-canddate as well: as is callable() and input(), by the way. yours, Gerrit. -- 147. If she have not borne him children, then her mistress may sell her for money. -- 1780 BC, Hammurabi, Code of Law -- Asperger's Syndrome - a personal approach: http://people.nl.linux.org/~gerrit/english/ From gerrit at nl.linux.org Sat Nov 29 10:37:12 2003 From: gerrit at nl.linux.org (Gerrit Holl) Date: Sat Nov 29 10:37:36 2003 Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern() In-Reply-To: <200311282142.hASLg4p17337@c-24-5-183-134.client.comcast.net> References: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer> <20031128194959.GA4886@nl.linux.org> <200311282142.hASLg4p17337@c-24-5-183-134.client.comcast.net> Message-ID: <20031129153712.GB8274@nl.linux.org> [Gerrit] > > I would like to nominate input() also. It is often misused by beginners. [Guido van Rossum] > So I say let's be kind on input(). Fine with me :) But... at [0], raw_input() and input() are mentioned as minor regrets, as functions which should actually not have been builtins. Have you now changed your mind, or did I misinterpret [0], or is it something else? [0] http://www.python.org/doc/essays/ppt/regrets/PythonRegrets.pdf yours, Gerrit. -- 134. If any one be captured in war and there is not sustenance in his house, if then his wife go to another house this woman shall be held blameless. -- 1780 BC, Hammurabi, Code of Law -- Asperger's Syndrome - a personal approach: http://people.nl.linux.org/~gerrit/english/ From guido at python.org Sat Nov 29 13:10:58 2003 From: guido at python.org (Guido van Rossum) Date: Sat Nov 29 13:11:06 2003 Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern() In-Reply-To: Your message of "Sat, 29 Nov 2003 14:50:38 GMT." <6.0.1.1.2.20031129144050.02304ef8@torment.chelsea.private> References: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer> <20031128194959.GA4886@nl.linux.org> <6.0.1.1.2.20031129144050.02304ef8@torment.chelsea.private> Message-ID: <200311291810.hATIAwc18636@c-24-5-183-134.client.comcast.net> > Is the `expr` worth banishing? I've never used it myself > because of the chance of misreading `expr` vs. 'expr'. > Isn't it a hard to read str()? Yes, backticks will be gone in 3.0. But I expect there's no hope of getting rid of them earlier -- they've been used too much. I suspect that even putting in a deprecation warning would be too much. (Maybe a silent deprecation could work.) So maybe these could be added to the list of language features moved to a "doomed" section. > Note: I tried to find it in the language reference and its not in the index > but then neither is %. I think none of the operators are in the index of the reference manual. I don't know how to resolve this; indexing non-alphanumeric characters may not be easy in LaTeX, I don't know. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sat Nov 29 13:17:00 2003 From: guido at python.org (Guido van Rossum) Date: Sat Nov 29 13:17:39 2003 Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern() In-Reply-To: Your message of "Sat, 29 Nov 2003 16:37:12 +0100." <20031129153712.GB8274@nl.linux.org> References: <004d01c3b31c$c1d7afe0$e804a044@oemcomputer> <20031128194959.GA4886@nl.linux.org> <200311282142.hASLg4p17337@c-24-5-183-134.client.comcast.net> <20031129153712.GB8274@nl.linux.org> Message-ID: <200311291817.hATIH0n18684@c-24-5-183-134.client.comcast.net> > But... at [0], raw_input() and input() are mentioned as minor regrets, > as functions which should actually not have been builtins. Have you now > changed your mind, or did I misinterpret [0], or is it something else? > > [0] http://www.python.org/doc/essays/ppt/regrets/PythonRegrets.pdf Note that the regrets were minor. :-) The problem is that these are almost never used in real programs; real programs use sys.stdin.readline() so they can properly handle EOF. But their main use, teaching Python to beginners without having to expose the whole language first, requires either that they are built in or that the teacher sets up a special environment for their students. For the latter, a PYTHONSTARTUP variable pointing to a file with teachers' additions does nicely, but requires a level of control over the student's environment that's not always realistic. (Especially not when the student is teaching herself. :-) Perhaps a special module of teacher's helpers could be devised, and a special Python invocation to include that automatically? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sat Nov 29 13:18:58 2003 From: guido at python.org (Guido van Rossum) Date: Sat Nov 29 13:19:09 2003 Subject: [Python-Dev] "groupby" iterator In-Reply-To: Your message of "Sat, 29 Nov 2003 03:26:38 EST." <001801c3b652$8534a820$e841fea9@oemcomputer> References: <001801c3b652$8534a820$e841fea9@oemcomputer> Message-ID: <200311291818.hATIIwo18695@c-24-5-183-134.client.comcast.net> Way to go, Raymond. One suggestion: instead of ilen(), I would suggest count(). (Yes, I've been using more SQL lately. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From nas-python at python.ca Sat Nov 29 14:52:35 2003 From: nas-python at python.ca (Neil Schemenauer) Date: Sat Nov 29 14:45:53 2003 Subject: [Python-Dev] genexps Was: "groupby" iterator In-Reply-To: <000201c3b642$75b07060$e841fea9@oemcomputer> References: <200311290550.hAT5ouK17896@c-24-5-183-134.client.comcast.net> <000201c3b642$75b07060$e841fea9@oemcomputer> Message-ID: <20031129195235.GA695@mems-exchange.org> On Sat, Nov 29, 2003 at 01:31:45AM -0500, Raymond Hettinger wrote: > Unless someone in the know volunteers, it will need to wait until > Christmas vacation. Currently, the implementation is beyond my skill > level. It will take a while raise my skills to cover adding new syntax > and what to do in the compiler. I wonder if we should try to finish the new compiler first. Neil From eppstein at ics.uci.edu Sat Nov 29 15:14:14 2003 From: eppstein at ics.uci.edu (David Eppstein) Date: Sat Nov 29 15:14:17 2003 Subject: [Python-Dev] Re: "groupby" iterator References: <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net> <000101c3b63f$c7fc4720$e841fea9@oemcomputer> Message-ID: In article <000101c3b63f$c7fc4720$e841fea9@oemcomputer>, "Raymond Hettinger" wrote: > Here is an implementation that translates readily into C. It uses > Guido's syntax and meets my requirement that bad things don't happen > when someone runs the outer iterator independently of the inner > iterator. If I understand your code correctly, running the outer iterator skips over any uniterated values from the inner iterator. I'd be happier with behavior like tee: the inner groups always return the same sequences of items, whether or not the inner iteration happens before the next outer iteration, but the memory cost is only small if you iterate through them in the expected order. E.g., see the "out of order" unit test in the code below. def identity(x): return x def groupby(iterable,key=identity): it = iter(iterable) first = it.next() while 1: group = bygroup(it,first,key) yield key(first),group first = group.nextgroup() class bygroup: """Iterator of items in a single group.""" def __init__(self, iterable, first, key=identity): """Instance variables: - self.lookahead: reversed list of items still to be output - self.groupid: group identity - self.key: func to turn iterated items into group ids - self.it: iterator, or None once we reach another group - self.postfinal: None (only valid once self.it is None) """ self.key = key self.it = iter(iterable) self.lookahead = [first] self.groupid = self.key(first) def __iter__(self): return self def group(self): return self.groupid def next(self): if self.lookahead: return self.lookahead.pop() if self.it is None: raise StopIteration x = self.it.next() if self.key(x) == self.groupid: return x self.postfinal = x self.it = None raise StopIteration def nextgroup(self): """Return first item of next group. Raises StopIteration if there is no next group.""" if self.it is not None: L = list(self) L.reverse() self.lookahead = L if self.it is not None: raise StopIteration return self.postfinal import unittest from sets import Set as set class TestBasicOps(unittest.TestCase): def test_groupby(self): # Check zero length input self.assertEqual([], list(groupby([],lambda r:r[0]))) # Check normal input s = [(0, 10, 20), (0, 11,21), (0,12,21), (1,13,21), (1,14,22), (2,15,22), (3,16,23), (3,17,23)] dup = [] for k, g in groupby(s, lambda r:r[0]): for elem in g: self.assertEqual(k, elem[0]) dup.append(elem) self.assertEqual(s, dup) # Check case where groups are iterated out of order nest1 = [] for k,g in groupby(s, lambda r:r[0]): nest1.append(list(g)) nest2 = [] for k,g in groupby(s, lambda r:r[0]): nest2.append(g) nest2 = [list(g) for g in nest2] self.assertEqual(nest1,nest2) # Check nested case dup = [] for k, g in groupby(s, lambda r:r[0]): for ik, ig in groupby(g, lambda r:r[2]): for elem in ig: self.assertEqual(k, elem[0]) self.assertEqual(ik, elem[2]) dup.append(elem) self.assertEqual(s, dup) # Check case where inner iterator is not used keys = [] for k, g in groupby(s, lambda r:r[0]): keys.append(k) expectedkeys = set([r[0] for r in s]) self.assertEqual(set(keys), expectedkeys) self.assertEqual(len(keys), len(expectedkeys)) suite = unittest.TestSuite() suite.addTest(unittest.makeSuite(TestBasicOps)) unittest.TextTestRunner(verbosity=2).run(suite) -- David Eppstein http://www.ics.uci.edu/~eppstein/ Univ. of California, Irvine, School of Information & Computer Science From perky at i18n.org Sat Nov 29 17:32:20 2003 From: perky at i18n.org (Hye-Shik Chang) Date: Sat Nov 29 17:32:30 2003 Subject: [Python-Dev] "groupby" iterator In-Reply-To: <000101c3b63f$c7fc4720$e841fea9@oemcomputer> References: <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net> <000101c3b63f$c7fc4720$e841fea9@oemcomputer> Message-ID: <20031129223220.GA90372@i18n.org> On Sat, Nov 29, 2003 at 01:12:34AM -0500, Raymond Hettinger wrote: > [Guido] > > I would make one change: after looking at another use case, I'd like > > to change the outer iterator to produce (key, grouper) tuples. This > > way, you can write things like > > > > totals = {} > > for key, group in groupby(sequence): > > totals[key] = sum(group) Heh. I love that! > > Here is an implementation that translates readily into C. It uses > Guido's syntax and meets my requirement that bad things don't happen > when someone runs the outer iterator independently of the inner > iterator. > I updated my implementation according to your guideline. Please see attachments. Docstrings are still insufficient due to my english shortage. :) Thanks! Regards, Hye-Shik -------------- next part -------------- Index: Modules/itertoolsmodule.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Modules/itertoolsmodule.c,v retrieving revision 1.26 diff -u -u -r1.26 itertoolsmodule.c --- Modules/itertoolsmodule.c 12 Nov 2003 14:32:26 -0000 1.26 +++ Modules/itertoolsmodule.c 29 Nov 2003 22:25:18 -0000 @@ -2081,6 +2081,332 @@ }; +/* groupby object ***********************************************************/ + +typedef struct { + PyObject_HEAD + PyObject *it; + PyObject *keyfunc; + PyObject *tgtkey; + PyObject *currkey; + PyObject *currvalue; +} groupbyobject; + +static PyTypeObject groupby_type; +static PyObject *_grouper_create(groupbyobject *, PyObject *); + +static PyObject * +groupby_new(PyTypeObject *type, PyObject *args, PyObject *kwds) +{ + groupbyobject *gbo; + PyObject *it, *keyfunc; + + if (!PyArg_UnpackTuple(args, "groupby", 2, 2, &keyfunc, &it)) + return NULL; + + if (keyfunc != Py_None && !PyCallable_Check(keyfunc)) { + PyErr_SetString(PyExc_ValueError, + "Key argument must be a callable object or None."); + return NULL; + } + + gbo = (groupbyobject *)type->tp_alloc(type, 0); + if (gbo == NULL) + return NULL; + gbo->tgtkey = NULL; + gbo->currkey = NULL; + gbo->currvalue = NULL; + gbo->keyfunc = keyfunc; + Py_INCREF(keyfunc); + gbo->it = PyObject_GetIter(it); + if (gbo->it == NULL) { + Py_DECREF(gbo); + return NULL; + } + return (PyObject *)gbo; +} + +static void +groupby_dealloc(groupbyobject *gbo) +{ + PyObject_GC_UnTrack(gbo); + Py_XDECREF(gbo->it); + Py_XDECREF(gbo->keyfunc); + Py_XDECREF(gbo->tgtkey); + Py_XDECREF(gbo->currkey); + Py_XDECREF(gbo->currvalue); + gbo->ob_type->tp_free(gbo); +} + +static int +groupby_traverse(groupbyobject *gbo, visitproc visit, void *arg) +{ + int err; + + if (gbo->it) { + err = visit(gbo->it, arg); + if (err) + return err; + } + + if (gbo->keyfunc) { + err = visit(gbo->keyfunc, arg); + if (err) + return err; + } + + if (gbo->tgtkey) { + err = visit(gbo->tgtkey, arg); + if (err) + return err; + } + + if (gbo->currkey) { + err = visit(gbo->currkey, arg); + if (err) + return err; + } + + if (gbo->currvalue) { + err = visit(gbo->currvalue, arg); + if (err) + return err; + } + + return 0; +} + +static PyObject * +groupby_next(groupbyobject *gbo) +{ + PyObject *newvalue, *newkey, *r, *grouper; + int rcmp; + + /* skip to next iteration group */ + for (;;) { + if (gbo->currkey == NULL) + rcmp = 0; + else if (gbo->tgtkey == NULL) + break; + else if (PyObject_Cmp(gbo->tgtkey, gbo->currkey, &rcmp) == -1) + return NULL; + + if (rcmp != 0) + break; + + newvalue = PyIter_Next(gbo->it); + if (newvalue == NULL) + return NULL; + + if (gbo->keyfunc == Py_None) { + newkey = newvalue; + Py_INCREF(newvalue); + } else { + newkey = PyObject_CallFunctionObjArgs(gbo->keyfunc, + newvalue, NULL); + if (newkey == NULL) { + Py_DECREF(newvalue); + return NULL; + } + } + + Py_XDECREF(gbo->currkey); + gbo->currkey = newkey; + Py_XDECREF(gbo->currvalue); + gbo->currvalue = newvalue; + } + + Py_XDECREF(gbo->tgtkey); + gbo->tgtkey = gbo->currkey; + Py_INCREF(gbo->currkey); + + grouper = _grouper_create(gbo, gbo->tgtkey); + if (grouper == NULL) + return NULL; + + r = PyTuple_New(2); + if (r == NULL) + return NULL; + PyTuple_SET_ITEM(r, 0, gbo->tgtkey); + Py_INCREF(gbo->tgtkey); + PyTuple_SET_ITEM(r, 1, grouper); + + return r; +} + +PyDoc_STRVAR(groupby_doc, +"groupby(keyfunc, iterable) -> create an iterator which returns\n\ +(key, sub-iterator) grouped by each value of key(value).\n"); + +static PyTypeObject groupby_type = { + PyObject_HEAD_INIT(NULL) + 0, /* ob_size */ + "itertools.groupby", /* tp_name */ + sizeof(groupbyobject), /* tp_basicsize */ + 0, /* tp_itemsize */ + /* methods */ + (destructor)groupby_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + PyObject_GenericGetAttr, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC | + Py_TPFLAGS_BASETYPE, /* tp_flags */ + groupby_doc, /* tp_doc */ + (traverseproc)groupby_traverse, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + PyObject_SelfIter, /* tp_iter */ + (iternextfunc)groupby_next, /* tp_iternext */ + 0, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + groupby_new, /* tp_new */ + PyObject_GC_Del, /* tp_free */ +}; + + +/* _grouper object (internal) ************************************************/ + +typedef struct { + PyObject_HEAD + PyObject *parent; + PyObject *tgtkey; +} _grouperobject; + +static PyTypeObject _grouper_type; + +static PyObject * +_grouper_create(groupbyobject *parent, PyObject *tgtkey) +{ + _grouperobject *igo; + + igo = PyObject_New(_grouperobject, &_grouper_type); + if (igo == NULL) + return PyErr_NoMemory(); + igo->parent = (PyObject *)parent; + Py_INCREF(parent); + igo->tgtkey = tgtkey; + Py_INCREF(tgtkey); + + return (PyObject *)igo; +} + +static void +_grouper_dealloc(_grouperobject *igo) +{ + Py_DECREF(igo->parent); + Py_DECREF(igo->tgtkey); + PyObject_Del(igo); +} + +static PyObject * +_grouper_next(_grouperobject *igo) +{ + groupbyobject *gbo = (groupbyobject *)igo->parent; + PyObject *newvalue, *newkey, *r; + int rcmp; + + if (gbo->currvalue == NULL) { + newvalue = PyIter_Next(gbo->it); + if (newvalue == NULL) + return NULL; + + if (gbo->keyfunc == Py_None) { + newkey = newvalue; + Py_INCREF(newvalue); + } else { + newkey = PyObject_CallFunctionObjArgs(gbo->keyfunc, + newvalue, NULL); + if (newkey == NULL) { + Py_DECREF(newvalue); + return NULL; + } + } + + assert(gbo->currkey == NULL); + gbo->currkey = newkey; + gbo->currvalue = newvalue; + } + + assert(gbo->currkey != NULL); + if (PyObject_Cmp(igo->tgtkey, gbo->currkey, &rcmp) == -1) + return NULL; + + if (rcmp != 0) + return NULL; + + r = gbo->currvalue; + gbo->currvalue = NULL; + Py_DECREF(gbo->currkey); + gbo->currkey = NULL; + + return r; +} + +static PyTypeObject _grouper_type = { + PyObject_HEAD_INIT(NULL) + 0, /* ob_size */ + "itertools._grouper", /* tp_name */ + sizeof(_grouperobject), /* tp_basicsize */ + 0, /* tp_itemsize */ + /* methods */ + (destructor)_grouper_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_compare */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + PyObject_GenericGetAttr, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT, /* tp_flags */ + 0, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + PyObject_SelfIter, /* tp_iter */ + (iternextfunc)_grouper_next, /* tp_iternext */ + 0, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + 0, /* tp_init */ + 0, /* tp_alloc */ + 0, /* tp_new */ + _PyObject_Del, /* tp_free */ +}; + + /* module level code ********************************************************/ PyDoc_STRVAR(module_doc, @@ -2103,6 +2429,7 @@ chain(p, q, ...) --> p0, p1, ... plast, q0, q1, ... \n\ takewhile(pred, seq) --> seq[0], seq[1], until pred fails\n\ dropwhile(pred, seq) --> seq[n], seq[n+1], starting when pred fails\n\ +groupby(keyfunc, iterable) --> sub-iteraters grouped by value of keyfunc(v)\n\ "); @@ -2130,6 +2457,7 @@ &count_type, &izip_type, &repeat_type, + &groupby_type, NULL }; @@ -2148,5 +2476,6 @@ return; if (PyType_Ready(&tee_type) < 0) return; - + if (PyType_Ready(&_grouper_type) < 0) + return; } -------------- next part -------------- import unittest from itertools import groupby class TestBasicOps(unittest.TestCase): def test_groupby(self): # Check zero length input self.assertEqual([], list(groupby(lambda r:r[0], []))) # Check normal input s = [(0, 10, 20), (0, 11,21), (0,12,21), (1,13,21), (1,14,22), (2,15,22), (3,16,23), (3,17,23)] dup = [] for k, g in groupby(lambda r:r[0], s): for elem in g: self.assertEqual(k, elem[0]) dup.append(elem) self.assertEqual(s, dup) # Check nested case dup = [] for k, g in groupby(lambda r:r[0], s): for ik, ig in groupby(lambda r:r[2], g): for elem in ig: self.assertEqual(k, elem[0]) self.assertEqual(ik, elem[2]) dup.append(elem) self.assertEqual(s, dup) # Check case where inner iterator is not used keys = [k for k, g in groupby(lambda r:r[0], s)] expectedkeys = set([r[0] for r in s]) self.assertEqual(set(keys), expectedkeys) self.assertEqual(len(keys), len(expectedkeys)) # Check case where key is None word = 'abracadabra' keys = [k for k, g in groupby(None, list.sorted(word))] expectedkeys = set(word) self.assertEqual(set(keys), expectedkeys) self.assertEqual(len(keys), len(expectedkeys)) # Exercise pipes and filters style s = 'abracadabra' ilen = lambda it: len(list(it)) # sort s | uniq r = [k for k, g in groupby(None, list.sorted(s))] self.assertEqual(r, ['a', 'b', 'c', 'd', 'r']) # sort s | uniq -d r = [k for k, g in groupby(None, list.sorted(s)) if ilen(g)>1] self.assertEqual(r, ['a', 'b', 'r']) # sort s | uniq -c r = [(ilen(g), k) for k, g in groupby(None, list.sorted(s))] self.assertEqual(r, [(5, 'a'), (2, 'b'), (1, 'c'), (1, 'd'), (2, 'r')]) # sort s | uniq -c | sort -rn | head -3 r = list.sorted([(ilen(g), k) for k, g in groupby(None, list.sorted(s))], reverse=True)[:3] self.assertEqual(r, [(5, 'a'), (2, 'r'), (2, 'b')]) # Uniteratable argument self.assertRaises(TypeError, groupby, None, None) # iter.next failure class ExpectedError(Exception): pass def delayed_raise(n=0): for i in range(n): yield 'yo' raise ExpectedError def gulp(key, iterable, func=list): return [func(g) for k, g in groupby(key, iterable)] # iter.next failure on outer object self.assertRaises(ExpectedError, gulp, None, delayed_raise(0)) # iter.next failure on inner object self.assertRaises(ExpectedError, gulp, None, delayed_raise(1)) # __cmp__ failure class DummyCmp: def __cmp__(self, dst): raise ExpectedError s = [DummyCmp(), DummyCmp(), None] # __cmp__ failure on outer object self.assertRaises(ExpectedError, gulp, None, s, id) # __cmp__ failure on inner object self.assertRaises(ExpectedError, gulp, None, s) # keyfunc failure def keyfunc(obj): if keyfunc.skip > 0: keyfunc.skip -= 1 return obj else: raise ExpectedError # keyfunc failure on outer object keyfunc.skip = 0 self.assertRaises(ExpectedError, gulp, keyfunc, [None]) keyfunc.skip = 1 self.assertRaises(ExpectedError, gulp, keyfunc, [None, None]) suite = unittest.TestSuite() suite.addTest(unittest.makeSuite(TestBasicOps)) unittest.TextTestRunner(verbosity=2).run(suite) From tjreedy at udel.edu Sat Nov 29 18:14:04 2003 From: tjreedy at udel.edu (Terry Reedy) Date: Sat Nov 29 18:14:09 2003 Subject: [Python-Dev] Re: Telnet server References: <001a01c3b64d$07660790$1501a8c0@datec21> Message-ID: cc'ed " I want to develop a thin telnet server using the curses library .First of all is this possible ? The box on which I want to develop this, does not provide a shell as it runs customized embedded linux so i want to write a telnet server on the box which presents the telnet client, a curses kinda intrface after he gets past the login prompt authentication. I can execute my python scripts on this box without any problems . Any pointers ? " 1. Post plain text instead of html. 2. Ask usage questions (like the above) on the main python list or comp.lang.python. Py-dev is for discussion of future-release development issues. TJR From guido at python.org Sat Nov 29 19:06:21 2003 From: guido at python.org (Guido van Rossum) Date: Sat Nov 29 19:06:31 2003 Subject: [Python-Dev] Int FutureWarnings and other 2.4 TODOs In-Reply-To: Your message of "Tue, 25 Nov 2003 21:32:58 +0100." <20031125203258.GA29814@i92.ryd.student.liu.se> References: <200311240434.hAO4Y4L06979@c-24-5-183-134.client.comcast.net> <20031125203258.GA29814@i92.ryd.student.liu.se> Message-ID: <200311300006.hAU06Lp19846@c-24-5-183-134.client.comcast.net> > [Guido van Rossum] > > There's a bunch of FutureWarnings e.g. about 0xffffffff<<1 that > > promise they will disappear in Python 2.4. If anyone has time to > > fix these, I'd appreciate it. (It's not just a matter of removing > > the FutureWarnings -- you actually have to implement the promised > > future behavior. :-) I may get to these myself, but they're not > > exactly rocket science, so they might be a good thing for a > > beginning developer (use SF please if you'd like someone to review > > the changes first). [Kalle Svensson] > I've submitted a patch (http://python.org/sf/849227). And yes, > somebody should probably take a good look at it before applying. The > (modified) test suite does pass on my machine, but that's all. I may > well have forgotten to add tests for new special cases, and I'm not > the most experienced C programmer on the block either. Well, it looks like you got everything right. Congratulations! I've checked your code into CVS. There are now two pieces of PEP 237 unimplemented (apart from the complete and total eradication of long literals, which won't happen until 3.0). (1) PEP 237 promises that after the new semantics are introduced for hex/oct literals and conversions, and left shifts, operations that cause a different result than before will produce a warning that is on by default. Given the pain we've suffered through the warnings in 2.3 about this stuff, I propose to forget about these warnings. The new semantics are clear and consistent, warnings would just cause more distress, and code first ported to 2.3 will already have silenced the warnings. (2) PEP 237 promises that repr() of a long should no longer show a trailing 'L'. This is not yet implemented (i.e., repr() of a long still has a trailing 'L'). First, past experience suggests that quite a bit of end user code will break, and it may easily break silently: there used to be code that did str(x)[:-1] (knowing x was a long) to strip the 'L', which broke when str() of a long no longer returned a trailing 'L'. Apparently some of this code was "fixed" by changing str() into repr(), and this code will now break again. Second, I *like* seeing a trailing L on longs, especially when there's no reason for it to be a long: if some expression returns 1L, I know something fishy may have gone on. Any comments on these? Should I update PEP 237 to reflect this? > As a side note, I think that line 233 in Lib/test/test_format.py > > if sys.maxint == 2**32-1: > > should be > > if sys.maxint == 2**31-1: > > but I didn't include that in the patch or submit a bug report. > Should I? Fixed that too. But somebody might want to backport it to 2.3.3. --Guido van Rossum (home page: http://www.python.org/~guido/) From python at rcn.com Sat Nov 29 19:38:25 2003 From: python at rcn.com (Raymond Hettinger) Date: Sat Nov 29 19:39:05 2003 Subject: [Python-Dev] Int FutureWarnings and other 2.4 TODOs In-Reply-To: <200311300006.hAU06Lp19846@c-24-5-183-134.client.comcast.net> Message-ID: <001601c3b6da$44065fa0$e841fea9@oemcomputer> > (1) PEP 237 promises that after the new semantics are introduced for > hex/oct literals and conversions, and left shifts, operations that > cause a different result than before will produce a warning that > is on by default. Given the pain we've suffered through the > warnings in 2.3 about this stuff, I propose to forget about these > warnings. The new semantics are clear and consistent, warnings > would just cause more distress, and code first ported to 2.3 will > already have silenced the warnings. +1, The warnings cause more pain than they save. Part of the purpose of a warning is to leave you feeling unsettled -- I don't think that is a worthy goal when the code is going to work fine anyway. Let PyChecker or some such warn about prior version compatibility issues like that. > (2) PEP 237 promises that repr() of a long should no longer show a > trailing 'L'. This is not yet implemented (i.e., repr() of a long > still has a trailing 'L'). First, past experience suggests that > quite a bit of end user code will break, and it may easily break > silently: there used to be code that did str(x)[:-1] (knowing x > was a long) to strip the 'L', which broke when str() of a long no > longer returned a trailing 'L'. Apparently some of this code was > "fixed" by changing str() into repr(), and this code will now > break again. Second, I *like* seeing a trailing L on longs, > especially when there's no reason for it to be a long: if some > expression returns 1L, I know something fishy may have gone on. -0, The reasons are good but this one has been promised for several years. It's time for an L free python -- one less thing to have to learn. If there is transition difficultly, let it be a prompt to consider applying the forthcoming Decimal module. If necessary, we could add a debug mode switch for L's to be on or off. By putting it the debug build, we keep people from using it in production code. The purpose is to allow code to be run twice to see if different results are obtained. Also, we can put migration advice in PEP 290 and whatsnew24.tex to grep for indicators like [:-1] on the same line as long() or repr(). > Should I update PEP 237 to reflect this? Yes, that's better than surprising people later. Raymond From guido at python.org Sat Nov 29 19:56:58 2003 From: guido at python.org (Guido van Rossum) Date: Sat Nov 29 19:57:03 2003 Subject: [Python-Dev] Int FutureWarnings and other 2.4 TODOs In-Reply-To: Your message of "Sat, 29 Nov 2003 19:38:25 EST." <001601c3b6da$44065fa0$e841fea9@oemcomputer> References: <001601c3b6da$44065fa0$e841fea9@oemcomputer> Message-ID: <200311300056.hAU0uwA19969@c-24-5-183-134.client.comcast.net> > > (2) PEP 237 promises that repr() of a long should no longer show a > > trailing 'L'. This is not yet implemented (i.e., repr() of a long > > still has a trailing 'L'). First, past experience suggests that > > quite a bit of end user code will break, and it may easily break > > silently: there used to be code that did str(x)[:-1] (knowing x > > was a long) to strip the 'L', which broke when str() of a long no > > longer returned a trailing 'L'. Apparently some of this code was > > "fixed" by changing str() into repr(), and this code will now > > break again. Second, I *like* seeing a trailing L on longs, > > especially when there's no reason for it to be a long: if some > > expression returns 1L, I know something fishy may have gone on. > > -0, The reasons are good but this one has been promised for several > years. It's time for an L free python -- one less thing to have to > learn. Yes, but people using type() or isinstance() or __class__ will still have to remember that there are two types of integers: int and long. And both built-ins will be with us for years, and they aren't quite aliases for each other (long('12') returns a long, but int('12') an int). > If there is transition difficultly, let it be a prompt to consider > applying the forthcoming Decimal module. This I don't understand. > If necessary, we could add a debug mode switch for L's to be on or off. > By putting it the debug build, we keep people from using it in > production code. The purpose is to allow code to be run twice to see if > different results are obtained. But making a debug build is far from trivial (especially on Windows). Perhaps it should be a switch on the regular build but also produce a warning, to annoy. :-) > Also, we can put migration advice in PEP 290 and whatsnew24.tex to grep > for indicators like [:-1] on the same line as long() or repr(). Can you take care of that? > > Should I update PEP 237 to reflect this? > > Yes, that's better than surprising people later. I'll do that (in due time). --Guido van Rossum (home page: http://www.python.org/~guido/) From python at rcn.com Sat Nov 29 20:09:25 2003 From: python at rcn.com (Raymond Hettinger) Date: Sat Nov 29 20:10:05 2003 Subject: [Python-Dev] Int FutureWarnings and other 2.4 TODOs In-Reply-To: <200311300056.hAU0uwA19969@c-24-5-183-134.client.comcast.net> Message-ID: <001e01c3b6de$98bca280$e841fea9@oemcomputer> > > If necessary, we could add a debug mode switch for L's to be on or off. > > By putting it the debug build, we keep people from using it in > > production code. The purpose is to allow code to be run twice to see if > > different results are obtained. > > But making a debug build is far from trivial (especially on Windows). > Perhaps it should be a switch on the regular build but also produce a > warning, to annoy. :-) That would work. > > Also, we can put migration advice in PEP 290 and whatsnew24.tex to grep > > for indicators like [:-1] on the same line as long() or repr(). > > Can you take care of that? Yes, when the time comes. Raymond From anthony at ekit-inc.com Sat Nov 29 20:27:59 2003 From: anthony at ekit-inc.com (Anthony Baxter) Date: Sat Nov 29 20:28:20 2003 Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern() In-Reply-To: Message from Guido van Rossum of "Sat, 29 Nov 2003 10:10:58 -0800." <200311291810.hATIAwc18636@c-24-5-183-134.client.comcast.net> Message-ID: <200311300128.hAU1S0cE031343@maxim.off.ekorp.com> >>> Guido van Rossum wrote > Yes, backticks will be gone in 3.0. But I expect there's no hope of > getting rid of them earlier -- they've been used too much. I suspect Then let's kill all use of backticks in the standard library. There's a lot of them. Anthony -- Anthony Baxter It's never to late to have a happy childhood. From kajiyama at grad.sccs.chukyo-u.ac.jp Sat Nov 29 20:24:01 2003 From: kajiyama at grad.sccs.chukyo-u.ac.jp (Tamito KAJIYAMA) Date: Sat Nov 29 20:36:34 2003 Subject: [Python-Dev] possible backward incompatibility in test.regrtest Message-ID: <200311300124.hAU1O1N21082@grad.sccs.chukyo-u.ac.jp> Hi developers, It seems that the test.regrtest module has a possible backward incompatibility with regard to pre-Python 2.3 releases. I have a test suit implemented using the test.regrtest module. In this test suit, my own tests are invoked by a script like this: import os from test import regrtest regrtest.STDTESTS = [] regrtest.main(testdir=os.getcwd()) This script runs fine with 2.2 but does not with 2.3, since regrtest.py in Python 2.3 has the following lines in runtest() (introduced in Revision 1.87.2.1. See [1]): if test.startswith('test.'): abstest = test else: # Always import it from the test package abstest = 'test.' + test the_package = __import__(abstest, globals(), locals(), []) That is, tests must be in a package named "test". However, this package name is already used by the standard library, and AFAIK multiple packages with the same package name cannot exist. In other words, any additional tests (i.e. my own tests) have to be put into the test package in the standard library. Otherwise, the additional tests won't be found. IMHO, this change in 2.3 is not reasonable. Unless I miss something trivial (I hope so), I'd have to give up using the test.regrtest module. I appreciate any comment. Thanks, -- KAJIYAMA, Tamito [1] http://cvs.sourceforge.net/viewcvs.py/python/python/dist/src/Lib/test/regrtest.py?r1=1.87&r2=1.87.2.1 From tim.one at comcast.net Sat Nov 29 22:24:21 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Nov 29 22:24:25 2003 Subject: [Python-Dev] Int FutureWarnings and other 2.4 TODOs In-Reply-To: <200311300006.hAU06Lp19846@c-24-5-183-134.client.comcast.net> Message-ID: [Guido] > ... > (1) PEP 237 promises that after the new semantics are introduced for > hex/oct literals and conversions, and left shifts, operations that > cause a different result than before will produce a warning that > is on by default. Given the pain we've suffered through the > warnings in 2.3 about this stuff, I propose to forget about these > warnings. The new semantics are clear and consistent, warnings > would just cause more distress, and code first ported to 2.3 will > already have silenced the warnings. +1, and especially since it looks like 2.3 is going to become the next 1.5.2 (i.e., the version everyone flocks to, and then badgers you about for the next 20 years ). > (2) PEP 237 promises that repr() of a long should no longer show a > trailing 'L'. This is not yet implemented (i.e., repr() of a long > still has a trailing 'L'). First, past experience suggests that > quite a bit of end user code will break, and it may easily break > silently: there used to be code that did str(x)[:-1] (knowing x > was a long) to strip the 'L', which broke when str() of a long no > longer returned a trailing 'L'. Apparently some of this code was > "fixed" by changing str() into repr(), and this code will now > break again. Second, I *like* seeing a trailing L on longs, > especially when there's no reason for it to be a long: if some > expression returns 1L, I know something fishy may have gone on. +1. Changing string representations is always traumatic (lots of programs rely on parsing them), and I have a hard time imagining what positive good could come from stripping the 'L'. Making that change for str(long) seemed like pure loss from my POV (broke stuff and helped nothing). > Any comments on these? Should I update PEP 237 to reflect this? The PEP should reflect The Plan, sure. From tim.one at comcast.net Sat Nov 29 22:31:46 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Nov 29 22:31:49 2003 Subject: [Python-Dev] genexps Was: "groupby" iterator In-Reply-To: <20031129195235.GA695@mems-exchange.org> Message-ID: [Raymond Hettinger] >> Unless someone in the know volunteers, it will need to wait until >> Christmas vacation. Currently, the implementation is beyond my skill >> level. It will take a while raise my skills to cover adding new >> syntax and what to do in the compiler. [Neil Schemenauer] > I wonder if we should try to finish the new compiler first. That's the rub -- if I have time to move 2.4 along, I'll first give it to advancing the AST branch. Teaching the current front end new parsing tricks would be an exercise is obsolescence. From guido at python.org Sat Nov 29 23:26:59 2003 From: guido at python.org (Guido van Rossum) Date: Sat Nov 29 23:27:44 2003 Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern() In-Reply-To: Your message of "Sun, 30 Nov 2003 12:27:59 +1100." <200311300128.hAU1S0cE031343@maxim.off.ekorp.com> References: <200311300128.hAU1S0cE031343@maxim.off.ekorp.com> Message-ID: <200311300427.hAU4Qxb20124@c-24-5-183-134.client.comcast.net> > Then let's kill all use of backticks in the standard library. There's > a lot of them. That's one reason why we have to support them for a long time; there standard library has widely been used as sample code, so there's likely to be a lot of them elsewhere. As always, be careful with doing peephole changes to the standard library -- historically, we've seen a 1-5% error rate in these change sets that persists for months or years afterwards. --Guido van Rossum (home page: http://www.python.org/~guido/) From oren-py-d at hishome.net Sun Nov 30 02:31:09 2003 From: oren-py-d at hishome.net (Oren Tirosh) Date: Sun Nov 30 02:31:12 2003 Subject: [Python-Dev] "groupby" iterator In-Reply-To: <002701c3b606$c61304a0$e841fea9@oemcomputer> References: <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net> <002701c3b606$c61304a0$e841fea9@oemcomputer> Message-ID: <20031130073109.GA1560@hishome.net> On Fri, Nov 28, 2003 at 06:24:30PM -0500, Raymond Hettinger wrote: ... > students.sort(key=extract('grade')) # key=lambda r:r.grade > students.sort(key=extract(2)) # key=lambda r:[2] Why should the extract function interpret a string argument as getattr and an int argument as getitem? I find the explicit version more readable: students.sort(key=attrgetter('grade')) # key=lambda r:r.grade students.sort(key=itemgetter(2)) # key=lambda r:[2] students.sort(key=itemgetter('grade')) # key=lambda r:r['grade'] Oren From python at rcn.com Sun Nov 30 03:26:16 2003 From: python at rcn.com (Raymond Hettinger) Date: Sun Nov 30 03:26:56 2003 Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern() In-Reply-To: <200311300427.hAU4Qxb20124@c-24-5-183-134.client.comcast.net> Message-ID: <000201c3b71b$9facae40$e841fea9@oemcomputer> [Anthony] > > Then let's kill all use of backticks in the standard library. There's > > a lot of them. [Guido] > As always, be careful with doing peephole changes to the standard > library -- historically, we've seen a 1-5% error rate in these change > sets that persists for months or years afterwards. FWIW, Walter and I did a bunch of these for Py2.3 and had excellent success because of a good process. Some ideas are: * start it now (don't wait until a beta release). * skip the packages like email which are maintained separately * think out ways it could go wrong (operator precedence, double backticks, escaped backticks, backticks inside strings or comments, etc.). * do it manually (not brainlessly), then do it with automation to compare the results. * make sure every affected module still imports. * run the whole unittest suite in debug mode with -u all. * self-review the diff file. * get a second person to do a 100% review of the diff (Walter or I would be a good choice). * put on an asbestos suit because the flames will come even if no mistakes are made. IMO, this change is much easier to get right than the ones that were done before. Good luck, Raymond From skip at manatee.mojam.com Sun Nov 30 08:01:04 2003 From: skip at manatee.mojam.com (Skip Montanaro) Date: Sun Nov 30 08:06:15 2003 Subject: [Python-Dev] Weekly Python Bug/Patch Summary Message-ID: <200311301301.hAUD14wF006134@manatee.mojam.com> Bug/Patch Summary ----------------- 590 open / 4387 total bugs (+63) 207 open / 2476 total patches (+28) New Bugs -------- XMLGenerator.startElementNS dies on EMPTY_NAMESPACE attribut (2003-11-23) http://python.org/sf/847665 Keyword similar to "global" for nested scopes wanted (2003-11-23) http://python.org/sf/847778 64 bit solaris versus /usr/local/lib (2003-11-23) http://python.org/sf/847812 4.2.6 (re) Examples: float regexp exponential on failure (2003-11-24) http://python.org/sf/848556 couple of new list.sort bugs (2003-11-25) http://python.org/sf/848856 Windows installer halts (2003-11-25) http://python.org/sf/848871 pydoc crash on MacOS X (2003-11-25) http://python.org/sf/848907 gzip.GzipFile is slow (2003-11-25) http://python.org/sf/849046 Request: getpos() for sgmllib (2003-11-25) http://python.org/sf/849097 ZipInfo shows incorrect file size for large files (2003-11-25) http://python.org/sf/849218 reading shelves is really slow (2003-11-26) http://python.org/sf/849662 unclear documentation/missing command? (2003-11-27) http://python.org/sf/850238 Typo in Popen3 description (2003-11-28) http://python.org/sf/850818 Doc/README has broken link (2003-11-28) http://python.org/sf/850823 optparse: OptionParser.__init__'s "prog" argument ignored (2003-11-28) http://python.org/sf/850964 test_poll fails in 2.3.2 on MacOSX(Panther) (2003-11-28) http://python.org/sf/850981 mbcs encoding ignores errors (2003-11-28) http://python.org/sf/850997 building on Fedora Core 1 (2003-11-28) http://python.org/sf/851020 winreg can segfault (2003-11-28) http://python.org/sf/851056 shutil.copy destroys hard links (2003-11-29) http://python.org/sf/851123 Item out of order on builtin function page (2003-11-29) http://python.org/sf/851152 Bug tracker page asks for login even when logged in (2003-11-29) http://python.org/sf/851156 New-style classes with __eq__ but not __hash__ are hashable (2003-11-29) http://python.org/sf/851449 New Patches ----------- Port tests to unittest (Part 2) (2003-05-13) http://python.org/sf/736962 SimpleHTTPServer reports wrong content-length for text files (2003-11-10) http://python.org/sf/839496 Extend struct.unpack to produce nested tuples (2003-11-23) http://python.org/sf/847857 Cookie.py: One step closer to RFC 2109 (2003-11-23) http://python.org/sf/848017 Flakey urllib2.parse_http_list (2003-11-25) http://python.org/sf/848870 Small error in test_format (2003-11-25) http://python.org/sf/849252 832799 proposed changes (2003-11-25) http://python.org/sf/849262 improve embeddability of python (2003-11-25) http://python.org/sf/849278 urllib reporthook could be more informative (2003-11-25) http://python.org/sf/849407 Enhance frame handing in warnings.warn() (2003-11-27) http://python.org/sf/850482 Semaphore.acquire() timeout parameter (2003-11-28) http://python.org/sf/850728 call com_set_lineno more often (2003-11-28) http://python.org/sf/850789 Modify Setup.py to Detect Tcl/Tk on BSD (2003-11-28) http://python.org/sf/850977 Argument passing from /usr/bin/idle2.3 to idle.py (2003-11-29) http://python.org/sf/851459 Closed Bugs ----------- Dialogs too tight on OSX (2002-10-29) http://python.org/sf/630818 MacPython for Panther additions includes IDLE (2003-11-08) http://python.org/sf/838616 SimpleHTTPServer reports wrong content-length for text files (2003-11-10) http://python.org/sf/839496 PackMan database for panther misses devtools dep (2003-11-14) http://python.org/sf/842116 PackageManager: deselect show hidden: indexerror (2003-11-18) http://python.org/sf/844676 error in python's grammar (2003-11-21) http://python.org/sf/846521 "and" operator tests the first argument twice (2003-11-21) http://python.org/sf/846564 Closed Patches -------------- From neal at metaslash.com Sun Nov 30 11:02:31 2003 From: neal at metaslash.com (Neal Norwitz) Date: Sun Nov 30 11:02:37 2003 Subject: [Python-Dev] Use of Python Versions Message-ID: <20031130160230.GO13300@epoch.metaslash.com> I conducted an experiment to try to find out what versions of Python people use. In the last release of pychecker, I asked people to take a survey (http://metaslash.com/pyversion.html). While not scientific, it provides some info. There were 186 responses, with 2 apparent duplicates. Nobody used only one version of Python with that version being 2.1 or below. 110 people only use a single version of python with 10 using 2.2 only, 108 using 2.3 only, and 2 using 2.4 only. Here are the total number of responses by version: 1.5 5, all 5 also use 2.3 2.0 3 2.1 13 2.2 72 2.3 172 2.4 23 The raw responses are here: http://metaslash.com/pyver.txt Neal From guido at python.org Sun Nov 30 11:50:41 2003 From: guido at python.org (Guido van Rossum) Date: Sun Nov 30 11:50:48 2003 Subject: [Python-Dev] "groupby" iterator In-Reply-To: Your message of "Sun, 30 Nov 2003 07:32:20 +0900." <20031129223220.GA90372@i18n.org> References: <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net> <000101c3b63f$c7fc4720$e841fea9@oemcomputer> <20031129223220.GA90372@i18n.org> Message-ID: <200311301650.hAUGofH28925@c-24-5-183-134.client.comcast.net> I lost David Eppstein's post, but I finally know what I want to say in response. David objected to the behavior of the groupby() subiterators to become invalidated when the outer iterator is moved on to the next subiterator. But I don't think there's a good use case for what he wants to do instead: save enough state so that the subiterators can be used in arbitrary order. An application that saves the subiterators for later will end up saving a copy of everything, so it might as well be written so explicitly, e.g.: store = {} for key, group in groupby(keyfunc, iterable): store[key] = list(group) # now access the groups in random order: for key in store: print store[key] I don't think the implementation should be complexified to allow leaving out the explicit list() call in the first loop. --Guido van Rossum (home page: http://www.python.org/~guido/) From oren-py-d at hishome.net Sun Nov 30 15:44:59 2003 From: oren-py-d at hishome.net (Oren Tirosh) Date: Sun Nov 30 15:45:02 2003 Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern() In-Reply-To: <000201c3b71b$9facae40$e841fea9@oemcomputer> References: <200311300427.hAU4Qxb20124@c-24-5-183-134.client.comcast.net> <000201c3b71b$9facae40$e841fea9@oemcomputer> Message-ID: <20031130204459.GA3275@hishome.net> On Sun, Nov 30, 2003 at 03:26:16AM -0500, Raymond Hettinger wrote: > [Anthony] > > > Then let's kill all use of backticks in the standard library. > There's > > > a lot of them. > > [Guido] > > As always, be careful with doing peephole changes to the standard > > library -- historically, we've seen a 1-5% error rate in these change > > sets that persists for months or years afterwards. > > FWIW, Walter and I did a bunch of these for Py2.3 and had excellent > success because of a good process. Some ideas are: > > * start it now (don't wait until a beta release). > > * skip the packages like email which are maintained separately > > * think out ways it could go wrong (operator precedence, double > backticks, escaped backticks, backticks inside strings or comments, > etc.). > > * do it manually (not brainlessly), then do it with automation to > compare the results. Here's an idea for verifying an automated translator: Instead of converting `expr` to repr(expr) convert it first to (`expr`) or even (`(expr)`) and make sure it still compiles into exactly the same bytecode. It should catch all the problem you mention except backticks in comments and strings. These need manual inspection. Oren From aleaxit at yahoo.com Sun Nov 30 15:57:49 2003 From: aleaxit at yahoo.com (Alex Martelli) Date: Sun Nov 30 15:57:58 2003 Subject: [Python-Dev] "groupby" iterator In-Reply-To: <20031130073109.GA1560@hishome.net> References: <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net> <002701c3b606$c61304a0$e841fea9@oemcomputer> <20031130073109.GA1560@hishome.net> Message-ID: <200311302157.49205.aleaxit@yahoo.com> On Sunday 30 November 2003 08:31, Oren Tirosh wrote: > On Fri, Nov 28, 2003 at 06:24:30PM -0500, Raymond Hettinger wrote: > ... > > > students.sort(key=extract('grade')) # key=lambda r:r.grade > > students.sort(key=extract(2)) # key=lambda r:[2] > > Why should the extract function interpret a string argument as getattr > and an int argument as getitem? > > I find the explicit version more readable: > > students.sort(key=attrgetter('grade')) # key=lambda r:r.grade > students.sort(key=itemgetter(2)) # key=lambda r:[2] > students.sort(key=itemgetter('grade')) # key=lambda r:r['grade'] I concur: "overloading" extract to mean (the equivalent of) either getattr or getitem depending on the argument type doesn't look good, besides making it unusable to extract some items from dicts. Since these functions or types are going to be in operator, I think we can afford to "spend" two names to distinguish functionality (even though attgetter and itemgetter look nowhere as neat as extract -- I don't have better suggestions offhand). Alex From guido at python.org Sun Nov 30 16:54:23 2003 From: guido at python.org (Guido van Rossum) Date: Sun Nov 30 16:54:29 2003 Subject: [Python-Dev] "groupby" iterator In-Reply-To: Your message of "Sun, 30 Nov 2003 21:57:49 +0100." <200311302157.49205.aleaxit@yahoo.com> References: <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net> <002701c3b606$c61304a0$e841fea9@oemcomputer> <20031130073109.GA1560@hishome.net> <200311302157.49205.aleaxit@yahoo.com> Message-ID: <200311302154.hAULsN229214@c-24-5-183-134.client.comcast.net> > I concur: "overloading" extract to mean (the equivalent of) either > getattr or getitem depending on the argument type doesn't look > good, besides making it unusable to extract some items from dicts. Agreed. I've seen too many of such "clever" overloading schemes in a past life. > Since these functions or types are going to be in operator, I think > we can afford to "spend" two names to distinguish functionality > (even though attgetter and itemgetter look nowhere as neat as > extract -- I don't have better suggestions offhand). Right. --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Sun Nov 30 18:48:25 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Nov 30 18:46:38 2003 Subject: [Python-Dev] "groupby" iterator In-Reply-To: <200311302157.49205.aleaxit@yahoo.com> References: <20031130073109.GA1560@hishome.net> <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net> <002701c3b606$c61304a0$e841fea9@oemcomputer> <20031130073109.GA1560@hishome.net> Message-ID: <5.1.0.14.0.20031130184217.02e3c1d0@mail.telecommunity.com> At 09:57 PM 11/30/03 +0100, Alex Martelli wrote: > > students.sort(key=attrgetter('grade')) # key=lambda r:r.grade > > students.sort(key=itemgetter(2)) # key=lambda r:[2] > > students.sort(key=itemgetter('grade')) # key=lambda r:r['grade'] > >I concur: "overloading" extract to mean (the equivalent of) either >getattr or getitem depending on the argument type doesn't look >good, besides making it unusable to extract some items from dicts. > >Since these functions or types are going to be in operator, I think >we can afford to "spend" two names to distinguish functionality >(even though attgetter and itemgetter look nowhere as neat as >extract -- I don't have better suggestions offhand). How about: extract(attr='grade') extract(item=2) extract(method='foo') # returns the result of calling 'ob.foo()' And following the pattern of Zope's old "query" package: extract(extract(attr='foo'), attr='bar') # extracts ob.foo.bar extract(extract(item=10), method='spam') # extracts ob[10].spam() i.e., the first (optional) positional argument to extract is a function that's called on the outer extract's argument, and the return value is then used to perform the main extract operation on. IIRC, the Zope query package used __getitem__ instead of __call__ on its instances as a speed hack, but I don't think we should follow that example. :) From guido at python.org Sun Nov 30 19:18:37 2003 From: guido at python.org (Guido van Rossum) Date: Sun Nov 30 19:18:48 2003 Subject: [Python-Dev] "groupby" iterator In-Reply-To: Your message of "Sun, 30 Nov 2003 18:48:25 EST." <5.1.0.14.0.20031130184217.02e3c1d0@mail.telecommunity.com> References: <20031130073109.GA1560@hishome.net> <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net> <002701c3b606$c61304a0$e841fea9@oemcomputer> <20031130073109.GA1560@hishome.net> <5.1.0.14.0.20031130184217.02e3c1d0@mail.telecommunity.com> Message-ID: <200312010018.hB10IbS29532@c-24-5-183-134.client.comcast.net> > How about: > > extract(attr='grade') > extract(item=2) > extract(method='foo') # returns the result of calling 'ob.foo()' > > And following the pattern of Zope's old "query" package: > > extract(extract(attr='foo'), attr='bar') # extracts ob.foo.bar > extract(extract(item=10), method='spam') # extracts ob[10].spam() > > i.e., the first (optional) positional argument to extract is a function > that's called on the outer extract's argument, and the return value is then > used to perform the main extract operation on. I'm not sure what the advantage of this is. It seems more typing, more explanation, probably more code to implement (to check for contradicting keyword args). > IIRC, the Zope query package used __getitem__ instead of __call__ on its > instances as a speed hack, but I don't think we should follow that example. :) Right. :) --Guido van Rossum (home page: http://www.python.org/~guido/) From fincher.8 at osu.edu Sun Nov 30 20:25:46 2003 From: fincher.8 at osu.edu (Jeremy Fincher) Date: Sun Nov 30 19:27:53 2003 Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern() In-Reply-To: <20031130204459.GA3275@hishome.net> References: <200311300427.hAU4Qxb20124@c-24-5-183-134.client.comcast.net> <000201c3b71b$9facae40$e841fea9@oemcomputer> <20031130204459.GA3275@hishome.net> Message-ID: <200311302025.46673.fincher.8@osu.edu> On Sunday 30 November 2003 03:44 pm, Oren Tirosh wrote: > Instead of converting `expr` to repr(expr) convert it first to (`expr`) > or even (`(expr)`) and make sure it still compiles into exactly the same > bytecode. It should catch all the problem you mention except backticks > in comments and strings. These need manual inspection. I don't know if it should be *that* mechanical; there are a lot of places where I've seen " 'something %s' % repr(foo)" when I think it's much more clearly written as " 'something %r' % foo". I don't know which is the officially preferred style, but if it's the latter (and I hope it is ;)) then it may not be good to mechanically change backticks to a repr call. Jeremy From eppstein at ics.uci.edu Sun Nov 30 20:01:20 2003 From: eppstein at ics.uci.edu (David Eppstein) Date: Sun Nov 30 20:01:18 2003 Subject: [Python-Dev] Re: "groupby" iterator References: <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net> <000101c3b63f$c7fc4720$e841fea9@oemcomputer> <20031129223220.GA90372@i18n.org> <200311301650.hAUGofH28925@c-24-5-183-134.client.comcast.net> Message-ID: In article <200311301650.hAUGofH28925@c-24-5-183-134.client.comcast.net>, Guido van Rossum wrote: > But I don't think there's a good use case > for what he wants to do instead: save enough state so that the > subiterators can be used in arbitrary order. An application > that saves the subiterators for later will end up saving a copy of > everything, so it might as well be written so explicitly I don't have a good explicit use case in mind, but my objective is to be able to use itertools-like functionals without having to pay much attention to which ones iterate through their arguments immediately and which ones defer the iteration until later. -- David Eppstein http://www.ics.uci.edu/~eppstein/ Univ. of California, Irvine, School of Information & Computer Science From guido at python.org Sun Nov 30 20:08:14 2003 From: guido at python.org (Guido van Rossum) Date: Sun Nov 30 20:08:30 2003 Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern() In-Reply-To: Your message of "Sun, 30 Nov 2003 20:25:46 EST." <200311302025.46673.fincher.8@osu.edu> References: <200311300427.hAU4Qxb20124@c-24-5-183-134.client.comcast.net> <000201c3b71b$9facae40$e841fea9@oemcomputer> <20031130204459.GA3275@hishome.net> <200311302025.46673.fincher.8@osu.edu> Message-ID: <200312010108.hB118EL29591@c-24-5-183-134.client.comcast.net> > I don't know if it should be *that* mechanical; there are a lot of > places where I've seen " 'something %s' % repr(foo)" when I think > it's much more clearly written as " 'something %r' % foo". I don't > know which is the officially preferred style, but if it's the latter > (and I hope it is ;)) then it may not be good to mechanically change > backticks to a repr call. If you're going to do that, I would beware of one thing. If x is a tuple, "foo %r" % x will not do the right thing: it will expect x to be a 1-tuple and produce the repr of x[0]: >>> a = (42,) >>> print "foo %s" % repr(a) foo (42,) >>> print "foo %r" % a foo 42 >>> a = (4, 2) >>> print "foo %r" % a Traceback (most recent call last): File "", line 1, in ? TypeError: not all arguments converted during string formatting >>> This is only a problem when there's only one % format in the string; if there are two or more, the argument is already a tuple and the substitution of %s/repr(x) to %r/x works fine. This also suggests a solution: if there's only one argument, create an explicit tuple: >>> print "foo %r" % (a,) foo (4, 2) >>> --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Sun Nov 30 21:20:16 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Nov 30 21:18:24 2003 Subject: [Python-Dev] "groupby" iterator In-Reply-To: <200312010018.hB10IbS29532@c-24-5-183-134.client.comcast.ne t> References: <20031130073109.GA1560@hishome.net> <200311282146.hASLkr317367@c-24-5-183-134.client.comcast.net> <002701c3b606$c61304a0$e841fea9@oemcomputer> <20031130073109.GA1560@hishome.net> <5.1.0.14.0.20031130184217.02e3c1d0@mail.telecommunity.com> Message-ID: <5.1.0.14.0.20031130210357.02f62d80@mail.telecommunity.com> At 04:18 PM 11/30/03 -0800, Guido van Rossum wrote: > > How about: > > > > extract(attr='grade') > > extract(item=2) > > extract(method='foo') # returns the result of calling 'ob.foo()' > > > > And following the pattern of Zope's old "query" package: > > > > extract(extract(attr='foo'), attr='bar') # extracts ob.foo.bar > > extract(extract(item=10), method='spam') # extracts ob[10].spam() > > > > i.e., the first (optional) positional argument to extract is a function > > that's called on the outer extract's argument, and the return value is > then > > used to perform the main extract operation on. > >I'm not sure what the advantage of this is. The chaining part, or the idea at all? For the idea in general, I was just proposing a more explicit form of the last API proposal. For the chaining part, well, my use case is the same as the old Zope query library: being able to compose operators to craft OO queries from a high level description. No reason that needs to go in the standard library, but as long as we were dreaming, I figured I might help implement it if it solved enough problems for me. :) (Without the chaining part, I don't really care if there's a standard library 'extract()' or not, since I'll still need to write a chaining one sooner or later.) > It seems more typing, >more explanation, probably more code to implement (to check for >contradicting keyword args). Yes. Really the whole extract thing isn't that useful, except to get extra speed over using 'lambda x: x.foo' or whatever, which is what I'd probably use in any code that wasn't composing functions or compiling an OO query language. :) From python at rcn.com Sun Nov 30 23:35:56 2003 From: python at rcn.com (Raymond Hettinger) Date: Sun Nov 30 23:37:27 2003 Subject: [Python-Dev] Re: "groupby" iterator In-Reply-To: Message-ID: <003301c3b7c4$b9f1a400$e841fea9@oemcomputer> [Guido van Rossum] > > But I don't think there's a good use case > > for what he wants to do instead: save enough state so that the > > subiterators can be used in arbitrary order. An application > > that saves the subiterators for later will end up saving a copy of > > everything, so it might as well be written so explicitly [David Eppstein] > I don't have a good explicit use case in mind, but my objective is to be > able to use itertools-like functionals without having to pay much > attention to which ones iterate through their arguments immediately and > which ones defer the iteration until later. Okay, I've decided on this one. Though David's idea is attractive in its generality, the use cases favor the previous implementation. IOW, there is a reasonable use case for skipping or partially consuming the subiterators (e.g. "sort s | uniq" and "sort s | uniq -d"). For the delinguent subiterators, the user can just convert them to a list if they are going to be needed later: groups = [] for k, g in groupby(seq, keyfunc): groups.append(list(g)) With respect to the principle of least surprise, it is the lesser evil between having a delinquent subiterator turn-up empty or having an itertool unexpectedly fall into a memory intensive mode. The first can be flagged so it won't pass silently. The second is more problematic because it is silent and because it is inconsistent with the memory friendly nature of itertools. Another minor argument against David's version is that the pure python version (which will be included in the docs) is longer and harder to follow. Raymond Hettinger P.S. I'm leaning toward Alex's suggested argument order. Having a default identity function is too attractive to pass up. So the choice is between a style like map(None, s) or something closer to list.sorted(s, key=). Though the latter is not consistent with other itertools, it wins in the beauty department and its similarity with the key= is a accurate, helpful analogy. From python at rcn.com Sun Nov 30 23:42:55 2003 From: python at rcn.com (Raymond Hettinger) Date: Sun Nov 30 23:43:39 2003 Subject: [Python-Dev] Banishing apply(), buffer(), coerce(), and intern() In-Reply-To: <200311300128.hAU1S0cE031343@maxim.off.ekorp.com> Message-ID: <003401c3b7c5$96b39e20$e841fea9@oemcomputer> > > Yes, backticks will be gone in 3.0. But I expect there's no hope of > > getting rid of them earlier -- they've been used too much. I suspect > > Then let's kill all use of backticks in the standard library. There's > a lot of them. Advisory from a micro-performance hawk: Backticks are faster than repr() >>> from timeit import Timer >>> min(Timer('`x`', 'x=1').repeat(3)) 1.4857213496706265 >>> min(Timer('repr(x)', 'x=1').repeat(3)) 1.7748914665012876 Raymond Hettinger From eppstein at ics.uci.edu Sun Nov 30 23:42:18 2003 From: eppstein at ics.uci.edu (David Eppstein) Date: Mon Dec 1 11:53:22 2003 Subject: [Python-Dev] Re: "groupby" iterator In-Reply-To: <003301c3b7c4$b9f1a400$e841fea9@oemcomputer> References: <003301c3b7c4$b9f1a400$e841fea9@oemcomputer> Message-ID: <30187757.1070224938@[192.168.1.100]> On 11/30/03 11:35 PM -0500 Raymond Hettinger wrote: > Okay, I've decided on this one. > > Though David's idea is attractive in its generality, the use cases favor > the previous implementation. IOW, there is a reasonable use case for > skipping or partially consuming the subiterators (e.g. "sort s | uniq" > and "sort s | uniq -d"). For the delinguent subiterators, the user can > just convert them to a list if they are going to be needed later: My implementation will skip or partially consume the subiterators, with only a very temporary additional use of memory, if you don't keep a reference to them. But I can see your arguments about visible vs silent failure modes and code complexity. -- David Eppstein http://www.ics.uci.edu/~eppstein/ Univ. of California, Irvine, School of Information & Computer Science