From gstein@lyra.org Wed Mar 1 00:12:29 2000 From: gstein@lyra.org (Greg Stein) Date: Tue, 29 Feb 2000 16:12:29 -0800 (PST) Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: On Wed, 1 Mar 2000, Mark Hammond wrote: > > Why don't we simply move forward with the assumption that PythonWin and > > Scintilla will be updated? > > Done :-) hehe... > However, I think dropping it now _is_ a little heavy handed. I decided to > do a wider search and found a few in, eg, Sam Rushings calldll based ODBC > package. > > Personally, I would much prefer a warning now, and drop it later. _Then_ we > can say we have made enough noise about it. It would only be 2 years ago > that I became aware that this "feature" of append was not a feature at all - > up until then I used it purposely, and habits are sometimes hard to change > :-) What's the difference between a warning and an error? If you're running a program and it suddenly spits out a warning about a misuse of list.append, I'd certainly see that as "the program did something unexpected; that is an error." But this is all moot. Guido has already said that we would be amenable to a warning/error infrastructure which list.append could use. His description used some awkward sentences, so I'm not sure (without spending some brain cycles to parse the email) exactly what his desired defaults and behavior are. But hey... the possibility is there, and is just waiting for somebody to code it. IMO, Guido has left an out for people that are upset with the current hard-line approach. One of those people just needs to spend a bit of time coming up with a patch :-) And yes, Guido is also the Benevolent Dictator and can certainly have his mind changed, so people can definitely continue pestering him to back away from the hard-line approach... Cheers, -g -- Greg Stein, http://www.lyra.org/ From ping@lfw.org Wed Mar 1 00:20:07 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Tue, 29 Feb 2000 18:20:07 -0600 (CST) Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: On Tue, 29 Feb 2000, Greg Stein wrote: > > What's the difference between a warning and an error? If you're running a > program and it suddenly spits out a warning about a misuse of list.append, > I'd certainly see that as "the program did something unexpected; that is > an error." A big, big difference. Perhaps to one of us, it's the minor inconvenience of reading the error message and inserting a couple of parentheses in the appropriate file -- but to the end user, it's the difference between the program working (albeit noisily) and *not* working. When the program throws an exception and stops, it is safe to say most users will declare it broken and give up. We can't assume that they're going to be able to figure out what to edit (or be brave enough to try) just by reading the error message... or even what interpreter flag to give, if errors (rather than warnings) are the default behaviour. -- ?!ng From klm@digicool.com Wed Mar 1 00:37:09 2000 From: klm@digicool.com (Ken Manheimer) Date: Tue, 29 Feb 2000 19:37:09 -0500 (EST) Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: On Wed, 1 Mar 2000, Mark Hammond wrote: > > Why don't we simply move forward with the assumption that PythonWin and > > Scintilla will be updated? > > Done :-) > > However, I think dropping it now _is_ a little heavy handed. I decided to > do a wider search and found a few in, eg, Sam Rushings calldll based ODBC > package. > > Personally, I would much prefer a warning now, and drop it later. _Then_ we > can say we have made enough noise about it. It would only be 2 years ago > that I became aware that this "feature" of append was not a feature at all - > up until then I used it purposely, and habits are sometimes hard to change > :-) I agree with mark. Why the sudden rush?? It seems to me to be unfair to make such a change - one that will break peoples code - without advanced warning, which typically is handled by a deprecation period. There *are* going to be people who won't be informed of the change in the short span of less than a single release. Just because it won't cause you pain isn't a good reason to disregard the pain of those that will suffer, particularly when you can do something relatively low-cost to avoid it. Ken klm@digicool.com From gstein@lyra.org Wed Mar 1 00:57:56 2000 From: gstein@lyra.org (Greg Stein) Date: Tue, 29 Feb 2000 16:57:56 -0800 (PST) Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: On Tue, 29 Feb 2000, Ken Manheimer wrote: >... > I agree with mark. Why the sudden rush?? It seems to me to be unfair to > make such a change - one that will break peoples code - without advanced > warning, which typically is handled by a deprecation period. There *are* > going to be people who won't be informed of the change in the short span > of less than a single release. Just because it won't cause you pain isn't > a good reason to disregard the pain of those that will suffer, > particularly when you can do something relatively low-cost to avoid it. Sudden rush?!? Mark said he knew about it for a couple years. Same here. It was a long while ago that .append()'s semantics were specified to "no longer" accept multiple arguments. I see in the HISTORY file, that changes were made to Python 1.4 (October, 1996) to avoid calling append() with multiple arguments. So, that is over three years that append() has had multiple-args deprecated. There was probably discussion even before that, but I can't seem to find something to quote. Seems like plenty of time -- far from rushed. Cheers, -g -- Greg Stein, http://www.lyra.org/ From klm@digicool.com Wed Mar 1 01:02:02 2000 From: klm@digicool.com (Ken Manheimer) Date: Tue, 29 Feb 2000 20:02:02 -0500 (EST) Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: On Tue, 29 Feb 2000, Greg Stein wrote: > On Tue, 29 Feb 2000, Ken Manheimer wrote: > >... > > I agree with mark. Why the sudden rush?? It seems to me to be unfair to > > make such a change - one that will break peoples code - without advanced > > warning, which typically is handled by a deprecation period. There *are* > > going to be people who won't be informed of the change in the short span > > of less than a single release. Just because it won't cause you pain isn't > > a good reason to disregard the pain of those that will suffer, > > particularly when you can do something relatively low-cost to avoid it. > > Sudden rush?!? > > Mark said he knew about it for a couple years. Same here. It was a long > while ago that .append()'s semantics were specified to "no longer" accept > multiple arguments. > > I see in the HISTORY file, that changes were made to Python 1.4 (October, > 1996) to avoid calling append() with multiple arguments. > > So, that is over three years that append() has had multiple-args > deprecated. There was probably discussion even before that, but I can't > seem to find something to quote. Seems like plenty of time -- far from > rushed. None the less, for those practicing it, the incorrectness of it will be fresh news. I would be less sympathetic with them if there was recent warning, eg, the schedule for changing it in the next release was part of the current release. But if you tell somebody you're going to change something, and then don't for a few years, you probably need to renew the warning before you make the change. Don't you think so? Why not? Ken klm@digicool.com From paul@prescod.net Wed Mar 1 02:56:33 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 29 Feb 2000 18:56:33 -0800 Subject: [Python-Dev] breaking list.append() References: Message-ID: <38BC86E1.53F69776@prescod.net> Software configuration management is HARD. Every sudden backwards incompatible change (warranted or not) makes it harder. Mutli-arg append is not hurting anyone as much as a sudden change to it would. It would be better to leave append() alone and publicize its near-term removal rather than cause random, part-time supported modules to stop working because their programmers may be too busy to update them right now. So no, I'm not stepping up to do it. But I'm also saying that the better "lazy" option is to put something in a prominent place in the documentation and otherwise leave it alone. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made possible the modern world." - from "Advent of the Algorithm" David Berlinski http://www.opengroup.com/mabooks/015/0151003386.shtml From guido@python.org Wed Mar 1 04:11:02 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 29 Feb 2000 23:11:02 -0500 Subject: [Python-Dev] breaking list.append() In-Reply-To: Your message of "Tue, 29 Feb 2000 18:56:33 PST." <38BC86E1.53F69776@prescod.net> References: <38BC86E1.53F69776@prescod.net> Message-ID: <200003010411.XAA12988@eric.cnri.reston.va.us> > Software configuration management is HARD. Every sudden backwards > incompatible change (warranted or not) makes it harder. Mutli-arg append > is not hurting anyone as much as a sudden change to it would. It would > be better to leave append() alone and publicize its near-term removal > rather than cause random, part-time supported modules to stop working > because their programmers may be too busy to update them right now. I'm tired of this rhetoric. It's not like I'm changing existing Python installations retroactively. I'm planning to release a new version of Python which no longer supports certain long-obsolete and undocumented behavior. If you maintain a non-core Python module, you should test it against the new release and fix anything that comes up. This is why we have an alpha and beta test cycle and even before that the CVS version. If you are a Python user who depends on a 3rd party module, you need to find out whether the new version is compatible with the 3rd party code you are using, or whether there's a newer version available that solves the incompatibility. There are people who still run Python 1.4 (really!) because they haven't upgraded. I don't have a problem with that -- they don't get much support, but it's their choice, and they may not need the new features introduced since then. I expect that lots of people won't upgrade their Python 1.5.2 to 1.6 right away -- they'll wait until the other modules/packages they need are compatible with 1.6. Multi-arg append probably won't be the only reason why e.g. Digital Creations may need to release an update to Zope for Python 1.6. Zope comes with its own version of Python anyway, so they have control over when they make the switch. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one@email.msn.com Wed Mar 1 05:04:35 2000 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 00:04:35 -0500 Subject: [Python-Dev] Size of int across machines (was RE: Blowfish in Python?) In-Reply-To: Message-ID: <000201bf833b$a3b01bc0$412d153f@tim> [Markus Stenberg] > ... > speed was horrendous. > > I think the main reason was the fact that I had to use _long ints_ for > calculations, as the normal ints are signed, and apparently the bitwise > operators do not work as advertised when bit32 is set (=number is > negative). [Tim, takes "bitwise operators" to mean & | ^ ~, and expresses surprise] [Markus, takes umbrage, and expresses umbrage ] > Hmm.. As far as I'm concerned, shifts for example do screw up. Do you mean "for example" as in "there are so many let's just pick one at random", or as in "this is the only one I've stumbled into" <0.9 wink>? > i.e. > > 0xffffffff >> 30 > > [64bit Python: 3] > [32bit Python: -1] > > As far as I'm concerned, that should _not_ happen. Or maybe it's just me. I could not have guessed that your complaint was about 64-bit Python from your "when bit32 is set (=number is negative)" description . The behavior shown in a Python compiled under a C in which sizeof(long)==4 matches the Reference Manual (see the "Integer and long integer literals" and "shifting operations" sections). So that can't be considered broken (you may not *like* it, but it's functioning as designed & as documented). The behavior under a sizeof(long)==8 C seems more of an ill-documented (and debatable to me too) feature. The possibility is mentioned in the "The standard type hierarchy" section (under Numbers -> Integers -> Plain integers) but really not fleshed out, and the "Integer and long integer literals" section plainly contradicts it. Python's going to have to clean up its act here -- 64-bit machines are getting more common. There's a move afoot to erase the distinction between Python ints and longs (in the sense of auto-converting from one to the other under the covers, as needed). In that world, your example would work like the "64bit Python" one. There are certainly compatability issues, though, in that int left shifts are end-off now, and on a 32-bit machine any int for which i & 0x8000000 is true "is negative" (and so sign-extends on a right shift; note that Python guarantees sign-extending right shifts *regardless* of what the platform C does (C doesn't define what happens here -- Python does)). [description of pain getting a fast C-like "mod 2**32 int +" to work too] Python really wasn't designed for high-performance bit-fiddling, so you're (as you've discovered ) swimming upstream with every stroke. Given that you can't write a C module here, there's nothing better than to do the ^ & | ~ parts with ints, and fake the rest slowly & painfully. Note that you can at least determine the size of a Python int via inspecting sys.maxint. sympathetically-unhelpfully y'rs - tim From guido@python.org Wed Mar 1 05:44:10 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 01 Mar 2000 00:44:10 -0500 Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python In-Reply-To: Your message of "Tue, 29 Feb 2000 15:34:21 MST." <20000229153421.A16502@acs.ucalgary.ca> References: <20000229153421.A16502@acs.ucalgary.ca> Message-ID: <200003010544.AAA13155@eric.cnri.reston.va.us> [I don't like to cross-post to patches and python-dev, but I think this belongs in patches because it's a followup to Neil's post there and also in -dev because of its longer-term importance.] Thanks for the new patches, Neil! We had a visitor here at CNRI today, Eric Tiedemann , who had a look at your patches before. Eric knows his way around the Scheme, Lisp and GC literature, and presented a variant on your approach which takes the bite out of the recursive passes. Eric had commented earlier on Neil's previous code, and I had used the morning to make myself familiar with Neil's code. This was relatively easy because Neil's code is very clear. Today, Eric proposed to do away with Neil's hash table altogether -- as long as we're wasting memory, we might as well add 3 fields to each container object rather than allocating the same amount in a separate hash table. Eric expects that this will run faster, although this obviously needs to be tried. Container types are: dict, list, tuple, class, instance; plus potentially user-defined container types such as kjbuckets. I have a feeling that function objects should also be considered container types, because of the cycle involving globals. Eric's algorithm, then, consists of the following parts. Each container object has three new fields: gc_next, gc_prev, and gc_refs. (Eric calls the gc_refs "refcount-zero".) We color objects white (initial), gray (root), black (scanned root). (The terms are explained later; we believe we don't actually need bits in the objects to store the color; see later.) All container objects are chained together in a doubly-linked list -- this is the same as Neil's code except Neil does it only for dicts. (Eric postulates that you need a list header.) When GC is activated, all objects are colored white; we make a pass over the entire list and set gc_refs equal to the refcount for each object. Next, we make another pass over the list to collect the internal references. Internal references are (just like in Neil's version) references from other container types. In Neil's version, this was recursive; in Eric's version, we don't need recursion, since the list already contains all containers. So we simple visit the containers in the list in turn, and for each one we go over all the objects it references and subtract one from *its* gc_refs field. (Eric left out the little detail that we ened to be able to distinguish between container and non-container objects amongst those references; this can be a flag bit in the type field.) Now, similar to Neil's version, all objects for which gc_refs == 0 have only internal references, and are potential garbage; all objects for which gc_refs > 0 are "roots". These have references to them from other places, e.g. from globals or stack frames in the Python virtual machine. We now start a second list, to which we will move all roots. The way to do this is to go over the first list again and to move each object that has gc_refs > 0 to the second list. Objects placed on the second list in this phase are considered colored gray (roots). Of course, some roots will reference some non-roots, which keeps those non-roots alive. We now make a pass over the second list, where for each object on the second list, we look at every object it references. If a referenced object is a container and is still in the first list (colored white) we *append* it to the second list (colored gray). Because we append, objects thus added to the second list will eventually be considered by this same pass; when we stop finding objects that sre still white, we stop appending to the second list, and we will eventually terminate this pass. Conceptually, objects on the second list that have been scanned in this pass are colored black (scanned root); but there is no need to to actually make the distinction. (How do we know whether an object pointed to is white (in the first list) or gray or black (in the second)? We could use an extra bitfield, but that's a waste of space. Better: we could set gc_refs to a magic value (e.g. 0xffffffff) when we move the object to the second list. During the meeting, I proposed to set the back pointer to NULL; that might work too but I think the gc_refs field is more elegant. We could even just test for a non-zero gc_refs field; the roots moved to the second list initially all have a non-zero gc_refs field already, and for the objects with a zero gc_refs field we could indeed set it to something arbitrary.) Once we reach the end of the second list, all objects still left in the first list are garbage. We can destroy them in a similar to the way Neil does this in his code. Neil calls PyDict_Clear on the dictionaries, and ignores the rest. Under Neils assumption that all cycles (that he detects) involve dictionaries, that is sufficient. In our case, we may need a type-specific "clear" function for containers in the type object. We discussed more things, but not as thoroughly. Eric & Eric stressed the importance of making excellent statistics available about the rate of garbage collection -- probably as data structures that Python code can read rather than debugging print statements. Eric T also sketched an incremental version of the algorithm, usable for real-time applications. This involved keeping the gc_refs field ("external" reference counts) up-to-date at all times, which would require two different versions of the INCREF/DECREF macros: one for adding/deleting a reference from a container, and another for adding/deleting a root reference. Also, a 4th color (red) was added, to distinguish between scanned roots and scanned non-roots. We decided not to work this out in more detail because the overhead cost appeared to be much higher than for the previous algorithm; instead, we recommed that for real-time requirements the whole GC is disabled (there should be run-time controls for this, not just compile-time). We also briefly discussed possibilities for generational schemes. The general opinion was that we should first implement and test the algorithm as sketched above, and then changes or extensions could be made. I was pleasantly surprised to find Neil's code in my inbox when we came out of the meeting; I think it would be worthwhile to compare and contrast the two approaches. (Hm, maybe there's a paper in it?) The rest of the afternoon was spent discussing continuations, coroutines and generators, and the fundamental reason why continuations are so hard (the C stack getting in the way everywhere). But that's a topic for another mail, maybe. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one@email.msn.com Wed Mar 1 05:57:49 2000 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 00:57:49 -0500 Subject: need .append patch (was RE: [Python-Dev] Re: Python-checkins digest, Vol 1 #370 - 8 msgs) In-Reply-To: <200002291302.IAA04581@eric.cnri.reston.va.us> Message-ID: <000601bf8343$13575040$412d153f@tim> [Tim, runs checkappend.py over the entire CVS tree, comes up with surprisingly many remaining problems, and surprisingly few false hits] [Guido fixes mailerdaemon.py, and argues for nuking Demo\tkinter\www\ (the whole directory) Demo\sgi\video\VcrIndex.py (unclear whether the dir or just the file) Demo\sgi\gl\glstdwin\glstdwin.py (stdwin-related) Demo\ibrowse\ibrowse.py (stdwin-related) > All these are stdwin-related. Stdwin will also go out of service per > 1.6. ] Then the sooner someone nukes them from the CVS tree, the sooner my automated hourly checkappend complaint generator will stop pestering Python-Dev about them . > (Conclusion: most multi-arg append() calls are *very* old, But part of that is because we went thru this exercise a couple years ago too, and you repaired all the ones in the less obscure parts of the distribution then. > or contributed by others. Sigh. I must've given bad examples long > ago...) Na, I doubt that. Most people will not read a language defn, at least not until "something doesn't work". If the compiler accepts a thing, they simply *assume* it's correct. It's pretty easy (at least for me!) to make this particular mistake as a careless typo, so I assume that's the "source origin" for many of these too. As soon you *notice* you've done it, and that nothing bad happened, the natural tendencies are to (a) believe it's OK, and (b) save 4 keystrokes (incl. the SHIFTs) over & over again in the glorious indefinite future . Reminds me of a c.l.py thread a while back, wherein someone did stuff like None, x, y, None = function_returning_a_4_tuple to mean that they didn't care what the 1st & 4th values were. It happened to work, so they did it more & more. Eventually a function containing this mistake needed to reference None after that line, and "suddenly for no reason at all Python stopped working". To the extent that you're serious about CP4E, you're begging for more of this, not less . newbies-even-keep-on-doing-things-that-*don't*-work!-ly y'rs - tim From tim_one@email.msn.com Wed Mar 1 06:50:44 2000 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 01:50:44 -0500 Subject: [Python-Dev] Unicode mapping tables In-Reply-To: <38BBD1A2.CD29AADD@lemburg.com> Message-ID: <000701bf834a$77acdfe0$412d153f@tim> [M.-A. Lemburg] > ... > Currently, mapping tables map characters to Unicode characters > and vice-versa. Now the .translate method will use a different > kind of table: mapping integer ordinals to integer ordinals. You mean that if I want to map u"a" to u"A", I have to set up some sort of dict mapping ord(u"a") to ord(u"A")? I simply couldn't follow this. > Question: What is more of efficient: having lots of integers > in a dictionary or lots of characters ? My bet is "lots of integers", to reduce both space use and comparison time. > ... > Something else that changed is the way .capitalize() works. The > Unicode version uses the Unicode algorithm for it (see TechRep. 13 > on the www.unicode.org site). #13 is "Unicode Newline Guidelines". I assume you meant #21 ("Case Mappings"). > Here's the new doc string: > > S.capitalize() -> unicode > > Return a capitalized version of S, i.e. words start with title case > characters, all remaining cased characters have lower case. > > Note that *all* characters are touched, not just the first one. > The change was needed to get it in sync with the .iscapitalized() > method which is based on the Unicode algorithm too. > > Should this change be propogated to the string implementation ? Unicode makes distinctions among "upper case", "lower case" and "title case", and you're trying to get away with a single "capitalize" function. Java has separate toLowerCase, toUpperCase and toTitleCase methods, and that's the way to do it. Whatever you do, leave .capitalize alone for 8-bit strings -- there's no reason to break code that currently works. "capitalize" seems a terrible choice of name for a titlecase method anyway, because of its baggage connotations from 8-bit strings. Since this stuff is complicated, I say it would be much better to use the same names for these things as the Unicode and Java folk do: there's excellent documentation elsewhere for all this stuff, and it's Bad to make users mentally translate unique Python terminology to make sense of the official docs. So my vote is: leave capitalize the hell alone . Do not implement capitialize for Unicode strings. Introduce a new titlecase method for Unicode strings. Add a new titlecase method to 8-bit strings too. Unicode strings should also have methods to get at uppercase and lowercase (as Unicode defines those). From tim_one@email.msn.com Wed Mar 1 07:36:03 2000 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 02:36:03 -0500 Subject: [Python-Dev] Re: Python / Haskell (fwd) In-Reply-To: Message-ID: <000801bf8350$cc4ec580$412d153f@tim> [Greg Wilson, quoting Philip Wadler] > Well, what I most want is typing. But you already know that. So invite him to contribute to the Types-SIG <0.5 wink>. > Next after typing? Full lexical scoping for closures. I want to write: > > fun x: fun y: x+y > > Not: > > fun x: fun y, x=x: x+y > > Lexically scoped closures would be a big help for the embedding technique > I described [GVW: in a posting to the Software Carpentry discussion list, > archived at > > http://software-carpentry.codesourcery.com/lists/sc-discuss/msg00068.html > > which discussed how to build a flexible 'make' alternative in Python]. So long as we're not deathly concerned over saving a few lines of easy boilerplate code, Python already supports this approach wonderfully well -- but via using classes with __call__ methods instead of lexical closures. I can't make time to debate this now, but suffice it to say dozens on c.l.py would be delighted to . Philip is understandably attached to the "functional way of spelling things", but Python's way is at least as usable for this (and many-- including me --would say more so). > Next after closures? Disjoint sums. E.g., > > fun area(shape) : > switch shape: > case Circle(r): > return pi*r*r > case Rectangle(h,w): > return h*w > > (I'm making up a Python-like syntax.) This is an alternative to the OO > approach. With the OO approach, it is hard to add area, unless you modify > the Circle and Rectangle class definitions. Python allows adding new methods to classes dynamically "from the outside" -- the original definitions don't need to be touched (although it's certainly preferable to add new methods directly!). Take this complaint to the extreme, and I expect you end up reinventing multimethods (suppose you need to add an intersection(shape1, shape2) method: N**2 nesting of "disjoint sums" starts to appear ludicrous ). In any case, the Types-SIG already seems to have decided that some form of "typecase" stmt will be needed; see the archives for that; I expect the use above would be considered abuse, though; Python has no "switch" stmt of any kind today, and the use above can already be spelled via if isinstance(shape, Circle): etc elif isinstace(shape, Rectange): etc else: raise TypeError(etc) From gstein@lyra.org Wed Mar 1 07:51:29 2000 From: gstein@lyra.org (Greg Stein) Date: Tue, 29 Feb 2000 23:51:29 -0800 (PST) Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: On Tue, 29 Feb 2000, Ken Manheimer wrote: >... > None the less, for those practicing it, the incorrectness of it will be > fresh news. I would be less sympathetic with them if there was recent > warning, eg, the schedule for changing it in the next release was part of > the current release. But if you tell somebody you're going to change > something, and then don't for a few years, you probably need to renew the > warning before you make the change. Don't you think so? Why not? I agree. Note that Guido posted a note to c.l.py on Monday. I believe that meets your notification criteria. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Wed Mar 1 08:10:28 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 1 Mar 2000 00:10:28 -0800 (PST) Subject: [Python-Dev] breaking list.append() In-Reply-To: <200003010411.XAA12988@eric.cnri.reston.va.us> Message-ID: On Tue, 29 Feb 2000, Guido van Rossum wrote: > I'm tired of this rhetoric. It's not like I'm changing existing > Python installations retroactively. I'm planning to release a new > version of Python which no longer supports certain long-obsolete and > undocumented behavior. If you maintain a non-core Python module, you > should test it against the new release and fix anything that comes up. > This is why we have an alpha and beta test cycle and even before that > the CVS version. If you are a Python user who depends on a 3rd party > module, you need to find out whether the new version is compatible > with the 3rd party code you are using, or whether there's a newer > version available that solves the incompatibility. > > There are people who still run Python 1.4 (really!) because they > haven't upgraded. I don't have a problem with that -- they don't get > much support, but it's their choice, and they may not need the new > features introduced since then. I expect that lots of people won't > upgrade their Python 1.5.2 to 1.6 right away -- they'll wait until the > other modules/packages they need are compatible with 1.6. Multi-arg > append probably won't be the only reason why e.g. Digital Creations > may need to release an update to Zope for Python 1.6. Zope comes with > its own version of Python anyway, so they have control over when they > make the switch. I wholeheartedly support his approach. Just ask Mark Hammond :-) how many times I've said "let's change the code to make it Right; people aren't required to upgrade [and break their code]." Of course, his counter is that people need to upgrade to fix other, unrelated problems. So I relax and try again later :-). But I still maintain that they can independently grab the specific fixes and leave the other changes we make. Maybe it is grey, but I think this change is quite fine. Especially given Tim's tool. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim_one@email.msn.com Wed Mar 1 08:22:06 2000 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 03:22:06 -0500 Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: <000b01bf8357$3af08d60$412d153f@tim> [Greg Stein] > ... > Maybe it is grey, but I think this change is quite fine. Especially given > Tim's tool. What the heck does Tim's one-eyed trouser snake have to do with this? I know *it* likes to think it's the measure of all things, but, frankly, my tool barely affects the world at all a mere two feet beyond its base . tim-and-his-tool-think-the-change-is-a-mixed-thing-but-on-balance- the-best-thing-ly y'rs - tim From Fredrik Lundh" Message-ID: <00fb01bf8359$c8196a20$34aab5d4@hagrid> Greg Stein wrote: > Note that Guido posted a note to c.l.py on Monday. I believe that = meets > your notification criteria. ahem. do you seriously believe that everyone in the Python universe reads comp.lang.python? afaik, most Python programmers don't. ... so as far as I'm concerned, this was officially deprecated with Guido's post. afaik, no official python documentation has explicitly mentioned this (and the fact that it doesn't explicitly allow it doesn't really matter, since the docs don't explicitly allow the x[a, b, c] syntax either. both work in 1.5.2). has anyone checked the recent crop of Python books, btw? the eff-bot guide uses old syntax in two examples out of 320. how about the others? ... sigh. running checkappend over a 50k LOC application, I just realized that it doesn't catch a very common append pydiom. =20 how fun. even though 99% of all append calls are "legal", this "minor" change will break every single application and library we have :-( oh, wait. xmlrpclib isn't affected. always something! From gstein@lyra.org Wed Mar 1 08:43:02 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 1 Mar 2000 00:43:02 -0800 (PST) Subject: [Python-Dev] breaking list.append() In-Reply-To: <00fb01bf8359$c8196a20$34aab5d4@hagrid> Message-ID: On Wed, 1 Mar 2000, Fredrik Lundh wrote: > Greg Stein wrote: > > Note that Guido posted a note to c.l.py on Monday. I believe that meets > > your notification criteria. > > ahem. do you seriously believe that everyone in the > Python universe reads comp.lang.python? > > afaik, most Python programmers don't. Now you're simply taking my comments out of context. Not a proper thing to do. Ken said that he wanted notification along certain guidelines. I said that I believed Guido's post did just that. Period. Personally, I think it is fine. I also think that a CHANGES file that arrives with 1.6 that points out the incompatibility is also fine. >... > sigh. running checkappend over a 50k LOC application, I > just realized that it doesn't catch a very common append > pydiom. And which is that? Care to help out? Maybe just a little bit? Or do you just want to talk about how bad this change is? :-( Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Wed Mar 1 09:01:52 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 1 Mar 2000 01:01:52 -0800 (PST) Subject: [Python-Dev] breaking list.append() In-Reply-To: <000b01bf8357$3af08d60$412d153f@tim> Message-ID: On Wed, 1 Mar 2000, Tim Peters wrote: > [Greg Stein] > > ... > > Maybe it is grey, but I think this change is quite fine. Especially given > > Tim's tool. > > What the heck does Tim's one-eyed trouser snake have to do with this? I > know *it* likes to think it's the measure of all things, but, frankly, my > tool barely affects the world at all a mere two feet beyond its base . > > tim-and-his-tool-think-the-change-is-a-mixed-thing-but-on-balance- > the-best-thing-ly y'rs - tim Heh. Now how is one supposed to respond to *that* ??! All right. Fine. +3 cool points go to Tim. :-) -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Wed Mar 1 09:03:32 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 1 Mar 2000 01:03:32 -0800 (PST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src Makefile.in,1.82,1.83 In-Reply-To: <14523.56638.286603.340358@weyr.cnri.reston.va.us> Message-ID: On Tue, 29 Feb 2000, Fred L. Drake, Jr. wrote: > Guido van Rossum writes: > > You can already extract this from the updated documetation on the > > website (which has a list of obsolete modules). > > > > But you're righ,t it would be good to be open about this. I'll think > > about it. > > Note that the updated documentation isn't yet "published"; there are > no links to it and it hasn't been checked as much as I need it to be > before announcing it. Isn't the documentation better than what has been released? In other words, if you release now, how could you make things worse? If something does turn up during a check, you can always release again... Cheers, -g -- Greg Stein, http://www.lyra.org/ From Fredrik Lundh" Message-ID: <011001bf835e$600d1da0$34aab5d4@hagrid> Greg Stein wrote: > On Wed, 1 Mar 2000, Fredrik Lundh wrote: > > Greg Stein wrote: > > > Note that Guido posted a note to c.l.py on Monday. I believe that = meets > > > your notification criteria. > >=20 > > ahem. do you seriously believe that everyone in the > > Python universe reads comp.lang.python? > >=20 > > afaik, most Python programmers don't. >=20 > Now you're simply taking my comments out of context. Not a proper = thing to > do. Ken said that he wanted notification along certain guidelines. I = said > that I believed Guido's post did just that. Period. my point was that most Python programmers won't see that notification. when these people download 1.6 final and find that all theirs apps just broke, they probably won't be happy with a pointer to dejanews. > And which is that? Care to help out? Maybe just a little bit? this rather common pydiom: append =3D list.append for x in something: append(...) it's used a lot where performance matters. > Or do you just want to talk about how bad this change is? :-( yes, I think it's bad. I've been using Python since 1.2, and no other change has had the same consequences (wrt. time/money required to fix it) call me a crappy programmer if you want, but I'm sure there are others out there who are nearly as bad. and lots of them won't be aware of this change until some- one upgrades the python interpreter on their server. From mal@lemburg.com Wed Mar 1 08:38:52 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 01 Mar 2000 09:38:52 +0100 Subject: [Python-Dev] Unicode mapping tables References: <000701bf834a$77acdfe0$412d153f@tim> Message-ID: <38BCD71C.3592E6A@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > ... > > Currently, mapping tables map characters to Unicode characters > > and vice-versa. Now the .translate method will use a different > > kind of table: mapping integer ordinals to integer ordinals. > > You mean that if I want to map u"a" to u"A", I have to set up some sort of > dict mapping ord(u"a") to ord(u"A")? I simply couldn't follow this. I meant: 'a': u'A' vs. ord('a'): ord(u'A') The latter wins ;-) Reasoning for the first was that it allows character sequences to be handled by the same mapping algorithm. I decided to leave those techniques to some future implementation, since mapping integers has the nice side-effect of also allowing sequences to be used as mapping tables... resulting in some speedup at the cost of memory consumption. BTW, there are now three different ways to do char translations: 1. char -> unicode (char mapping codec's decode) 2. unicode -> char (char mapping codec's encode) 3. unicode -> unicode (unicode's .translate() method) > > Question: What is more of efficient: having lots of integers > > in a dictionary or lots of characters ? > > My bet is "lots of integers", to reduce both space use and comparison time. Right. That's what I found too... it's "lots of integers" now :-) > > ... > > Something else that changed is the way .capitalize() works. The > > Unicode version uses the Unicode algorithm for it (see TechRep. 13 > > on the www.unicode.org site). > > #13 is "Unicode Newline Guidelines". I assume you meant #21 ("Case > Mappings"). Dang. You're right. Here's the URL in case someone wants to join in: http://www.unicode.org/unicode/reports/tr21/tr21-2.html > > Here's the new doc string: > > > > S.capitalize() -> unicode > > > > Return a capitalized version of S, i.e. words start with title case > > characters, all remaining cased characters have lower case. > > > > Note that *all* characters are touched, not just the first one. > > The change was needed to get it in sync with the .iscapitalized() > > method which is based on the Unicode algorithm too. > > > > Should this change be propogated to the string implementation ? > > Unicode makes distinctions among "upper case", "lower case" and "title > case", and you're trying to get away with a single "capitalize" function. > Java has separate toLowerCase, toUpperCase and toTitleCase methods, and > that's the way to do it. The Unicode implementation has the corresponding: .upper(), .lower() and .capitalize() They work just like .toUpperCase, .toLowerCase, .toTitleCase resp. (well at least they should ;). > Whatever you do, leave .capitalize alone for 8-bit > strings -- there's no reason to break code that currently works. > "capitalize" seems a terrible choice of name for a titlecase method anyway, > because of its baggage connotations from 8-bit strings. Since this stuff is > complicated, I say it would be much better to use the same names for these > things as the Unicode and Java folk do: there's excellent documentation > elsewhere for all this stuff, and it's Bad to make users mentally translate > unique Python terminology to make sense of the official docs. Hmm, that's an argument but it breaks the current method naming scheme of all lowercase letter. Perhaps I should simply provide a new method for .toTitleCase(), e.g. .title(), and leave the previous definition of .capitalize() intact... > So my vote is: leave capitalize the hell alone . Do not implement > capitialize for Unicode strings. Introduce a new titlecase method for > Unicode strings. Add a new titlecase method to 8-bit strings too. Unicode > strings should also have methods to get at uppercase and lowercase (as > Unicode defines those). ...looks like you're more or less on the same wave length here ;-) Here's what I'll do: * implement .capitalize() in the traditional way for Unicode objects (simply convert the first char to uppercase) * implement u.title() to mean the same as Java's toTitleCase() * don't implement s.title(): the reasoning here is that it would confuse the user when she get's different return values for the same string (titlecase chars usually live in higher Unicode code ranges not reachable in Latin-1) Thanks for the feedback, -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim_one@email.msn.com Wed Mar 1 10:06:58 2000 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 05:06:58 -0500 Subject: [Python-Dev] breaking list.append() In-Reply-To: <00fb01bf8359$c8196a20$34aab5d4@hagrid> Message-ID: <000e01bf8365$e1e0b9c0$412d153f@tim> [/F] > ... > so as far as I'm concerned, this was officially deprecated > with Guido's post. afaik, no official python documentation > has explicitly mentioned this (and the fact that it doesn't > explicitly allow it doesn't really matter, since the docs don't > explicitly allow the x[a, b, c] syntax either. both work in > 1.5.2). The "Subscriptions" section of the Reference Manual explicitly allows for dict[a, b, c] and explicitly does not allow for sequence[a, b, c] The "Mapping Types" section of the Library Ref does not explicitly allow for it, though, and if you read it as implicitly allowing for it (based on the Reference Manual's clarification of "key" syntax), you would also have to read the Library Ref as allowing for dict.has_key(a, b, c) Which 1.5.2 does allow, but which Guido very recently patched to treat as a syntax error. > ... > sigh. running checkappend over a 50k LOC application, I > just realized that it doesn't catch a very common append > pydiom. [And, later, after prodding by GregS] > this rather common pydiom: > > append = list.append > for x in something: > append(...) This limitation was pointed out in checkappend's module docstring. Doesn't make it any easier for you to swallow, but I needed to point out that you didn't *have* to stumble into this the hard way . > how fun. even though 99% of all append calls are "legal", > this "minor" change will break every single application and > library we have :-( > > oh, wait. xmlrpclib isn't affected. always something! What would you like to do, then? The code will be at least as broken a year from now, and probably more so -- unless you fix it. So this sounds like an indirect argument for never changing Python's behavior here. Frankly, I expect you could fix the 50K LOC in less time than it took me to write this naggy response <0.50K wink>. embrace-change-ly y'rs - tim From tim_one@email.msn.com Wed Mar 1 10:31:12 2000 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 05:31:12 -0500 Subject: [Python-Dev] breaking list.append() In-Reply-To: <000e01bf8365$e1e0b9c0$412d153f@tim> Message-ID: <001001bf8369$453e9fc0$412d153f@tim> [Tim. needing sleep] > dict.has_key(a, b, c) > > Which 1.5.2 does allow, but which Guido very recently patched to > treat as a syntax error. No, a runtime error. haskeynanny.py, anyone? not-me-ly y'rs - tim From fredrik@pythonware.com Wed Mar 1 11:14:18 2000 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 1 Mar 2000 12:14:18 +0100 Subject: [Python-Dev] breaking list.append() References: <000e01bf8365$e1e0b9c0$412d153f@tim> Message-ID: <002101bf836f$4a012220$f29b12c2@secret.pythonware.com> Tim Peters wrote: > The "Subscriptions" section of the Reference Manual explicitly allows = for >=20 > dict[a, b, c] >=20 > and explicitly does not allow for >=20 > sequence[a, b, c] I'd thought we'd agreed that nobody reads the reference manual ;-) > What would you like to do, then? more time to fix it, perhaps? it's surely a minor code change, but fixing it can be harder than you think (just witness Gerrit's bogus patches) after all, python might be free, but more and more people are investing lots of money in using it [1]. > The code will be at least as broken a year > from now, and probably more so -- unless you fix it.=20 sure. we've already started. but it's a lot of work, and it's quite likely that it will take a while until we can be 100% confident that all the changes are pro- perly done. (not all software have a 100% complete test suite that simply says "yes, this works" or "no, it doesn't") 1) fwiw, some poor soul over here posted a short note to the pythonworks mailing, mentioning that we've now fixed the price. a major flamewar erupted, and my mail- box is now full of mail from unknowns telling me that I must be a complete moron that doesn't understand that Python is just a toy system, which everyone uses just be- cause they cannot afford anything better... From tim_one@email.msn.com Wed Mar 1 11:26:21 2000 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 06:26:21 -0500 Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python In-Reply-To: <200003010544.AAA13155@eric.cnri.reston.va.us> Message-ID: <001101bf8370$f881dfa0$412d153f@tim> Very briefly: [Guido] > ... > Today, Eric proposed to do away with Neil's hash table altogether -- > as long as we're wasting memory, we might as well add 3 fields to each > container object rather than allocating the same amount in a separate > hash table. Eric expects that this will run faster, although this > obviously needs to be tried. No, it doesn't : it will run faster. > Container types are: dict, list, tuple, class, instance; plus > potentially user-defined container types such as kjbuckets. I > have a feeling that function objects should also be considered > container types, because of the cycle involving globals. Note that the list-migrating steps you sketch later are basically the same as (but hairier than) the ones JimF and I worked out for M&S-on-RC a few years ago, right down to using appending to effect a breadth-first traversal without requiring recursion -- except M&S doesn't have to bother accounting for sources of refcounts. Since *this* scheme does more work per item per scan, to be as fast in the end it has to touch less stuff than M&S. But the more kinds of types you track, the more stuff this scheme will have to chase. The tradeoffs are complicated & unclear, so I'll just raise an uncomfortable meta-point : you balked at M&S the last time around because of the apparent need for two link fields + a bit or two per object of a "chaseable type". If that's no longer perceived as being a showstopper, M&S should be reconsidered too. I happen to be a fan of both approaches . The worst part of M&S-on-RC (== the one I never had a good answer for) is that a non-cooperating extension type E can't be chased, hence objects reachable only from objects of type E never get marked, so are vulnerable to bogus collection. In the Neil/Toby scheme, objects of type E merely act as sources of "external" references, so the scheme fails safe (in the sense of never doing a bogus collection due to non-cooperating types). Hmm ... if both approaches converge on keeping a list of all chaseable objects, and being careful of uncoopoerating types, maybe the only real difference in the end is whether the root set is given explicitly (as in traditional M&S) or inferred indirectly (but where "root set" has a different meaning in the scheme you sketched). > ... > In our case, we may need a type-specific "clear" function for containers > in the type object. I think definitely, yes. full-speed-sideways-ly y'rs - tim From mal@lemburg.com Wed Mar 1 10:40:36 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 01 Mar 2000 11:40:36 +0100 Subject: [Python-Dev] breaking list.append() References: <011001bf835e$600d1da0$34aab5d4@hagrid> Message-ID: <38BCF3A4.1CCADFCE@lemburg.com> Fredrik Lundh wrote: > > Greg Stein wrote: > > On Wed, 1 Mar 2000, Fredrik Lundh wrote: > > > Greg Stein wrote: > > > > Note that Guido posted a note to c.l.py on Monday. I believe that meets > > > > your notification criteria. > > > > > > ahem. do you seriously believe that everyone in the > > > Python universe reads comp.lang.python? > > > > > > afaik, most Python programmers don't. > > > > Now you're simply taking my comments out of context. Not a proper thing to > > do. Ken said that he wanted notification along certain guidelines. I said > > that I believed Guido's post did just that. Period. > > my point was that most Python programmers won't > see that notification. when these people download > 1.6 final and find that all theirs apps just broke, they > probably won't be happy with a pointer to dejanews. Dito. Anyone remember the str(2L) == '2' change, BTW ? That one will cost lots of money in case someone implemented an eShop using the common str(2L)[:-1] idiom... There will need to be a big warning sign somewhere that people see *before* finding the download link. (IMHO, anyways.) > > And which is that? Care to help out? Maybe just a little bit? > > this rather common pydiom: > > append = list.append > for x in something: > append(...) > > it's used a lot where performance matters. Same here. checkappend.py doesn't find these (a great tool BTW, thanks Tim; I noticed that it leaks memory badly though). > > Or do you just want to talk about how bad this change is? :-( > > yes, I think it's bad. I've been using Python since 1.2, > and no other change has had the same consequences > (wrt. time/money required to fix it) > > call me a crappy programmer if you want, but I'm sure > there are others out there who are nearly as bad. and > lots of them won't be aware of this change until some- > one upgrades the python interpreter on their server. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@python.org Wed Mar 1 12:07:42 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 01 Mar 2000 07:07:42 -0500 Subject: need .append patch (was RE: [Python-Dev] Re: Python-checkins digest, Vol 1 #370 - 8 msgs) In-Reply-To: Your message of "Wed, 01 Mar 2000 00:57:49 EST." <000601bf8343$13575040$412d153f@tim> References: <000601bf8343$13575040$412d153f@tim> Message-ID: <200003011207.HAA13342@eric.cnri.reston.va.us> > To the extent that you're serious about CP4E, you're begging for more of > this, not less . Which is exactly why I am breaking multi-arg append now -- this is my last chance. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Mar 1 12:27:10 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 01 Mar 2000 07:27:10 -0500 Subject: [Python-Dev] Unicode mapping tables In-Reply-To: Your message of "Wed, 01 Mar 2000 09:38:52 +0100." <38BCD71C.3592E6A@lemburg.com> References: <000701bf834a$77acdfe0$412d153f@tim> <38BCD71C.3592E6A@lemburg.com> Message-ID: <200003011227.HAA13396@eric.cnri.reston.va.us> > Here's what I'll do: > > * implement .capitalize() in the traditional way for Unicode > objects (simply convert the first char to uppercase) > * implement u.title() to mean the same as Java's toTitleCase() > * don't implement s.title(): the reasoning here is that it would > confuse the user when she get's different return values for > the same string (titlecase chars usually live in higher Unicode > code ranges not reachable in Latin-1) Huh? For ASCII at least, titlecase seems to map to ASCII; in your current implementation, only two Latin-1 characters (u'\265' and u'\377', I have no easy way to show them in Latin-1) map outside the Latin-1 range. Anyway, I would suggest to add a title() call to 8-bit strings as well; then we can do away with string.capwords(), which does something similar but different, mostly by accident. --Guido van Rossum (home page: http://www.python.org/~guido/) From jack@oratrix.nl Wed Mar 1 12:34:42 2000 From: jack@oratrix.nl (Jack Jansen) Date: Wed, 01 Mar 2000 13:34:42 +0100 Subject: [Python-Dev] Re: A warning switch? In-Reply-To: Message by Guido van Rossum , Mon, 28 Feb 2000 12:35:12 -0500 , <200002281735.MAA27771@eric.cnri.reston.va.us> Message-ID: <20000301123442.7DEF8371868@snelboot.oratrix.nl> > > What about adding a command-line switch for enabling warnings, as has > > been suggested long ago? The .append() change could then print a > > warning in 1.6alphas (and betas?), but still run, and be turned into > > an error later. > > That's better. I propose that the warnings are normally on, and that > there are flags to turn them off or thrn them into errors. Can we then please have an interface to the "give warning" call (in stead of a simple fprintf)? On the mac (and possibly also in PythonWin) it's probably better to pop up a dialog (possibly with a "don't show again" button) than do a printf which may get lost. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From guido@python.org Wed Mar 1 12:55:42 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 01 Mar 2000 07:55:42 -0500 Subject: [Python-Dev] Re: A warning switch? In-Reply-To: Your message of "Wed, 01 Mar 2000 13:34:42 +0100." <20000301123442.7DEF8371868@snelboot.oratrix.nl> References: <20000301123442.7DEF8371868@snelboot.oratrix.nl> Message-ID: <200003011255.HAA13489@eric.cnri.reston.va.us> > Can we then please have an interface to the "give warning" call (in > stead of a simple fprintf)? On the mac (and possibly also in > PythonWin) it's probably better to pop up a dialog (possibly with a > "don't show again" button) than do a printf which may get lost. Sure. All you have to do is code it (or get someone else to code it). <0.9 wink> --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Wed Mar 1 13:32:02 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 01 Mar 2000 14:32:02 +0100 Subject: [Python-Dev] Unicode mapping tables References: <000701bf834a$77acdfe0$412d153f@tim> <38BCD71C.3592E6A@lemburg.com> <200003011227.HAA13396@eric.cnri.reston.va.us> Message-ID: <38BD1BD2.792E9B73@lemburg.com> Guido van Rossum wrote: > > > Here's what I'll do: > > > > * implement .capitalize() in the traditional way for Unicode > > objects (simply convert the first char to uppercase) > > * implement u.title() to mean the same as Java's toTitleCase() > > * don't implement s.title(): the reasoning here is that it would > > confuse the user when she get's different return values for > > the same string (titlecase chars usually live in higher Unicode > > code ranges not reachable in Latin-1) > > Huh? For ASCII at least, titlecase seems to map to ASCII; in your > current implementation, only two Latin-1 characters (u'\265' and > u'\377', I have no easy way to show them in Latin-1) map outside the > Latin-1 range. You're right, sorry for the confusion. I was thinking of other encodings like e.g. cp437 which have corresponding characters in the higher Unicode ranges. > Anyway, I would suggest to add a title() call to 8-bit strings as > well; then we can do away with string.capwords(), which does something > similar but different, mostly by accident. Ok, I'll do it this way then: s.title() will use C's toupper() and tolower() for case mapping and u.title() the Unicode routines. This will be in sync with the rest of the 8-bit string world (which is locale aware on many platforms AFAIK), even though it might not return the same string as the corresponding u.title() call. u.capwords() will be disabled in the Unicode implemetation... it wasn't even implemented for the string implementetation, so there's no breakage ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From akuchlin@mems-exchange.org Wed Mar 1 14:59:07 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Wed, 1 Mar 2000 09:59:07 -0500 (EST) Subject: [Python-Dev] breaking list.append() In-Reply-To: <011001bf835e$600d1da0$34aab5d4@hagrid> References: <011001bf835e$600d1da0$34aab5d4@hagrid> Message-ID: <14525.12347.120543.804804@amarok.cnri.reston.va.us> Fredrik Lundh writes: >yes, I think it's bad. I've been using Python since 1.2, >and no other change has had the same consequences >(wrt. time/money required to fix it) There are more things in 1.6 that might require fixing existing code: str(2L) returning '2', the int/long changes, the Unicode changes, and if it gets added, garbage collection -- and bugs caused by those changes might not be catchable by a nanny. IMHO it's too early to point at the .append() change as breaking too much existing code; there may be changes that break a lot more. I'd wait and see what happens once the 1.6 alphas become available; if c.l.p is filled with shrieks and groans, GvR might decide to back the offending change out. (Or he might not...) -- A.M. Kuchling http://starship.python.net/crew/amk/ I have no skills with machines. I fear them, and because I cannot help attributing human qualities to them, I suspect that they hate me and will kill me if they can. -- Robertson Davies, "Reading" From klm@digicool.com Wed Mar 1 15:37:49 2000 From: klm@digicool.com (Ken Manheimer) Date: Wed, 1 Mar 2000 10:37:49 -0500 (EST) Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: On Tue, 29 Feb 2000, Greg Stein wrote: > On Tue, 29 Feb 2000, Ken Manheimer wrote: > >... > > None the less, for those practicing it, the incorrectness of it will be > > fresh news. I would be less sympathetic with them if there was recent > > warning, eg, the schedule for changing it in the next release was part of > > the current release. But if you tell somebody you're going to change > > something, and then don't for a few years, you probably need to renew the > > warning before you make the change. Don't you think so? Why not? > > I agree. > > Note that Guido posted a note to c.l.py on Monday. I believe that meets > your notification criteria. Actually, by "part of the current release", i meant having the deprecation/impending-deletion warning in the release notes for the release before the one where the deletion happens - saying it's being deprecated now, will be deleted next time around. Ken klm@digicool.com I mean, you tell one guy it's blue. He tells his guy it's brown, and it lands on the page sorta purple. Wavy Gravy/Hugh Romney From Vladimir.Marangozov@inrialpes.fr Wed Mar 1 17:07:07 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Wed, 1 Mar 2000 18:07:07 +0100 (CET) Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python In-Reply-To: <200003010544.AAA13155@eric.cnri.reston.va.us> from "Guido van Rossum" at Mar 01, 2000 12:44:10 AM Message-ID: <200003011707.SAA01310@python.inrialpes.fr> Guido van Rossum wrote: > > Thanks for the new patches, Neil! Thanks from me too! I notice, however, that hash_resize() still uses a malloc call instead of PyMem_NEW. Neil, please correct this in your version immediately ;-) > > We had a visitor here at CNRI today, Eric Tiedemann > , who had a look at your patches before. Eric > knows his way around the Scheme, Lisp and GC literature, and presented > a variant on your approach which takes the bite out of the recursive > passes. Avoiding the recursion is valuable, as long we're optimizing the implementation of one particular scheme. It doesn't bother me that Neil's scheme is recursive, because I still perceive his code as a proof of concept. You're presenting here another scheme based on refcounts arithmetic, generalized for all container types. The linked list implementation of this generalized scheme is not directly related to the logic. I have some suspitions on the logic, so you'll probably want to elaborate a bit more on it, and convince me that this scheme would actually work. > Today, Eric proposed to do away with Neil's hash table altogether -- > as long as we're wasting memory, we might as well add 3 fields to each > container object rather than allocating the same amount in a separate > hash table. I cannot agree so easily with this statement, but you should have expecting this from me :-) If we're about to opimize storage, I have good reasons to believe that we don't need 3 additional slots per container (but 1 for gc_refs, yes). We could certainly envision allocating the containers within memory pools of 4K (just as it is done in pymalloc, and close to what we have for ints & floats). These pools would be labaled as "container's memory", they would obviously be under our control, and we'd have additional slots per pool, not per object. As long as we isolate the containers from the rest, we can enumerate them easily by walking though the pools. But I'm willing to defer this question for now, as it involves the object allocators (the builtin allocators + PyObject_NEW for extension types E -- user objects of type E would be automatically taken into account for GC if there's a flag in the type struct which identifies them as containers). > Eric expects that this will run faster, although this obviously needs > to be tried. Definitely, although I trust Eric & Tim :-) > > Container types are: dict, list, tuple, class, instance; plus > potentially user-defined container types such as kjbuckets. I have a > feeling that function objects should also be considered container > types, because of the cycle involving globals. + other extension container types. And I insist. Don't forget that we're planning to merge types and classes... > > Eric's algorithm, then, consists of the following parts. > > Each container object has three new fields: gc_next, gc_prev, and > gc_refs. (Eric calls the gc_refs "refcount-zero".) > > We color objects white (initial), gray (root), black (scanned root). > (The terms are explained later; we believe we don't actually need bits > in the objects to store the color; see later.) > > All container objects are chained together in a doubly-linked list -- > this is the same as Neil's code except Neil does it only for dicts. > (Eric postulates that you need a list header.) > > When GC is activated, all objects are colored white; we make a pass > over the entire list and set gc_refs equal to the refcount for each > object. Step 1: for all containers, c->gc_refs = c->ob_refcnt > > Next, we make another pass over the list to collect the internal > references. Internal references are (just like in Neil's version) > references from other container types. In Neil's version, this was > recursive; in Eric's version, we don't need recursion, since the list > already contains all containers. So we simple visit the containers in > the list in turn, and for each one we go over all the objects it > references and subtract one from *its* gc_refs field. (Eric left out > the little detail that we ened to be able to distinguish between > container and non-container objects amongst those references; this can > be a flag bit in the type field.) Step 2: c->gc_refs = c->gc_refs - Nb_referenced_containers_from_c I guess that you realize that after this step, gc_refs can be zero or negative. I'm not sure that you collect "internal" references here (references from other container types). A list referencing 20 containers, being itself referenced by one container + one static variable + two times from the runtime stack, has an initial refcount == 4, so we'll end up with gc_refs == -16. A tuple referencing 1 list, referenced once by the stack, will end up with gc_refs == 0. Neil's scheme doesn't seem to have this "property". > > Now, similar to Neil's version, all objects for which gc_refs == 0 > have only internal references, and are potential garbage; all objects > for which gc_refs > 0 are "roots". These have references to them from > other places, e.g. from globals or stack frames in the Python virtual > machine. > Agreed, some roots have gc_refs > 0 I'm not sure that all of them have it, though... Do they? > We now start a second list, to which we will move all roots. The way > to do this is to go over the first list again and to move each object > that has gc_refs > 0 to the second list. Objects placed on the second > list in this phase are considered colored gray (roots). > Step 3: Roots with gc_refs > 0 go to the 2nd list. All c->gc_refs <= 0 stay in the 1st list. > Of course, some roots will reference some non-roots, which keeps those > non-roots alive. We now make a pass over the second list, where for > each object on the second list, we look at every object it references. > If a referenced object is a container and is still in the first list > (colored white) we *append* it to the second list (colored gray). > Because we append, objects thus added to the second list will > eventually be considered by this same pass; when we stop finding > objects that sre still white, we stop appending to the second list, > and we will eventually terminate this pass. Conceptually, objects on > the second list that have been scanned in this pass are colored black > (scanned root); but there is no need to to actually make the > distinction. > Step 4: Closure on reachable containers which are all moved to the 2nd list. (Assuming that the objects are checked only via their type, without involving gc_refs) > (How do we know whether an object pointed to is white (in the first > list) or gray or black (in the second)? Good question? :-) > We could use an extra bitfield, but that's a waste of space. > Better: we could set gc_refs to a magic value (e.g. 0xffffffff) when > we move the object to the second list. I doubt that this would work for the reasons mentioned above. > During the meeting, I proposed to set the back pointer to NULL; that > might work too but I think the gc_refs field is more elegant. We could > even just test for a non-zero gc_refs field; the roots moved to the > second list initially all have a non-zero gc_refs field already, and > for the objects with a zero gc_refs field we could indeed set it to > something arbitrary.) Not sure that "arbitrary" is a good choice if the differentiation is based solely on gc_refs. > > Once we reach the end of the second list, all objects still left in > the first list are garbage. We can destroy them in a similar to the > way Neil does this in his code. Neil calls PyDict_Clear on the > dictionaries, and ignores the rest. Under Neils assumption that all > cycles (that he detects) involve dictionaries, that is sufficient. In > our case, we may need a type-specific "clear" function for containers > in the type object. Couldn't this be done in the object's dealloc function? Note that both Neil's and this scheme assume that garbage _detection_ and garbage _collection_ is an atomic operation. I must say that I don't care of having some living garbage if it doesn't hurt my work. IOW, the used criterion for triggering the detection phase _may_ eventually differ from the one used for the collection phase. But this is where we reach the incremental approaches, implying different reasoning as a whole. My point is that the introduction of a "clear" function depends on the adopted scheme, whose logic depends on pertinent statistics on memory consumption of the cyclic garbage. To make it simple, we first need stats on memory consumption, then we can discuss objectively on how to implement some particular GC scheme. I second Eric on the need for excellent statistics. > > The general opinion was that we should first implement and test the > algorithm as sketched above, and then changes or extensions could be > made. I'd like to see it discussed first in conjunction with (1) the possibility of having a proprietary malloc, (2) the envisioned type/class unification. Perhaps I'm getting too deep, but once something gets in, it's difficult to take it out, even when a better solution is found subsequently. Although I'm enthousiastic about this work on GC, I'm not in a position to evaluate the true benefits of the proposed schemes, as I still don't have a basis for evaluating how much garbage my program generates and whether it hurts the interpreter compared to its overal memory consumption. > > I was pleasantly surprised to find Neil's code in my inbox when we > came out of the meeting; I think it would be worthwhile to compare and > contrast the two approaches. (Hm, maybe there's a paper in it?) I'm all for it! -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From jeremy@cnri.reston.va.us Wed Mar 1 17:53:13 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Wed, 1 Mar 2000 12:53:13 -0500 (EST) Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python In-Reply-To: <200003011707.SAA01310@python.inrialpes.fr> References: <200003010544.AAA13155@eric.cnri.reston.va.us> <200003011707.SAA01310@python.inrialpes.fr> Message-ID: <14525.22793.963077.707198@goon.cnri.reston.va.us> >>>>> "VM" == Vladimir Marangozov writes: [">>" == Guido explaining Eric Tiedemann's GC design] >> Next, we make another pass over the list to collect the internal >> references. Internal references are (just like in Neil's >> version) references from other container types. In Neil's >> version, this was recursive; in Eric's version, we don't need >> recursion, since the list already contains all containers. So we >> simple visit the containers in the list in turn, and for each one >> we go over all the objects it references and subtract one from >> *its* gc_refs field. (Eric left out the little detail that we >> ened to be able to distinguish between container and >> non-container objects amongst those references; this can be a >> flag bit in the type field.) VM> Step 2: c->gc_refs = c->gc_refs - VM> Nb_referenced_containers_from_c VM> I guess that you realize that after this step, gc_refs can be VM> zero or negative. I think Guido's explanation is slightly ambiguous. When he says, "subtract one from *its" gc_refs field" he means subtract one from the _contained_ object's gc_refs field. VM> I'm not sure that you collect "internal" references here VM> (references from other container types). A list referencing 20 VM> containers, being itself referenced by one container + one VM> static variable + two times from the runtime stack, has an VM> initial refcount == 4, so we'll end up with gc_refs == -16. The strategy is not that the container's gc_refs is decremented once for each object it contains. Rather, the container decrements each contained object's gc_refs by one. So you should never end of with gc_refs < 0. >> During the meeting, I proposed to set the back pointer to NULL; >> that might work too but I think the gc_refs field is more >> elegant. We could even just test for a non-zero gc_refs field; >> the roots moved to the second list initially all have a non-zero >> gc_refs field already, and for the objects with a zero gc_refs >> field we could indeed set it to something arbitrary.) I believe we discussed this further and concluded that setting the back pointer to NULL would not work. If we make the second list doubly-linked (like the first one), it is trivial to end GC by swapping the first and second lists. If we've zapped the NULL pointer, then we have to go back and re-set them all. Jeremy From mal@lemburg.com Wed Mar 1 18:44:58 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 01 Mar 2000 19:44:58 +0100 Subject: [Python-Dev] Unicode Snapshot 2000-03-01 Message-ID: <38BD652A.EA2EB0A3@lemburg.com> There is a new Unicode implementation snaphot available at the secret URL. It contains quite a few small changes to the internal APIs, doc strings for all methods and some new methods (e.g. .title()) on the Unicode and the string objects. The code page mappings are now integer->integer which should make them more performant. Some of the C codec APIs have changed, so you may need to adapt code that already uses these (Fredrik ?!). Still missing is a MSVC project file... haven't gotten around yet to build one. The code does compile on WinXX though, as Finn Bock told me in private mail. Please try out the new stuff... Most interesting should be the code in Lib/codecs.py as it provides a very high level interface to all those builtin codecs. BTW: I would like to implement a .readline() method using only the .read() method as basis. Does anyone have a good idea on how this could be done without buffering ? (Unicode has a slightly larger choice of line break chars as C; the .splitlines() method will deal with these) Gotta run... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Fredrik Lundh" <011001bf835e$600d1da0$34aab5d4@hagrid> <14525.12347.120543.804804@amarok.cnri.reston.va.us> Message-ID: <034a01bf83b3$e97c8620$34aab5d4@hagrid> Andrew M. Kuchling wrote: > There are more things in 1.6 that might require fixing existing code: > str(2L) returning '2', the int/long changes, the Unicode changes, and > if it gets added, garbage collection -- and bugs caused by those > changes might not be catchable by a nanny. hey, you make it sound like "1.6" should really be "2.0" ;-) From nascheme@enme.ucalgary.ca Wed Mar 1 19:29:02 2000 From: nascheme@enme.ucalgary.ca (nascheme@enme.ucalgary.ca) Date: Wed, 1 Mar 2000 12:29:02 -0700 Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python In-Reply-To: <200003011707.SAA01310@python.inrialpes.fr>; from marangoz@python.inrialpes.fr on Wed, Mar 01, 2000 at 06:07:07PM +0100 References: <200003010544.AAA13155@eric.cnri.reston.va.us> <200003011707.SAA01310@python.inrialpes.fr> Message-ID: <20000301122902.B7773@acs.ucalgary.ca> On Wed, Mar 01, 2000 at 06:07:07PM +0100, Vladimir Marangozov wrote: > Guido van Rossum wrote: > > Once we reach the end of the second list, all objects still left in > > the first list are garbage. We can destroy them in a similar to the > > way Neil does this in his code. Neil calls PyDict_Clear on the > > dictionaries, and ignores the rest. Under Neils assumption that all > > cycles (that he detects) involve dictionaries, that is sufficient. In > > our case, we may need a type-specific "clear" function for containers > > in the type object. > > Couldn't this be done in the object's dealloc function? No, I don't think so. The object still has references to it. You have to be careful about how you break cycles so that memory is not accessed after it is freed. Neil -- "If elected mayor, my first act will be to kill the whole lot of you, and burn your town to cinders!" -- Groundskeeper Willie From gvwilson@nevex.com Wed Mar 1 20:19:30 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Wed, 1 Mar 2000 15:19:30 -0500 (EST) Subject: [Python-Dev] DDJ article on Python GC Message-ID: Jon Erickson (editor-in-chief) of "Doctor Dobb's Journal" would like an article on what's involved in adding garbage collection to Python. Please email me if you're interested in tackling it... Thanks, Greg From fdrake@acm.org Wed Mar 1 20:37:49 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 1 Mar 2000 15:37:49 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src Makefile.in,1.82,1.83 In-Reply-To: References: <14523.56638.286603.340358@weyr.cnri.reston.va.us> Message-ID: <14525.32669.909212.716484@weyr.cnri.reston.va.us> Greg Stein writes: > Isn't the documentation better than what has been released? In other > words, if you release now, how could you make things worse? If something > does turn up during a check, you can always release again... Releasing is still somewhat tedious, and I don't want to ask people to do several substantial downloads & installs. So far, a major navigation bug has been fonud in the test version I posted (just now fixed online); *thats* why I don't like to release too hastily! I don't think waiting two more weeks is a problem. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido@python.org Wed Mar 1 22:53:26 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 01 Mar 2000 17:53:26 -0500 Subject: [Python-Dev] DDJ article on Python GC In-Reply-To: Your message of "Wed, 01 Mar 2000 15:19:30 EST." References: Message-ID: <200003012253.RAA16056@eric.cnri.reston.va.us> > Jon Erickson (editor-in-chief) of "Doctor Dobb's Journal" would like an > article on what's involved in adding garbage collection to Python. Please > email me if you're interested in tackling it... I might -- although I should get Neil, Eric and Tim as co-authors. I'm halfway implementing the scheme that Eric showed yesterday. It's very elegant, but I don't have an idea about its impact performance yet. Say hi to Jon -- we've met a few times. I liked his March editorial, having just read the same book and had the same feeling of "wow, an open source project in the 19th century!" --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond@skippinet.com.au Wed Mar 1 23:09:23 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Thu, 2 Mar 2000 10:09:23 +1100 Subject: [Python-Dev] Re: A warning switch? In-Reply-To: <200003011255.HAA13489@eric.cnri.reston.va.us> Message-ID: > > Can we then please have an interface to the "give warning" call (in > > stead of a simple fprintf)? On the mac (and possibly also in > > PythonWin) it's probably better to pop up a dialog (possibly with a > > "don't show again" button) than do a printf which may get lost. > > Sure. All you have to do is code it (or get someone else to code it). How about just having either a "sys.warning" function, or maybe even a sys.stdwarn stream? Then a simple C API to call this, and we are done :-) sys.stdwarn sounds OK - it just defaults to sys.stdout, so the Mac and Pythonwin etc should "just work" by sending the output wherever sys.stdout goes today... Mark. From tim_one@email.msn.com Thu Mar 2 05:08:39 2000 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 2 Mar 2000 00:08:39 -0500 Subject: [Python-Dev] breaking list.append() In-Reply-To: <38BCF3A4.1CCADFCE@lemburg.com> Message-ID: <001001bf8405$5f9582c0$732d153f@tim> [/F] > append = list.append > for x in something: > append(...) [M.-A. Lemburg] > Same here. checkappend.py doesn't find these As detailed in a c.l.py posting, I have yet to find a single instance of this actually called with multiple arguments. Pointing out that it's *possible* isn't the same as demonstrating it's an actual problem. I'm quite willing to believe that it is, but haven't yet seen evidence of it. For whatever reason, people seem much (and, in my experience so far, infinitely ) more prone to make the list.append(1, 2, 3) error than the maybethisisanappend(1, 2, 3) error. > (a great tool BTW, thanks Tim; I noticed that it leaks memory badly > though). Which Python? Which OS? How do you know? What were you running it over? Using 1.5.2 under Win95, according to wintop, & over the whole CVS tree, the total (code + data) virtual memory allocated to it peaked at about 2Mb a few seconds into the run, and actually decreased as time went on. So, akin to the bound method multi-argument append problem, the "checkappend leak problem" is something I simply have no reason to believe . Check your claim again? checkappend.py itself obviously creates no cycles or holds on to any state across files, so if you're seeing a leak it must be a bug in some other part of the version of Python + std libraries you're using. Maybe a new 1.6 bug? Something you did while adding Unicode? Etc. Tell us what you were running. Has anyone else seen a leak? From tim_one@email.msn.com Thu Mar 2 05:50:19 2000 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 2 Mar 2000 00:50:19 -0500 Subject: [Python-Dev] str vs repr at prompt again (FW: String printing behavior?) Message-ID: <001401bf840b$3177ba60$732d153f@tim> Another unsolicited testimonial that countless users are oppressed by auto-repr (as opposed to auto-str) at the interpreter prompt. Just trying to keep a once-hot topic from going stone cold forever . -----Original Message----- From: python-list-admin@python.org [mailto:python-list-admin@python.org] On Behalf Of Ted Drain Sent: Wednesday, March 01, 2000 5:42 PM To: python-list@python.org Subject: String printing behavior? Hi all, I've got a question about the string printing behavior. If I define a functions as: >>> def foo(): ... return "line1\nline2" >>> foo() 'line1\013line2' >>> print foo() line1 line2 >>> It seems to me that the default printing behavior for strings should match behavior of the print routine. I realize that some people may want to see embedded control codes, but I would advocate a seperate method for printing raw byte sequences. We are using the python interactive prompt as a pseudo-matlab like user interface and the current printing behavior is very confusing to users. It also means that functions that return text (like help routines) must print the string rather than returning it. Returning the string is much more flexible because it allows the string to be captured easily and redirected. Any thoughts? Ted -- Ted Drain Jet Propulsion Laboratory Ted.Drain@jpl.nasa.gov -- http://www.python.org/mailman/listinfo/python-list From mal@lemburg.com Thu Mar 2 07:42:33 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 02 Mar 2000 08:42:33 +0100 Subject: [Python-Dev] breaking list.append() References: <001001bf8405$5f9582c0$732d153f@tim> Message-ID: <38BE1B69.E0B88B41@lemburg.com> Tim Peters wrote: > > [/F] > > append = list.append > > for x in something: > > append(...) > > [M.-A. Lemburg] > > Same here. checkappend.py doesn't find these > > As detailed in a c.l.py posting, I have yet to find a single instance of > this actually called with multiple arguments. Pointing out that it's > *possible* isn't the same as demonstrating it's an actual problem. I'm > quite willing to believe that it is, but haven't yet seen evidence of it. Haven't had time to check this yet, but I'm pretty sure there are some instances of this idiom in my code. Note that I did in fact code like this on purpose: it saves a tuple construction for every append, which can make a difference in tight loops... > For whatever reason, people seem much (and, in my experience so far, > infinitely ) more prone to make the > > list.append(1, 2, 3) > > error than the > > maybethisisanappend(1, 2, 3) > > error. Of course... still there are hidden instances of the problem which are yet to be revealed. For my own code the siutation is even worse, since I sometimes did: add = list.append for x in y: add(x,1,2) > > (a great tool BTW, thanks Tim; I noticed that it leaks memory badly > > though). > > Which Python? Which OS? How do you know? What were you running it over? That's Python 1.5 on Linux2. I let the script run over a large lib directory and my projects directory. In the projects directory the script consumed as much as 240MB of process size. > Using 1.5.2 under Win95, according to wintop, & over the whole CVS tree, the > total (code + data) virtual memory allocated to it peaked at about 2Mb a few > seconds into the run, and actually decreased as time went on. So, akin to > the bound method multi-argument append problem, the "checkappend leak > problem" is something I simply have no reason to believe . Check your > claim again? checkappend.py itself obviously creates no cycles or holds on > to any state across files, so if you're seeing a leak it must be a bug in > some other part of the version of Python + std libraries you're using. > Maybe a new 1.6 bug? Something you did while adding Unicode? Etc. Tell us > what you were running. I'll try the same thing again using Python1.5.2 and the CVS version. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Thu Mar 2 07:46:49 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 02 Mar 2000 08:46:49 +0100 Subject: [Python-Dev] breaking list.append() References: <001001bf8405$5f9582c0$732d153f@tim> <38BE1B69.E0B88B41@lemburg.com> Message-ID: <38BE1C69.C8A9E6B0@lemburg.com> "M.-A. Lemburg" wrote: > > > > (a great tool BTW, thanks Tim; I noticed that it leaks memory badly > > > though). > > > > Which Python? Which OS? How do you know? What were you running it over? > > That's Python 1.5 on Linux2. I let the script run over > a large lib directory and my projects directory. In the > projects directory the script consumed as much as 240MB > of process size. > > > Using 1.5.2 under Win95, according to wintop, & over the whole CVS tree, the > > total (code + data) virtual memory allocated to it peaked at about 2Mb a few > > seconds into the run, and actually decreased as time went on. So, akin to > > the bound method multi-argument append problem, the "checkappend leak > > problem" is something I simply have no reason to believe . Check your > > claim again? checkappend.py itself obviously creates no cycles or holds on > > to any state across files, so if you're seeing a leak it must be a bug in > > some other part of the version of Python + std libraries you're using. > > Maybe a new 1.6 bug? Something you did while adding Unicode? Etc. Tell us > > what you were running. > > I'll try the same thing again using Python1.5.2 and the CVS version. Using the Unicode patched CVS version there's no leak anymore. Couldn't find a 1.5.2 version on my machine... I'll build one later. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@python.org Thu Mar 2 15:32:32 2000 From: guido@python.org (Guido van Rossum) Date: Thu, 02 Mar 2000 10:32:32 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? Message-ID: <200003021532.KAA17088@eric.cnri.reston.va.us> I was looking at the code that invokes __del__, with the intent to implement a feature from Java: in Java, a finalizer is only called once per object, even if calling it makes the object live longer. To implement this, we need a flag in each instance that means "__del__ was called". I opened the creation code for instances, looking for the right place to set the flag. I then realized that it might be smart, now that we have this flag anyway, to set it to "true" during initialization. There are a number of exits from the initialization where the object is created but not fully initialized, where the new object is DECREF'ed and NULL is returned. When such an exit is taken, __del__ is called on an incompletely initialized object! Example: >>> class C: def __del__(self): print "deleting", self >>> x = C(1) !--> deleting <__main__.C instance at 1686d8> Traceback (innermost last): File "", line 1, in ? TypeError: this constructor takes no arguments >>> Now I have a choice to make. If the class has an __init__, should I clear the flag only after __init__ succeeds? This means that if __init__ raises an exception, __del__ is never called. This is an incompatibility. It's possible that someone has written code that relies on __del__ being called even when __init__ fails halfway, and then their code would break. But it is just as likely that calling __del__ on a partially uninitialized object is a bad mistake, and I am doing all these cases a favor by not calling __del__ when __init__ failed! Any opinions? If nobody speaks up, I'll make the change. --Guido van Rossum (home page: http://www.python.org/~guido/) From bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Thu Mar 2 16:44:00 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) (Barry A. Warsaw) Date: Thu, 2 Mar 2000 11:44:00 -0500 (EST) Subject: [Python-Dev] Design question: call __del__ only after successful __init__? References: <200003021532.KAA17088@eric.cnri.reston.va.us> Message-ID: <14526.39504.36065.657527@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> Now I have a choice to make. If the class has an __init__, GvR> should I clear the flag only after __init__ succeeds? This GvR> means that if __init__ raises an exception, __del__ is never GvR> called. This is an incompatibility. It's possible that GvR> someone has written code that relies on __del__ being called GvR> even when __init__ fails halfway, and then their code would GvR> break. It reminds me of the separation between object allocation and initialization in ObjC. GvR> But it is just as likely that calling __del__ on a partially GvR> uninitialized object is a bad mistake, and I am doing all GvR> these cases a favor by not calling __del__ when __init__ GvR> failed! GvR> Any opinions? If nobody speaks up, I'll make the change. I think you should set the flag right before you call __init__(), i.e. after (nearly all) the C level initialization has occurred. Here's why: your "favor" can easily be accomplished by Python constructs in the __init__(): class MyBogo: def __init__(self): self.get_delified = 0 do_sumtin_exceptional() self.get_delified = 1 def __del__(self): if self.get_delified: ah_sweet_release() -Barry From gstein@lyra.org Thu Mar 2 17:14:35 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 2 Mar 2000 09:14:35 -0800 (PST) Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: <200003021532.KAA17088@eric.cnri.reston.va.us> Message-ID: On Thu, 2 Mar 2000, Guido van Rossum wrote: >... > But it is just as likely that calling __del__ on a partially > uninitialized object is a bad mistake, and I am doing all these cases > a favor by not calling __del__ when __init__ failed! > > Any opinions? If nobody speaks up, I'll make the change. +1 on calling __del__ IFF __init__ completes successfully. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jeremy@cnri.reston.va.us Thu Mar 2 17:15:14 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Thu, 2 Mar 2000 12:15:14 -0500 (EST) Subject: [Python-Dev] str vs repr at prompt again (FW: String printing behavior?) In-Reply-To: <001401bf840b$3177ba60$732d153f@tim> References: <001401bf840b$3177ba60$732d153f@tim> Message-ID: <14526.41378.374653.497993@goon.cnri.reston.va.us> >>>>> "TP" == Tim Peters writes: TP> Another unsolicited testimonial that countless users are TP> oppressed by auto-repr (as opposed to auto-str) at the TP> interpreter prompt. Just trying to keep a once-hot topic from TP> going stone cold forever . [Signature from the included message:] >> -- Ted Drain Jet Propulsion Laboratory Ted.Drain@jpl.nasa.gov -- This guy is probably a rocket scientist. We want the language to be useful for everybody, not just rocket scientists. Jeremy From guido@python.org Thu Mar 2 22:45:37 2000 From: guido@python.org (Guido van Rossum) Date: Thu, 02 Mar 2000 17:45:37 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: Your message of "Thu, 02 Mar 2000 11:44:00 EST." <14526.39504.36065.657527@anthem.cnri.reston.va.us> References: <200003021532.KAA17088@eric.cnri.reston.va.us> <14526.39504.36065.657527@anthem.cnri.reston.va.us> Message-ID: <200003022245.RAA20265@eric.cnri.reston.va.us> > >>>>> "GvR" == Guido van Rossum writes: > > GvR> Now I have a choice to make. If the class has an __init__, > GvR> should I clear the flag only after __init__ succeeds? This > GvR> means that if __init__ raises an exception, __del__ is never > GvR> called. This is an incompatibility. It's possible that > GvR> someone has written code that relies on __del__ being called > GvR> even when __init__ fails halfway, and then their code would > GvR> break. [Barry] > It reminds me of the separation between object allocation and > initialization in ObjC. Is that good or bad? > GvR> But it is just as likely that calling __del__ on a partially > GvR> uninitialized object is a bad mistake, and I am doing all > GvR> these cases a favor by not calling __del__ when __init__ > GvR> failed! > > GvR> Any opinions? If nobody speaks up, I'll make the change. > > I think you should set the flag right before you call __init__(), > i.e. after (nearly all) the C level initialization has occurred. > Here's why: your "favor" can easily be accomplished by Python > constructs in the __init__(): > > class MyBogo: > def __init__(self): > self.get_delified = 0 > do_sumtin_exceptional() > self.get_delified = 1 > > def __del__(self): > if self.get_delified: > ah_sweet_release() But the other behavior (call __del__ even when __init__ fails) can also easily be accomplished in Python: class C: def __init__(self): try: ...stuff that may fail... except: self.__del__() raise def __del__(self): ...cleanup... I believe that in almost all cases the programmer would be happier if __del__ wasn't called when their __init__ fails. This makes it easier to write a __del__ that can assume that all the object's fields have been properly initialized. In my code, typically when __init__ fails, this is a symptom of a really bad bug (e.g. I just renamed one of __init__'s arguments and forgot to fix all references), and I don't care much about cleanup behavior. --Guido van Rossum (home page: http://www.python.org/~guido/) From bwarsaw@cnri.reston.va.us Thu Mar 2 22:52:31 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Thu, 2 Mar 2000 17:52:31 -0500 (EST) Subject: [Python-Dev] Design question: call __del__ only after successful __init__? References: <200003021532.KAA17088@eric.cnri.reston.va.us> <14526.39504.36065.657527@anthem.cnri.reston.va.us> <200003022245.RAA20265@eric.cnri.reston.va.us> Message-ID: <14526.61615.362973.624022@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> But the other behavior (call __del__ even when __init__ GvR> fails) can also easily be accomplished in Python: It's a fair cop. GvR> I believe that in almost all cases the programmer would be GvR> happier if __del__ wasn't called when their __init__ fails. GvR> This makes it easier to write a __del__ that can assume that GvR> all the object's fields have been properly initialized. That's probably fine; I don't have strong feelings either way. -Barry P.S. Interesting what X-Oblique-Strategy was randomly inserted in this message (but I'm not sure which approach is more "explicit" :). -Barry From tim_one@email.msn.com Fri Mar 3 05:38:59 2000 From: tim_one@email.msn.com (Tim Peters) Date: Fri, 3 Mar 2000 00:38:59 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: <200003021532.KAA17088@eric.cnri.reston.va.us> Message-ID: <000001bf84d2$c711e2e0$092d153f@tim> [Guido] > I was looking at the code that invokes __del__, with the intent to > implement a feature from Java: in Java, a finalizer is only called > once per object, even if calling it makes the object live longer. Why? That is, in what way is this an improvement over current behavior? Note that Java is a bit subtle: a finalizer is only called once by magic; explicit calls "don't count". The Java rules add up to quite a confusing mish-mash. Python's rules are *currently* clearer. I deal with possible exceptions in Python constructors the same way I do in C++ and Java: if there's a destructor, don't put anything in __init__ that may raise an uncaught exception. Anything dangerous is moved into a separate .reset() (or .clear() or ...) method. This works well in practice. > To implement this, we need a flag in each instance that means "__del__ > was called". At least . > I opened the creation code for instances, looking for the right place > to set the flag. I then realized that it might be smart, now that we > have this flag anyway, to set it to "true" during initialization. There > are a number of exits from the initialization where the object is created > but not fully initialized, where the new object is DECREF'ed and NULL is > returned. When such an exit is taken, __del__ is called on an > incompletely initialized object! I agree *that* isn't good. Taken on its own, though, it argues for adding an "instance construction completed" flag that __del__ later checks, as if its body were: if self.__instance_construction_completed: body That is, the problem you've identified here could be addressed directly. > Now I have a choice to make. If the class has an __init__, should I > clear the flag only after __init__ succeeds? This means that if > __init__ raises an exception, __del__ is never called. This is an > incompatibility. It's possible that someone has written code that > relies on __del__ being called even when __init__ fails halfway, and > then their code would break. > > But it is just as likely that calling __del__ on a partially > uninitialized object is a bad mistake, and I am doing all these cases > a favor by not calling __del__ when __init__ failed! > > Any opinions? If nobody speaks up, I'll make the change. I'd be in favor of fixing the actual problem; I don't understand the point to the rest of it, especially as it has the potential to break existing code and I don't see a compensating advantage (surely not compatibility w/ JPython -- JPython doesn't invoke __del__ methods at all by magic, right? or is that changing, and that's what's driving this?). too-much-magic-is-dizzying-ly y'rs - tim From bwarsaw@cnri.reston.va.us Fri Mar 3 05:50:16 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 3 Mar 2000 00:50:16 -0500 (EST) Subject: [Python-Dev] Design question: call __del__ only after successful __init__? References: <200003021532.KAA17088@eric.cnri.reston.va.us> <000001bf84d2$c711e2e0$092d153f@tim> Message-ID: <14527.21144.9421.958311@anthem.cnri.reston.va.us> >>>>> "TP" == Tim Peters writes: TP> (surely not compatibility w/ JPython -- JPython doesn't invoke TP> __del__ methods at all by magic, right? or is that changing, TP> and that's what's driving this?). No, JPython doesn't invoke __del__ methods by magic, and I don't have any plans to change that. -Barry From ping@lfw.org Fri Mar 3 09:00:21 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Fri, 3 Mar 2000 01:00:21 -0800 (PST) Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: Message-ID: On Thu, 2 Mar 2000, Greg Stein wrote: > On Thu, 2 Mar 2000, Guido van Rossum wrote: > >... > > But it is just as likely that calling __del__ on a partially > > uninitialized object is a bad mistake, and I am doing all these cases > > a favor by not calling __del__ when __init__ failed! > > > > Any opinions? If nobody speaks up, I'll make the change. > > +1 on calling __del__ IFF __init__ completes successfully. That would be my vote as well. What convinced me of this is the following: If it's up to the implementation of __del__ to deal with a problem that happened during initialization, you only know about the problem with very coarse granularity. It's a pain (or even impossible) to then rediscover the information you need to recover adequately. If on the other hand you deal with the problem in __init__, then you have much better control over what is happening, because you can position try/except blocks precisely where you need them to deal with specific potential problems. Each block can take care of its case appropriately, and re-raise if necessary. In general, it seems to me that what you want to do when __init__ runs afoul is going to be different from what you want to do to take care of object cleanup in __del__. So it doesn't belong there -- it belongs in an except: clause in __init__. Even though it's an incompatibility, i really think this is the right behaviour. -- ?!ng "To be human is to continually change. Your desire to remain as you are is what ultimately limits you." -- The Puppet Master, Ghost in the Shell From guido@python.org Fri Mar 3 16:13:16 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 03 Mar 2000 11:13:16 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: Your message of "Fri, 03 Mar 2000 00:38:59 EST." <000001bf84d2$c711e2e0$092d153f@tim> References: <000001bf84d2$c711e2e0$092d153f@tim> Message-ID: <200003031613.LAA21571@eric.cnri.reston.va.us> > [Guido] > > I was looking at the code that invokes __del__, with the intent to > > implement a feature from Java: in Java, a finalizer is only called > > once per object, even if calling it makes the object live longer. [Tim] > Why? That is, in what way is this an improvement over current behavior? > > Note that Java is a bit subtle: a finalizer is only called once by magic; > explicit calls "don't count". Of course. Same in my proposal. But I wouldn't call it "by magic" -- just "on behalf of the garbage collector". > The Java rules add up to quite a confusing mish-mash. Python's rules are > *currently* clearer. I don't find the Java rules confusing. It seems quite useful that the GC promises to call the finalizer at most once -- this can simplify the finalizer logic. (Otherwise it may have to ask itself, "did I clean this already?" and leave notes for itself.) Explicit finalizer calls are always a mistake and thus "don't count" -- the response to that should in general be "don't do that" (unless you have particularly stupid callers -- or very fearful lawyers :-). > I deal with possible exceptions in Python constructors the same way I do in > C++ and Java: if there's a destructor, don't put anything in __init__ that > may raise an uncaught exception. Anything dangerous is moved into a > separate .reset() (or .clear() or ...) method. This works well in practice. Sure, but the rule "if __init__ fails, __del__ won't be called" means that we don't have to program our __init__ or __del__ quite so defensively. Most people who design a __del__ probably assume that __init__ has run to completion. The typical scenario (which has happened to me! And I *implemented* the damn thing!) is this: __init__ opens a file and assigns it to an instance variable; __del__ closes the file. This is tested a few times and it works great. Now in production the file somehow unexpectedly fails to be openable. Sure, the programmer should've expected that, but she didn't. Now, at best, the failed __del__ creates an additional confusing error message on top of the traceback generated by IOError. At worst, the failed __del__ could wreck the original traceback. Note that I'm not proposing to change the C level behavior; when a Py_New() function is halfway its initialization and decides to bail out, it does a DECREF(self) and you bet that at this point the _dealloc() function gets called (via self->ob_type->tp_dealloc). Occasionally I need to initialize certain fields to NULL so that the dealloc() function doesn't try to free memory that wasn't allocated. Often it's as simple as using XDECREF instead of DECREF in the dealloc() function (XDECREF is safe when the argument is NULL, DECREF dumps core, saving a load-and-test if you are sure its arg is a valid object). > > To implement this, we need a flag in each instance that means "__del__ > > was called". > > At least . > > > I opened the creation code for instances, looking for the right place > > to set the flag. I then realized that it might be smart, now that we > > have this flag anyway, to set it to "true" during initialization. There > > are a number of exits from the initialization where the object is created > > but not fully initialized, where the new object is DECREF'ed and NULL is > > returned. When such an exit is taken, __del__ is called on an > > incompletely initialized object! > > I agree *that* isn't good. Taken on its own, though, it argues for adding > an "instance construction completed" flag that __del__ later checks, as if > its body were: > > if self.__instance_construction_completed: > body > > That is, the problem you've identified here could be addressed directly. Sure -- but I would argue that when __del__ returns, __instance_construction_completed should be reset to false, because the destruction (conceptually, at least) cancels out the construction! > > Now I have a choice to make. If the class has an __init__, should I > > clear the flag only after __init__ succeeds? This means that if > > __init__ raises an exception, __del__ is never called. This is an > > incompatibility. It's possible that someone has written code that > > relies on __del__ being called even when __init__ fails halfway, and > > then their code would break. > > > > But it is just as likely that calling __del__ on a partially > > uninitialized object is a bad mistake, and I am doing all these cases > > a favor by not calling __del__ when __init__ failed! > > > > Any opinions? If nobody speaks up, I'll make the change. > > I'd be in favor of fixing the actual problem; I don't understand the point > to the rest of it, especially as it has the potential to break existing code > and I don't see a compensating advantage (surely not compatibility w/ > JPython -- JPython doesn't invoke __del__ methods at all by magic, right? > or is that changing, and that's what's driving this?). JPython's a red herring here. I think that the proposed change probably *fixes* much morecode that is subtly wrong than it breaks code that is relying on __del__ being called after a partial __init__. All the rules relating to __del__ are confusing (e.g. what __del__ can expect to survive in its globals). Also note Ping's observation: | If it's up to the implementation of __del__ to deal with a problem | that happened during initialization, you only know about the problem | with very coarse granularity. It's a pain (or even impossible) to | then rediscover the information you need to recover adequately. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one@email.msn.com Fri Mar 3 16:49:52 2000 From: tim_one@email.msn.com (Tim Peters) Date: Fri, 3 Mar 2000 11:49:52 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: <200003031613.LAA21571@eric.cnri.reston.va.us> Message-ID: <000501bf8530$7f8c78a0$b0a0143f@tim> [Tim] >> Note that Java is a bit subtle: a finalizer is only called >> once by magic; explicit calls "don't count". [Guido] > Of course. Same in my proposal. OK -- that wasn't clear. > But I wouldn't call it "by magic" -- just "on behalf of the garbage > collector". Yup, magically called . >> The Java rules add up to quite a confusing mish-mash. Python's >> rules are *currently* clearer. > I don't find the Java rules confusing. "add up" == "taken as a whole"; include the Java spec's complex state machine for cleanup semantics, and the later complications added by three (four?) distinct flavors of weak reference, and I doubt 1 Java programmer in 1,000 actually understands the rules. This is why I'm wary of moving in the Java *direction* here. Note that Java programmers in past c.l.py threads have generally claimed Java's finalizers are so confusing & unpredictable they don't use them at all! Which, in the end, is probably a good idea in Python too <0.5 wink>. > It seems quite useful that the GC promises to call the finalizer at > most once -- this can simplify the finalizer logic. Granting that explicit calls are "use at your own risk", the only user-visible effect of "called only once" is in the presence of resurrection. Now in my Python experience, on the few occasions I've resurrected an object in __del__, *of course* I expected __del__ to get called again if the object is about to die again! Typical: def __del__(self): if oops_i_still_need_to_stay_alive: resurrect(self) else: # really going away release(self.critical_resource) Call __del__ only once, and code like this is busted bigtime. OTOH, had I written __del__ logic that relied on being called only once, switching the implementation to call it more than once would break *that* bigtime. Neither behavior is an obvious all-cases win to me, or even a plausibly most-cases win. But Python already took a stand on this & so I think you need a *good* reason to change semantics now. > ... > Sure, but the rule "if __init__ fails, __del__ won't be called" means > that we don't have to program our __init__ or __del__ quite so > defensively. Most people who design a __del__ probably assume that > __init__ has run to completion. ... This is (or can easily be made) a separate issue, & I agreed the first time this seems worth fixing (although if nobody has griped about it in a decade of use, it's hard to call it a major bug ). > ... > Sure -- but I would argue that when __del__ returns, >__instance_construction_completed should be reset to false, because > the destruction (conceptually, at least) cancels out the construction! In the __del__ above (which is typical of the cases of resurrection I've seen), there is no such implication. Perhaps this is philosophical abuse of Python's intent, but if so it relied only on trusting its advertised semantics. > I think that the proposed change probably *fixes* much morecode that > is subtly wrong than it breaks code that is relying on __del__ being > called after a partial __init__. Yes, again, I have no argument against refusing to call __del__ unless __init__ succeeded. Going beyond that to a new "called at most once" rule is indeed going beyond that, *will* break reasonable old code, and holds no particular attraction that I can see (it trades making one kind of resurrection scenario easier at the cost of making other kinds harder). If there needs to be incompatible change here, curiously enough I'd be more in favor of making resurrection illegal period (which could *really* simplify gc's headaches). > All the rules relating to __del__ are confusing (e.g. what __del__ can > expect to survive in its globals). Problems unique to final shutdown don't seem relevant here. > Also note Ping's observation: ... I can't agree with that yet another time without being quadruply redundant . From guido@python.org Fri Mar 3 16:50:08 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 03 Mar 2000 11:50:08 -0500 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Your message of "Wed, 01 Mar 2000 00:44:10 EST." <200003010544.AAA13155@eric.cnri.reston.va.us> References: <20000229153421.A16502@acs.ucalgary.ca> <200003010544.AAA13155@eric.cnri.reston.va.us> Message-ID: <200003031650.LAA21647@eric.cnri.reston.va.us> We now have two implementations of Eric Tiedemann's idea: Neil and I both implemented it. It's too soon to post the patch sets (both are pretty rough) but I've got another design question. Once we've identified a bunch of objects that are only referring to each other (i.e., one or more cycles) we have to dispose of them. The question is, how? We can't just call free on each of the objects; some may not be allocated with malloc, and some may contain pointers to other malloc'ed memory that also needs to be freed. So we have to get their destructors involved. But how? Calling ob->ob_type->tp_dealloc(ob) for an object who reference count is unsafe -- this will destroy the object while there are still references to it! Those references are all coming from other objects that are part of the same cycle; those objects will also be deallocated and they will reference the deallocated objects (if only to DECREF them). Neil uses the same solution that I use when finalizing the Python interpreter -- find the dictionaries and call PyDict_Clear() on them. (In his unpublished patch, he also clears the lists using PyList_SetSlice(list, 0, list->ob_size, NULL). He's also generalized so that *every* object can define a tp_clear function in its type object.) As long as every cycle contains at least one dictionary or list object, this will break cycles reliably and get rid of all the garbage. (If you wonder why: clearing the dict DECREFs the next object(s) in the cycle; if the last dict referencing a particular object is cleared, the last DECREF will deallocate that object, which will in turn DECREF the objects it references, and so forth. Since none of the objects in the cycle has incoming references from outside the cycle, we can prove that this will delete all objects as long as there's a dict or list in each cycle. However, there's a snag. It's the same snag as what finalizing the Python interpreter runs into -- it has to do with __del__ methods and the undefined order in which the dictionaries are cleared. For example, it's quite possible that the first dictionary we clear is the __dict__ of an instance, so this zaps all its instance variables. Suppose this breaks the cycle, so then the instance itself gets DECREFed to zero. Its deallocator will be called. If it's got a __del__, this __del__ will be called -- but all the instance variables have already been zapped, so it will fail miserably! It's also possible that the __dict__ of a class involved in a cycle gets cleared first, in which case the __del__ no longer "exists", and again the cleanup is skipped. So the question is: What to *do*? My solution is to make an extra pass over all the garbage objects *before* we clear dicts and lists, and for those that are instances and have __del__ methods, call their __del__ ("by magic", as Tim calls it in another post). The code in instance_dealloc() already does the right thing here: it calls __del__, then discovers that the reference count is > 0 ("I'm not dead yet" :-), and returns without freeing the object. (This is also why I want to introduce a flag ensuring that __del__ gets called by instance_dealloc at most once: later when the instance gets DECREFed to 0, instance_dealloc is called again and will correctly free the object; but we don't want __del__ called again.) [Note for Neil: somehow I forgot to add this logic to the code; in_del_called isn't used! The change is obvious though.] This still leaves a problem for the user: if two class instances reference each other and both have a __del__, we can't predict whose __del__ is called first when they are called as part of cycle collection. The solution is to write each __del__ so that it doesn't depend on the other __del__. Someone (Tim?) in the past suggested a different solution (probably found in another language): for objects that are collected as part of a cycle, the destructor isn't called at all. The memory is freed (since it's no longer reachable), but the destructor is not called -- it is as if the object lives on forever. This is theoretically superior, but not practical: when I have an object that creates a temp file, I want to be able to reliably delete the temp file in my destructor, even when I'm part of a cycle! --Guido van Rossum (home page: http://www.python.org/~guido/) From jack@oratrix.nl Fri Mar 3 16:57:54 2000 From: jack@oratrix.nl (Jack Jansen) Date: Fri, 03 Mar 2000 17:57:54 +0100 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Message by Guido van Rossum , Fri, 03 Mar 2000 11:50:08 -0500 , <200003031650.LAA21647@eric.cnri.reston.va.us> Message-ID: <20000303165755.490EA371868@snelboot.oratrix.nl> The __init__ rule for calling __del__ has me confused. Is this per-class or per-object? I.e. what will happen in the following case: class Purse: def __init__(self): self.balance = WithdrawCashFromBank(1000) def __del__(self): PutCashBackOnBank(self.balance) self.balance = 0 class LossyPurse(Purse): def __init__(self): Purse.__init__(self) raise 'kaboo! kaboo!' If the new scheme means that the __del__ method of Purse isn't called I think I don't like it. In the current scheme I can always program defensively: def __del__(self): try: b = self.balance self.balance = 0 except AttributeError: pass else: PutCashBackOnBank(b) but in a new scheme with a per-object "__del__ must be called" flag I can't... -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From guido@python.org Fri Mar 3 17:05:00 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 03 Mar 2000 12:05:00 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: Your message of "Fri, 03 Mar 2000 11:49:52 EST." <000501bf8530$7f8c78a0$b0a0143f@tim> References: <000501bf8530$7f8c78a0$b0a0143f@tim> Message-ID: <200003031705.MAA21700@eric.cnri.reston.va.us> OK, so we're down to this one point: if __del__ resurrects the object, should __del__ be called again later? Additionally, should resurrection be made illegal? I can easily see how __del__ could *accidentally* resurrect the object as part of its normal cleanup -- e.g. you make a call to some other routine that helps with the cleanup, passing self as an argument, and this other routine keeps a helpful cache of the last argument for some reason. I don't see how we could forbid this type of resurrection. (What are you going to do? You can't raise an exception from instance_dealloc, since it is called from DECREF. You can't track down the reference and replace it with a None easily.) In this example, the helper routine will eventually delete the object from its cache, at which point it is truly deleted. It would be harmful, not helpful, if __del__ was called again at this point. Now, it is true that the current docs for __del__ imply that resurrection is possible. The intention of that note was to warn __del__ writers that in the case of accidental resurrection __del__ might be called again. The intention certainly wasn't to allow or encourage intentional resurrection. Would there really be someone out there who uses *intentional* resurrection? I severely doubt it. I've never heard of this. [Jack just finds a snag] > The __init__ rule for calling __del__ has me confused. Is this per-class or > per-object? > > I.e. what will happen in the following case: > > class Purse: > def __init__(self): > self.balance = WithdrawCashFromBank(1000) > > def __del__(self): > PutCashBackOnBank(self.balance) > self.balance = 0 > > class LossyPurse(Purse): > def __init__(self): > Purse.__init__(self) > raise 'kaboo! kaboo!' > > If the new scheme means that the __del__ method of Purse isn't called I think > I don't like it. In the current scheme I can always program defensively: > def __del__(self): > try: > b = self.balance > self.balance = 0 > except AttributeError: > pass > else: > PutCashBackOnBank(b) > but in a new scheme with a per-object "__del__ must be called" flag I can't... Yes, that's a problem. But there are other ways for the subclass to break the base class's invariant (e.g. it could override __del__ without calling the base class' __del__). So I think it's a red herring. In Python 3000, typechecked classes may declare invariants that are enforced by the inheritance mechanism; then we may need to keep track which base class constructors succeeded and only call corresponding destructors. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Fri Mar 3 18:17:11 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 03 Mar 2000 19:17:11 +0100 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? References: <000501bf8530$7f8c78a0$b0a0143f@tim> <200003031705.MAA21700@eric.cnri.reston.va.us> Message-ID: <38C001A7.6CF8F365@lemburg.com> Guido van Rossum wrote: > > OK, so we're down to this one point: if __del__ resurrects the object, > should __del__ be called again later? Additionally, should > resurrection be made illegal? Yes and no :-) One example comes to mind: implementations of weak references, which manage weak object references themselves (as soon as __del__ is called the weak reference implementation takes over the object). Another example is that of free list like implementations which reduce object creation times by implementing smart object recycling, e.g. objects could keep allocated dictionaries alive or connections to databases open, etc. As for the second point: Calling __del__ again is certainly needed to keep application logic sane... after all, __del__ should be called whenever the refcount reaches 0 -- and that can happend more than once in the objects life-time if reanimation occurs. > I can easily see how __del__ could *accidentally* resurrect the object > as part of its normal cleanup -- e.g. you make a call to some other > routine that helps with the cleanup, passing self as an argument, and > this other routine keeps a helpful cache of the last argument for some > reason. I don't see how we could forbid this type of resurrection. > (What are you going to do? You can't raise an exception from > instance_dealloc, since it is called from DECREF. You can't track > down the reference and replace it with a None easily.) > In this example, the helper routine will eventually delete the object > from its cache, at which point it is truly deleted. It would be > harmful, not helpful, if __del__ was called again at this point. I'd say this is an application logic error -- nothing that the mechanism itself can help with automagically. OTOH, turning multi calls to __del__ off, would make certain techniques impossible. > Now, it is true that the current docs for __del__ imply that > resurrection is possible. The intention of that note was to warn > __del__ writers that in the case of accidental resurrection __del__ > might be called again. The intention certainly wasn't to allow or > encourage intentional resurrection. I don't think that docs are the right argument here ;-) It is simply the reference counting logic that plays its role: __del__ is called when refcount reaches 0, which usually means that the object is about to be garbage collected... unless the object is rereferenced by some other object and thus gets reanimated. > Would there really be someone out there who uses *intentional* > resurrection? I severely doubt it. I've never heard of this. BTW, I can't see what the original question has to do with this discussion ... calling __del__ only after successful __init__ is ok, IMHO, but what does this have to do with the way __del__ itself is implemented ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Fri Mar 3 18:30:36 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 03 Mar 2000 19:30:36 +0100 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? References: <20000229153421.A16502@acs.ucalgary.ca> <200003010544.AAA13155@eric.cnri.reston.va.us> <200003031650.LAA21647@eric.cnri.reston.va.us> Message-ID: <38C004CC.1FE0A501@lemburg.com> [Guido about ways to cleanup cyclic garbage] FYI, I'm using a special protocol for disposing of cyclic garbage: the __cleanup__ protocol. The purpose of this call is probably similar to Neil's tp_clear: it is intended to let objects break possible cycles in their own storage scope, e.g. instances can delete instance variables which they know can cause cyclic garbage. The idea is simple: give all power to the objects rather than try to solve everything with one magical master plan. The mxProxy package has details on the protocol. The __cleanup__ method is called by the Proxy when the Proxy is about to be deleted. If all references to an object go through the Proxy, the __cleanup__ method call can easily break cycles to have the refcount reach zero in which case __del__ is called. Since the object knows about this scheme it can take precautions to make sure that __del__ still works after __cleanup__ was called. Anyway, just a thought... there are probably many ways to do all this. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tismer@tismer.com Fri Mar 3 18:51:55 2000 From: tismer@tismer.com (Christian Tismer) Date: Fri, 03 Mar 2000 19:51:55 +0100 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? References: <000501bf8530$7f8c78a0$b0a0143f@tim> <200003031705.MAA21700@eric.cnri.reston.va.us> Message-ID: <38C009CB.72BD49CA@tismer.com> Guido van Rossum wrote: > > OK, so we're down to this one point: if __del__ resurrects the object, > should __del__ be called again later? Additionally, should > resurrection be made illegal? [much stuff] Just a random note: What if we had a __del__ with zombie behavior? Assume an instance that is about to be destructed. Then __del__ is called via normal method lookup. What we want is to let this happen only once. Here the Zombie: After method lookup, place a dummy __del__ into the to-be-deleted instance dict, and we are sure that this does not harm. Kinda "yes its there, but a broken link ". The zombie always works by doing nothing. Makes some sense? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From gstein@lyra.org Fri Mar 3 23:09:48 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 3 Mar 2000 15:09:48 -0800 (PST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> Message-ID: You may as well remove the entire "vi" concept from ConfigParser. Since "vi" can be *only* a '=' or ':', then you aren't truly checking anything in the "if" statement. Further, "vi" is used nowhere else, so that variable and the corresponding regex group can be nuked altogether. IMO, I'm not sure why the ";" comment form was initially restricted to just one option format in the first place. Cheers, -g On Fri, 3 Mar 2000, Jeremy Hylton wrote: > Update of /projects/cvsroot/python/dist/src/Lib > In directory bitdiddle:/home/jhylton/python/src/Lib > > Modified Files: > ConfigParser.py > Log Message: > allow comments beginning with ; in key: value as well as key = value > > > Index: ConfigParser.py > =================================================================== > RCS file: /projects/cvsroot/python/dist/src/Lib/ConfigParser.py,v > retrieving revision 1.16 > retrieving revision 1.17 > diff -C2 -r1.16 -r1.17 > *** ConfigParser.py 2000/02/28 23:23:55 1.16 > --- ConfigParser.py 2000/03/03 20:43:57 1.17 > *************** > *** 359,363 **** > optname, vi, optval = mo.group('option', 'vi', 'value') > optname = string.lower(optname) > ! if vi == '=' and ';' in optval: > # ';' is a comment delimiter only if it follows > # a spacing character > --- 359,363 ---- > optname, vi, optval = mo.group('option', 'vi', 'value') > optname = string.lower(optname) > ! if vi in ('=', ':') and ';' in optval: > # ';' is a comment delimiter only if it follows > # a spacing character > > > _______________________________________________ > Python-checkins mailing list > Python-checkins@python.org > http://www.python.org/mailman/listinfo/python-checkins > -- Greg Stein, http://www.lyra.org/ From jeremy@cnri.reston.va.us Fri Mar 3 23:15:32 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Fri, 3 Mar 2000 18:15:32 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> Message-ID: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> Thanks for catching that. I didn't look at the context. I'm going to wait, though, until I talk to Fred to mess with the code any more. General question for python-dev readers: What are your experiences with ConfigParser? I just used it to build a simple config parser for IDLE and found it hard to use for several reasons. The biggest problem was that the file format is undocumented. I also found it clumsy to have to specify section and option arguments. I ended up writing a proxy that specializes on section so that get takes only an option argument. It sounds like ConfigParser code and docs could use a general cleanup. Are there any other issues to take care of as part of that cleanup? Jeremy From gstein@lyra.org Fri Mar 3 23:35:09 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 3 Mar 2000 15:35:09 -0800 (PST) Subject: [Python-Dev] ConfigParser stuff (was: CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17) In-Reply-To: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> Message-ID: On Fri, 3 Mar 2000, Jeremy Hylton wrote: > Thanks for catching that. I didn't look at the context. I'm going to > wait, though, until I talk to Fred to mess with the code any more. Not a problem. I'm glad that diffs are now posted to -checkins. :-) > General question for python-dev readers: What are your experiences > with ConfigParser? Love it! > I just used it to build a simple config parser for > IDLE and found it hard to use for several reasons. The biggest > problem was that the file format is undocumented. In my most complex use of ConfigParser, I had to override SECTCRE to allow periods in the section name. Of course, that was quite interesting since the variable is __SECTRE in 1.5.2 (i.e. I had to compensate for the munging). I also change OPTCRE to allow a few more charaters ("@" in particular, which even the update doesn't do). Not a problem nowadays since those are public. My subclass also defines a set() method and a delsection() method. These are used because I write the resulting changes back out to a file. It might be nice to have a method which writes out a config file (with an "AUTOGENERATED BY ConfigParser.py -- DO NOT EDIT BY HAND"; or maybe "... BY ..."). > I also found it > clumsy to have to specify section and option arguments. I found these were critical in my application. I also take advantage of the sections in my "edna" application for logical organization. > I ended up > writing a proxy that specializes on section so that get takes only an > option argument. > > It sounds like ConfigParser code and docs could use a general cleanup. > Are there any other issues to take care of as part of that cleanup? A set() method and a writefile() type of method would be nice. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim_one@email.msn.com Sat Mar 4 01:38:43 2000 From: tim_one@email.msn.com (Tim Peters) Date: Fri, 3 Mar 2000 20:38:43 -0500 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: <200003031650.LAA21647@eric.cnri.reston.va.us> Message-ID: <000001bf857a$60b45ac0$c6a0143f@tim> [Guido] > ... > Someone (Tim?) in the past suggested a different solution (probably > found in another language): for objects that are collected as part of > a cycle, the destructor isn't called at all. The memory is freed > (since it's no longer reachable), but the destructor is not called -- > it is as if the object lives on forever. Stroustrup has written in favor of this for C++. It's exactly the kind of overly slick "good argument" he would never accept from anyone else <0.1 wink>. > This is theoretically superior, but not practical: when I have an > object that creates a temp file, I want to be able to reliably delete > the temp file in my destructor, even when I'm part of a cycle! A member of the C++ committee assured me Stroustrup is overwhelmingly opposed on this. I don't even agree it's theoretically superior: it relies on the fiction that gc "may never occur", and that's just silly in practice. You're moving down the Java path. I can't possibly do a better job of explaining the Java rules than the Java Language Spec. does for itself. So pick that up and study section 12.6 (Finalization of Class Instances). The end result makes little sense to users, but is sufficient to guarantee that Java itself never blows up. Note, though, that there is NO good answer to finalizers in cycles! The implementation cannot be made smart enough to both avoid trouble and "do the right thing" from the programmer's POV, because the latter is unknowable. Somebody has to lose, one way or another. Rather than risk doing a wrong thing, the BDW collector lets cycles with finalizers leak. But it also has optional hacks to support exceptions for use with C++ (which sometimes creates self-cycles) and Java. See http://reality.sgi.com/boehm_mti/finalization.html for Boehm's best concentrated thoughts on the subject. The only principled approach I know of comes out of the Scheme world. Scheme has no finalizers, of course. But it does have gc, and the concept of "guardians" was invented to address all gc finalization problems in one stroke. It's extremely Scheme-like in providing a perfectly general mechanism with no policy whatsoever. You (the Scheme programmer) can create guardian objects, and "register" other objects with a guardian. At any time, you can ask a guardian whether some object registered with it is "ready to die" (i.e., the only thing keeping it alive is its registration with the guardian). If so, you can ask it to give you one. Everything else is up to you: if you want to run a finalizer, your problem. If there are cycles, also your problem. Even if there are simple non-cyclic dependencies, your problem. Etc. So those are the extremes: BDW avoids blame by refusing to do anything. Java avoids blame by exposing an impossibly baroque implementation-driven finalization model. Scheme avoids blame by refusing to do anything "by magic", but helps you to shoot yourself with the weapon of your choice. That bad news is that I don't know of a scheme *not* at an extreme! It's extremely un-Pythonic to let things leak (despite that it has let things leak for a decade ), but also extremely un-Pythonic to make some wild-ass guess. So here's what I'd consider doing: explicit is better than implicit, and in the face of ambiguity refuse the temptation to guess. If a trash cycle contains a finalizer (my, but that has to be rare. in practice, in well-designed code!), don't guess, but make it available to the user. A gc.guardian() call could expose such beasts, or perhaps a callback could be registered, invoked when gc finds one of these things. Anyone crazy enough to create cyclic trash with finalizers then has to take responsibility for breaking the cycle themself. This puts the burden on the person creating the problem, and they can solve it in the way most appropriate to *their* specific needs. IOW, the only people who lose under this scheme are the ones begging to lose, and their "loss" consists of taking responsibility. when-a-problem-is-impossible-to-solve-favor-sanity-ly y'rs - tim From gstein@lyra.org Sat Mar 4 02:59:26 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 3 Mar 2000 18:59:26 -0800 (PST) Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: <000001bf857a$60b45ac0$c6a0143f@tim> Message-ID: On Fri, 3 Mar 2000, Tim Peters wrote: >... > Note, though, that there is NO good answer to finalizers in cycles! The "Note" ?? Not just a note, but I'd say an axiom :-) By definition, you have two objects referring to each other in some way. How can you *definitely* know how to break the link between them? Do you call A's finalizer or B's first? If they're instances, do you just whack their __dict__ and hope for the best? >... > So here's what I'd consider doing: explicit is better than implicit, and in > the face of ambiguity refuse the temptation to guess. If a trash cycle > contains a finalizer (my, but that has to be rare. in practice, in > well-designed code!), don't guess, but make it available to the user. A > gc.guardian() call could expose such beasts, or perhaps a callback could be > registered, invoked when gc finds one of these things. Anyone crazy enough > to create cyclic trash with finalizers then has to take responsibility for > breaking the cycle themself. This puts the burden on the person creating > the problem, and they can solve it in the way most appropriate to *their* > specific needs. IOW, the only people who lose under this scheme are the > ones begging to lose, and their "loss" consists of taking responsibility. I'm not sure if Tim is saying the same thing, but I'll write down a concreate idea for cleaning garbage cycles. First, a couple observations: * Some objects can always be reliably "cleaned": lists, dicts, tuples. They just drop their contents, with no invocations against any of them. Note that an instance without a __del__ has no opinion on how it is cleaned. (this is related to Tim's point about whether a cycle has a finalizer) * The other objects may need to *use* their referenced objects in some way to clean out cycles. Since the second set of objects (possibly) need more care during their cleanup, we must concentrate on how to solve their problem. Back up a step: to determine where an object falls, let's define a tp_clean type slot. It returns an integer and takes one parameter: an operation integer. Py_TPCLEAN_CARE_CHECK /* check whether care is needed */ Py_TPCLEAN_CARE_EXEC /* perform the careful cleaning */ Py_TPCLEAN_EXEC /* perform a non-careful cleaning */ Given a set of objects that require special cleaning mechanisms, there is no way to tell where to start first. So... just pick the first one. Call its tp_clean type slot with CARE_EXEC. For instances, this maps to __clean__. If the instance does not have a __clean__, then tp_clean returns FALSE meaning that it could not clean this object. The algorithm moves on to the next object in the set. If tp_clean returns TRUE, then the object has been "cleaned" and is moved to the "no special care needed" list of objects, awaiting its reference count to hit zero. Note that objects in the "care" and "no care" lists may disappear during the careful-cleaning process. If the careful-cleaning algorithm hits the end of the careful set of objects and the set is non-empty, then throw an exception: GCImpossibleError. The objects in this set each said they could not be cleaned carefully AND they were not dealloc'd during other objects' cleaning. [ it could be possible to define a *dynamic* CARE_EXEC that will succeed if you call it during a second pass; I'm not sure this is a Good Thing to allow, however. ] This also implies that a developer should almost *always* consider writing a __clean__ method whenever they write a __del__ method. That method MAY be called when cycles need to be broken; the object should delete any non-essential variables in such a way that integrity is retained (e.g. it fails gracefully when methods are called and __del__ won't raise an error). For example, __clean__ could call a self.close() to shut down its operation. Whatever... you get the idea. At the end of the iteration of the "care" set, then you may have objects remaining in the "no care" set. By definition, these objects don't care about their internal references to other objects (they don't need them during deallocation). We iterate over this set, calling tp_clean(EXEC). For lists, dicts, and tuples, the tp_clean(EXEC) call simply clears out the references to other objects (but does not dealloc the object!). Again: objects in the "no care" set will go away during this process. By the end of the iteration over the "no care" set, it should be empty. [ note: the iterations over these sets should probably INCREF/DECREF across the calls; otherwise, the object could be dealloc'd during the tp_clean call. ] [ if the set is NOT empty, then tp_clean(EXEC) did not remove all possible references to other objects; not sure what this means. is it an error? maybe you just force a tp_dealloc on the remaining objects. ] Note that the tp_clean mechanism could probably be used during the Python finalization, where Python does a bunch of special-casing to clean up modules. Specifically: a module does not care about its contents during its deallocation, so it is a "no care" object; it responds to tp_clean(EXEC) by clearing its dictionary. Class objects are similar: they can clear their dict (which contains a module reference which usually causes a loop) during tp_clean(EXEC). Module cleanup is easy once objects with CARE_CHECK have been handled -- all that funny logic in there is to deal with "care" objects. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim_one@email.msn.com Sat Mar 4 03:26:54 2000 From: tim_one@email.msn.com (Tim Peters) Date: Fri, 3 Mar 2000 22:26:54 -0500 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Message-ID: <000401bf8589$7d1364e0$c6a0143f@tim> [Tim] > Note, though, that there is NO good answer to finalizers in cycles! The [Greg Stein] > "Note" ?? Not just a note, but I'd say an axiom :-) An axiom is accepted without proof: we have plenty of proof that there's no thoroughly good answer (i.e., every language that has ever addressed this issue -- along with every language that ever will ). > By definition, you have two objects referring to each other in some way. > How can you *definitely* know how to break the link between them? Do you > call A's finalizer or B's first? If they're instances, do you just whack > their __dict__ and hope for the best? Exactly. The *programmer* may know the right thing to do, but the Python implementation can't possibly know. Facing both facts squarely constrains the possibilities to the only ones that are all of understandable, predictable and useful. Cycles with finalizers must be a Magic-Free Zone else you lose at least one of those three: even Guido's kung fu isn't strong enough to outguess this. [a nice implementation sketch, of what seems an overly elaborate scheme, if you believe cycles with finalizers are rare in intelligently designed code) ] Provided Guido stays interested in this, he'll make his own fun. I'm just inviting him to move in a sane direction <0.9 wink>. One caution: > ... > If the careful-cleaning algorithm hits the end of the careful set of > objects and the set is non-empty, then throw an exception: > GCImpossibleError. Since gc "can happen at any time", this is very severe (c.f. Guido's objection to making resurrection illegal). Hand a trash cycle back to the programmer instead, via callback or request or whatever, and it's all explicit without more cruft in the implementation. It's alive again when they get it back, and they can do anything they want with it (including resurrecting it, or dropping it again, or breaking cycles -- anything). I'd focus on the cycles themselves, not on the types of objects involved. I'm not pretending to address the "order of finalization at shutdown" question, though (although I'd agree they're deeply related: how do you follow a topological sort when there *isn't* one? well, you don't, because you can't). realistically y'rs - tim From gstein@lyra.org Sat Mar 4 08:43:45 2000 From: gstein@lyra.org (Greg Stein) Date: Sat, 4 Mar 2000 00:43:45 -0800 (PST) Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: <000401bf8589$7d1364e0$c6a0143f@tim> Message-ID: On Fri, 3 Mar 2000, Tim Peters wrote: >... > [a nice implementation sketch, of what seems an overly elaborate scheme, > if you believe cycles with finalizers are rare in intelligently designed > code) > ] Nah. Quite simple to code up, but a bit longer to explain in English :-) The hardest part is finding the cycles, but Guido already posted a long explanation about that. Once that spits out the doubly-linked list of objects, then you're set. 1) scan the list calling tp_clean(CARE_CHECK), shoving "care needed" objects to a second list 2) scan the care-needed list calling tp_clean(CARE_EXEC). if TRUE is returned, then the object was cleaned and moves to the "no care" list. 3) assert len(care-needed list) == 0 4) scan the no-care list calling tp_clean(EXEC) 5) (questionable) assert len(no-care list) == 0 The background makes it longer. The short description of the algorithm is easy. Step (1) could probably be merged right into one of the scans in the GC algorithm (e.g. during the placement into the "these are cyclical garbage" list) > Provided Guido stays interested in this, he'll make his own fun. I'm just > inviting him to move in a sane direction <0.9 wink>. hehe... Agreed. > One caution: > > > ... > > If the careful-cleaning algorithm hits the end of the careful set of > > objects and the set is non-empty, then throw an exception: > > GCImpossibleError. > > Since gc "can happen at any time", this is very severe (c.f. Guido's > objection to making resurrection illegal). GCImpossibleError would simply be a subclass of MemoryError. Makes sense to me, and definitely allows for its "spontaneity." > Hand a trash cycle back to the > programmer instead, via callback or request or whatever, and it's all > explicit without more cruft in the implementation. It's alive again when > they get it back, and they can do anything they want with it (including > resurrecting it, or dropping it again, or breaking cycles -- anything). I'd > focus on the cycles themselves, not on the types of objects involved. I'm > not pretending to address the "order of finalization at shutdown" question, > though (although I'd agree they're deeply related: how do you follow a > topological sort when there *isn't* one? well, you don't, because you > can't). I disagree. I don't think a Python-level function is going to have a very good idea of what to do. IMO, this kind of semantics belong down in the interpreter with a specific, documented algorithm. Throwing it out to Python won't help -- that function will still have to use a "standard pattern" for getting the cyclical objects to toss themselves. I think that standard pattern should be a language definition. Without a standard pattern, then you're saying the application will know what to do, but that is kind of weird -- what happens when an unexpected cycle arrives? Cheers, -g -- Greg Stein, http://www.lyra.org/ From Moshe Zadka Sat Mar 4 09:50:19 2000 From: Moshe Zadka (Moshe Zadka) Date: Sat, 4 Mar 2000 11:50:19 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> Message-ID: On Fri, 3 Mar 2000, Jeremy Hylton wrote: > It sounds like ConfigParser code and docs could use a general cleanup. > Are there any other issues to take care of as part of that cleanup? One thing that bothered me once: I want to be able to have something like: [section] tag = 1 tag = 2 And be able to retrieve ("section", "tag") -> ["1", "2"]. Can be awfully useful for things that make sense several time. Perhaps there should be two functions, one that reads a single-tag and one that reads a multi-tag? File format: I'm sure I'm going to get yelled at, but why don't we make it XML? Hard to edit, yadda, yadda, but you can easily write a special purpose widget to edit XConfig (that's what we'll call the DTD) files. hopefull-yet-not-naive-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From gstein@lyra.org Sat Mar 4 10:05:15 2000 From: gstein@lyra.org (Greg Stein) Date: Sat, 4 Mar 2000 02:05:15 -0800 (PST) Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: Message-ID: On Sat, 4 Mar 2000, Moshe Zadka wrote: > On Fri, 3 Mar 2000, Jeremy Hylton wrote: > > It sounds like ConfigParser code and docs could use a general cleanup. > > Are there any other issues to take care of as part of that cleanup? > > One thing that bothered me once: > > I want to be able to have something like: > > [section] > tag = 1 > tag = 2 > > And be able to retrieve ("section", "tag") -> ["1", "2"]. > Can be awfully useful for things that make sense several time. > Perhaps there should be two functions, one that reads a single-tag and > one that reads a multi-tag? Structured values would be nice. Several times, I've needed to decompose the right hand side into lists. > File format: I'm sure I'm going to get yelled at, but why don't we > make it XML? Hard to edit, yadda, yadda, but you can easily write a > special purpose widget to edit XConfig (that's what we'll call the DTD) > files. Write a whole new module. ConfigParser is for files that look like the above. There isn't a reason to NOT use XML, but it shouldn't go into ConfigParser. I find the above style much easier for *humans*, than an XML file, to specify options. XML is good for computers; not so good for humans. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Moshe Zadka Sat Mar 4 10:46:40 2000 From: Moshe Zadka (Moshe Zadka) Date: Sat, 4 Mar 2000 12:46:40 +0200 (IST) Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: <000001bf857a$60b45ac0$c6a0143f@tim> Message-ID: [Tim Peters] > ...If a trash cycle > contains a finalizer (my, but that has to be rare. in practice, in > well-designed code!), This shows something Tim himself has often said -- he never programmed a GUI. It's very hard to build a GUI (especially with Tkinter) which is cycle-less, but the classes implementing the GUI often have __del__'s to break system-allocated resources. So, it's not as rare as we would like to believe, which is the reason I haven't given this answer. which-is-not-the-same-thing-as-disagreeing-with-it-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From Moshe Zadka Sat Mar 4 11:16:19 2000 From: Moshe Zadka (Moshe Zadka) Date: Sat, 4 Mar 2000 13:16:19 +0200 (IST) Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Message-ID: On Sat, 4 Mar 2000, Greg Stein wrote: > I disagree. I don't think a Python-level function is going to have a very > good idea of what to do Much better then the Python interpreter... > Throwing it out to Python won't help > what happens when an unexpected cycle arrives? Don't delete it. It's as simple as that, since it's a bug. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From Moshe Zadka Sat Mar 4 11:29:33 2000 From: Moshe Zadka (Moshe Zadka) Date: Sat, 4 Mar 2000 13:29:33 +0200 (IST) Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: Message-ID: On Sat, 4 Mar 2000, Greg Stein wrote: > Write a whole new module. ConfigParser is for files that look like the > above. Gotcha. One problem: two configurations modules might cause the classic "which should I use?" confusion. > > I find the above style much easier for *humans*, than an XML file, to > specify options. XML is good for computers; not so good for humans. > Of course: what human could delimit his text with and ? oh-no-another-c.l.py-bot-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From gstein@lyra.org Sat Mar 4 11:38:46 2000 From: gstein@lyra.org (Greg Stein) Date: Sat, 4 Mar 2000 03:38:46 -0800 (PST) Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Message-ID: On Sat, 4 Mar 2000, Moshe Zadka wrote: > On Sat, 4 Mar 2000, Greg Stein wrote: > > I disagree. I don't think a Python-level function is going to have a very > > good idea of what to do > > > Much better then the Python interpreter... If your function receives two instances (A and B), what are you going to do? How can you know what their policy is for cleaning up in the face of a cycle? I maintain that you would call the equivalent of my proposed __clean__. There isn't much else you'd be able to do, unless you had a completely closed system, you expected cycles between specific types of objects, and you knew a way to clean them up. Even then, you would still be calling something like __clean__ to let the objects do whatever they needed. I'm suggesting that __clean__ should be formalized (as part of tp_clean). Throwing the handling "up to Python" isn't going to do much for you. Seriously... I'm all for coding more stuff in Python rather than C, but this just doesn't feel right. Getting the objects GC'd is a language feature, and a specific pattern/method/recommendation is best formulated as an interpreter mechanism. > > > Throwing it out to Python won't help > > > what happens when an unexpected cycle arrives? > > Don't delete it. > It's as simple as that, since it's a bug. The point behind this stuff is to get rid of it, rather than let it linger on. If the objects have finalizers (which is how we get to this step!), then it typically means there is a resource they must release. Getting the object cleaned and dealloc'd becomes quite important. Cheers, -g p.s. did you send in a patch for the instance_contains() thing yet? -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Sat Mar 4 11:43:12 2000 From: gstein@lyra.org (Greg Stein) Date: Sat, 4 Mar 2000 03:43:12 -0800 (PST) Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: Message-ID: On Sat, 4 Mar 2000, Moshe Zadka wrote: > On Sat, 4 Mar 2000, Greg Stein wrote: > > Write a whole new module. ConfigParser is for files that look like the > > above. > > Gotcha. > > One problem: two configurations modules might cause the classic "which > should I use?" confusion. Nah. They wouldn't *both* be called ConfigParser. And besides, I see the XML format more as a persistence mechanism rather than a configuration mechanism. I'd call the module something like "XMLPersist". > > > > I find the above style much easier for *humans*, than an XML file, to > > specify options. XML is good for computers; not so good for humans. > > > > Of course: what human could delimit his text with and ? Feh. As a communciation mechanism, dropping in that stuff... it's easy. ButI wouldnotwant ... bleck. I wouldn't want to use XML for configuration stuff. It just gets ugly. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gvwilson@nevex.com Sat Mar 4 16:46:24 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Sat, 4 Mar 2000 11:46:24 -0500 (EST) Subject: [Python-Dev] HTMLgen-style interface to SQL? Message-ID: [short form] I'm looking for an object-oriented toolkit that will do for SQL what Perl's CGI.pm module, or Python's HTMLgen, does for HTML. Pointers, examples, or expressions of interest would be welcome. [long form] Lincoln Stein's CGI.pm module for Perl allows me to build HTML in an object-oriented way, instead of getting caught in the Turing tarpit of string substitution and printf. DOM does the same (in a variety of languages) for XML. Right now, if I want to interact with an SQL database from Perl or Python, I have to embed SQL strings in my programs. I would like to have a DOM-like ability to build and manipulate queries as objects, then call a method that translate the query structure into SQL to send to the database. Alternatively, if there is an XML DTD for SQL (how's that for a chain of TLAs?), and some tool to convert the XML/SQL to pure SQL, so that I could build my query using DOM, that would be cool too. RSVP, Greg Wilson gvwilson@nevex.com From Moshe Zadka Sat Mar 4 18:02:54 2000 From: Moshe Zadka (Moshe Zadka) Date: Sat, 4 Mar 2000 20:02:54 +0200 (IST) Subject: [Python-Dev] Re: [Patches] selfnanny.py: checking for "self" in every method In-Reply-To: <200003041724.MAA05053@eric.cnri.reston.va.us> Message-ID: On Sat, 4 Mar 2000, Guido van Rossum wrote: > Before we all start writing nannies and checkers, how about a standard > API design first? I thoroughly agree -- we should have a standard API. I tried to write selfnanny so it could be callable from any API possible (e.g., it can take either a file, a string, an ast or a tuple representation) > I will want to call various nannies from a "Check" > command that I plan to add to IDLE. Very cool: what I imagine is a sort of modular PyLint. > I already did this with tabnanny, > and found that it's barely possible -- it's really written to run like > a script. Mine definitely isn't: it's designed to run both like a script and like a module. One outstanding bug: no docos. To be supplied upon request <0.5 wink>. I just wanted to float it out and see if people think that this particular nanny is worth while. > Since parsing is expensive, we probably want to share the parse tree. Yes. Probably as an AST, and transform to tuples/lists inside the checkers. > Ideas? Here's a strawman API: There's a package called Nanny Every module in that package should have a function called check_ast. It's argument is an AST object, and it's output should be a list of three-tuples: (line-number, error-message, None) or (line-number, error-message, (column-begin, column-end)) (each tuple can be a different form). Problems? (I'm CCing to python-dev. Please follow up to that discussion to python-dev only, as I don't believe it belongs in patches) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From gvwilson@nevex.com Sat Mar 4 18:26:20 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Sat, 4 Mar 2000 13:26:20 -0500 (EST) Subject: [Python-Dev] Re: selfnanny.py / nanny architecture In-Reply-To: Message-ID: > > Guido van Rossum wrote: > > Before we all start writing nannies and checkers, how about a standard > > API design first? > Moshe Zadka wrote: > Here's a strawman API: > There's a package called Nanny > Every module in that package should have a function called check_ast. > It's argument is an AST object, and it's output should be a list > of three-tuples: (line-number, error-message, None) or > (line-number, error-message, (column-begin, column-end)) (each tuple can > be a different form). Greg Wilson wrote: The SUIF (Stanford University Intermediate Format) group has been working on an extensible compiler framework for about ten years now. The framework is based on an extensible AST spec; anyone can plug in a new analysis or optimization algorithm by writing one or more modules that read and write decorated ASTs. (See http://suif.stanford.edu for more information.) Based on their experience, I'd suggest that every nanny take an AST as an argument, and add complaints in place as decorations to the nodes. A terminal nanny could then collect these and display them to the user. I think this architecture will make it simpler to write meta-nannies. I'd further suggest that the AST be something that can be manipulated through DOM, since (a) it's designed for tree-crunching, (b) it's already documented reasonably well, (c) it'll save us re-inventing a wheel, and (d) generating human-readable output in a variety of customizable formats ought to be simple (well, simpler than the alternatives). Greg From jeremy@cnri.reston.va.us Sun Mar 5 02:10:28 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Sat, 4 Mar 2000 21:10:28 -0500 (EST) Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: References: Message-ID: <14529.49684.219826.466310@bitdiddle.cnri.reston.va.us> >>>>> "MZ" == Moshe Zadka writes: MZ> On Sat, 4 Mar 2000, Greg Stein wrote: >> Write a whole new module. ConfigParser is for files that look >> like the above. MZ> Gotcha. MZ> One problem: two configurations modules might cause the classic MZ> "which should I use?" confusion. I don't think this is a hard decision to make. ConfigParser is good for simple config files that are going to be maintained by humans with a text editor. An XML-based configuration file is probably the right solution when humans aren't going to maintain the config files by hand. Perhaps XML will eventually be the right solution in both cases, but only if XML editors are widely available. >> I find the above style much easier for *humans*, than an >> XML file, to specify options. XML is good for computers; not so >> good for humans. MZ> Of course: what human could delimit his text with and MZ> ? Could? I'm sure there are more ways on Linux and Windows to mark up text than are dreamt of in your philosophy, Moshe . The question is what is easiest to read and understand? Jeremy From tim_one@email.msn.com Sun Mar 5 02:22:16 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 4 Mar 2000 21:22:16 -0500 Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" in every method In-Reply-To: <200003041724.MAA05053@eric.cnri.reston.va.us> Message-ID: <000201bf8649$a17383e0$f42d153f@tim> [Guido van Rossum] > Before we all start writing nannies and checkers, how about a standard > API design first? I will want to call various nannies from a "Check" > command that I plan to add to IDLE. I already did this with tabnanny, > and found that it's barely possible -- it's really written to run like > a script. I like Moshe's suggestion fine, except with an abstract base class named Nanny with a virtual method named check_ast. Nannies should (of course) derive from that. > Since parsing is expensive, we probably want to share the parse tree. What parse tree? Python's parser module produces an AST not nearly "A enough" for reasonably productive nanny writing. GregS & BillT have improved on that, but it's not in the std distrib. Other "problems" include the lack of original source lines in the trees, and lack of column-number info. Note that by the time Python has produced a parse tree, all evidence of the very thing tabnanny is looking for has been removed. That's why she used the tokenize module to begin with. God knows tokenize is too funky to use too when life gets harder (check out checkappend.py's tokeneater state machine for a preliminary taste of that). So the *only* solution is to adopt Christian's Stackless so I can rewrite tokenize as a coroutine like God intended . Seriously, I don't know of anything that produces a reasonably usable (for nannies) parse tree now, except via modifying a Python grammar for use with John Aycock's SPARK; the latter also comes with very pleasant & powerful tree pattern-matching abilities. But it's probably too slow for everyday "just folks" use. Grabbing the GregS/BillT enhancement is probably the most practical thing we could build on right now (but tabnanny will have to remain a special case). unsure-about-the-state-of-simpleparse-on-mxtexttools-for-this-ly y'rs - tim From tim_one@email.msn.com Sun Mar 5 03:24:18 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 4 Mar 2000 22:24:18 -0500 Subject: [Python-Dev] breaking list.append() In-Reply-To: <38BE1B69.E0B88B41@lemburg.com> Message-ID: <000301bf8652$4aadaf00$f42d153f@tim> Just noting that two instances of this were found in Zope. [/F] > append = list.append > for x in something: > append(...) [Tim] > As detailed in a c.l.py posting, I have yet to find a single instance of > this actually called with multiple arguments. Pointing out that it's > *possible* isn't the same as demonstrating it's an actual problem. I'm > quite willing to believe that it is, but haven't yet seen evidence of it. From fdrake@acm.org Sun Mar 5 03:55:27 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Sat, 4 Mar 2000 22:55:27 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> Message-ID: <14529.55983.263225.691427@weyr.cnri.reston.va.us> Jeremy Hylton writes: > Thanks for catching that. I didn't look at the context. I'm going to > wait, though, until I talk to Fred to mess with the code any more. I did it that way since the .ini format allows comments after values (the ';' comments after a '=' vi; '#' comments are a ConfigParser thing), but there's no equivalent concept for RFC822 parsing, other than '(...)' in addresses. The code was trying to allow what was expected from the .ini crowd without breaking the "native" use of ConfigParser. > General question for python-dev readers: What are your experiences > with ConfigParser? I just used it to build a simple config parser for > IDLE and found it hard to use for several reasons. The biggest > problem was that the file format is undocumented. I also found it > clumsy to have to specify section and option arguments. I ended up > writing a proxy that specializes on section so that get takes only an > option argument. > > It sounds like ConfigParser code and docs could use a general cleanup. > Are there any other issues to take care of as part of that cleanup? I agree that the API to ConfigParser sucks, and I think also that the use of it as a general solution is a big mistake. It's a messy bit of code that doesn't need to be, supports a really nasty mix of syntaxes, and can easily bite users who think they're getting something .ini-like (the magic names and interpolation is a bad idea!). While it suited the original application well enough, something with .ini syntax and interpolation from a subclass would have been *much* better. I think we should create a new module, inilib, that implements exactly .ini syntax in a base class that can be intelligently extended. ConfigParser should be deprecated. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From tim_one@email.msn.com Sun Mar 5 04:11:12 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 4 Mar 2000 23:11:12 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: <200003031705.MAA21700@eric.cnri.reston.va.us> Message-ID: <000601bf8658$d81d34e0$f42d153f@tim> [Guido] > OK, so we're down to this one point: if __del__ resurrects the object, > should __del__ be called again later? Additionally, should > resurrection be made illegal? I give up on the latter, so it really is just one. > I can easily see how __del__ could *accidentally* resurrect the object > as part of its normal cleanup ... > In this example, the helper routine will eventually delete the object > from its cache, at which point it is truly deleted. It would be > harmful, not helpful, if __del__ was called again at this point. If this is something that happens easily, and current behavior is harmful, don't you think someone would have complained about it by now? That is, __del__ *is* "called again at this point" now, and has been for years & years. And if it happens easily, it *is* happening now, and in an unknown amount of existing code. (BTW, I doubt it happens at all -- people tend to write very simple __del__ methods, so far as I've ever seen) > Now, it is true that the current docs for __del__ imply that > resurrection is possible. "imply" is too weak. The Reference Manual's "3.3.1 Basic customization" flat-out says it's possible ("though not recommended"). The precise meaning of the word "may" in the following sentence is open to debate, though. > The intention of that note was to warn __del__ writers that in the case > of accidental resurrection Sorry, but I can't buy this: saying that *accidents* are "not recommended" is just too much of a stretch . > __del__ might be called again. That's a plausible reading of the following "may", but not the only one. I believe it's the one you intended, but it's not the meaning I took prior to this. > The intention certainly wasn't to allow or encourage intentional resurrection. Well, I think it plainly says it's supported ("though not recommended"). I used it intentionally at KSR, and even recommended it on c.l.py in the dim past (in one of those "dark & useless" threads ). > Would there really be someone out there who uses *intentional* > resurrection? I severely doubt it. I've never heard of this. Why would anyone tell you about something that *works*?! You rarely hear the good stuff, you know. I gave the typical pattern in the preceding msg. To flesh out the motivation more, you have some external resource that's very expensive to set up (in KSR's case, it was an IPC connection to a remote machine). Rights to use that resource are handed out in the form of an object. When a client is done using the resource, they *should* explicitly use the object's .release() method, but you can't rely on that. So the object's __del__ method looks like (for example): def __del__(self): # Code not shown to figure out whether to disconnect: the downside to # disconnecting is that it can cost a bundle to create a new connection. # If the whole app is shutting down, then of course we want to disconnect. # Or if a timestamp trace shows that we haven't been making good use of # all the open connections lately, we may want to disconnect too. if decided_to_disconnect: self.external_resource.disconnect() else: # keep the connection alive for reuse global_available_connection_objects.append(self) This is simple & effective, and it relies on both intentional resurrection and __del__ getting called repeatedly. I don't claim there's no other way to write it, just that there's *been* no problem doing this for a millennium . Note that MAL spontaneously sketched similar examples, although I can't say whether he's actually done stuff like this. Going back up a level, in another msg you finally admitted that you want "__del__ called only once" for the same reason Java wants it: because gc has no idea what to do when faced with finalizers in a trash cycle, and settles for an unprincipled scheme whose primary virtue is that "it doesn't blow up" -- and "__del__ called only once" happens to be convenient for that scheme. But toss such cycles back to the user to deal with at the Python level, and all those problems go away (along with the artificial need to change __del__). The user can break the cycles in an order that makes sense to the app (or they can let 'em leak! up to them). >>> print gc.get_cycle.__doc__ Return a list of objects comprising a single garbage cycle; [] if none. At least one of the objects has a finalizer, so Python can't determine the intended order of destruction. If you don't break the cycle, Python will neither run any finalizers for the contained objects nor reclaim their memory. If you do break the cycle, and dispose of the list, Python will follow its normal reference-counting rules for running finalizers and reclaiming memory. That this "won't blow up" either is just the least of its virtues . you-break-it-you-buy-it-ly y'rs - tim From tim_one@email.msn.com Sun Mar 5 04:56:54 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 4 Mar 2000 23:56:54 -0500 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Message-ID: <000001bf865f$3acb99a0$432d153f@tim> [Tim sez "toss insane cycles back on the user"] [Greg Stein] > I disagree. I don't think a Python-level function is going to have a very > good idea of what to do. You've already assumed that Python coders know exactly what to do, else they couldn't have coded the new __clean__ method your proposal relies on. I'm taking what strikes me as the best part of Scheme's Guardian idea: don't assume *anything* about what users "should" do to clean up their trash. Leave it up to them: their problem, their solution. I think finalizers in trash cycles should be so rare in well-written code that it's just not worth adding much of anything in the implementation to cater to it. > IMO, this kind of semantics belong down in the interpreter with a > specific, documented algorithm. Throwing it out to Python won't help > -- that function will still have to use a "standard pattern" for getting > the cyclical objects to toss themselves. They can use any pattern they want, and if the pattern doesn't *need* to be coded in C as part of the implementation, it shouldn't be. > I think that standard pattern should be a language definition. I distrust our ability to foresee everything users may need over the next 10 years: how can we know today that the first std pattern you dreamed up off the top of your head is the best approach to an unbounded number of problems we haven't yet seen a one of ? > Without a standard pattern, then you're saying the application will know > what to do, but that is kind of weird -- what happens when an unexpected > cycle arrives? With the hypothetical gc.get_cycle() function I mentioned before, they should inspect objects in the list they get back, and if they find they don't know what to do with them, they can still do anything they want. Examples include raising an exception, dialing my home pager at 3am to insist I come in to look at it, or simply let the list go away (at which point the objects in the list will again become a trash cycle containing a finalizer). If several distinct third-party modules get into this act, I *can* see where it could become a mess. That's why Scheme "guardians" is plural: a given module could register its "problem objects" in advance with a specific guardian of its own, and query only that guardian later for things ready to die. This probably can't be implemented in Python, though, without support for weak references (or lots of brittle assumptions about specific refcount values). agreeably-disagreeing-ly y'rs - tim From tim_one@email.msn.com Sun Mar 5 04:56:58 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 4 Mar 2000 23:56:58 -0500 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Message-ID: <000101bf865f$3cb0d460$432d153f@tim> [Tim] > ...If a trash cycle contains a finalizer (my, but that has to be rare. > in practice, in well-designed code!), [Moshe Zadka] > This shows something Tim himself has often said -- he never programmed a > GUI. It's very hard to build a GUI (especially with Tkinter) which is > cycle-less, but the classes implementing the GUI often have __del__'s > to break system-allocated resources. > > So, it's not as rare as we would like to believe, which is the reason > I haven't given this answer. I wrote Cyclops.py when trying to track down leaks in IDLE. The extraordinary thing we discovered is that "even real gc" would not have reclaimed the cycles. They were legitimately reachable, because, indeed, "everything points to everything else". Guido fixed almost all of them by explicitly calling new "close" methods. I believe IDLE has no __del__ methods at all now. Tkinter.py currently contains two. so-they-contained-__del__-but-weren't-trash-ly y'rs - tim From tim_one@email.msn.com Sun Mar 5 06:05:24 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sun, 5 Mar 2000 01:05:24 -0500 Subject: [Python-Dev] Unicode mapping tables In-Reply-To: <38BCD71C.3592E6A@lemburg.com> Message-ID: <000601bf8668$cbbdd640$432d153f@tim> [M.-A. Lemburg] > ... > Here's what I'll do: > > * implement .capitalize() in the traditional way for Unicode > objects (simply convert the first char to uppercase) Given .title(), is .capitalize() of use for Unicode strings? Or is it just a temptation to do something senseless in the Unicode world? If it doesn't make sense, leave it out (this *seems* like compulsion to implement all current string methods in *some* way for Unicode, whether or not they make sense). From Moshe Zadka Sun Mar 5 06:16:22 2000 From: Moshe Zadka (Moshe Zadka) Date: Sun, 5 Mar 2000 08:16:22 +0200 (IST) Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" in every method In-Reply-To: <000201bf8649$a17383e0$f42d153f@tim> Message-ID: On Sat, 4 Mar 2000, Tim Peters wrote: > I like Moshe's suggestion fine, except with an abstract base class named > Nanny with a virtual method named check_ast. Nannies should (of course) > derive from that. Why? The C++ you're programming damaged your common sense cycles? > > Since parsing is expensive, we probably want to share the parse tree. > > What parse tree? Python's parser module produces an AST not nearly "A > enough" for reasonably productive nanny writing. As a note, selfnanny uses the parser module AST. > GregS & BillT have > improved on that, but it's not in the std distrib. Other "problems" include > the lack of original source lines in the trees, The parser module has source lines. > and lack of column-number info. Yes, that sucks. > Note that by the time Python has produced a parse tree, all evidence of the > very thing tabnanny is looking for has been removed. That's why she used > the tokenize module to begin with. Well, it's one of the few nannies which would be in that position. > God knows tokenize is too funky to use too when life gets harder (check out > checkappend.py's tokeneater state machine for a preliminary taste of that). Why doesn't checkappend.py uses the parser module? > Grabbing the GregS/BillT enhancement is probably the most > practical thing we could build on right now You got some pointers? > (but tabnanny will have to remain a special case). tim-will-always-be-a-special-case-in-our-hearts-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From tim_one@email.msn.com Sun Mar 5 07:01:12 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sun, 5 Mar 2000 02:01:12 -0500 Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" inevery method In-Reply-To: Message-ID: <000901bf8670$97d8f320$432d153f@tim> [Tim] >> [make Nanny a base class] [Moshe Zadka] > Why? Because it's an obvious application for OO design. A common base class formalizes the interface and can provide useful utilities for subclasses. > The C++ you're programming damaged your common sense cycles? Yes, very, but that isn't relevant here . It's good Python sense too. >> [parser module produces trees far too concrete for comfort] > As a note, selfnanny uses the parser module AST. Understood, but selfnanny has a relatively trivial task. Hassling with tuples nested dozens deep for even relatively simple stmts is both a PITA and a time sink. >> [parser doesn't give source lines] > The parser module has source lines. No, it does not (it only returns terminals, as isolated strings). The tokenize module does deliver original source lines in their entirety (as well as terminals, as isolated strings; and column numbers). >> and lack of column-number info. > Yes, that sucks. > ... > Why doesn't checkappend.py uses the parser module? Because it wanted to display the acutal source line containing an offending "append" (which, again, the parse module does not supply). Besides, it was a trivial variation on tabnanny.py, of which I have approximately 300 copies on my disk . >> Grabbing the GregS/BillT enhancement is probably the most >> practical thing we could build on right now > You got some pointers? Download python2c (http://www.mudlib.org/~rassilon/p2c/) and grab transformer.py from the zip file. The latter supplies a very useful post-processing pass over the parse module's output, squashing it *way* down. From Moshe Zadka Sun Mar 5 07:08:41 2000 From: Moshe Zadka (Moshe Zadka) Date: Sun, 5 Mar 2000 09:08:41 +0200 (IST) Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" inevery method In-Reply-To: <000901bf8670$97d8f320$432d153f@tim> Message-ID: On Sun, 5 Mar 2000, Tim Peters wrote: > [Tim] > >> [make Nanny a base class] > > [Moshe Zadka] > > Why? > > Because it's an obvious application for OO design. A common base class > formalizes the interface and can provide useful utilities for subclasses. The interface is just one function. You're welcome to have a do-nothing nanny that people *can* derive from: I see no point in making them derive from a base class. > > As a note, selfnanny uses the parser module AST. > > Understood, but selfnanny has a relatively trivial task. That it does, and it was painful. > >> [parser doesn't give source lines] > > > The parser module has source lines. > > No, it does not (it only returns terminals, as isolated strings). Sorry, misunderstanding: it seemed obvious to me you wanted line numbers. For lines, use the linecache module... > > You got some pointers? > > Download python2c (http://www.mudlib.org/~rassilon/p2c/) and grab > transformer.py from the zip file. I'll have a look. Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From Fredrik Lundh" from "Python for Lisp Programmers": http://www.norvig.com/python-lisp.html > Don't forget return. Writing def twice(x): x+x is tempting > and doesn't signal a warning or > ception, but you probably > meant to have a return in there. This is particularly irksome > because in a lambda you are prohibited from writing return, > but the semantics is to do the return.=20 maybe adding an (optional but encouraged) "return" to lambda would be an improvement? lambda x: x + 10 vs. lambda x: return x + 10 or is this just more confusing... opinions? From guido@python.org Sun Mar 5 12:04:56 2000 From: guido@python.org (Guido van Rossum) Date: Sun, 05 Mar 2000 07:04:56 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: Your message of "Sat, 04 Mar 2000 22:55:27 EST." <14529.55983.263225.691427@weyr.cnri.reston.va.us> References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> <14529.55983.263225.691427@weyr.cnri.reston.va.us> Message-ID: <200003051204.HAA05367@eric.cnri.reston.va.us> [Fred] > I agree that the API to ConfigParser sucks, and I think also that > the use of it as a general solution is a big mistake. It's a messy > bit of code that doesn't need to be, supports a really nasty mix of > syntaxes, and can easily bite users who think they're getting > something .ini-like (the magic names and interpolation is a bad > idea!). While it suited the original application well enough, > something with .ini syntax and interpolation from a subclass would > have been *much* better. > I think we should create a new module, inilib, that implements > exactly .ini syntax in a base class that can be intelligently > extended. ConfigParser should be deprecated. Amen. Some thoughts: - You could put it all in ConfigParser.py but with new classnames. (Not sure though, since the ConfigParser class, which is really a kind of weird variant, will be assumed to be the main class because its name is that of the module.) - Variants on the syntax could be given through some kind of option system rather than through subclassing -- they should be combinable independently. Som possible options (maybe I'm going overboard here) could be: - comment characters: ('#', ';', both, others?) - comments after variables allowed? on sections? - variable characters: (':', '=', both, others?) - quoting of values with "..." allowed? - backslashes in "..." allowed? - does backslash-newline mean a continuation? - case sensitivity for section names (default on) - case sensitivity for option names (default off) - variables allowed before first section name? - first section name? (default "main") - character set allowed in section names - character set allowed in variable names - %(...) substitution? (Well maybe the whole substitution thing should really be done through a subclass -- it's too weird for normal use.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sun Mar 5 12:17:31 2000 From: guido@python.org (Guido van Rossum) Date: Sun, 05 Mar 2000 07:17:31 -0500 Subject: [Python-Dev] Unicode mapping tables In-Reply-To: Your message of "Sun, 05 Mar 2000 01:05:24 EST." <000601bf8668$cbbdd640$432d153f@tim> References: <000601bf8668$cbbdd640$432d153f@tim> Message-ID: <200003051217.HAA05395@eric.cnri.reston.va.us> > [M.-A. Lemburg] > > ... > > Here's what I'll do: > > > > * implement .capitalize() in the traditional way for Unicode > > objects (simply convert the first char to uppercase) [Tim] > Given .title(), is .capitalize() of use for Unicode strings? Or is it just > a temptation to do something senseless in the Unicode world? If it doesn't > make sense, leave it out (this *seems* like compulsion to implement > all current string methods in *some* way for Unicode, whether or not they > make sense). The intention of this is to make code that does something using strings do exactly the same strings if those strings happen to be Unicode strings with the same values. The capitalize method returns self[0].upper() + self[1:] -- that may not make sense for e.g. Japanese, but it certainly does for Russian or Greek. It also does this in JPython. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sun Mar 5 12:24:41 2000 From: guido@python.org (Guido van Rossum) Date: Sun, 05 Mar 2000 07:24:41 -0500 Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" inevery method In-Reply-To: Your message of "Sun, 05 Mar 2000 02:01:12 EST." <000901bf8670$97d8f320$432d153f@tim> References: <000901bf8670$97d8f320$432d153f@tim> Message-ID: <200003051224.HAA05410@eric.cnri.reston.va.us> > >> [parser doesn't give source lines] > > > The parser module has source lines. > > No, it does not (it only returns terminals, as isolated strings). The > tokenize module does deliver original source lines in their entirety (as > well as terminals, as isolated strings; and column numbers). Moshe meant line numbers - -it has those. > > Why doesn't checkappend.py uses the parser module? > > Because it wanted to display the acutal source line containing an offending > "append" (which, again, the parse module does not supply). Besides, it was > a trivial variation on tabnanny.py, of which I have approximately 300 copies > on my disk . Of course another argument for making things more OO. (The code used in tabnanny.py to process files and recursively directories fronm sys.argv is replicated a thousand times in various scripts of mine -- Tim took it from my now-defunct takpolice.py. This should be in the std library somehow...) > >> Grabbing the GregS/BillT enhancement is probably the most > >> practical thing we could build on right now > > > You got some pointers? > > Download python2c (http://www.mudlib.org/~rassilon/p2c/) and grab > transformer.py from the zip file. The latter supplies a very useful > post-processing pass over the parse module's output, squashing it *way* > down. Those of you who have seen the compiler-sig should know that Jeremy made an improvement which will find its way into p2c. It's currently on display in the Python CVS tree in the nondist branch: see http://www.python.org/pipermail/compiler-sig/2000-February/000011.html and the ensuing thread for more details. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sun Mar 5 13:46:13 2000 From: guido@python.org (Guido van Rossum) Date: Sun, 05 Mar 2000 08:46:13 -0500 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Your message of "Fri, 03 Mar 2000 22:26:54 EST." <000401bf8589$7d1364e0$c6a0143f@tim> References: <000401bf8589$7d1364e0$c6a0143f@tim> Message-ID: <200003051346.IAA05539@eric.cnri.reston.va.us> I'm beginning to believe that handing cycles with finalizers to the user is better than calling __del__ with a different meaning, and I tentatively withdraw my proposal to change the rules for when __del__ is called (even when __init__ fails; I haven't had any complaints about that either). There seem to be two competing suggestions for solutions: (1) call some kind of __cleanup__ (Marc-Andre) or tp_clean (Greg) method of the object; (2) Tim's proposal of an interface to ask the garbage collector for a trash cycle with a finalizer (or for an object with a finalizer in a trash cycle?). Somehow Tim's version looks less helpful to me, because it *seems* that whoever gets to handle the cycle (the main code of the program?) isn't necessarily responsible for creating it (some library you didn't even know was used under the covers of some other library you called). Of course, it's also posssible that a trash cycle is created by code outside the responsibility of the finalizer. But still, I have a hard time understanding how Tim's version would be used. Greg or Marc-Andre's version I understand. What keeps nagging me though is what to do when there's a finalizer but no cleanup method. I guess the trash cycle remains alive. Is this acceptable? (I guess so, because we've given the programmer a way to resolve the trash: provide a cleanup method.) If we detect individual cycles (the current algorithm doesn't do that yet, though it seems easy enough to do another scan), could we special-case cycles with only one finalizer and no cleaner-upper? (I'm tempted to call the finalizer because it seems little harm can be done -- but then of course there's the problem of the finalizer being called again when the refcount really goes to zero. :-( ) > Exactly. The *programmer* may know the right thing to do, but the Python > implementation can't possibly know. Facing both facts squarely constrains > the possibilities to the only ones that are all of understandable, > predictable and useful. Cycles with finalizers must be a Magic-Free Zone > else you lose at least one of those three: even Guido's kung fu isn't > strong enough to outguess this. > > [a nice implementation sketch, of what seems an overly elaborate scheme, > if you believe cycles with finalizers are rare in intelligently designed > code) > ] > > Provided Guido stays interested in this, he'll make his own fun. I'm just > inviting him to move in a sane direction <0.9 wink>. My current tendency is to go with the basic __cleanup__ and nothing more, calling each instance's __cleanup__ before clobbering directories and lists -- which should break all cycles safely. > One caution: > > > ... > > If the careful-cleaning algorithm hits the end of the careful set of > > objects and the set is non-empty, then throw an exception: > > GCImpossibleError. > > Since gc "can happen at any time", this is very severe (c.f. Guido's > objection to making resurrection illegal). Not quite. Cycle detection is presumably only called every once in a while on memory allocation, and memory *allocation* (as opposed to deallocation) is allowed to fail. Of course, this will probably run into various coding bugs where allocation failure isn't dealt with properly, because in practice this happens so rarely... > Hand a trash cycle back to the > programmer instead, via callback or request or whatever, and it's all > explicit without more cruft in the implementation. It's alive again when > they get it back, and they can do anything they want with it (including > resurrecting it, or dropping it again, or breaking cycles -- > anything). That was the idea with calling the finalizer too: it would be called between INCREF/DECREF, so the object would be considered alive for the duration of the finalizer call. Here's another way of looking at my error: for dicts and lists, I would call a special *clear* function; but for instances, I would call *dealloc*, however intending it to perform a *clear*. I wish we didn't have to special-case finalizers on class instances (since each dealloc function is potentially a combination of a finalizer and a deallocation routine), but the truth is that they *are* special -- __del__ has no responsibility for deallocating memory, only for deallocating external resources (such as temp files). And even if we introduced a tp_clean protocol that would clear dicts and lists and call __cleanup__ for instances, we'd still want to call it first for instances, because an instance depends on its __dict__ for its __cleanup__ to succeed (but the __dict__ doesn't depend on the instance for its cleanup). Greg's 3-phase tp_clean protocol seems indeed overly elaborate but I guess it deals with such dependencies in the most general fashion. > I'd focus on the cycles themselves, not on the types of objects > involved. I'm not pretending to address the "order of finalization > at shutdown" question, though (although I'd agree they're deeply > related: how do you follow a topological sort when there *isn't* > one? well, you don't, because you can't). In theory, you just delete the last root (a C global pointing to sys.modules) and you run the garbage collector. It might be more complicated in practiceto track down all roots. Another practical consideration is that now there are cycles of the form <=> which suggests that we should make function objects traceable. Also, modules can cross-reference, so module objects should be made traceable. I don't think that this will grow the sets of traced objects by too much (since the dicts involved are already traced, and a typical program has way fewer functions and modules than it has class instances). On the other hand, we may also have to trace (un)bound method objects, and these may be tricky because they are allocated and deallocated at high rates (once per typical method call). Back to the drawing board... --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@mojam.com (Skip Montanaro) Sun Mar 5 16:42:30 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Sun, 5 Mar 2000 10:42:30 -0600 (CST) Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: <200003051346.IAA05539@eric.cnri.reston.va.us> References: <000401bf8589$7d1364e0$c6a0143f@tim> <200003051346.IAA05539@eric.cnri.reston.va.us> Message-ID: <14530.36471.11654.666900@beluga.mojam.com> Guido> What keeps nagging me though is what to do when there's a Guido> finalizer but no cleanup method. I guess the trash cycle remains Guido> alive. Is this acceptable? (I guess so, because we've given the Guido> programmer a way to resolve the trash: provide a cleanup method.) That assumes the programmer even knows there's a cycle, right? I'd like to see this scheme help provide debugging assistance. If a cycle is discovered but the programmer hasn't declared a cleanup method for the object it wants to cleanup, a default cleanup method is called if it exists (e.g. sys.default_cleanup), which would serve mostly as an alert (print magic hex values to stderr, popup a Tk bomb dialog, raise the blue screen of death, ...) as opposed to actually breaking any cycles. Presumably the programmer would define sys.default_cleanup during development and leave it undefined during production. Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From paul@prescod.net Sat Mar 4 01:04:43 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 03 Mar 2000 17:04:43 -0800 Subject: [Python-Dev] breaking list.append() References: <38BC86E1.53F69776@prescod.net> <200003010411.XAA12988@eric.cnri.reston.va.us> Message-ID: <38C0612B.7C92F8C4@prescod.net> Guido van Rossum wrote: > > .. > Multi-arg > append probably won't be the only reason why e.g. Digital Creations > may need to release an update to Zope for Python 1.6. Zope comes with > its own version of Python anyway, so they have control over when they > make the switch. My concernc is when I want to build an application with a module that only works with Python 1.5.2 and another one that only works with Python 1.6. If we can avoid that situation by making 1.6 compatible with 1.5.2. we should. By the time 1.7 comes around I will accept that everyone has had enough time to update their modules. Remember that many module authors are just part time volunteers. They may only use Python every few months when they get a spare weekend! I really hope that Andrew is wrong when he predicts that there may be lots of different places where Python 1.6 breaks code! I'm in favor of being a total jerk when it comes to Py3K but Python has been pretty conservative thus far. Could someone remind in one sentence what the downside is for treating this as a warning condition as Java does with its deprecated features? Then the CP4E people don't get into bad habits and those same CP4E people trying to use older modules don't run into frustrating runtime errors. Do it for the CP4E people! (how's that for rhetoric) -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "We still do not know why mathematics is true and whether it is certain. But we know what we do not know in an immeasurably richer way than we did. And learning this has been a remarkable achievement, among the greatest and least known of the modern era." - from "Advent of the Algorithm" David Berlinski http://www.opengroup.com/mabooks/015/0151003386.shtml From jeremy@cnri.reston.va.us Sun Mar 5 17:46:14 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Sun, 5 Mar 2000 12:46:14 -0500 (EST) Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" inevery method In-Reply-To: <000901bf8670$97d8f320$432d153f@tim> References: <000901bf8670$97d8f320$432d153f@tim> Message-ID: <14530.40294.593407.777859@bitdiddle.cnri.reston.va.us> >>>>> "TP" == Tim Peters writes: >>> Grabbing the GregS/BillT enhancement is probably the most >>> practical thing we could build on right now >> You got some pointers? TP> Download python2c (http://www.mudlib.org/~rassilon/p2c/) and TP> grab transformer.py from the zip file. The latter supplies a TP> very useful post-processing pass over the parse module's output, TP> squashing it *way* down. The compiler tools in python/nondist/src/Compiler include Bill & Greg's transformer code, a class-based AST (each node is a subclass of the generic node), and a visitor framework for walking the AST. The APIs and organization are in a bit of flux; Mark Hammond suggested some reorganization that I've not finished yet. I may finish it up this evening. The transformer module does a good job of incuding line numbers, but I've occasionally run into a node that didn't have a lineno attribute when I expected it would. I haven't taken the time to figure out if my expection was unreasonable or if the transformer should be fixed. The compiler-sig might be a good place to discuss this further. A warning framework was one of my original goals for the SIG. I imagine we could convince Guido to move warnings + compiler tools into the standard library if they end up being useful. Jeremy From mal@lemburg.com Sun Mar 5 19:57:32 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 05 Mar 2000 20:57:32 +0100 Subject: [Python-Dev] Unicode mapping tables References: <000601bf8668$cbbdd640$432d153f@tim> Message-ID: <38C2BC2C.FFEB72C3@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > ... > > Here's what I'll do: > > > > * implement .capitalize() in the traditional way for Unicode > > objects (simply convert the first char to uppercase) > > Given .title(), is .capitalize() of use for Unicode strings? Or is it just > a temptation to do something senseless in the Unicode world? If it doesn't > make sense, leave it out (this *seems* like compulsion to implement > all current string methods in *some* way for Unicode, whether or not they > make sense). .capitalize() only touches the first char of the string - not sure whether it makes sense in both worlds ;-) Anyhow, the difference is there but subtle: string.capitalize() will use C's toupper() which is locale dependent, while unicode.capitalize() uses Unicode's toTitleCase() for the first character. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Sun Mar 5 20:15:47 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 05 Mar 2000 21:15:47 +0100 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? References: <000601bf8658$d81d34e0$f42d153f@tim> Message-ID: <38C2C073.CD51688@lemburg.com> Tim Peters wrote: > > [Guido] > > Would there really be someone out there who uses *intentional* > > resurrection? I severely doubt it. I've never heard of this. > > Why would anyone tell you about something that *works*?! You rarely hear > the good stuff, you know. I gave the typical pattern in the preceding msg. > To flesh out the motivation more, you have some external resource that's > very expensive to set up (in KSR's case, it was an IPC connection to a > remote machine). Rights to use that resource are handed out in the form of > an object. When a client is done using the resource, they *should* > explicitly use the object's .release() method, but you can't rely on that. > So the object's __del__ method looks like (for example): > > def __del__(self): > > # Code not shown to figure out whether to disconnect: the downside to > # disconnecting is that it can cost a bundle to create a new connection. > # If the whole app is shutting down, then of course we want to > disconnect. > # Or if a timestamp trace shows that we haven't been making good use of > # all the open connections lately, we may want to disconnect too. > > if decided_to_disconnect: > self.external_resource.disconnect() > else: > # keep the connection alive for reuse > global_available_connection_objects.append(self) > > This is simple & effective, and it relies on both intentional resurrection > and __del__ getting called repeatedly. I don't claim there's no other way > to write it, just that there's *been* no problem doing this for a millennium > . > > Note that MAL spontaneously sketched similar examples, although I can't say > whether he's actually done stuff like this. Not exactly this, but similar things in the weak reference implementation of mxProxy. The idea came from a different area: the C implementation of Python uses free lists a lot and these are basically implementations of the same idiom: save an allocated resource for reviving it at some later point. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From nascheme@enme.ucalgary.ca Mon Mar 6 00:27:54 2000 From: nascheme@enme.ucalgary.ca (nascheme@enme.ucalgary.ca) Date: Sun, 5 Mar 2000 17:27:54 -0700 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: <000001bf857a$60b45ac0$c6a0143f@tim>; from tim_one@email.msn.com on Fri, Mar 03, 2000 at 08:38:43PM -0500 References: <200003031650.LAA21647@eric.cnri.reston.va.us> <000001bf857a$60b45ac0$c6a0143f@tim> Message-ID: <20000305172754.A14998@acs.ucalgary.ca> On Fri, Mar 03, 2000 at 08:38:43PM -0500, Tim Peters wrote: > So here's what I'd consider doing: explicit is better than implicit, and in > the face of ambiguity refuse the temptation to guess. I like Marc's suggestion. Here is my proposal: Allow classes to have a new method, __cleanup__ or whatever you want to call it. When tp_clear is called for an instance, it checks for this method. If it exists, call it, otherwise delete the container objects from the instance's dictionary. When collecting cycles, call tp_clear for instances first. Its simple and allows the programmer to cleanly break cycles if they insist on creating them and using __del__ methods. Neil From tim_one@email.msn.com Mon Mar 6 07:13:21 2000 From: tim_one@email.msn.com (Tim Peters) Date: Mon, 6 Mar 2000 02:13:21 -0500 Subject: [Python-Dev] breaking list.append() In-Reply-To: <38C0612B.7C92F8C4@prescod.net> Message-ID: <000401bf873b$745f8320$ea2d153f@tim> [Paul Prescod] > ... > Could someone remind in one sentence what the downside is for treating > this as a warning condition as Java does with its deprecated features? Simply the lack of anything to build on: Python has no sort of runtime warning system now, and nobody has volunteered to create one. If you do , remember that stdout & stderr may go to the bit bucket in a GUI app. The bit about dropping the "L" suffix on longs seems unwarnable-about in any case (short of warning every time anyone uses long()). remember-that-you-asked-for-the-problems-not-for-solutions-ly y'rs - tim From tim_one@email.msn.com Mon Mar 6 07:33:49 2000 From: tim_one@email.msn.com (Tim Peters) Date: Mon, 6 Mar 2000 02:33:49 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: <38C2C073.CD51688@lemburg.com> Message-ID: <000701bf873e$5032eca0$ea2d153f@tim> [M.-A. Lemburg, on the resurrection/multiple-__del__ "idiom"] > ... > The idea came from a different area: the C implementation > of Python uses free lists a lot and these are basically > implementations of the same idiom: save an allocated > resource for reviving it at some later point. Excellent analogy! Thanks. Now that you phrased it in this clarifying way, I recall that very much the same point was raised in the papers that resulted in the creation of guardians in Scheme. I don't know that anyone is actually using Python __del__ this way today (I am not), but you reminded me why I thought it was natural at one time . generally-__del__-aversive-now-except-in-c++-where-destructors-are- guaranteed-to-be-called-when-you-except-them-to-be-ly y'rs - tim From tim_one@email.msn.com Mon Mar 6 08:12:06 2000 From: tim_one@email.msn.com (Tim Peters) Date: Mon, 6 Mar 2000 03:12:06 -0500 Subject: [Python-Dev] return statements in lambda In-Reply-To: <006f01bf8686$391ced80$34aab5d4@hagrid> Message-ID: <000901bf8743$a9f61aa0$ea2d153f@tim> [/F] > maybe adding an (optional but encouraged) "return" > to lambda would be an improvement? > > lambda x: x + 10 > > vs. > > lambda x: return x + 10 > > or is this just more confusing... opinions? It was an odd complaint to begin with, since Lisp-heads aren't used to using "return" anyway. More of a symptom of taking a shallow syntactic approach to a new (to them) language. For non-Lisp heads, I think it's more confusing in the end, blurring the distinction between stmts and expressions ("the body of a lambda must be an expression" ... "ok, i lied, unless it's a 'return' stmt). If Guido had it to do over again, I vote he rejects the original patch . Short of that, would have been better if the lambda arglist required parens, and if the body were required to be a single return stmt (that would sure end the "lambda x: print x" FAQ -- few would *expect* "return print x" to work!). hindsight-is-great-ly y'rs - tim From tim_one@email.msn.com Mon Mar 6 09:09:45 2000 From: tim_one@email.msn.com (Tim Peters) Date: Mon, 6 Mar 2000 04:09:45 -0500 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: <200003051346.IAA05539@eric.cnri.reston.va.us> Message-ID: <000b01bf874b$b6fe9da0$ea2d153f@tim> [Guido] > I'm beginning to believe that handing cycles with finalizers to the > user is better than calling __del__ with a different meaning, You won't be sorry: Python has the chance to be the first language that's both useful and sane here! > and I tentatively withdraw my proposal to change the rules for when > __del__is called (even when __init__ fails; I haven't had any complaints > about that either). Well, everyone liked the parenthetical half of that proposal, although Jack's example did point out a real surprise with it. > There seem to be two competing suggestions for solutions: (1) call > some kind of __cleanup__ (Marc-Andre) or tp_clean (Greg) method of the > object; (2) Tim's proposal of an interface to ask the garbage > collector for a trash cycle with a finalizer (or for an object with a > finalizer in a trash cycle?). Or a maximal strongly-connected component, or *something* -- unsure. > Somehow Tim's version looks less helpful to me, because it *seems* > that whoever gets to handle the cycle (the main code of the program?) > isn't necessarily responsible for creating it (some library you didn't > even know was used under the covers of some other library you called). Yes, to me too. This is the Scheme "guardian" idea in a crippled form (Scheme supports as many distinct guardians as the programmer cares to create), and even in its full-blown form it supplies "a perfectly general mechanism with no policy whatsoever". Greg convinced me (although I haven't admitted this yet ) that "no policy whatsoever" is un-Pythonic too. *Some* policy is helpful, so I won't be pushing the guardian idea any more (although see immediately below for an immediate backstep on that ). > ... > What keeps nagging me though is what to do when there's a finalizer > but no cleanup method. I guess the trash cycle remains alive. Is > this acceptable? (I guess so, because we've given the programmer a > way to resolve the trash: provide a cleanup method.) BDW considers it better to leak than to risk doing a wrong thing, and I agree wholeheartedly with that. GC is one place you want to have a "100% language". This is where something like a guardian can remain useful: while leaking is OK because you've given them an easy & principled alternative, leaking without giving them a clear way to *know* about it is not OK. If gc pushes the leaked stuff off to the side, the gc module should (say) supply an entry point that returns all the leaked stuff in a list. Then users can *know* they're leaking, know how badly they're leaking, and examine exactly the objects that are leaking. Then they've got the info they need to repair their program (or at least track down the 3rd-party module that's leaking). As with a guardian, they *could* also build a reclamation scheme on top of it, but that would no longer be the main (or even an encouraged) thrust. > If we detect individual cycles (the current algorithm doesn't do that > yet, though it seems easy enough to do another scan), could we > special-case cycles with only one finalizer and no cleaner-upper? > (I'm tempted to call the finalizer because it seems little harm can be > done -- but then of course there's the problem of the finalizer being > called again when the refcount really goes to zero. :-( ) "Better safe than sorry" is my immediate view on this -- you can't know that the finalizer won't resurrect the cycle, and "finalizer called iff refcount hits 0" is a wonderfully simple & predictable rule. That's worth a lot to preserve, unless & until it proves to be a disaster in practice. As to the details of cleanup, I haven't succeeded in making the time to understand all the proposals. But I've done my primary job here if I've harassed everyone into not repeating the same mistakes all previous languages have made <0.9 wink>. > ... > I wish we didn't have to special-case finalizers on class instances > (since each dealloc function is potentially a combination of a > finalizer and a deallocation routine), but the truth is that they > *are* special -- __del__ has no responsibility for deallocating > memory, only for deallocating external resources (such as temp files). And the problem is that __del__ can do anything whatsoever than can be expressed in Python, so there's not a chance in hell of outguessing it. > ... > Another practical consideration is that now there are cycles of the form > > <=> > > which suggests that we should make function objects traceable. Also, > modules can cross-reference, so module objects should be made > traceable. I don't think that this will grow the sets of traced > objects by too much (since the dicts involved are already traced, and > a typical program has way fewer functions and modules than it has > class instances). On the other hand, we may also have to trace > (un)bound method objects, and these may be tricky because they are > allocated and deallocated at high rates (once per typical method > call). This relates to what I was trying to get at with my response to your gc implementation sketch: mark-&-sweep needs to chase *everything*, so the set of chased types is maximal from the start. Adding chased types to the "indirectly infer what's unreachable via accounting for internal refcounts within the transitive closure" scheme can end up touching nearly as much as a full M-&-S pass per invocation. I don't know where the break-even point is, but the more stuff you chase in the latter scheme the less often you want to run it. About high rates, so long as a doubly-linked list allows efficient removal of stuff that dies via refcount exhaustion, you won't actually *chase* many bound method objects (i.e., they'll usually go away by themselves). Note in passing that bound method objects often showed up in cycles in IDLE, although you usually managed to break those in other ways. > Back to the drawing board... Good! That means you're making real progress . glad-someone-is-ly y'rs - tim From mal@lemburg.com Mon Mar 6 10:01:31 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 06 Mar 2000 11:01:31 +0100 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? References: <200003031650.LAA21647@eric.cnri.reston.va.us> <000001bf857a$60b45ac0$c6a0143f@tim> <20000305172754.A14998@acs.ucalgary.ca> Message-ID: <38C381FB.E222D6E4@lemburg.com> nascheme@enme.ucalgary.ca wrote: > > On Fri, Mar 03, 2000 at 08:38:43PM -0500, Tim Peters wrote: > > So here's what I'd consider doing: explicit is better than implicit, and in > > the face of ambiguity refuse the temptation to guess. > > I like Marc's suggestion. Here is my proposal: > > Allow classes to have a new method, __cleanup__ or whatever you > want to call it. When tp_clear is called for an instance, it > checks for this method. If it exists, call it, otherwise delete > the container objects from the instance's dictionary. When > collecting cycles, call tp_clear for instances first. > > Its simple and allows the programmer to cleanly break cycles if > they insist on creating them and using __del__ methods. Right :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Mon Mar 6 11:57:29 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 06 Mar 2000 12:57:29 +0100 Subject: [Python-Dev] Unicode character property methods Message-ID: <38C39D29.A29CE67F@lemburg.com> As you may have noticed, the Unicode objects provide new methods .islower(), .isupper() and .istitle(). Finn Bock mentioned that Java also provides .isdigit() and .isspace(). Question: should Unicode also provide these character property methods: .isdigit(), .isnumeric(), .isdecimal() and .isspace() ? Plus maybe .digit(), .numeric() and .decimal() for the corresponding decoding ? Similar APIs are already available through the unicodedata module, but could easily be moved to the Unicode object (they cause the builtin interpreter to grow a bit in size due to the new mapping tables). BTW, string.atoi et al. are currently not mapped to string methods... should they be ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@python.org Mon Mar 6 13:29:04 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 06 Mar 2000 08:29:04 -0500 Subject: [Python-Dev] Unicode character property methods In-Reply-To: Your message of "Mon, 06 Mar 2000 12:57:29 +0100." <38C39D29.A29CE67F@lemburg.com> References: <38C39D29.A29CE67F@lemburg.com> Message-ID: <200003061329.IAA09529@eric.cnri.reston.va.us> > As you may have noticed, the Unicode objects provide > new methods .islower(), .isupper() and .istitle(). Finn Bock > mentioned that Java also provides .isdigit() and .isspace(). > > Question: should Unicode also provide these character > property methods: .isdigit(), .isnumeric(), .isdecimal() > and .isspace() ? Plus maybe .digit(), .numeric() and > .decimal() for the corresponding decoding ? What would be the difference between isdigit, isnumeric, isdecimal? I'd say don't do more than Java. I don't understand what the "corresponding decoding" refers to. What would "3".decimal() return? > Similar APIs are already available through the unicodedata > module, but could easily be moved to the Unicode object > (they cause the builtin interpreter to grow a bit in size > due to the new mapping tables). > > BTW, string.atoi et al. are currently not mapped to > string methods... should they be ? They are mapped to int() c.s. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Mon Mar 6 15:09:55 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 6 Mar 2000 10:09:55 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: <200003051204.HAA05367@eric.cnri.reston.va.us> References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> <14529.55983.263225.691427@weyr.cnri.reston.va.us> <200003051204.HAA05367@eric.cnri.reston.va.us> Message-ID: <14531.51779.650532.881626@weyr.cnri.reston.va.us> Guido van Rossum writes: > - You could put it all in ConfigParser.py but with new classnames. > (Not sure though, since the ConfigParser class, which is really a > kind of weird variant, will be assumed to be the main class because > its name is that of the module.) The ConfigParser class could be clearly marked as deprecated both in the source/docstring and in the documentation. But the class itself should not be used in any way. > - Variants on the syntax could be given through some kind of option > system rather than through subclassing -- they should be combinable > independently. Som possible options (maybe I'm going overboard here) > could be: Yes, you are going overboard. It should contain exactly what's right for .ini files, and that's it. There are really three aspects to the beast: reading, using, and writing. I think there should be a class which does the right thing for using the informatin in the file, and reading & writing can be handled through functions or helper classes. That separates the parsing issues from the use issues, and alternate syntaxes will be easy enough to implement by subclassing the helper or writing a new function. An "editable" version that allows loading & saving without throwing away comments, ordering, etc. would require a largely separate implementation of all three aspects (or at least the reader and writer). > (Well maybe the whole substitution thing should really be done through > a subclass -- it's too weird for normal use.) That and the ad hoc syntax are my biggest beefs with ConfigParser. But it can easily be added by a subclass as long as the method to override is clearly specified in the documenation (it should only require one!). -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake@acm.org Mon Mar 6 17:47:44 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 6 Mar 2000 12:47:44 -0500 (EST) Subject: [Python-Dev] PyBufferProcs Message-ID: <14531.61248.941076.803617@weyr.cnri.reston.va.us> While working on the documentation, I've noticed a naming inconsistency regarding PyBufferProcs; it's peers are all named Py*Methods (PySequenceMethods, PyNumberMethods, etc.). I'd like to propose that a synonym, PyBufferMethods, be made for PyBufferProcs, and use that in the core implementations and the documentation. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From jeremy@cnri.reston.va.us Mon Mar 6 19:28:12 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Mon, 6 Mar 2000 14:28:12 -0500 (EST) Subject: [Python-Dev] example checkers based on compiler package Message-ID: <14532.1740.90292.440395@goon.cnri.reston.va.us> There was some discussion on python-dev over the weekend about generating warnings, and Moshe Zadke posted a selfnanny that warned about methods that didn't have self as the first argument. I think these kinds of warnings are useful, and I'd like to see a more general framework for them built are Python abstract syntax originally from P2C. Ideally, they would be available as command line tools and integrated into GUIs like IDLE in some useful way. I've included a couple of quick examples I coded up last night based on the compiler package (recently re-factored) that is resident in python/nondist/src/Compiler. The analysis on the one that checks for name errors is a bit of a mess, but the overall structure seems right. I'm hoping to collect a few more examples of checkers and generalize from them to develop a framework for checking for errors and reporting them. Jeremy ------------ checkself.py ------------ """Check for methods that do not have self as the first argument""" from compiler import parseFile, walk, ast, misc class Warning: def __init__(self, filename, klass, method, lineno, msg): self.filename = filename self.klass = klass self.method = method self.lineno = lineno self.msg = msg _template = "%(filename)s:%(lineno)s %(klass)s.%(method)s: %(msg)s" def __str__(self): return self._template % self.__dict__ class NoArgsWarning(Warning): super_init = Warning.__init__ def __init__(self, filename, klass, method, lineno): self.super_init(filename, klass, method, lineno, "no arguments") class NotSelfWarning(Warning): super_init = Warning.__init__ def __init__(self, filename, klass, method, lineno, argname): self.super_init(filename, klass, method, lineno, "self slot is named %s" % argname) class CheckSelf: def __init__(self, filename): self.filename = filename self.warnings = [] self.scope = misc.Stack() def inClass(self): if self.scope: return isinstance(self.scope.top(), ast.Class) return 0 def visitClass(self, klass): self.scope.push(klass) self.visit(klass.code) self.scope.pop() return 1 def visitFunction(self, func): if self.inClass(): classname = self.scope.top().name if len(func.argnames) == 0: w = NoArgsWarning(self.filename, classname, func.name, func.lineno) self.warnings.append(w) elif func.argnames[0] != "self": w = NotSelfWarning(self.filename, classname, func.name, func.lineno, func.argnames[0]) self.warnings.append(w) self.scope.push(func) self.visit(func.code) self.scope.pop() return 1 def check(filename): global p, check p = parseFile(filename) check = CheckSelf(filename) walk(p, check) for w in check.warnings: print w if __name__ == "__main__": import sys # XXX need to do real arg processing check(sys.argv[1]) ------------ badself.py ------------ def foo(): return 12 class Foo: def __init__(): pass def foo(self, foo): pass def bar(this, that): def baz(this=that): return this return baz def bar(): class Quux: def __init__(self): self.sum = 1 def quam(x, y): self.sum = self.sum + (x * y) return Quux() ------------ checknames.py ------------ """Check for NameErrors""" from compiler import parseFile, walk from compiler.misc import Stack, Set import __builtin__ from UserDict import UserDict class Warning: def __init__(self, filename, funcname, lineno): self.filename = filename self.funcname = funcname self.lineno = lineno def __str__(self): return self._template % self.__dict__ class UndefinedLocal(Warning): super_init = Warning.__init__ def __init__(self, filename, funcname, lineno, name): self.super_init(filename, funcname, lineno) self.name = name _template = "%(filename)s:%(lineno)s %(funcname)s undefined local %(name)s" class NameError(UndefinedLocal): _template = "%(filename)s:%(lineno)s %(funcname)s undefined name %(name)s" class NameSet(UserDict): """Track names and the line numbers where they are referenced""" def __init__(self): self.data = self.names = {} def add(self, name, lineno): l = self.names.get(name, []) l.append(lineno) self.names[name] = l class CheckNames: def __init__(self, filename): self.filename = filename self.warnings = [] self.scope = Stack() self.gUse = NameSet() self.gDef = NameSet() # _locals is the stack of local namespaces # locals is the top of the stack self._locals = Stack() self.lUse = None self.lDef = None self.lGlobals = None # var declared global # holds scope,def,use,global triples for later analysis self.todo = [] def enterNamespace(self, node): ## print node.name self.scope.push(node) self.lUse = use = NameSet() self.lDef = _def = NameSet() self.lGlobals = gbl = NameSet() self._locals.push((use, _def, gbl)) def exitNamespace(self): ## print self.todo.append((self.scope.top(), self.lDef, self.lUse, self.lGlobals)) self.scope.pop() self._locals.pop() if self._locals: self.lUse, self.lDef, self.lGlobals = self._locals.top() else: self.lUse = self.lDef = self.lGlobals = None def warn(self, warning, funcname, lineno, *args): args = (self.filename, funcname, lineno) + args self.warnings.append(apply(warning, args)) def defName(self, name, lineno, local=1): ## print "defName(%s, %s, local=%s)" % (name, lineno, local) if self.lUse is None: self.gDef.add(name, lineno) elif local == 0: self.gDef.add(name, lineno) self.lGlobals.add(name, lineno) else: self.lDef.add(name, lineno) def useName(self, name, lineno, local=1): ## print "useName(%s, %s, local=%s)" % (name, lineno, local) if self.lUse is None: self.gUse.add(name, lineno) elif local == 0: self.gUse.add(name, lineno) self.lUse.add(name, lineno) else: self.lUse.add(name, lineno) def check(self): for s, d, u, g in self.todo: self._check(s, d, u, g, self.gDef) # XXX then check the globals def _check(self, scope, _def, use, gbl, globals): # check for NameError # a name is defined iff it is in def.keys() # a name is global iff it is in gdefs.keys() gdefs = UserDict() gdefs.update(globals) gdefs.update(__builtin__.__dict__) defs = UserDict() defs.update(gdefs) defs.update(_def) errors = Set() for name in use.keys(): if not defs.has_key(name): firstuse = use[name][0] self.warn(NameError, scope.name, firstuse, name) errors.add(name) # check for UndefinedLocalNameError # order == use & def sorted by lineno # elements are lineno, flag, name # flag = 0 if use, flag = 1 if def order = [] for name, lines in use.items(): if gdefs.has_key(name) and not _def.has_key(name): # this is a global ref, we can skip it continue for lineno in lines: order.append(lineno, 0, name) for name, lines in _def.items(): for lineno in lines: order.append(lineno, 1, name) order.sort() # ready contains names that have been defined or warned about ready = Set() for lineno, flag, name in order: if flag == 0: # use if not ready.has_elt(name) and not errors.has_elt(name): self.warn(UndefinedLocal, scope.name, lineno, name) ready.add(name) # don't warn again else: ready.add(name) # below are visitor methods def visitFunction(self, node, noname=0): for expr in node.defaults: self.visit(expr) if not noname: self.defName(node.name, node.lineno) self.enterNamespace(node) for name in node.argnames: self.defName(name, node.lineno) self.visit(node.code) self.exitNamespace() return 1 def visitLambda(self, node): return self.visitFunction(node, noname=1) def visitClass(self, node): for expr in node.bases: self.visit(expr) self.defName(node.name, node.lineno) self.enterNamespace(node) self.visit(node.code) self.exitNamespace() return 1 def visitName(self, node): self.useName(node.name, node.lineno) def visitGlobal(self, node): for name in node.names: self.defName(name, node.lineno, local=0) def visitImport(self, node): for name in node.names: self.defName(name, node.lineno) visitFrom = visitImport def visitAssName(self, node): self.defName(node.name, node.lineno) def check(filename): global p, checker p = parseFile(filename) checker = CheckNames(filename) walk(p, checker) checker.check() for w in checker.warnings: print w if __name__ == "__main__": import sys # XXX need to do real arg processing check(sys.argv[1]) ------------ badnames.py ------------ # XXX can we detect race conditions on accesses to global variables? # probably can (conservatively) by noting variables _created_ by # global decls in funcs import string import time def foo(x): return x + y def foo2(x): return x + z a = 4 def foo3(x): a, b = x, a def bar(x): z = x global z def bar2(x): f = string.strip a = f(x) import string return string.lower(a) def baz(x, y): return x + y + z def outer(x): def inner(y): return x + y return inner From gstein@lyra.org Mon Mar 6 21:09:33 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 6 Mar 2000 13:09:33 -0800 (PST) Subject: [Python-Dev] PyBufferProcs In-Reply-To: <14531.61248.941076.803617@weyr.cnri.reston.va.us> Message-ID: On Mon, 6 Mar 2000, Fred L. Drake, Jr. wrote: > While working on the documentation, I've noticed a naming > inconsistency regarding PyBufferProcs; it's peers are all named > Py*Methods (PySequenceMethods, PyNumberMethods, etc.). > I'd like to propose that a synonym, PyBufferMethods, be made for > PyBufferProcs, and use that in the core implementations and the > documentation. +0 Although.. I might say that it should be renamed, and a synonym (#define or typedef?) be provided for the old name. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal@lemburg.com Mon Mar 6 22:04:14 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 06 Mar 2000 23:04:14 +0100 Subject: [Python-Dev] Unicode character property methods References: <38C39D29.A29CE67F@lemburg.com> <200003061329.IAA09529@eric.cnri.reston.va.us> Message-ID: <38C42B5E.42801755@lemburg.com> Guido van Rossum wrote: > > > As you may have noticed, the Unicode objects provide > > new methods .islower(), .isupper() and .istitle(). Finn Bock > > mentioned that Java also provides .isdigit() and .isspace(). > > > > Question: should Unicode also provide these character > > property methods: .isdigit(), .isnumeric(), .isdecimal() > > and .isspace() ? Plus maybe .digit(), .numeric() and > > .decimal() for the corresponding decoding ? > > What would be the difference between isdigit, isnumeric, isdecimal? > I'd say don't do more than Java. I don't understand what the > "corresponding decoding" refers to. What would "3".decimal() return? These originate in the Unicode database; see ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html Here are the descriptions: """ 6 Decimal digit value normative This is a numeric field. If the character has the decimal digit property, as specified in Chapter 4 of the Unicode Standard, the value of that digit is represented with an integer value in this field 7 Digit value normative This is a numeric field. If the character represents a digit, not necessarily a decimal digit, the value is here. This covers digits which do not form decimal radix forms, such as the compatibility superscript digits 8 Numeric value normative This is a numeric field. If the character has the numeric property, as specified in Chapter 4 of the Unicode Standard, the value of that character is represented with an integer or rational number in this field. This includes fractions as, e.g., "1/5" for U+2155 VULGAR FRACTION ONE FIFTH Also included are numerical values for compatibility characters such as circled numbers. u"3".decimal() would return 3. u"\u2155". Some more examples from the unicodedata module (which makes all fields of the database available in Python): >>> unicodedata.decimal(u"3") 3 >>> unicodedata.decimal(u"²") 2 >>> unicodedata.digit(u"²") 2 >>> unicodedata.numeric(u"²") 2.0 >>> unicodedata.numeric(u"\u2155") 0.2 >>> unicodedata.numeric(u'\u215b') 0.125 > > Similar APIs are already available through the unicodedata > > module, but could easily be moved to the Unicode object > > (they cause the builtin interpreter to grow a bit in size > > due to the new mapping tables). > > > > BTW, string.atoi et al. are currently not mapped to > > string methods... should they be ? > > They are mapped to int() c.s. Hmm, I just noticed that int() et friends don't like Unicode... shouldn't they use the "t" parser marker instead of requiring a string or tp_int compatible type ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@python.org Mon Mar 6 23:12:33 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 06 Mar 2000 18:12:33 -0500 Subject: [Python-Dev] Unicode character property methods In-Reply-To: Your message of "Mon, 06 Mar 2000 23:04:14 +0100." <38C42B5E.42801755@lemburg.com> References: <38C39D29.A29CE67F@lemburg.com> <200003061329.IAA09529@eric.cnri.reston.va.us> <38C42B5E.42801755@lemburg.com> Message-ID: <200003062312.SAA11697@eric.cnri.reston.va.us> [MAL] > > > As you may have noticed, the Unicode objects provide > > > new methods .islower(), .isupper() and .istitle(). Finn Bock > > > mentioned that Java also provides .isdigit() and .isspace(). > > > > > > Question: should Unicode also provide these character > > > property methods: .isdigit(), .isnumeric(), .isdecimal() > > > and .isspace() ? Plus maybe .digit(), .numeric() and > > > .decimal() for the corresponding decoding ? [Guido] > > What would be the difference between isdigit, isnumeric, isdecimal? > > I'd say don't do more than Java. I don't understand what the > > "corresponding decoding" refers to. What would "3".decimal() return? [MAL] > These originate in the Unicode database; see > > ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html > > Here are the descriptions: > > """ > 6 > Decimal digit value > normative > This is a numeric field. If the > character has the decimal digit > property, as specified in Chapter > 4 of the Unicode Standard, the > value of that digit is represented > with an integer value in this field > 7 > Digit value > normative > This is a numeric field. If the > character represents a digit, not > necessarily a decimal digit, the > value is here. This covers digits > which do not form decimal radix > forms, such as the compatibility > superscript digits > 8 > Numeric value > normative > This is a numeric field. If the > character has the numeric > property, as specified in Chapter > 4 of the Unicode Standard, the > value of that character is > represented with an integer or > rational number in this field. This > includes fractions as, e.g., "1/5" for > U+2155 VULGAR FRACTION > ONE FIFTH Also included are > numerical values for compatibility > characters such as circled > numbers. > > u"3".decimal() would return 3. u"\u2155". > > Some more examples from the unicodedata module (which makes > all fields of the database available in Python): > > >>> unicodedata.decimal(u"3") > 3 > >>> unicodedata.decimal(u"²") > 2 > >>> unicodedata.digit(u"²") > 2 > >>> unicodedata.numeric(u"²") > 2.0 > >>> unicodedata.numeric(u"\u2155") > 0.2 > >>> unicodedata.numeric(u'\u215b') > 0.125 Hm, very Unicode centric. Probably best left out of the general string methods. Isspace() seems useful, and an isdigit() that is only true for ASCII '0' - '9' also makes sense. What about "123".isdigit()? What does Java say? Or do these only apply to single chars there? I think "123".isdigit() should be true if "abc".islower() is true. > > > Similar APIs are already available through the unicodedata > > > module, but could easily be moved to the Unicode object > > > (they cause the builtin interpreter to grow a bit in size > > > due to the new mapping tables). > > > > > > BTW, string.atoi et al. are currently not mapped to > > > string methods... should they be ? > > > > They are mapped to int() c.s. > > Hmm, I just noticed that int() et friends don't like > Unicode... shouldn't they use the "t" parser marker > instead of requiring a string or tp_int compatible > type ? Good catch. Go ahead. --Guido van Rossum (home page: http://www.python.org/~guido/) From Moshe Zadka Tue Mar 7 05:25:43 2000 From: Moshe Zadka (Moshe Zadka) Date: Tue, 7 Mar 2000 07:25:43 +0200 (IST) Subject: [Python-Dev] Re: example checkers based on compiler package In-Reply-To: <14532.1740.90292.440395@goon.cnri.reston.va.us> Message-ID: On Mon, 6 Mar 2000, Jeremy Hylton wrote: > I think these kinds of warnings are useful, and I'd like to see a more > general framework for them built are Python abstract syntax originally > from P2C. Ideally, they would be available as command line tools and > integrated into GUIs like IDLE in some useful way. Yes! Guido already suggested we have a standard API to them. One thing I suggested was that the abstract API include not only the input (one form or another of an AST), but the output: so IDE's wouldn't have to parse strings, but get a warning class. Something like a: An output of a warning can be a subclass of GeneralWarning, and should implemented the following methods: 1. line-no() -- returns an integer 2. columns() -- returns either a pair of integers, or None 3. message() -- returns a string containing a message 4. __str__() -- comes for free if inheriting GeneralWarning, and formats the warning message. > I've included a couple of quick examples I coded up last night based > on the compiler package (recently re-factored) that is resident in > python/nondist/src/Compiler. The analysis on the one that checks for > name errors is a bit of a mess, but the overall structure seems right. One thing I had trouble with is that in my implementation of selfnanny, I used Python's stack for recursion while you used an explicit stack. It's probably because of the visitor pattern, which is just another argument for co-routines and generators. > I'm hoping to collect a few more examples of checkers and generalize > from them to develop a framework for checking for errors and reporting > them. Cool! Brainstorming: what kind of warnings would people find useful? In selfnanny, I wanted to include checking for assigment to self, and checking for "possible use before definition of local variables" sounds good. Another check could be a CP4E "checking that no two identifiers differ only by case". I might code up a few if I have the time... What I'd really want (but it sounds really hard) is a framework for partial ASTs: warning people as they write code. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From mwh21@cam.ac.uk Tue Mar 7 08:31:23 2000 From: mwh21@cam.ac.uk (Michael Hudson) Date: 07 Mar 2000 08:31:23 +0000 Subject: [Python-Dev] Re: [Compiler-sig] Re: example checkers based on compiler package In-Reply-To: Moshe Zadka's message of "Tue, 7 Mar 2000 07:25:43 +0200 (IST)" References: Message-ID: Moshe Zadka writes: > On Mon, 6 Mar 2000, Jeremy Hylton wrote: > > > I think these kinds of warnings are useful, and I'd like to see a more > > general framework for them built are Python abstract syntax originally > > from P2C. Ideally, they would be available as command line tools and > > integrated into GUIs like IDLE in some useful way. > > Yes! Guido already suggested we have a standard API to them. One thing > I suggested was that the abstract API include not only the input (one form > or another of an AST), but the output: so IDE's wouldn't have to parse > strings, but get a warning class. That would be seriously cool. > Something like a: > > An output of a warning can be a subclass of GeneralWarning, and should > implemented the following methods: > > 1. line-no() -- returns an integer > 2. columns() -- returns either a pair of integers, or None > 3. message() -- returns a string containing a message > 4. __str__() -- comes for free if inheriting GeneralWarning, > and formats the warning message. Wouldn't it make sense to include function/class name here too? A checker is likely to now, and it would save reparsing to find it out. [little snip] > > I'm hoping to collect a few more examples of checkers and generalize > > from them to develop a framework for checking for errors and reporting > > them. > > Cool! > Brainstorming: what kind of warnings would people find useful? In > selfnanny, I wanted to include checking for assigment to self, and > checking for "possible use before definition of local variables" sounds > good. Another check could be a CP4E "checking that no two identifiers > differ only by case". I might code up a few if I have the time... Is there stuff in the current Compiler code to do control flow analysis? You'd need that to check for use before definition in meaningful cases, and also if you ever want to do any optimisation... > What I'd really want (but it sounds really hard) is a framework for > partial ASTs: warning people as they write code. I agree (on both points). Cheers, M. -- very few people approach me in real life and insist on proving they are drooling idiots. -- Erik Naggum, comp.lang.lisp From mal@lemburg.com Tue Mar 7 09:14:25 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 07 Mar 2000 10:14:25 +0100 Subject: [Python-Dev] Unicode character property methods References: <38C39D29.A29CE67F@lemburg.com> <200003061329.IAA09529@eric.cnri.reston.va.us> <38C42B5E.42801755@lemburg.com> <200003062312.SAA11697@eric.cnri.reston.va.us> Message-ID: <38C4C871.F47E17A3@lemburg.com> Guido van Rossum wrote: > [MAL about adding .isdecimal(), .isdigit() and .isnumeric()] > > Some more examples from the unicodedata module (which makes > > all fields of the database available in Python): > > > > >>> unicodedata.decimal(u"3") > > 3 > > >>> unicodedata.decimal(u"²") > > 2 > > >>> unicodedata.digit(u"²") > > 2 > > >>> unicodedata.numeric(u"²") > > 2.0 > > >>> unicodedata.numeric(u"\u2155") > > 0.2 > > >>> unicodedata.numeric(u'\u215b') > > 0.125 > > Hm, very Unicode centric. Probably best left out of the general > string methods. Isspace() seems useful, and an isdigit() that is only > true for ASCII '0' - '9' also makes sense. Well, how about having all three on Unicode objects and only .isdigit() on string objects ? > What about "123".isdigit()? What does Java say? Or do these only > apply to single chars there? I think "123".isdigit() should be true > if "abc".islower() is true. In the current uPython implementation u"123".isdigit() is true; same for the other two methods. > > > > Similar APIs are already available through the unicodedata > > > > module, but could easily be moved to the Unicode object > > > > (they cause the builtin interpreter to grow a bit in size > > > > due to the new mapping tables). > > > > > > > > BTW, string.atoi et al. are currently not mapped to > > > > string methods... should they be ? > > > > > > They are mapped to int() c.s. > > > > Hmm, I just noticed that int() et friends don't like > > Unicode... shouldn't they use the "t" parser marker > > instead of requiring a string or tp_int compatible > > type ? > > Good catch. Go ahead. Done. float(), int() and long() now accept charbuf compatible objects as argument. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Tue Mar 7 09:23:35 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 07 Mar 2000 10:23:35 +0100 Subject: [Python-Dev] Adding Unicode methods to string objects Message-ID: <38C4CA97.5D0AA9D@lemburg.com> Before starting to code away, I would like to know which of the new Unicode methods should also be available on string objects. Here are the currently available methods: Unicode objects string objects ------------------------------------ capitalize capitalize center count count encode endswith endswith expandtabs find find index index isdecimal isdigit islower isnumeric isspace istitle isupper join join ljust lower lower lstrip lstrip replace replace rfind rfind rindex rindex rjust rstrip rstrip split split splitlines startswith startswith strip strip swapcase swapcase title title translate translate (*) upper upper zfill (*) The two hvae slightly different implementations, e.g. deletions are handled differently. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik@pythonware.com Tue Mar 7 11:54:56 2000 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 7 Mar 2000 12:54:56 +0100 Subject: [Python-Dev] Adding Unicode methods to string objects References: <38C4CA97.5D0AA9D@lemburg.com> Message-ID: <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com> > Unicode objects string objects > expandtabs =20 yes. I'm pretty sure there's "expandtabs" code in the strop module. maybe barry missed it? > center > ljust > rjust =20 probably. the implementation is trivial, and ljust/rjust are somewhat useful, so you might as well add them all (just cut and paste from the unicode class). what about rguido and lguido, btw? > zfill =20 no. From guido@python.org Tue Mar 7 13:52:00 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 07 Mar 2000 08:52:00 -0500 Subject: [Python-Dev] finalization again Message-ID: <200003071352.IAA13571@eric.cnri.reston.va.us> Warning: long message. If you're not interested in reading all this, please skip to "Conclusion" at the end. At Tim's recommendation I had a look at what section 12.6 of the Java language spec says about finalizers. The stuff there is sure seductive for language designers... Have a look at te diagram at http://java.sun.com/docs/books/jls/html/12.doc.html#48746. In all its (seeming) complexity, it helped me understand some of the issues of finalization better. Rather than the complex 8-state state machine that it appears to be, think of it as a simple 3x3 table. The three rows represent the categories reachable, finalizer-reachable (abbreviated in the diagram as f-reachable), and unreachable. These categories correspond directly to categories of objects that the Schemenauer-Tiedemann cycle-reclamation scheme deals with: after moving all the reachable objects to the second list (first the roots and then the objects reachable from the roots), the first list is left with the unreachable and finalizer-reachable objects. If we want to distinguish between unreachable and finalizer-reachable at this point, a straightforward application of the same algorithm will work well: Create a third list (this will contain the finalizer-reachable objects). Start by filling it with all the objects from the first list (which contains the potential garbage at this point) that have a finalizer. We can look for objects that have __del__ or __clean__ or for which tp_clean(CARE_EXEC)==true, it doesn't matter here.(*) Then walk through the third list, following each object's references, and move all referenced objects that are still in the first list to the third list. Now, we have: List 1: truly unreachable objects. These have no finalizers and can be discarded right away. List 2: truly reachable objects. (Roots and objects reachable from roots.) Leave them alone. List 3: finalizer-reachable objects. This contains objects that are unreachable but have a finalizer, and objects that are only reachable through those. We now have to decide on a policy for invoking finalizers. Java suggests the following: Remember the "roots" of the third list -- the nodes that were moved there directly from the first list because they have a finalizer. These objects are marked *finalizable* (a category corresponding to the second *column* of the Java diagram). The Java spec allows the Java garbage collector to call all of these finalizers in any order -- even simultaneously in separate threads. Java never allows an object to go back from the finalizable to the unfinalized state (there are no arrows pointing left in the diagram). The first finalizer that is called could make its object reachable again (up arrow), thereby possibly making other finalizable objects reachable too. But this does not cancel their scheduled finalization! The conclusion is that Java can sometimes call finalization on unreachable objects -- but only if those objects have gone through a phase in their life where they were unreachable or at least finalizer-unreachable. I agree that this is the best that Java can do: if there are cycles containing multiple objects with finalizers, there is no way (short of asking the programmer(s)) to decide which object to finalize first. We could pick one at random, run its finalizer, and start garbage collection all over -- if the finalizer doesn't resurrect anything, this will give us the same set of unreachable objects, from which we could pick the next finalizable object, and so on. That looks very inefficient, might not terminate (the same object could repeatedly show up as the candidate for finalization), and it's still arbitrary: the programmer(s) still can't predict which finalizer in a cycle with multiple finalizers will be called first. Assuming the recommended characteristics of finalizers (brief and robust), it won't make much difference if we call all finalizers (of the now-finalizeable objects) "without looking back". Sure, some objects may find themselves in a perfectly reachable position with their finalizer called -- but they did go through a "near-death experience". I don't find this objectionable, and I don't see how Java could possibly do better for cycles with multiple finalizers. Now let's look again at the rule that an object's finalizer will be called at most once automatically by the garbage collector. The transitions between the colums of the Java diagram enforce this: the columns are labeled from left to right with unfinalized, finalizable, and finalized, and there are no transition arrows pointing left. (In my description above, already finalized objects are considered not to have a finalizer.) I think this rule makes a lot of sense given Java's multi-threaded garbage collection: the invocation of finalizers could run concurreltly with another garbage collection, and we don't want this to find some of the same finalizable objects and call their finalizers again! We could mark them with a "finalization in progress" flag only while their finalizer is running, but in a cycle with multiple finalizers it seems we should keep this flag set until *all* finalizers for objects in the cycle have run. But we don't actually know exactly what the cycles are: all we know is "these objects are involved in trash cycles". More detailed knowledge would require yet another sweep, plus a more hairy two-dimensional data structure (a list of separate cycles). And for what? as soon as we run finalizers from two separate cycles, those cycles could be merged again (e.g. the first finalizer could resurrect its cycle, and the second one could link to it). Now we have a pool of objects that are marked "finalization in progress" until all their finalizations terminate. For an incremental concurrent garbage collector, this seems a pain, since it may continue to find new finalizable objects and add them to the pile. Java takes the logical conclusion: the "finalization in progress" flag is never cleared -- and renamed to "finalized". Conclusion ---------- Are the Java rules complex? Yes. Are there better rules possible? I'm not so sure, given the requirement of allowing concurrent incremental garbage collection algorithms that haven't even been invented yet. (Plus the implied requirement that finalizers in trash cycles should be invoked.) Are the Java rules difficult for the user? Only for users who think they can trick finalizers into doing things for them that they were not designed to do. I would think the following guidelines should do nicely for the rest of us: 1. Avoid finalizers if you can; use them only to release *external* (e.g. OS) resources. 2. Write your finalizer as robust as you can, with as little use of other objects as you can. 3. Your only get one chance. Use it. Unlike Scheme guardians or the proposed __cleanup__ mechanism, you don't have to know whether your object is involved in a cycle -- your finalizer will still be called. I am reconsidering to use the __del__ method as the finalizer. As a compromise to those who want their __del__ to run whenever the reference count reaches zero, the finalized flag can be cleared explicitly. I am considering to use the following implementation: after retrieving the __del__ method, but before calling it, self.__del__ is set to None (better, self.__dict__['__del__'] = None, to avoid confusing __setattr__ hooks). The object call remove self.__del__ to clear the finalized flag. I think I'll use the same mechanism to prevent __del__ from being called upon a failed initialization. Final note: the semantics "__del__ is called whenever the reference count reaches zero" cannot be defended in the light of a migration to different forms of garbage collection (e.g. JPython). There may not be a reference count. --Guido van Rossum (home page: http://www.python.org/~guido/) ____ (*) Footnote: there's one complication: to ask a Python class instance if it has a finalizer, we have to use PyObject_Getattr(obj, ...). If the object's class has a __getattr__ hook, this can invoke arbitrary Python code -- even if the answer to the question is "no"! This can make the object reachable again (in the Java diagram, arrows pointing up or up and right). We could either use instance_getattr1(), which avoids the __getattr__ hook, or mark all class instances as finalizable until proven innocent. From gward@cnri.reston.va.us Tue Mar 7 14:04:30 2000 From: gward@cnri.reston.va.us (Greg Ward) Date: Tue, 7 Mar 2000 09:04:30 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: <200003051204.HAA05367@eric.cnri.reston.va.us>; from guido@python.org on Sun, Mar 05, 2000 at 07:04:56AM -0500 References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> <14529.55983.263225.691427@weyr.cnri.reston.va.us> <200003051204.HAA05367@eric.cnri.reston.va.us> Message-ID: <20000307090430.A16948@cnri.reston.va.us> On 05 March 2000, Guido van Rossum said: > - Variants on the syntax could be given through some kind of option > system rather than through subclassing -- they should be combinable > independently. Som possible options (maybe I'm going overboard here) > could be: > > - comment characters: ('#', ';', both, others?) > - comments after variables allowed? on sections? > - variable characters: (':', '=', both, others?) > - quoting of values with "..." allowed? > - backslashes in "..." allowed? > - does backslash-newline mean a continuation? > - case sensitivity for section names (default on) > - case sensitivity for option names (default off) > - variables allowed before first section name? > - first section name? (default "main") > - character set allowed in section names > - character set allowed in variable names > - %(...) substitution? I agree with Fred that this level of flexibility is probably overkill for a config file parser; you don't want every application author who uses the module to have to explain his particular variant of the syntax. However, if you're interested in a class that *does* provide some of the above flexibility, I have written such a beast. It's currently used to parse the Distutils MANIFEST.in file, and I've considered using it for the mythical Distutils config files. (And it also gets heavy use in my day job.) It's really a class for reading a file in preparation for "text processing the Unix way", though: it doesn't say anything about syntax, it just worries about blank lines, comments, continuations, and a few other things. Here's the class docstring: class TextFile: """Provides a file-like object that takes care of all the things you commonly want to do when processing a text file that has some line-by-line syntax: strip comments (as long as "#" is your comment character), skip blank lines, join adjacent lines by escaping the newline (ie. backslash at end of line), strip leading and/or trailing whitespace, and collapse internal whitespace. All of these are optional and independently controllable. Provides a 'warn()' method so you can generate warning messages that report physical line number, even if the logical line in question spans multiple physical lines. Also provides 'unreadline()' for implementing line-at-a-time lookahead. Constructor is called as: TextFile (filename=None, file=None, **options) It bombs (RuntimeError) if both 'filename' and 'file' are None; 'filename' should be a string, and 'file' a file object (or something that provides 'readline()' and 'close()' methods). It is recommended that you supply at least 'filename', so that TextFile can include it in warning messages. If 'file' is not supplied, TextFile creates its own using the 'open()' builtin. The options are all boolean, and affect the value returned by 'readline()': strip_comments [default: true] strip from "#" to end-of-line, as well as any whitespace leading up to the "#" -- unless it is escaped by a backslash lstrip_ws [default: false] strip leading whitespace from each line before returning it rstrip_ws [default: true] strip trailing whitespace (including line terminator!) from each line before returning it skip_blanks [default: true} skip lines that are empty *after* stripping comments and whitespace. (If both lstrip_ws and rstrip_ws are true, then some lines may consist of solely whitespace: these will *not* be skipped, even if 'skip_blanks' is true.) join_lines [default: false] if a backslash is the last non-newline character on a line after stripping comments and whitespace, join the following line to it to form one "logical line"; if N consecutive lines end with a backslash, then N+1 physical lines will be joined to form one logical line. collapse_ws [default: false] after stripping comments and whitespace and joining physical lines into logical lines, all internal whitespace (strings of whitespace surrounded by non-whitespace characters, and not at the beginning or end of the logical line) will be collapsed to a single space. Note that since 'rstrip_ws' can strip the trailing newline, the semantics of 'readline()' must differ from those of the builtin file object's 'readline()' method! In particular, 'readline()' returns None for end-of-file: an empty string might just be a blank line (or an all-whitespace line), if 'rstrip_ws' is true but 'skip_blanks' is not.""" Interested in having something like this in the core? Adding more options is possible, but the code is already on the hairy side to support all of these. And I'm not a big fan of the subtle difference in semantics with file objects, but honestly couldn't think of a better way at the time. If you're interested, you can download it from http://www.mems-exchange.org/exchange/software/python/text_file/ or just use the version in the Distutils CVS tree. Greg From mal@lemburg.com Tue Mar 7 14:38:09 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 07 Mar 2000 15:38:09 +0100 Subject: [Python-Dev] Adding Unicode methods to string objects References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com> Message-ID: <38C51451.D38B21FE@lemburg.com> Fredrik Lundh wrote: > > > Unicode objects string objects > > expandtabs > > yes. > > I'm pretty sure there's "expandtabs" code in the > strop module. maybe barry missed it? > > > center > > ljust > > rjust > > probably. > > the implementation is trivial, and ljust/rjust are > somewhat useful, so you might as well add them > all (just cut and paste from the unicode class). > > what about rguido and lguido, btw? Ooops, forgot those, thanks :-) > > zfill > > no. Why not ? Since the string implementation had all of the above marked as TBD, I added all four. What about the other new methods (.isXXX() and .splitlines()) ? .isXXX() are mostly needed due to the extended character properties in Unicode. They would be new to the string object world. .splitlines() is Unicode aware and also treats CR/LF combinations across platforms: S.splitlines([maxsplit]]) -> list of strings Return a list of the lines in S, breaking at line boundaries. If maxsplit is given, at most maxsplit are done. Line breaks are not included in the resulting list. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@python.org Tue Mar 7 15:38:18 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 07 Mar 2000 10:38:18 -0500 Subject: [Python-Dev] Adding Unicode methods to string objects In-Reply-To: Your message of "Tue, 07 Mar 2000 15:38:09 +0100." <38C51451.D38B21FE@lemburg.com> References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com> <38C51451.D38B21FE@lemburg.com> Message-ID: <200003071538.KAA13977@eric.cnri.reston.va.us> > > > zfill > > > > no. > > Why not ? Zfill is (or ought to be) deprecated. It stems from times before we had things like "%08d" % x and no longer serves a useful purpose. I doubt anyone would miss it. (Of course, now /F will claim that PIL will break in 27 places because of this. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one@email.msn.com Tue Mar 7 17:07:40 2000 From: tim_one@email.msn.com (Tim Peters) Date: Tue, 7 Mar 2000 12:07:40 -0500 Subject: [Python-Dev] finalization again In-Reply-To: <200003071352.IAA13571@eric.cnri.reston.va.us> Message-ID: <000701bf8857$a56ed660$a72d153f@tim> [Guido] > ... > Conclusion > ---------- > > Are the Java rules complex? Yes. Are there better rules possible? I'm > not so sure, given the requirement of allowing concurrent incremental > garbage collection algorithms that haven't even been invented > yet. Guy Steele worked his ass off on Java's rules. He had as much real-world experience with implementing GC as anyone, via his long & deep Lisp implementation background (both SW & HW), and indeed invented several key techniques in high-performance GC. But he had no background in GC with user-defined finalizers -- and it shows! > (Plus the implied requirement that finalizers in trash cycles > should be invoked.) Are the Java rules difficult for the user? Only > for users who think they can trick finalizers into doing things for > them that they were not designed to do. This is so implementation-centric it's hard to know what to say <0.5 wink>. The Java rules weren't designed to do much of anything except guarantee that Java (1) would eventually reclaim all unreachable objects, and (2) wouldn't expose dangling pointers to user finalizers, or chase any itself. Whatever *useful* finalizer semantics may remain are those that just happened to survive. > ... > Unlike Scheme guardians or the proposed __cleanup__ mechanism, you > don't have to know whether your object is involved in a cycle -- your > finalizer will still be called. This is like saying a user doesn't have to know whether the new drug prescribed for them by their doctor has potentially fatal side effects -- they'll be forced to take it regardless . > ... > Final note: the semantics "__del__ is called whenever the reference > count reaches zero" cannot be defended in the light of a migration to > different forms of garbage collection (e.g. JPython). There may not > be a reference count. 1. I don't know why JPython doesn't execute __del__ methods at all now, but have to suspect that the Java rules imply an implementation so grossly inefficient in the presence of __del__ that Barry simply doesn't want to endure the speed complaints. The Java spec itself urges implementations to special-case the snot out of classes that don't override the default do-nothing finalizer, for "go fast" reasons too. 2. The "refcount reaches 0" rule in CPython is merely a wonderfully concrete way to get across the idea of "destruction occurs in an order consistent with a topological sort of the points-to graph". The latter is explicit in the BDW collector, which has no refcounts; the topsort concept is applicable and thoroughly natural in all languages; refcounts in CPython give an exploitable hint about *when* collection will occur, but add no purely semantic constraint beyond the topsort requirement (they neatly *imply* the topsort requirement). There is no topsort in the presence of cycles, so cycles create problems in all languages. The same "throw 'em back at the user" approach makes just as much sense from the topsort view as the RC view; it doesn't rely on RC at all. stop-the-insanity-ly y'rs - tim From guido@python.org Tue Mar 7 17:33:31 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 07 Mar 2000 12:33:31 -0500 Subject: [Python-Dev] finalization again In-Reply-To: Your message of "Tue, 07 Mar 2000 12:07:40 EST." <000701bf8857$a56ed660$a72d153f@tim> References: <000701bf8857$a56ed660$a72d153f@tim> Message-ID: <200003071733.MAA14926@eric.cnri.reston.va.us> [Tim tells Guido again that he finds the Java rules bad, slinging some mud at Guy Steel, but without explaining what the problem with them is, and then asks:] > 1. I don't know why JPython doesn't execute __del__ methods at all now, but > have to suspect that the Java rules imply an implementation so grossly > inefficient in the presence of __del__ that Barry simply doesn't want to > endure the speed complaints. The Java spec itself urges implementations to > special-case the snot out of classes that don't override the default > do-nothing finalizer, for "go fast" reasons too. Something like that, yes, although it was Jim Hugunin. I have a feeling it has to do with the dynamic of __del__ -- this would imply that *all* Python class instances would appear to Java to have a finalizer -- just in most cases it would do a failing lookup of __del__ and bail out quickly. Maybe some source code or class analysis looking for a __del__ could fix this, at the cost of not allowing one to patch __del__ into an existing class after instances have already been created. I don't find that breach of dynamicism a big deal -- e.g. CPython keeps copies of __getattr__, __setattr__ and __delattr__ in the class for similar reasons. > 2. The "refcount reaches 0" rule in CPython is merely a wonderfully concrete > way to get across the idea of "destruction occurs in an order consistent > with a topological sort of the points-to graph". The latter is explicit in > the BDW collector, which has no refcounts; the topsort concept is applicable > and thoroughly natural in all languages; refcounts in CPython give an > exploitable hint about *when* collection will occur, but add no purely > semantic constraint beyond the topsort requirement (they neatly *imply* the > topsort requirement). There is no topsort in the presence of cycles, so > cycles create problems in all languages. The same "throw 'em back at the > user" approach makes just as much sense from the topsort view as the RC > view; it doesn't rely on RC at all. Indeed. I propose to throw it back at the user by calling __del__. The typical user defines __del__ because they want to close a file, say goodbye nicely on a socket connection, or delete a temp file. That sort of thing. This is what finalizers are *for*. As an author of this kind of finalizer, I don't see why I need to know whether I'm involved in a cycle or not. I want my finalizer called when my object goes away, and I don't want my object kept alive by unreachable cycles. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Tue Mar 7 17:39:15 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 07 Mar 2000 18:39:15 +0100 Subject: [Python-Dev] Adding Unicode methods to string objects References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com> <38C51451.D38B21FE@lemburg.com> Message-ID: <38C53EC3.5292ECF@lemburg.com> I've ported most of the Unicode methods to strings now. Here's the new table: Unicode objects string objects ------------------------------------------------------------ capitalize capitalize center center count count encode endswith endswith expandtabs expandtabs find find index index isdecimal isdigit isdigit islower islower isnumeric isspace isspace istitle istitle isupper isupper join join ljust ljust lower lower lstrip lstrip replace replace rfind rfind rindex rindex rjust rjust rstrip rstrip split split splitlines splitlines startswith startswith strip strip swapcase swapcase title title translate translate upper upper zfill zfill I don't think that .isdecimal() and .isnumeric() are needed for strings since most of the added mappings refer to Unicode char points. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Tue Mar 7 17:42:53 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 07 Mar 2000 18:42:53 +0100 Subject: [Python-Dev] Adding Unicode methods to string objects References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com> <38C51451.D38B21FE@lemburg.com> <200003071538.KAA13977@eric.cnri.reston.va.us> Message-ID: <38C53F9D.44C3A0F3@lemburg.com> Guido van Rossum wrote: > > > > > zfill > > > > > > no. > > > > Why not ? > > Zfill is (or ought to be) deprecated. It stems from times before we > had things like "%08d" % x and no longer serves a useful purpose. > I doubt anyone would miss it. > > (Of course, now /F will claim that PIL will break in 27 places because > of this. :-) Ok, I'll remove it from both implementations again... (there was some email overlap). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From bwarsaw@cnri.reston.va.us Tue Mar 7 19:24:39 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 7 Mar 2000 14:24:39 -0500 (EST) Subject: [Python-Dev] finalization again References: <200003071352.IAA13571@eric.cnri.reston.va.us> <000701bf8857$a56ed660$a72d153f@tim> Message-ID: <14533.22391.447739.901802@anthem.cnri.reston.va.us> >>>>> "TP" =3D=3D Tim Peters writes: TP> 1. I don't know why JPython doesn't execute __del__ methods at TP> all now, but have to suspect that the Java rules imply an TP> implementation so grossly inefficient in the presence of TP> __del__ that Barry simply doesn't want to endure the speed TP> complaints. Actually, it was JimH that discovered this performance gotcha. The problem is that if you want to support __del__, you've got to take the finalize() hit for every instance (i.e. PyInstance object) and it's just not worth it. I just realized that it would be relatively trivial to add a subclass of PyInstance differing only in that it has a finalize() method which would invoke __del__(). Now when the class gets defined, the __del__() would be mined and cached and we'd look at that cache when creating an instance. If there's a function there, we create a PyFinalizableInstance, otherwise we create a PyInstance. The cache means you couldn't dynamically add a __del__ later, but I don't think that's a big deal. It wouldn't be hard to look up the __del__ every time, but that'd be a hit for every instance creation (as opposed to class creation), so again, it's probably not worth it. I just did a quick and dirty hack and it seems at first blush to work. I'm sure there's something I'm missing :). For those of you who don't care about JPython, you can skip the rest. Okay, first the Python script to exercise this, then the PyFinalizableInstance.java file, and then the diffs to PyClass.java. JPython-devers, is it worth adding this? -------------------- snip snip --------------------del.py class B: def __del__(self): print 'In my __del__' b =3D B() del b from java.lang import System System.gc() -------------------- snip snip --------------------PyFinalizableInstanc= e.java // Copyright =A9 Corporation for National Research Initiatives // These are just like normal instances, except that their classes incl= uded // a definition for __del__(), i.e. Python's finalizer. These two inst= ance // types have to be separated due to Java performance issues. package org.python.core; public class PyFinalizableInstance extends PyInstance=20 { public PyFinalizableInstance(PyClass iclass) { super(iclass); } // __del__ method is invoked upon object finalization. protected void finalize() { __class__.__del__.__call__(this); } } -------------------- snip snip -------------------- Index: PyClass.java =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /projects/cvsroot/jpython/dist/org/python/core/PyClass.java,v= retrieving revision 2.8 diff -c -r2.8 PyClass.java *** PyClass.java=091999/10/04 20:44:28=092.8 --- PyClass.java=092000/03/07 19:02:29 *************** *** 21,27 **** =20 // Store these methods for performance optimization // These are only used by PyInstance ! PyObject __getattr__, __setattr__, __delattr__, __tojava__; =20 // Holds the classes for which this is a proxy // Only used when subclassing from a Java class --- 21,27 ---- =20 // Store these methods for performance optimization // These are only used by PyInstance ! PyObject __getattr__, __setattr__, __delattr__, __tojava__, __del= __; =20 // Holds the classes for which this is a proxy // Only used when subclassing from a Java class *************** *** 111,116 **** --- 111,117 ---- __setattr__ =3D lookup("__setattr__", false); __delattr__ =3D lookup("__delattr__", false); __tojava__ =3D lookup("__tojava__", false); + __del__ =3D lookup("__del__", false); } =20 protected void findModule(PyObject dict) { *************** *** 182,188 **** } =20 public PyObject __call__(PyObject[] args, String[] keywords) { ! PyInstance inst =3D new PyInstance(this); inst.__init__(args, keywords); return inst; } --- 183,194 ---- } =20 public PyObject __call__(PyObject[] args, String[] keywords) { ! PyInstance inst; ! if (__del__ =3D=3D null) ! inst =3D new PyInstance(this); ! else ! // the class defined an __del__ method ! inst =3D new PyFinalizableInstance(this); inst.__init__(args, keywords); return inst; } From bwarsaw@cnri.reston.va.us Tue Mar 7 19:35:44 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 7 Mar 2000 14:35:44 -0500 (EST) Subject: [Python-Dev] finalization again References: <000701bf8857$a56ed660$a72d153f@tim> <200003071733.MAA14926@eric.cnri.reston.va.us> Message-ID: <14533.23056.517661.633574@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> Maybe some source code or class analysis looking for a GvR> __del__ could fix this, at the cost of not allowing one to GvR> patch __del__ into an existing class after instances have GvR> already been created. I don't find that breach of dynamicism GvR> a big deal -- e.g. CPython keeps copies of __getattr__, GvR> __setattr__ and __delattr__ in the class for similar reasons. For those of you who enter the "Being Guido van Rossum" door like I just did, please keep in mind that it dumps you out not on the NJ Turnpike, but in the little ditch back behind CNRI. Stop by and say hi after you brush yourself off. -Barry From Tim_Peters@Dragonsys.com Tue Mar 7 22:30:16 2000 From: Tim_Peters@Dragonsys.com (Tim_Peters@Dragonsys.com) Date: Tue, 7 Mar 2000 17:30:16 -0500 Subject: [Python-Dev] finalization again Message-ID: <8525689B.007AB2BA.00@notes-mta.dragonsys.com> [Guido] > Tim tells Guido again that he finds the Java rules bad, slinging some > mud at Guy Steele, but without explaining what the problem with them > is ... Slinging mud? Let's back off here. You've read the Java spec and were impressed. That's fine -- it is impressive . But go on from there and see where it leads in practice. That Java's GC model did a masterful job but includes a finalization model users dislike is really just conventional wisdom in the Java world. My sketch of Guy Steele's involvement was an attempt to explain why both halves of that are valid. I didn't think "explaining the problem" was necessary, as it's been covered in depth multiple times in c.l.py threads, by Java programmers as well as by me. Searching the web for articles about this turns up many; the first one I hit is typical: http://www.quoininc.com/quoininc/Design_Java0197.html eventually concludes Consequently we recommend that [Java] programmers support but do not rely on finalization. That is, place all finalization semantics in finalize() methods, but call those methods explicitly and in the order required. The points below provide more detail. That's par for the Java course: advice to write finalizers to survive being called multiple times, call them explicitly, and do all you can to ensure that the "by magic" call is a nop. The lack of ordering rules in the language forces people to "do it by hand" (as the Java spec acknowledges: "It is straightforward to implement a Java class that will cause a set of finalizer-like methods to be invoked in a specified order for a set of objects when all the objects become unreachable. Defining such a class is left as an exercise for the reader." But from what I've seen, that exercise is beyond the imagination of most Java programmers! The perceived need for ordering is not.). It's fine that you want to restrict finalizers to "simple" cases; it's not so fine if the language can't ensure that simple cases are the only ones the user can write, & can neither detect & complain at runtime about cases it didn't intend to support. The Java spec is unhelpful here too: Therefore, we recommend that the design of finalize methods be kept simple and that they be programmed defensively, so that they will work in all cases. Mom and apple pie, but what does it mean, exactly? The spec realizes that you're going to be tempted to try things that won't work, but can't really explain what those are in terms simpler than the full set of implementation consequences. As a result, users hate it -- but don't take my word for that! If you look & don't find that Java's finalization rules are widely viewed as "a problem to be wormed around" by serious Java programmers, fine -- then you've got a much better search engine than mine . As for why I claim following topsort rules is very likely to work out better, they follow from the nature of the problem, and can be explained as such, independent of implementation details. See the Boehm reference for more about topsort. will-personally-use-python-regardless-ly y'rs - tim From guido@python.org Wed Mar 8 00:50:38 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 07 Mar 2000 19:50:38 -0500 Subject: [Python-Dev] finalization again In-Reply-To: Your message of "Tue, 07 Mar 2000 17:30:16 EST." <8525689B.007AB2BA.00@notes-mta.dragonsys.com> References: <8525689B.007AB2BA.00@notes-mta.dragonsys.com> Message-ID: <200003080050.TAA19264@eric.cnri.reston.va.us> > [Guido] > > Tim tells Guido again that he finds the Java rules bad, slinging some > > mud at Guy Steele, but without explaining what the problem with them > > is ... > > Slinging mud? Let's back off here. You've read the Java spec and were > impressed. That's fine -- it is impressive . But go on from > there and see where it leads in practice. That Java's GC model did a > masterful job but includes a finalization model users dislike is really > just conventional wisdom in the Java world. My sketch of Guy Steele's > involvement was an attempt to explain why both halves of that are valid. Granted. I can read Java code and sometimes I write some, but I'm not a Java programmer by any measure, and I wasn't aware that finalize() has a general bad rep. > I didn't think "explaining the problem" was necessary, as it's been > covered in depth multiple times in c.l.py threads, by Java programmers > as well as by me. Searching the web for articles about this turns up > many; the first one I hit is typical: > > http://www.quoininc.com/quoininc/Design_Java0197.html > > eventually concludes > > Consequently we recommend that [Java] programmers support but do > not rely on finalization. That is, place all finalization semantics > in finalize() methods, but call those methods explicitly and in the > order required. The points below provide more detail. > > That's par for the Java course: advice to write finalizers to survive > being called multiple times, call them explicitly, and do all you can > to ensure that the "by magic" call is a nop. It seems the authors make one big mistake: they recommend to call finalize() explicitly. This may be par for the Java course: the quality of the materials is often poor, and that has to be taken into account when certain features have gotten a bad rep. (These authors also go on at length about the problems of GC in a real-time situation -- attempts to use Java in sutations for which it is inappropriate are also par for the cours, inspired by all the hype.) Note that e.g. Bruce Eckel in "Thinking in Java" makes it clear that you should never call finalize() explicitly (except that you should always call super.fuinalize() in your finalize() method). (Bruce goes on at length explaining that there aren't a lot of things you should use finalize() for -- except to observe the garbage collector. :-) > The lack of ordering > rules in the language forces people to "do it by hand" (as the Java > spec acknowledges: "It is straightforward to implement a Java class > that will cause a set of finalizer-like methods to be invoked in a > specified order for a set of objects when all the objects become > unreachable. Defining such a class is left as an exercise for the > reader." But from what I've seen, that exercise is beyond the > imagination of most Java programmers! The perceived need for ordering > is not.). True, but note that Python won't have the ordering problem, at least not as long as we stick to reference counting as the primary means of GC. The ordering problem in Python will only happen when there are cycles, and there you really can't blame the poor GC design! > It's fine that you want to restrict finalizers to "simple" cases; it's > not so fine if the language can't ensure that simple cases are the only > ones the user can write, & can neither detect & complain at runtime > about cases it didn't intend to support. The Java spec is unhelpful > here too: > > Therefore, we recommend that the design of finalize methods be kept > simple and that they be programmed defensively, so that they will > work in all cases. > > Mom and apple pie, but what does it mean, exactly? The spec realizes > that you're going to be tempted to try things that won't work, but > can't really explain what those are in terms simpler than the full set > of implementation consequences. As a result, users hate it -- but > don't take my word for that! If you look & don't find that Java's > finalization rules are widely viewed as "a problem to be wormed around" > by serious Java programmers, fine -- then you've got a much better > search engine than mine . Hm. Of course programmers hate finalizers. They hate GC as well. But they hate even more not to have it (witness the relentless complaints about Python's "lack of GC" -- and Java's GC is often touted as one of the reasons for its superiority over C++). I think this stuff is just hard! (Otherwise why would we be here having this argument?) > As for why I claim following topsort rules is very likely to work out > better, they follow from the nature of the problem, and can be > explained as such, independent of implementation details. See the > Boehm reference for more about topsort. Maybe we have a disconnect? We *are* using topsort -- for non-cyclical data structures. Reference counting ensure that. Nothing in my design changes that. The issue at hand is what to do with *cyclical* data structures, where topsort doesn't help. Boehm, on http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html, says: "Cycles involving one or more finalizable objects are never finalized." The question remains, what to do with trash cycles? I find having a separate __cleanup__ protocol cumbersome. I think that the "finalizer only called once by magic" rule is reasonable. I believe that the ordering problems will be much less than in Java, because we use topsort whenever we can. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one@email.msn.com Wed Mar 8 06:25:56 2000 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 8 Mar 2000 01:25:56 -0500 Subject: [Python-Dev] finalization again In-Reply-To: <200003080050.TAA19264@eric.cnri.reston.va.us> Message-ID: <001401bf88c7$29f2a320$452d153f@tim> [Guido] > Granted. I can read Java code and sometimes I write some, but I'm not > a Java programmer by any measure, and I wasn't aware that finalize() > has a general bad rep. It does, albeit often for bad reasons. 1. C++ programmers seeking to emulate techniques based on C++'s rigid specification of the order and timing of destruction of autos. 2. People pushing the limits (as in the URL I happened to post). 3. People trying to do anything . Java's finalization semantics are very weak, and s-l-o-w too (under most current implementations). Now I haven't used Java for real in about two years, and avoided finalizers completely when I did use it. I can't recall any essential use of __del__ I make in Python code, either. So what Python does here makes no personal difference to me. However, I frequently respond to complaints & questions on c.l.py, and don't want to get stuck trying to justify Java's uniquely baroque rules outside of comp.lang.java <0.9 wink>. >> [Tim, passes on the first relevant URL he finds: >> http://www.quoininc.com/quoininc/Design_Java0197.html] > It seems the authors make one big mistake: they recommend to call > finalize() explicitly. This may be par for the Java course: the > quality of the materials is often poor, and that has to be taken into > account when certain features have gotten a bad rep. Well, in the "The Java Programming Language", Gosling recommends to: a) Add a method called close(), that tolerates being called multiple times. b) Write a finalize() method whose body calls close(). People tended to do that at first, but used a bunch of names other than "close" too. I guess people eventually got weary of having two methods that did the same thing, so decided to just use the single name Java guaranteed would make sense. > (These authors also go on at length about the problems of GC in a real- > time situation -- attempts to use Java in sutations for which it is > inappropriate are also par for the course, inspired by all the hype.) I could have picked any number of other URLs, but don't regret picking this one: you can't judge a ship in smooth waters, and people will push *all* features beyond their original intents. Doing so exposes weaknesses. Besides, Sun won't come out & say Java is unsuitable for real-time, no matter how obvious it is . > Note that e.g. Bruce Eckel in "Thinking in Java" makes it clear that > you should never call finalize() explicitly (except that you should > always call super.fuinalize() in your finalize() method). You'll find lots of conflicting advice here, be it about Java or C++. Java may be unique, though, in the universality of the conclusion Bruce draws here: > (Bruce goes on at length explaining that there aren't a lot of things > you should use finalize() for -- except to observe the garbage collector. :-) Frankly, I think Java would be better off without finalizers. Python could do fine without __del__ too -- if you and I were the only users <0.6 wink>. [on Java's lack of ordering promises] > True, but note that Python won't have the ordering problem, at least > not as long as we stick to reference counting as the primary means of > GC. The ordering problem in Python will only happen when there are > cycles, and there you really can't blame the poor GC design! I cannot. Nor do I intend to. The cyclic ordering problem isn't GC's fault, it's the program's; but GC's *response* to it is entirely GC's responsibility. >> ... The Java spec is unhelpful here too: >> >> Therefore, we recommend that the design of finalize methods be kept >> simple and that they be programmed defensively, so that they will >> work in all cases. >> >> Mom and apple pie, but what does it mean, exactly? The spec realizes >> that you're going to be tempted to try things that won't work, but >> can't really explain what those are in terms simpler than the full set >> of implementation consequences. As a result, users hate it -- but >> don't take my word for that! If you look & don't find that Java's >> finalization rules are widely viewed as "a problem to be wormed around" >> by serious Java programmers, fine -- then you've got a much better >> search engine than mine . > Hm. Of course programmers hate finalizers. Oh no! C++ programmers *love* destructors! I mean it, they're absolutely gaga over them. I haven't detected signs that CPython programmers hate __del__ either, except at shutdown time. Regardless of language, they love them when they're predictable and work as expected, they hate them when they're unpredictable and confusing. C++ auto destructors are extremely predictable (e.g., after "{SomeClass a, b; ...}", b is destructed before a, and both destructions are guaranteed before leaving the block they're declared in, regardless of whether via return, exception, goto or falling off the end). CPython's __del__ is largely predictable (modulo shutdown, cycles, and sometimes exceptions). The unhappiness in the Java world comes from Java finalizers' unpredictability and consequent all-around uselessness in messy real life. > They hate GC as well. Yes, when it's unpredictable and confusing . > But they hate even more not to have it (witness the relentless > complaints about Python's "lack of GC" -- and Java's GC is often > touted as one of the reasons for its superiority over C++). Back when JimF & I were looking at gc, we may have talked each other into really believing that paying careful attention to RC issues leads to cleaner and more robust designs. In fact, I still believe that, and have never clamored for "real gc" in Python. Jim now may even be opposed to "real gc". But Jim and I and you all think a lot about the art of programming, and most users just don't have time or inclination for that -- the slowly changing nature of c.l.py is also clear evidence of this. I'm afraid this makes growing "real GC" a genuine necessity for Python's continued growth. It's not a *bad* thing in any case. Think of it as a marketing requirement <0.7 wink>. > I think this stuff is just hard! (Otherwise why would we be here > having this argument?) Honest to Guido, I think it's because you're sorely tempted to go down an un-Pythonic path here, and I'm fighting that. I said early on there are no thoroughly good answers (yes, it's hard), but that's nothing new for Python! We're having this argument solely because you're confusing Python with some other language . [a 2nd or 3rd plug for taking topsort seriously] > Maybe we have a disconnect? Not in the technical analysis, but in what conclusions to take from it. > We *are* using topsort -- for non-cyclical data structures. Reference > counting ensure that. Nothing in my design changes that. And it's great! Everyone understands the RC rules pretty quickly, lots of people like them a whole lot, and if it weren't for cyclic trash everything would be peachy. > The issue at hand is what to do with *cyclical* data structures, where > topsort doesn't help. Boehm, on > http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html, > says: "Cycles involving one or more finalizable objects are never > finalized." This is like some weird echo chamber, where the third time I shout something the first one comes back without any distortion at all . Yes, Boehm's first rule is "Do No Harm". It's a great rule. Python follows the same rule all over the place; e.g., when you see x = "4" + 2 you can't possibly know what was intended, so you refuse to guess: you would rather *kill* the program than make a blind guess! I see cycles with finalizers as much the same: it's plain wrong to guess when you can't possibly know what was intended. Because topsort is the only principled way to decide order of finalization, and they've *created* a situation where a topsort doesn't exist, what they're handing you is no less amibiguous than in trying to add a string to an int. This isn't the time to abandon topsort as inconvenient, it's the time to defend it as inviolate principle! The only throughly rational response is "you know, this doesn't make sense -- since I can't know what you want here, I refuse to pretend that I can". Since that's "the right" response everywhere else in Python, what the heck is so special about this case? It's like you decided Python *had* to allow adding strings to ints, and now we're going to argue about whether Perl, Awk or Tcl makes the best unprincipled guess . > The question remains, what to do with trash cycles? A trash cycle without a finalizer isn't a problem, right? In that case, topsort rules have no visible consquence so it doesn't matter in what order you merely reclaim the memory. If it has an object with a finalizer, though, at the very worst you can let it leak, and make the collection of leaked objects available for inspection. Even that much is a *huge* "improvement" over what they have today: most cycles won't have a finalizer and so will get reclaimed, and for the rest they'll finally have a simple way to identify exactly where the problem is, and a simple criterion for predicting when it will happen. If that's not "good enough", then without abandoning principle the user needs to have some way to reduce such a cycle *to* a topsort case themself. > I find having a separate __cleanup__ protocol cumbersome. Same here, but if you're not comfortable leaking, and you agree Python is not in the business of guesing in inherently ambiguous situations, maybe that's what it takes! MAL and GregS both gravitated to this kind of thing at once, and that's at least suggestive; and MAL has actually been using his approach. It's explicit, and that's Pythonic on the face of it. > I think that the "finalizer only called once by magic" rule is reasonable. If it weren't for its specific use in emulating Java's scheme, would you still be in favor of that? It's a little suspicious that it never came up before . > I believe that the ordering problems will be much less than in Java, because > we use topsort whenever we can. No argument here, except that I believe there's never sufficient reason to abandon topsort ordering. Note that BDW's adamant refusal to yield on this hasn't stopped "why doesn't Python use BDW?" from becoming a FAQ . a-case-where-i-expect-adhering-to-principle-is-more-pragmatic- in-the-end-ly y'rs - tim From tim_one@email.msn.com Wed Mar 8 07:48:24 2000 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 8 Mar 2000 02:48:24 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? Message-ID: <001801bf88d2$af0037c0$452d153f@tim> Mike has a darned good point here. Anyone have a darned good answer ? -----Original Message----- From: python-list-admin@python.org [mailto:python-list-admin@python.org] On Behalf Of Mike Fletcher Sent: Tuesday, March 07, 2000 2:08 PM To: Python Listserv (E-mail) Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage all over the hard-disk, traffic rerouted through the bit-bucket, you aren't getting to work anytime soon Mrs. Programmer) and wondering why we have a FAQ instead of having the win32pipe stuff rolled into the os module to fix it. Is there some incompatibility? Is there a licensing problem? Ideas? Mike __________________________________ Mike C. Fletcher Designer, VR Plumber http://members.home.com/mcfletch -- http://www.python.org/mailman/listinfo/python-list From mal@lemburg.com Wed Mar 8 08:36:57 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 08 Mar 2000 09:36:57 +0100 Subject: [Python-Dev] finalization again References: <8525689B.007AB2BA.00@notes-mta.dragonsys.com> <200003080050.TAA19264@eric.cnri.reston.va.us> Message-ID: <38C61129.2F8C9E95@lemburg.com> > [Guido] > The question remains, what to do with trash cycles? I find having a > separate __cleanup__ protocol cumbersome. I think that the "finalizer > only called once by magic" rule is reasonable. I believe that the > ordering problems will be much less than in Java, because we use > topsort whenever we can. Note that the __cleanup__ protocol is intended to break cycles *before* calling the garbage collector. After those cycles are broken, ordering is not a problem anymore and because __cleanup__ can do its task on a per-object basis all magic is left in the hands of the programmer. The __cleanup__ protocol as I use it is designed to be called in situations where the system knows that all references into a cycle are about to be dropped (I typically use small cyclish object systems in my application, e.g. ones that create and reference namespaces which include a reference to the hosting object itself). In my application that is done by using mxProxies at places where I know these cyclic object subsystems are being referenced. In Python the same could be done whenever the interpreter knows that a certain object is about to be deleted, e.g. during shutdown (important for embedding Python in other applications such as Apache) or some other major subsystem finalization, e.g. unload of a module or killing of a thread (yes, I know these are nonos, but they could be useful, esp. the thread kill operation in multi-threaded servers). After __cleanup__ has done its thing, the finalizer can either choose to leave all remaining cycles in memory (and leak) or apply its own magic to complete the task. In any case, __del__ should be called when the refcount reaches 0. (I find it somewhat strange that people are argueing to keep external resources alive even though there is a chance of freeing them.) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Wed Mar 8 08:46:14 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 08 Mar 2000 09:46:14 +0100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: <001801bf88d2$af0037c0$452d153f@tim> Message-ID: <38C61356.E0598DBF@lemburg.com> Tim Peters wrote: > > Mike has a darned good point here. Anyone have a darned good answer ? > > -----Original Message----- > From: python-list-admin@python.org [mailto:python-list-admin@python.org] > On Behalf Of Mike Fletcher > Sent: Tuesday, March 07, 2000 2:08 PM > To: Python Listserv (E-mail) > Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be > adopted? > > Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage > all over the hard-disk, traffic rerouted through the bit-bucket, you aren't > getting to work anytime soon Mrs. Programmer) and wondering why we have a > FAQ instead of having the win32pipe stuff rolled into the os module to fix > it. Is there some incompatibility? Is there a licensing problem? > > Ideas? I'd suggest moving the popen from the C modules into os.py as Python API and then applying all necessary magic to either use the win32pipe implementation (if available) or the native C one from the posix module in os.py. Unless, of course, the win32 stuff (or some of it) makes it into the core. I'm mostly interested in this for my platform.py module... BTW, is there any interest of moving it into the core ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@python.org Wed Mar 8 12:10:53 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 08 Mar 2000 07:10:53 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: Your message of "Wed, 08 Mar 2000 09:46:14 +0100." <38C61356.E0598DBF@lemburg.com> References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> Message-ID: <200003081210.HAA19931@eric.cnri.reston.va.us> > Tim Peters wrote: > > > > Mike has a darned good point here. Anyone have a darned good answer ? > > Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be > > adopted? > > > > Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage > > all over the hard-disk, traffic rerouted through the bit-bucket, you aren't > > getting to work anytime soon Mrs. Programmer) and wondering why we have a > > FAQ instead of having the win32pipe stuff rolled into the os module to fix > > it. Is there some incompatibility? Is there a licensing problem? MAL: > I'd suggest moving the popen from the C modules into os.py > as Python API and then applying all necessary magic to either > use the win32pipe implementation (if available) or the native > C one from the posix module in os.py. > > Unless, of course, the win32 stuff (or some of it) makes it into > the core. No concrete plans -- except that I think the registry access is supposed to go in. Haven't seen the code on patches@python.org yet though. > I'm mostly interested in this for my platform.py module... > BTW, is there any interest of moving it into the core ? "it" == platform.py? Little interest from me personally; I suppose it could go in Tools/scripts/... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Mar 8 14:06:53 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 08 Mar 2000 09:06:53 -0500 Subject: [Python-Dev] finalization again In-Reply-To: Your message of "Wed, 08 Mar 2000 01:25:56 EST." <001401bf88c7$29f2a320$452d153f@tim> References: <001401bf88c7$29f2a320$452d153f@tim> Message-ID: <200003081406.JAA20033@eric.cnri.reston.va.us> > A trash cycle without a finalizer isn't a problem, right? In that case, > topsort rules have no visible consquence so it doesn't matter in what order > you merely reclaim the memory. When we have a pile of garbage, we don't know whether it's all connected or whether it's lots of little cycles. So if we find [objects with -- I'm going to omit this] finalizers, we have to put those on a third list and put everything reachable from them on that list as well (the algorithm I described before). What's left on the first list then consists of finalizer-free garbage. We dispose of this garbage by clearing dicts and lists. Hopefully this makes the refcount of some of the finalizers go to zero -- those are finalized in the normal way. And now we have to deal with the inevitable: finalizers that are part of cycles. It makes sense to reduce the graph of objects to a graph of finalizers only. Example: A <=> b -> C <=> d A and C have finalizers. C is part of a cycle (C-d) that contains no other finalizers, but C is also reachable from A. A is part of a cycle (A-b) that keeps it alive. The interesting thing here is that if we only look at the finalizers, there are no cycles! If we reduce the graph to only finalizers (setting aside for now the problem of how to do that -- we may need to allocate more memory to hold the reduced greaph), we get: A -> C We can now finalize A (even though its refcount is nonzero!). And that's really all we can do! A could break its own cycle, thereby disposing of itself and b. It could also break C's cycle, disposing of C and d. It could do nothing. Or it could resurrect A, thereby resurrecting all of A, b, C, and d. This leads to (there's that weird echo again :-) Boehm's solution: Call A's finalizer and leave the rest to the next time the garbage collection runs. Note that we're now calling finalizers on objects with a non-zero refcount. At some point (probably as a result of finalizing A) its refcount will go to zero. We should not finalize it again -- this would serve no purpose. Possible solution: INCREF(A); A->__del__(); if (A->ob_refcnt == 1) A->__class__ = NULL; /* Make a finalizer-less */ DECREF(A); This avoids finalizing twice if the first finalization broke all cycles in which A is involved. But if it doesn't, A is still cyclical garbage with a finalizer! Even if it didn't resurrect itself. Instead of the code fragment above, we could mark A as "just finalized" and when it shows up at the head of the tree (of finalizers in cyclical trash) again on the next garbage collection, to discard it without calling the finalizer again (because this clearly means that it didn't resurrect itself -- at least not for a very long time). I would be happier if we could still have a rule that says that a finalizer is called only once by magic -- even if we have two forms of magic: refcount zero or root of the tree. Tim: I don't know if you object against this rule as a matter of principle (for the sake of finalizers that resurrect the object) or if your objection is really against the unordered calling of finalizers legitimized by Java's rules. I hope the latter, since I think it that this rule (__del__ called only once by magic) by itself is easy to understand and easy to deal with, and I believe it may be necessary to guarantee progress for the garbage collector. The problem is that the collector can't easily tell whether A has resurrected itself. Sure, if the refcount is 1 after the finalizer run, I know it didn't resurrect itself. But even if it's higher than before, that doesn't mean it's resurrected: it could have linked to itself. Without doing a full collection I can't tell the difference. If I wait until a full collection happens again naturally, and look at the "just finalized flag", I can't tell the difference between the case whereby the object resurrected itself but died again before the next collection, and the case where it was dead already. So I don't know how many times it was expecting the "last rites" to be performed, and the object can't know whether to expect them again or not. This seems worse than the only-once rule to me. Even if someone once found a good use for resurrecting inside __del__, against all recommendations, I don't mind breaking their code, if it's for a good cause. The Java rules aren't a good cause. But top-sorted finalizer calls seem a worthy cause. So now we get to discuss what to do with multi-finalizer cycles, like: A <=> b <=> C Here the reduced graph is: A <=> C About this case you say: > If it has an object with a finalizer, though, at the very worst you can let > it leak, and make the collection of leaked objects available for > inspection. Even that much is a *huge* "improvement" over what they have > today: most cycles won't have a finalizer and so will get reclaimed, and > for the rest they'll finally have a simple way to identify exactly where the > problem is, and a simple criterion for predicting when it will happen. If > that's not "good enough", then without abandoning principle the user needs > to have some way to reduce such a cycle *to* a topsort case themself. > > > I find having a separate __cleanup__ protocol cumbersome. > > Same here, but if you're not comfortable leaking, and you agree Python is > not in the business of guesing in inherently ambiguous situations, maybe > that's what it takes! MAL and GregS both gravitated to this kind of thing > at once, and that's at least suggestive; and MAL has actually been using his > approach. It's explicit, and that's Pythonic on the face of it. > > > I think that the "finalizer only called once by magic" rule is reasonable. > > If it weren't for its specific use in emulating Java's scheme, would you > still be in favor of that? It's a little suspicious that it never came up > before . Suspicious or not, it still comes up. I still like it. I still think that playing games with resurrection is evil. (Maybe my spiritual beliefs shine through here -- I'm a convinced atheist. :-) Anyway, once-only rule aside, we still need a protocol to deal with cyclical dependencies between finalizers. The __cleanup__ approach is one solution, but it also has a problem: we have a set of finalizers. Whose __cleanup__ do we call? Any? All? Suggestions? Note that I'd like some implementation freedom: I may not want to bother with the graph reduction algorithm at first (which seems very hairy) so I'd like to have the right to use the __cleanup__ API as soon as I see finalizers in cyclical trash. I don't mind disposing of finalizer-free cycles first, but once I have more than one finalizer left in the remaining cycles, I'd like the right not to reduce the graph for topsort reasons -- that algorithm seems hard. So we're back to the __cleanup__ design. Strawman proposal: for all finalizers in a trash cycle, call their __cleanup__ method, in arbitrary order. After all __cleanup__ calls are done, if the objects haven't all disposed of themselves, they are all garbage-collected without calling __del__. (This seems to require another garbage colelction cycle -- so perhaps there should also be a once-only rule for __cleanup__?) Separate question: what if there is no __cleanup__? This should probably be reported: "You have cycles with finalizers, buddy! What do you want to do about them?" This same warning could be given when there is a __cleanup__ but it doesn't break all cycles. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Wed Mar 8 13:34:06 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 08 Mar 2000 14:34:06 +0100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us> Message-ID: <38C656CE.B0ACFF35@lemburg.com> Guido van Rossum wrote: > > > Tim Peters wrote: > > > > > > Mike has a darned good point here. Anyone have a darned good answer ? > > > Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be > > > adopted? > > > > > > Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage > > > all over the hard-disk, traffic rerouted through the bit-bucket, you aren't > > > getting to work anytime soon Mrs. Programmer) and wondering why we have a > > > FAQ instead of having the win32pipe stuff rolled into the os module to fix > > > it. Is there some incompatibility? Is there a licensing problem? > > MAL: > > I'd suggest moving the popen from the C modules into os.py > > as Python API and then applying all necessary magic to either > > use the win32pipe implementation (if available) or the native > > C one from the posix module in os.py. > > > > Unless, of course, the win32 stuff (or some of it) makes it into > > the core. > > No concrete plans -- except that I think the registry access is > supposed to go in. Haven't seen the code on patches@python.org yet > though. Ok, what about the optional "use win32pipe if available" idea then ? > > I'm mostly interested in this for my platform.py module... > > BTW, is there any interest of moving it into the core ? > > "it" == platform.py? Right. > Little interest from me personally; I suppose it > could go in Tools/scripts/... Hmm, it wouldn't help much in there I guess... after all, it defines APIs which are to be queried by other scripts. The default action to print the platform information to stdout is just a useful addition. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@python.org Wed Mar 8 14:33:53 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 08 Mar 2000 09:33:53 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: Your message of "Wed, 08 Mar 2000 14:34:06 +0100." <38C656CE.B0ACFF35@lemburg.com> References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us> <38C656CE.B0ACFF35@lemburg.com> Message-ID: <200003081433.JAA20177@eric.cnri.reston.va.us> > > MAL: > > > I'd suggest moving the popen from the C modules into os.py > > > as Python API and then applying all necessary magic to either > > > use the win32pipe implementation (if available) or the native > > > C one from the posix module in os.py. > > > > > > Unless, of course, the win32 stuff (or some of it) makes it into > > > the core. [Guido] > > No concrete plans -- except that I think the registry access is > > supposed to go in. Haven't seen the code on patches@python.org yet > > though. > > Ok, what about the optional "use win32pipe if available" idea then ? Sorry, I meant please send me the patch! --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Wed Mar 8 14:59:46 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 8 Mar 2000 09:59:46 -0500 (EST) Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <200003081210.HAA19931@eric.cnri.reston.va.us> References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us> Message-ID: <14534.27362.139106.701784@weyr.cnri.reston.va.us> Guido van Rossum writes: > "it" == platform.py? Little interest from me personally; I suppose it > could go in Tools/scripts/... I think platform.py is pretty nifty, but I'm not entirely sure how it's expected to be used. Perhaps Marc-Andre could explain further the motivation behind the module? My biggest requirement is that it be accompanied by documentation. The coolness factor and shared use of hackerly knowledge would probably get *me* to put it in, but there are a lot of things about which I'll disagree with Guido just to hear his (well-considered) thoughts on the matter. ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mal@lemburg.com Wed Mar 8 17:37:43 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 08 Mar 2000 18:37:43 +0100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 ... code for thought. References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us> <38C656CE.B0ACFF35@lemburg.com> <200003081433.JAA20177@eric.cnri.reston.va.us> Message-ID: <38C68FE7.63943C5C@lemburg.com> Guido van Rossum wrote: > > > > MAL: > > > > I'd suggest moving the popen from the C modules into os.py > > > > as Python API and then applying all necessary magic to either > > > > use the win32pipe implementation (if available) or the native > > > > C one from the posix module in os.py. > > > > > > > > Unless, of course, the win32 stuff (or some of it) makes it into > > > > the core. > [Guido] > > > No concrete plans -- except that I think the registry access is > > > supposed to go in. Haven't seen the code on patches@python.org yet > > > though. > > > > Ok, what about the optional "use win32pipe if available" idea then ? > > Sorry, I meant please send me the patch! Here's the popen() interface I use in platform.py. It should serve well as basis for a os.popen patch... (don't have time to do it myself right now): class _popen: """ Fairly portable (alternative) popen implementation. This is mostly needed in case os.popen() is not available, or doesn't work as advertised, e.g. in Win9X GUI programs like PythonWin or IDLE. XXX Writing to the pipe is currently not supported. """ tmpfile = '' pipe = None bufsize = None mode = 'r' def __init__(self,cmd,mode='r',bufsize=None): if mode != 'r': raise ValueError,'popen()-emulation only support read mode' import tempfile self.tmpfile = tmpfile = tempfile.mktemp() os.system(cmd + ' > %s' % tmpfile) self.pipe = open(tmpfile,'rb') self.bufsize = bufsize self.mode = mode def read(self): return self.pipe.read() def readlines(self): if self.bufsize is not None: return self.pipe.readlines() def close(self, remove=os.unlink,error=os.error): if self.pipe: rc = self.pipe.close() else: rc = 255 if self.tmpfile: try: remove(self.tmpfile) except error: pass return rc # Alias __del__ = close def popen(cmd, mode='r', bufsize=None): """ Portable popen() interface. """ # Find a working popen implementation preferring win32pipe.popen # over os.popen over _popen popen = None if os.environ.get('OS','') == 'Windows_NT': # On NT win32pipe should work; on Win9x it hangs due to bugs # in the MS C lib (see MS KnowledgeBase article Q150956) try: import win32pipe except ImportError: pass else: popen = win32pipe.popen if popen is None: if hasattr(os,'popen'): popen = os.popen # Check whether it works... it doesn't in GUI programs # on Windows platforms if sys.platform == 'win32': # XXX Others too ? try: popen('') except os.error: popen = _popen else: popen = _popen if bufsize is None: return popen(cmd,mode) else: return popen(cmd,mode,bufsize) if __name__ == '__main__': print """ I confirm that, to the best of my knowledge and belief, this contribution is free of any claims of third parties under copyright, patent or other rights or interests ("claims"). To the extent that I have any such claims, I hereby grant to CNRI a nonexclusive, irrevocable, royalty-free, worldwide license to reproduce, distribute, perform and/or display publicly, prepare derivative versions, and otherwise use this contribution as part of the Python software and its related documentation, or any derivative versions thereof, at no cost to CNRI or its licensed users, and to authorize others to do so. I acknowledge that CNRI may, at its sole discretion, decide whether or not to incorporate this contribution in the Python software and its related documentation. I further grant CNRI permission to use my name and other identifying information provided to CNRI by me for use in connection with the Python software and its related documentation. """ -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Wed Mar 8 17:44:59 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 08 Mar 2000 18:44:59 +0100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us> <14534.27362.139106.701784@weyr.cnri.reston.va.us> Message-ID: <38C6919B.EA3EE2E7@lemburg.com> "Fred L. Drake, Jr." wrote: > > Guido van Rossum writes: > > "it" == platform.py? Little interest from me personally; I suppose it > > could go in Tools/scripts/... > > I think platform.py is pretty nifty, but I'm not entirely sure how > it's expected to be used. Perhaps Marc-Andre could explain further > the motivation behind the module? It was first intended to provide a way to format a platform identifying file name for the mxCGIPython project and then quickly moved on to provide many different APIs to query platform specific information. architecture(executable='/usr/local/bin/python', bits='', linkage='') : Queries the given executable (defaults to the Python interpreter binary) for various architecture informations. Returns a tuple (bits,linkage) which contain information about the bit architecture and the linkage format used for the executable. Both values are returned as strings. Values that cannot be determined are returned as given by the parameter presets. If bits is given as '', the sizeof(long) is used as indicator for the supported pointer size. The function relies on the system's "file" command to do the actual work. This is available on most if not all Unix platforms. On some non-Unix platforms and then only if the executable points to the Python interpreter defaults from _default_architecture are used. dist(distname='', version='', id='') : Tries to determine the name of the OS distribution name The function first looks for a distribution release file in /etc and then reverts to _dist_try_harder() in case no suitable files are found. Returns a tuple distname,version,id which default to the args given as parameters. java_ver(release='', vendor='', vminfo=('', '', ''), osinfo=('', '', '')) : Version interface for JPython. Returns a tuple (release,vendor,vminfo,osinfo) with vminfo being a tuple (vm_name,vm_release,vm_vendor) and osinfo being a tuple (os_name,os_version,os_arch). Values which cannot be determined are set to the defaults given as parameters (which all default to ''). libc_ver(executable='/usr/local/bin/python', lib='', version='') : Tries to determine the libc version against which the file executable (defaults to the Python interpreter) is linked. Returns a tuple of strings (lib,version) which default to the given parameters in case the lookup fails. Note that the function has intimate knowledge of how different libc versions add symbols to the executable is probably only useable for executables compiled using gcc. The file is read and scanned in chunks of chunksize bytes. mac_ver(release='', versioninfo=('', '', ''), machine='') : Get MacOS version information and return it as tuple (release, versioninfo, machine) with versioninfo being a tuple (version, dev_stage, non_release_version). Entries which cannot be determined are set to ''. All tuple entries are strings. Thanks to Mark R. Levinson for mailing documentation links and code examples for this function. Documentation for the gestalt() API is available online at: http://www.rgaros.nl/gestalt/ machine() : Returns the machine type, e.g. 'i386' An empty string is returned if the value cannot be determined. node() : Returns the computer's network name (may not be fully qualified !) An empty string is returned if the value cannot be determined. platform(aliased=0, terse=0) : Returns a single string identifying the underlying platform with as much useful information as possible (but no more :). The output is intended to be human readable rather than machine parseable. It may look different on different platforms and this is intended. If "aliased" is true, the function will use aliases for various platforms that report system names which differ from their common names, e.g. SunOS will be reported as Solaris. The system_alias() function is used to implement this. Setting terse to true causes the function to return only the absolute minimum information needed to identify the platform. processor() : Returns the (true) processor name, e.g. 'amdk6' An empty string is returned if the value cannot be determined. Note that many platforms do not provide this information or simply return the same value as for machine(), e.g. NetBSD does this. release() : Returns the system's release, e.g. '2.2.0' or 'NT' An empty string is returned if the value cannot be determined. system() : Returns the system/OS name, e.g. 'Linux', 'Windows' or 'Java'. An empty string is returned if the value cannot be determined. system_alias(system, release, version) : Returns (system,release,version) aliased to common marketing names used for some systems. It also does some reordering of the information in some cases where it would otherwise cause confusion. uname() : Fairly portable uname interface. Returns a tuple of strings (system,node,release,version,machine,processor) identifying the underlying platform. Note that unlike the os.uname function this also returns possible processor information as additional tuple entry. Entries which cannot be determined are set to ''. version() : Returns the system's release version, e.g. '#3 on degas' An empty string is returned if the value cannot be determined. win32_ver(release='', version='', csd='', ptype='') : Get additional version information from the Windows Registry and return a tuple (version,csd,ptype) referring to version number, CSD level and OS type (multi/single processor). As a hint: ptype returns 'Uniprocessor Free' on single processor NT machines and 'Multiprocessor Free' on multi processor machines. The 'Free' refers to the OS version being free of debugging code. It could also state 'Checked' which means the OS version uses debugging code, i.e. code that checks arguments, ranges, etc. (Thomas Heller). Note: this functions only works if Mark Hammond's win32 package is installed and obviously only runs on Win32 compatible platforms. XXX Is there any way to find out the processor type on WinXX ? XXX Is win32 available on Windows CE ? Adapted from code posted by Karl Putland to comp.lang.python. > My biggest requirement is that it be accompanied by documentation. > The coolness factor and shared use of hackerly knowledge would > probably get *me* to put it in, but there are a lot of things about > which I'll disagree with Guido just to hear his (well-considered) > thoughts on the matter. ;) The module is doc-string documented (see above). This should server well as basis for the latex docs. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From DavidA@ActiveState.com Wed Mar 8 18:36:01 2000 From: DavidA@ActiveState.com (David Ascher) Date: Wed, 8 Mar 2000 10:36:01 -0800 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <200003081210.HAA19931@eric.cnri.reston.va.us> Message-ID: > "it" == platform.py? Little interest from me personally; I suppose it > could go in Tools/scripts/... FWIW, I think it belongs in the standard path. It allows one to do the equivalent of if os.platform == '...' but in a much more useful way. --david From mhammond@skippinet.com.au Wed Mar 8 21:36:12 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Thu, 9 Mar 2000 08:36:12 +1100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <200003081210.HAA19931@eric.cnri.reston.va.us> Message-ID: > No concrete plans -- except that I think the registry access is > supposed to go in. Haven't seen the code on patches@python.org yet > though. FYI, that is off with Trent who is supposed to be testing it on the Alpha. Re win32pipe - I responded to that post suggesting that we do with os.pipe and win32pipe what was done with os.path.abspath/win32api - optionally try to import the win32 specific module and use it. My only "concern" is that this then becomes more code for Guido to maintain in the core, even though Guido has expressed a desire to get out of the installers business. Assuming the longer term plan is for other people to put together installation packages, and that these people are free to redistribute win32api/win32pipe, Im wondering if it is worth bothering with? Mark. From trentm@ActiveState.com Wed Mar 8 14:42:06 2000 From: trentm@ActiveState.com (Trent Mick) Date: Wed, 8 Mar 2000 14:42:06 -0000 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <38C6919B.EA3EE2E7@lemburg.com> Message-ID: MAL: > architecture(executable='/usr/local/bin/python', bits='', > linkage='') : > > Values that cannot be determined are returned as given by the > parameter presets. If bits is given as '', the sizeof(long) is > used as indicator for the supported pointer size. Just a heads up, using sizeof(long) will not work on forthcoming WIN64 (LLP64 data model) to determine the supported pointer size. You would want to use the 'P' struct format specifier instead, I think (I am speaking in relative ignorance). However, the docs say that a PyInt is used to store 'P' specified value, which as a C long, will not hold a pointer on LLP64. Hmmmm. The keyword perhaps is "forthcoming". This is the code in question in platform.py: # Use the sizeof(long) as default number of bits if nothing # else is given as default. if not bits: import struct bits = str(struct.calcsize('l')*8) + 'bit' Guido: > > No concrete plans -- except that I think the registry access is > > supposed to go in. Haven't seen the code on patches@python.org yet > > though. > Mark Hammond: > FYI, that is off with Trent who is supposed to be testing it on the Alpha. My Alpha is in pieces right now! I will get to it soon. I will try it on Win64 as well, if I can. Trent Trent Mick trentm@activestate.com From guido@python.org Thu Mar 9 02:59:51 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 08 Mar 2000 21:59:51 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: Your message of "Thu, 09 Mar 2000 08:36:12 +1100." References: Message-ID: <200003090259.VAA20928@eric.cnri.reston.va.us> > My only "concern" is that this then becomes more code for Guido to maintain > in the core, even though Guido has expressed a desire to get out of the > installers business. Theoretically, it shouldn't need much maintenance. I'm more concerned that it will have different semantics than on Unix so that in practice you'd need to know about the platform anyway (apart from the fact that the installed commands are different, of course). > Assuming the longer term plan is for other people to put together > installation packages, and that these people are free to redistribute > win32api/win32pipe, Im wondering if it is worth bothering with? So that everybody could use os.popen() regardless of whether they're on Windows or Unix. --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond@skippinet.com.au Thu Mar 9 03:31:21 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Thu, 9 Mar 2000 14:31:21 +1100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <200003090259.VAA20928@eric.cnri.reston.va.us> Message-ID: [Me] > > Assuming the longer term plan is for other people to put together > > installation packages, and that these people are free to redistribute > > win32api/win32pipe, Im wondering if it is worth bothering with? [Guido] > So that everybody could use os.popen() regardless of whether they're > on Windows or Unix. Sure. But what I meant was "should win32pipe code move into the core, or should os.pipe() just auto-detect and redirect to win32pipe if installed?" I was suggesting that over the longer term, it may be reasonable to assume that win32pipe _will_ be installed, as everyone who releases installers for Python should include it :-) It could also be written in such a way that it prints a warning message when win32pipe doesnt exist, so in 99% of cases, it will answer the FAQ before they have had a chance to ask it :-) It also should be noted that the win32pipe support for popen on Windows 95/98 includes a small, dedicated .exe - this just adds to the maintenance burden. But it doesnt worry me at all what happens - I was just trying to save you work . Anyone is free to take win32pipe and move the relevant code into the core anytime they like, with my and Bill's blessing. It quite suits me that people have to download win32all to get this working, so I doubt I will get around to it any time soon :-) Mark. From tim_one@email.msn.com Thu Mar 9 03:52:58 2000 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 8 Mar 2000 22:52:58 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: Message-ID: <000401bf897a$f5a7e620$0d2d153f@tim> I had another take on all this, which I'll now share since nobody seems inclined to fold in the Win32 popen: perhaps os.popen should not be supported at all under Windows! The current function is a mystery wrapped in an enigma -- sometimes it works, sometimes it doesn't, and I've never been able to outguess which one will obtain (there's more to it than just whether a console window is attached). If it's not reliable (it's not), and we can't document the conditions under which it can be used safely (I can't), Python shouldn't expose it. Failing that, the os.popen docs should caution it's "use at your own risk" under Windows, and that this is directly inherited from MS's popen implementation. From tim_one@email.msn.com Thu Mar 9 09:40:26 2000 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 9 Mar 2000 04:40:26 -0500 Subject: [Python-Dev] finalization again In-Reply-To: <200003081406.JAA20033@eric.cnri.reston.va.us> Message-ID: <000701bf89ab$80cb8e20$0d2d153f@tim> [Guido, with some implementation details and nice examples] Normally I'd eat this up -- today I'm gasping for air trying to stay afloat. I'll have to settle for sketching the high-level approach I've had in the back of my mind. I start with the pile of incestuous stuff Toby/Neil discovered have no external references. It consists of dead cycles, and perhaps also non-cycles reachable only from dead cycles. 1. The "points to" relation on this pile defines a graph G. 2. From any graph G, we can derive a related graph G' consisting of the maximal strongly connected components (SCCs) of G. Each (super)node of G' is an SCC of G, where (super)node A' of G' points to (super)node B' of G' iff there exists a node A in A' that points to (wrt G) some node B in B'. It's not obvious, but the SCCs can be found in linear time (via Tarjan's algorithm, which is simple but subtle; Cyclops.py uses a much dumber brute-force approach, which is nevertheless perfectly adequate in the absence of massively large cycles -- premature optimization is the root etc <0.5 wink>). 3. G' is necessarily a DAG. For if distinct A' and B' are both reachable from each other in G', then every pair of A in A' and B in B' are reachable from each other in G, contradicting that A' and B' are distinct maximal SCCs (that is, the union of A' and B' is also an SCC). 4. The point to all this: Every DAG can be topsorted. Start with the nodes of G' without predecessors. There must be at least one, because G' is a DAG. 5. For every node A' in G' without predecessors (wrt G'), it either does or does not contain an object with a potentially dangerous finalizer. If it does not, let's call it a safe node. If there are no safe nodes without predecessors, GC is stuck, and for good reason: every object in the whole pile is reachable from an object with a finalizer, which could change the topology in near-arbitrary ways. The unsafe nodes without predecessors (and again, by #4, there must be at least one) are the heart of the problem, and this scheme identifies them precisely. 6. Else there is a safe node A'. For each A in A', reclaim it, following the normal refcount rules (or in an implementation w/o RC, by following a topsort of "points to" in the original G). This *may* cause reclamation of an object X with a finalizer outside of A'. But doing so cannot cause resurrection of anything in A' (X is reachable from A' else cleaning up A' couldn't have affected X, and if anything in A' were also reachable from X, X would have been in A' to begin with (SCC!), contradicting that A' is safe). So the objects in A' can get reclaimed without difficulty. 7. The simplest thing to do now is just stop: rebuild it from scratch the next time the scheme is invoked. If it was *possible* to make progress without guessing, we did; and if it was impossible, we identified the precise SCC(s) that stopped us. Anything beyond that is optimization <0.6 wink>. Seems the most valuable optimization would be to keep track of whether an object with a finalizer gets reclaimed in step 6 (so long as that doesn't happen, the mutations that can occur to the structure of G' seem nicely behaved enough that it should be possible to loop back to step #5 without crushing pain). On to Guido's msg: [Guido] > When we have a pile of garbage, we don't know whether it's all > connected or whether it's lots of little cycles. So if we find > [objects with -- I'm going to omit this] finalizers, we have to put > those on a third list and put everything reachable from them on that > list as well (the algorithm I described before). SCC determination gives precise answers to all that. > What's left on the first list then consists of finalizer-free garbage. > We dispose of this garbage by clearing dicts and lists. Hopefully > this makes the refcount of some of the finalizers go to zero -- those > are finalized in the normal way. In Python it's even possible for a finalizer to *install* a __del__ method that didn't previously exist, into the class of one of the objects on your "first list". The scheme above is meant to be bulletproof in the face of abuses even I can't conceive of . More mundanely, clearing an item on your first list can cause a chain of events that runs a finalizer, which in turn can resurrect one of the objects on your first list (and so it should *not* get reclaimed). Without doing the SCC bit, I don't think you can out-think that (the reasoning above showed that the finalizer can't resurrect something in the *same* SCC as the object that started it all, but that argument cannot be extended to objects in other safe SCCs: they're vulnerable). > And now we have to deal with the inevitable: finalizers that are part > of cycles. It makes sense to reduce the graph of objects to a graph > of finalizers only. Example: > > A <=> b -> C <=> d > > A and C have finalizers. C is part of a cycle (C-d) that contains no > other finalizers, but C is also reachable from A. A is part of a > cycle (A-b) that keeps it alive. The interesting thing here is that > if we only look at the finalizers, there are no cycles! The scheme above derives G': A' -> C' where A' consists of the A<=>b cycle and C' the C<=>d cycle. That there are no cycles in G' isn't surprising, it's just the natural consequence of doing the natural analysis . The scheme above refuses to do anything here, because the only node in G' without a predecessor (namely A') isn't "safe". > If we reduce the graph to only finalizers (setting aside for now the > problem of how to do that -- we may need to allocate more memory to > hold the reduced greaph), we get: > > A -> C You should really have self-loops on both A and C, right? (because A is reachable from itself via chasing pointers; ditto for C) > We can now finalize A (even though its refcount is nonzero!). And > that's really all we can do! A could break its own cycle, thereby > disposing of itself and b. It could also break C's cycle, disposing > of C and d. It could do nothing. Or it could resurrect A, thereby > resurrecting all of A, b, C, and d. > > This leads to (there's that weird echo again :-) Boehm's solution: > Call A's finalizer and leave the rest to the next time the garbage > collection runs. This time the echo came back distorted : [Boehm] Cycles involving one or more finalizable objects are never finalized. A<=>b is "a cycle involving one or more finalizable objects", so he won't touch it. The scheme at the top doesn't either. If you handed him your *derived* graph (but also without the self-loops), he would; me too. KISS! > Note that we're now calling finalizers on objects with a non-zero > refcount. I don't know why you want to do this. As the next several paragraphs confirm, it creates real headaches for the implementation, and I'm unclear on what it buys in return. Is "we'll do something by magic for cycles with no more than one finalizer" a major gain for the user over "we'll do something by magic for cycles with no finalizer"? 0, 1 and infinity *are* the only interesting numbers , but the difference between 0 and 1 *here* doesn't seem to me worth signing up for any pain at all. > At some point (probably as a result of finalizing A) its > refcount will go to zero. We should not finalize it again -- this > would serve no purpose. I don't believe BDW (or the scheme at the top) has this problem (simply because the only way to run finalizer in a cycle under them is for the user to break the cycle explicitly -- so if an object's finalizer gets run, the user caused it directly, and so can never claim surprise). > Possible solution: > > INCREF(A); > A->__del__(); > if (A->ob_refcnt == 1) > A->__class__ = NULL; /* Make a finalizer-less */ > DECREF(A); > > This avoids finalizing twice if the first finalization broke all > cycles in which A is involved. But if it doesn't, A is still cyclical > garbage with a finalizer! Even if it didn't resurrect itself. > > Instead of the code fragment above, we could mark A as "just > finalized" and when it shows up at the head of the tree (of finalizers > in cyclical trash) again on the next garbage collection, to discard it > without calling the finalizer again (because this clearly means that > it didn't resurrect itself -- at least not for a very long time). I don't think you need to do any of this -- unless you think you need to do the thing that created the need for this, which I didn't think you needed to do either . > I would be happier if we could still have a rule that says that a > finalizer is called only once by magic -- even if we have two forms of > magic: refcount zero or root of the tree. Tim: I don't know if you > object against this rule as a matter of principle (for the sake of > finalizers that resurrect the object) or if your objection is really > against the unordered calling of finalizers legitimized by Java's > rules. I hope the latter, since I think it that this rule (__del__ > called only once by magic) by itself is easy to understand and easy to > deal with, and I believe it may be necessary to guarantee progress for > the garbage collector. My objections to Java's rules have been repeated enough. I would have no objection to "__del__ called only once" if it weren't for that Python currently does something different. I don't know whether people rely on that now; if they do, it's a much more dangerous thing to change than adding a new keyword (the compiler gives automatic 100% coverage of the latter; but nothing mechanical can help people track down reliance-- whether deliberate or accidental --on the former). My best *guess* is that __del__ is used rarely; e.g., there are no more than 40 instances of it in the whole CVS tree, including demo directories; and they all look benign (at least three have bodies consisting of "pass"!). The most complicated one I found in my own code is: def __del__(self): self.break_cycles() def break_cycles(self): for rule in self.rules: if rule is not None: rule.cleanse() But none of this self-sampling is going to comfort some guy in France who has a megaline of code relying on it. Good *bet*, though . > [and another cogent explanation of why breaking the "leave cycles with > finalizers" alone injunction creates headaches] > ... > Even if someone once found a good use for resurrecting inside __del__, > against all recommendations, I don't mind breaking their code, if it's > for a good cause. The Java rules aren't a good cause. But top-sorted > finalizer calls seem a worthy cause. They do to me too, except that I say even a cycle involving but a single object (w/ finalizer) looping on itself is the user's problem. > So now we get to discuss what to do with multi-finalizer cycles, like: > > A <=> b <=> C > > Here the reduced graph is: > > A <=> C The SCC reduction is simply to A and, right, the scheme at the top punts. > [more the on once-only rule chopped] > ... > Anyway, once-only rule aside, we still need a protocol to deal with > cyclical dependencies between finalizers. The __cleanup__ approach is > one solution, but it also has a problem: we have a set of finalizers. > Whose __cleanup__ do we call? Any? All? Suggestions? This is why a variant of guardians were more appealing to me at first: I could ask a guardian for the entire SCC, so I get the *context* of the problem as well as the final microscopic symptom. I see Marc-Andre already declined to get sucked into the magical part of this . Greg should speak for his scheme, and I haven't made time to understand it fully; my best guess is to call x.__cleanup__ for every object in the SCC (but there's no clear way to decide which order to call them in, and unless they're more restricted than __del__ methods they can create all the same problems __del__ methods can!). > Note that I'd like some implementation freedom: I may not want to > bother with the graph reduction algorithm at first (which seems very > hairy) so I'd like to have the right to use the __cleanup__ API > as soon as I see finalizers in cyclical trash. I don't mind disposing > of finalizer-free cycles first, but once I have more than one > finalizer left in the remaining cycles, I'd like the right not to > reduce the graph for topsort reasons -- that algorithm seems hard. I hate to be realistic , but modern GC algorithms are among the hardest you'll ever see in any field; even the outer limits of what we've talked about here is baby stuff. Sun's Java group (the one in Chelmsford, MA, down the road from me) had a group of 4+ people (incl. the venerable Mr. Steele) working full-time for over a year on the last iteration of Java's GC. The simpler BDW is a megabyte of code spread over 100+ files. Etc -- state of the art GC can be crushingly hard. So I've got nothing against taking shortcuts at first -- there's actually no realistic alternative. I think we're overlooking the obvious one, though: if any finalizer appears in any trash cycle, tough luck. Python 3000 -- which may be a spelling of 1.7 , but doesn't *need* to be a spelling of 1.6. > So we're back to the __cleanup__ design. Strawman proposal: for all > finalizers in a trash cycle, call their __cleanup__ method, in > arbitrary order. After all __cleanup__ calls are done, if the objects > haven't all disposed of themselves, they are all garbage-collected > without calling __del__. (This seems to require another garbage > colelction cycle -- so perhaps there should also be a once-only rule > for __cleanup__?) > > Separate question: what if there is no __cleanup__? This should > probably be reported: "You have cycles with finalizers, buddy! What > do you want to do about them?" This same warning could be given when > there is a __cleanup__ but it doesn't break all cycles. If I *ever* have a trash cycle with a finalizer in my code (> 0 -- "exactly 1" isn't special to me), I will consider it to be a bug. So I want a way to get it back from gc, so I can see what the heck it is, so I can fix my code (or harass whoever did it to me). __cleanup__ suffices for that, so the very act of calling it is all I'm really after ("Python invoked __cleanup__ == Tim has a bug"). But after I outgrow that , I'll certainly want the option to get another kind of complaint if __cleanup__ doesn't break the cycles, and after *that* I couldn't care less. I've given you many gracious invitations to say that you don't mind leaking in the face of a buggy program , but as you've declined so far, I take it that never hearing another gripe about leaking is a Primary Life Goal. So collection without calling __del__ is fine -- but so is collection with calling it! If we're going to (at least implicitly) approve of this stuff, it's probably better *to* call __del__, if for no other reason than to catch your case of some poor innocent object caught in a cycle not of its making that expects its __del__ to abort starting World War III if it becomes unreachable . whatever-we-don't-call-a-mistake-is-a-feature-ly y'rs - tim From fdrake@acm.org Thu Mar 9 14:25:35 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 9 Mar 2000 09:25:35 -0500 (EST) Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <000401bf897a$f5a7e620$0d2d153f@tim> References: <000401bf897a$f5a7e620$0d2d153f@tim> Message-ID: <14535.46175.991970.135642@weyr.cnri.reston.va.us> Tim Peters writes: > Failing that, the os.popen docs should caution it's "use at your own risk" > under Windows, and that this is directly inherited from MS's popen > implementation. Tim (& others), Would this additional text be sufficient for the os.popen() documentation? \strong{Note:} This function behaves unreliably under Windows due to the native implementation of \cfunction{popen()}. If someone cares to explain what's weird about it, that might be appropriate as well, but I've never used this under Windows. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mal@lemburg.com Thu Mar 9 14:42:37 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 09 Mar 2000 15:42:37 +0100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: Message-ID: <38C7B85D.E6090670@lemburg.com> Trent Mick wrote: > > MAL: > > architecture(executable='/usr/local/bin/python', bits='', > > linkage='') : > > > > Values that cannot be determined are returned as given by the > > parameter presets. If bits is given as '', the sizeof(long) is > > used as indicator for the supported pointer size. > > Just a heads up, using sizeof(long) will not work on forthcoming WIN64 > (LLP64 data model) to determine the supported pointer size. You would want > to use the 'P' struct format specifier instead, I think (I am speaking in > relative ignorance). However, the docs say that a PyInt is used to store 'P' > specified value, which as a C long, will not hold a pointer on LLP64. Hmmmm. > The keyword perhaps is "forthcoming". > > This is the code in question in platform.py: > > # Use the sizeof(long) as default number of bits if nothing > # else is given as default. > if not bits: > import struct > bits = str(struct.calcsize('l')*8) + 'bit' Python < 1.5.2 doesn't support 'P', but anyway, I'll change those lines according to your suggestion. Does struct.calcsize('P')*8 return 64 on 64bit-platforms as it should (probably ;) ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jim@interet.com Thu Mar 9 15:45:54 2000 From: jim@interet.com (James C. Ahlstrom) Date: Thu, 09 Mar 2000 10:45:54 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: <000401bf897a$f5a7e620$0d2d153f@tim> Message-ID: <38C7C732.D9086C34@interet.com> Tim Peters wrote: > > I had another take on all this, which I'll now share since nobody > seems inclined to fold in the Win32 popen: perhaps os.popen should not be > supported at all under Windows! > > The current function is a mystery wrapped in an enigma -- sometimes it > works, sometimes it doesn't, and I've never been able to outguess which one > will obtain (there's more to it than just whether a console window is > attached). If it's not reliable (it's not), and we can't document the > conditions under which it can be used safely (I can't), Python shouldn't > expose it. OK, I admit I don't understand this either, but here goes... It looks like Python popen() uses the Windows _popen() function. The _popen() docs say that it creates a spawned copy of the command processor (shell) with the given string argument. It further states that it does NOT work in a Windows program and ONLY works when called from a Windows Console program. From this I assume that popen() works from python.exe (it is a Console app) if the command can be directly executed by the shell (like "dir"), or if the command starts a Console Windows application. It can't work when starting a regular Windows program because those don't have a stdin nor stdout. But Console apps do have stdin and stdout, and these are inherited by other Console programs in Unix fashion. Is this what doesn't work? If so, there is a bug in _popen(). Otherwise we are just expecting Unix behavior from Windows. Or perhaps we expect popen() to work from a Windows non-Console app, which _popen() is guaranteed not to do. If there is something wrong with _popen() then the way to fix it is to avoid using it and create the pipes directly. For an example look in the docs under: Platform SDK Windows Base Services Executables Processes and Threads Using Processes and Threads Creating a Child Process with Redirected Input and Output The sample code can be extraced and put into posixmodule.c. Note that this is what OS2 does. See the #ifdef. > Failing that, the os.popen docs should caution it's "use at your own risk" > under Windows, and that this is directly inherited from MS's popen > implementation. Of course, the strength of Python is portable code. popen() should be fixed the right way. JimA From tim_one@email.msn.com Thu Mar 9 17:14:17 2000 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 9 Mar 2000 12:14:17 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <38C7C732.D9086C34@interet.com> Message-ID: <000401bf89ea$e6e54180$79a0143f@tim> [James C. Ahlstrom] > OK, I admit I don't understand this either, but here goes... > > It looks like Python popen() uses the Windows _popen() function. > The _popen() docs say ... Screw the docs. Pretend you're a newbie and *try* it. Here: import os p = os.popen("dir") while 1: line = p.readline() if not line: break print line Type that in by hand, or stick it in a file & run it from a cmdline python.exe (which is a Windows console program). Under Win95 the process freezes solid, and even trying to close the DOS box doesn't work. You have to bring up the task manager and kill it that way. I once traced this under the debugger -- it's hung inside an MS DLL. "dir" is not entirely arbitrary here: for *some* cmds it works fine, for others not. The set of which work appears to vary across Windows flavors. Sometimes you can worm around it by wrapping "a bad" cmd in a .bat file, and popen'ing the latter instead; but sometimes not. After hours of poke-&-hope (in the past), as I said, I've never been able to predict which cases will work. > ... > It further states that it does NOT work in a Windows program and ONLY > works when called from a Windows Console program. The latter is a necessary condition but not sufficient; don't know what *is* sufficient, and AFAIK nobody else does either. > From this I assume that popen() works from python.exe (it is a Console > app) if the command can be directly executed by the shell (like "dir"), See above for a counterexample to both . I actually have much better luck with cmds command.com *doesn't* know anything about. So this appears to vary by shell too. > ... > If there is something wrong with _popen() then the way to fix it is > to avoid using it and create the pipes directly. libc pipes ares as flaky as libc popen under Windows, Jim! MarkH has the only versions of these things that come close to working under Windows (he wraps the native Win32 spellings of these things; MS's libc entry points (which Python uses now) are much worse). > ... > Of course, the strength of Python is portable code. popen() should be > fixed the right way. pipes too, but users get baffled by popen much more often simply because they try popen much more often. there's-no-question-about-whether-it-works-right-it-doesn't-ly y'rs - tim From gstein@lyra.org Thu Mar 9 17:47:23 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 9 Mar 2000 09:47:23 -0800 (PST) Subject: [Python-Dev] platform.py (was: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?) In-Reply-To: <38C7B85D.E6090670@lemburg.com> Message-ID: On Thu, 9 Mar 2000, M.-A. Lemburg wrote: >... > Python < 1.5.2 doesn't support 'P', but anyway, I'll change > those lines according to your suggestion. > > Does struct.calcsize('P')*8 return 64 on 64bit-platforms as > it should (probably ;) ? Yes. It returns sizeof(void *). Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal@lemburg.com Thu Mar 9 14:55:36 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 09 Mar 2000 15:55:36 +0100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: <000401bf897a$f5a7e620$0d2d153f@tim> <14535.46175.991970.135642@weyr.cnri.reston.va.us> Message-ID: <38C7BB68.9FAE3BE9@lemburg.com> "Fred L. Drake, Jr." wrote: > > Tim Peters writes: > > Failing that, the os.popen docs should caution it's "use at your own risk" > > under Windows, and that this is directly inherited from MS's popen > > implementation. > > Tim (& others), > Would this additional text be sufficient for the os.popen() > documentation? > > \strong{Note:} This function behaves unreliably under Windows > due to the native implementation of \cfunction{popen()}. > > If someone cares to explain what's weird about it, that might be > appropriate as well, but I've never used this under Windows. Ehm, hasn't anyone looked at the code I posted yesterday ? It goes a long way to deal with these inconsistencies... even though its not perfect (yet ;). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake@acm.org Thu Mar 9 18:52:40 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 9 Mar 2000 13:52:40 -0500 (EST) Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <38C7BB68.9FAE3BE9@lemburg.com> References: <000401bf897a$f5a7e620$0d2d153f@tim> <14535.46175.991970.135642@weyr.cnri.reston.va.us> <38C7BB68.9FAE3BE9@lemburg.com> Message-ID: <14535.62200.158087.102380@weyr.cnri.reston.va.us> M.-A. Lemburg writes: > Ehm, hasn't anyone looked at the code I posted yesterday ? > It goes a long way to deal with these inconsistencies... even > though its not perfect (yet ;). I probably sent that before I'd read everything, and I'm not the one to change the popen() implementation. At this point, I'm waiting for someone who understands the details to decide what happens (if anything) to the implementation before I check in any changes to the docs. My inclination is to fix popen() on Windows to do the right thing, but I don't know enough about pipes & process management on Windows to get into that fray. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From nascheme@enme.ucalgary.ca Thu Mar 9 19:37:31 2000 From: nascheme@enme.ucalgary.ca (nascheme@enme.ucalgary.ca) Date: Thu, 9 Mar 2000 12:37:31 -0700 Subject: [Python-Dev] finalization again Message-ID: <20000309123731.A3664@acs.ucalgary.ca> [Tim, explaining something I was thinking about more clearly than I ever could] >It's not obvious, but the SCCs can be found in linear time (via Tarjan's >algorithm, which is simple but subtle; Wow, it seems like it should be more expensive than that. What are the space requirements? Also, does the simple algorithm you used in Cyclops have a name? >If there are no safe nodes without predecessors, GC is stuck, >and for good reason: every object in the whole pile is reachable >from an object with a finalizer, which could change the topology >in near-arbitrary ways. The unsafe nodes without predecessors >(and again, by #4, there must be at least one) are the heart of >the problem, and this scheme identifies them precisely. Exactly. What is our policy on these unsafe nodes? Guido seems to feel that it is okay for the programmer to create them and Python should have a way of collecting them. Tim seems to feel that the programmer should not create them in the first place. I agree with Tim. If topological finalization is used, it is possible for the programmer to design their classes so that this problem does not happen. This is explained on Hans Boehm's finalization web page. If the programmer can or does not redesign their classes I don't think it is unreasonable to leak memory. We can link these cycles to a global list of garbage or print a debugging message. This is a large improvement over the current situation (ie. leaking memory with no debugging even for cycles without finalizers). Neil -- "If you're a great programmer, you make all the routines depend on each other, so little mistakes can really hurt you." -- Bill Gates, ca. 1985. From gstein@lyra.org Thu Mar 9 19:50:29 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 9 Mar 2000 11:50:29 -0800 (PST) Subject: [Python-Dev] finalization again In-Reply-To: <20000309123731.A3664@acs.ucalgary.ca> Message-ID: On Thu, 9 Mar 2000 nascheme@enme.ucalgary.ca wrote: >... > If the programmer can or does not redesign their classes I don't > think it is unreasonable to leak memory. We can link these > cycles to a global list of garbage or print a debugging message. > This is a large improvement over the current situation (ie. > leaking memory with no debugging even for cycles without > finalizers). I think we throw an error (as a subclass of MemoryError). As an alternative, is it possible to move those cycles to the garbage list and then never look at them again? That would speed up future collection processing. Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido@python.org Thu Mar 9 19:51:46 2000 From: guido@python.org (Guido van Rossum) Date: Thu, 09 Mar 2000 14:51:46 -0500 Subject: [Python-Dev] finalization again In-Reply-To: Your message of "Thu, 09 Mar 2000 11:50:29 PST." References: Message-ID: <200003091951.OAA26184@eric.cnri.reston.va.us> > As an alternative, is it possible to move those cycles to the garbage list > and then never look at them again? That would speed up future collection > processing. With the current approach, that's almost automatic :-) I'd rather reclaim the memory too. --Guido van Rossum (home page: http://www.python.org/~guido/) From gmcm@hypernet.com Thu Mar 9 19:54:16 2000 From: gmcm@hypernet.com (Gordon McMillan) Date: Thu, 9 Mar 2000 14:54:16 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <000401bf89ea$e6e54180$79a0143f@tim> References: <38C7C732.D9086C34@interet.com> Message-ID: <1259490837-400325@hypernet.com> [Tim re popen on Windows] ... > the debugger -- it's hung inside an MS DLL. "dir" is not entirely arbitrary > here: for *some* cmds it works fine, for others not. The set of which work > appears to vary across Windows flavors. Sometimes you can worm around it by > wrapping "a bad" cmd in a .bat file, and popen'ing the latter instead; but > sometimes not. It doesn't work for commands builtin to whatever "shell" you're using. That's different between cmd and command, and the various flavors, versions and extensions thereof. FWIW, I gave up a long time ago. I use redirection and a tempfile. The few times I've wanted "interactive" control, I've used Win32Process, dup'ed, inherited handles... the whole 9 yards. Why? Look at all the questions about popen and child processes in general, on platforms where it *works*, (if it weren't for Donn Cave, nobody'd get it to work anywhere ). To reiterate Tim's point: *none* of the c runtime routines for process control on Windows are adequate (beyond os.system and living with a DOS box popping up). The raw Win32 CreateProcess does everything you could possibly want, but takes a week or more to understand, (if this arg is a that, then that arg is a whatsit, and the next is limited to the values X and Z unless...). your-brain-on-Windows-ly y'rs - Gordon From guido@python.org Thu Mar 9 19:55:23 2000 From: guido@python.org (Guido van Rossum) Date: Thu, 09 Mar 2000 14:55:23 -0500 Subject: [Python-Dev] finalization again In-Reply-To: Your message of "Thu, 09 Mar 2000 04:40:26 EST." <000701bf89ab$80cb8e20$0d2d153f@tim> References: <000701bf89ab$80cb8e20$0d2d153f@tim> Message-ID: <200003091955.OAA26217@eric.cnri.reston.va.us> [Tim describes a more formal approach based on maximal strongly connected components (SCCs).] I like the SCC approach -- it's what I was struggling to invent but came short of discovering. However: [me] > > What's left on the first list then consists of finalizer-free garbage. > > We dispose of this garbage by clearing dicts and lists. Hopefully > > this makes the refcount of some of the finalizers go to zero -- those > > are finalized in the normal way. [Tim] > In Python it's even possible for a finalizer to *install* a __del__ method > that didn't previously exist, into the class of one of the objects on your > "first list". The scheme above is meant to be bulletproof in the face of > abuses even I can't conceive of . Are you *sure* your scheme deals with this? Let's look at an example. (Again, lowercase nodes have no finalizers.) Take G: a <=> b -> C This is G' (a and b are strongly connected): a' -> C' C is not reachable from any root node. We decide to clear a and b. Let's suppose we happen to clear b first. This removes the last reference to C, C's finalizer runs, and it installs a finalizer on a.__class__. So now a' has turned into A', and we're halfway committing a crime we said we would never commit (touching cyclical trash with finalizers). I propose to disregard this absurd possibility, except to the extent that Python shouldn't crash -- but we make no guarantees to the user. > More mundanely, clearing an item on your first list can cause a chain of > events that runs a finalizer, which in turn can resurrect one of the objects > on your first list (and so it should *not* get reclaimed). Without doing > the SCC bit, I don't think you can out-think that (the reasoning above > showed that the finalizer can't resurrect something in the *same* SCC as the > object that started it all, but that argument cannot be extended to objects > in other safe SCCs: they're vulnerable). I don't think so. While my poor wording ("finalizer-free garbage") didn't make this clear, my references to earlier algorithms were intended to imply that this is garbage that consists of truly unreachable objects. I have three lists: let's call them T(rash), R(oot-reachable), and F(inalizer-reachable). The Schemenauer c.s. algorithm moves all reachable nodes to R. I then propose to move all finalizers to F, and to run another pass of Schemenauer c.s. to also move all finalizer-reachable (but not root-reachable) nodes to F. I truly believe that (barring the absurdity of installing a new __del__) the objects on T at this point cannot be resurrected by a finalizer that runs, since they aren't reachable from any finalizers: by virtue of Schemenauer c.s. (which computes a reachability closure given some roots) anything reachable from a finalizer is on F by now (if it isn't on R -- again, nothing on T is reachable from R, because R is calculated a closure). So, unless there's still a bug in my thinking here, I think that as long as we only want to clear SCCs with 0 finalizers, T is exactly the set of nodes we're looking for. > This time the echo came back distorted : > > [Boehm] > Cycles involving one or more finalizable objects are never finalized. > > A<=>b is "a cycle involving one or more finalizable objects", so he won't > touch it. The scheme at the top doesn't either. If you handed him your > *derived* graph (but also without the self-loops), he would; me too. KISS! > > > Note that we're now calling finalizers on objects with a non-zero > > refcount. > > I don't know why you want to do this. As the next several paragraphs > confirm, it creates real headaches for the implementation, and I'm unclear > on what it buys in return. Is "we'll do something by magic for cycles with > no more than one finalizer" a major gain for the user over "we'll do > something by magic for cycles with no finalizer"? 0, 1 and infinity *are* > the only interesting numbers , but the difference between 0 and 1 > *here* doesn't seem to me worth signing up for any pain at all. I do have a reason: if a maximal SCC has only one finalizer, there can be no question about the ordering between finalizer calls. And isn't the whole point of this discussion to have predictable ordering of finalizer calls in the light of trash recycling? > I would have no objection to "__del__ called only once" if it weren't for > that Python currently does something different. I don't know whether people > rely on that now; if they do, it's a much more dangerous thing to change > than adding a new keyword (the compiler gives automatic 100% coverage of the > latter; but nothing mechanical can help people track down reliance-- whether > deliberate or accidental --on the former). [...] > But none of this self-sampling is going to comfort some guy in France who > has a megaline of code relying on it. Good *bet*, though . OK -- so your objection is purely about backwards compatibility. Apart from that, I strongly feel that the only-once rule is a good one. And I don't think that the compatibility issue weighs very strongly here (given all the other problems that typically exist with __del__). > I see Marc-Andre already declined to get sucked into the magical part of > this . Greg should speak for his scheme, and I haven't made time to > understand it fully; my best guess is to call x.__cleanup__ for every object > in the SCC (but there's no clear way to decide which order to call them in, > and unless they're more restricted than __del__ methods they can create all > the same problems __del__ methods can!). Yes, but at least since we're defining a new API (in a reserved portion of the method namespace) there are no previous assumptions to battle. > > Note that I'd like some implementation freedom: I may not want to > > bother with the graph reduction algorithm at first (which seems very > > hairy) so I'd like to have the right to use the __cleanup__ API > > as soon as I see finalizers in cyclical trash. I don't mind disposing > > of finalizer-free cycles first, but once I have more than one > > finalizer left in the remaining cycles, I'd like the right not to > > reduce the graph for topsort reasons -- that algorithm seems hard. > > I hate to be realistic , but modern GC algorithms are among the > hardest you'll ever see in any field; even the outer limits of what we've > talked about here is baby stuff. Sun's Java group (the one in Chelmsford, > MA, down the road from me) had a group of 4+ people (incl. the venerable Mr. > Steele) working full-time for over a year on the last iteration of Java's > GC. The simpler BDW is a megabyte of code spread over 100+ files. Etc -- > state of the art GC can be crushingly hard. > > So I've got nothing against taking shortcuts at first -- there's actually no > realistic alternative. I think we're overlooking the obvious one, though: > if any finalizer appears in any trash cycle, tough luck. Python 3000 -- > which may be a spelling of 1.7 , but doesn't *need* to be a spelling > of 1.6. Kind of sad though -- finally knowing about cycles and then not being able to do anything about them. > > So we're back to the __cleanup__ design. Strawman proposal: for all > > finalizers in a trash cycle, call their __cleanup__ method, in > > arbitrary order. After all __cleanup__ calls are done, if the objects > > haven't all disposed of themselves, they are all garbage-collected > > without calling __del__. (This seems to require another garbage > > colelction cycle -- so perhaps there should also be a once-only rule > > for __cleanup__?) > > > > Separate question: what if there is no __cleanup__? This should > > probably be reported: "You have cycles with finalizers, buddy! What > > do you want to do about them?" This same warning could be given when > > there is a __cleanup__ but it doesn't break all cycles. > > If I *ever* have a trash cycle with a finalizer in my code (> 0 -- "exactly > 1" isn't special to me), I will consider it to be a bug. So I want a way to > get it back from gc, so I can see what the heck it is, so I can fix my code > (or harass whoever did it to me). __cleanup__ suffices for that, so the > very act of calling it is all I'm really after ("Python invoked __cleanup__ > == Tim has a bug"). > > But after I outgrow that , I'll certainly want the option to get > another kind of complaint if __cleanup__ doesn't break the cycles, and after > *that* I couldn't care less. I've given you many gracious invitations to > say that you don't mind leaking in the face of a buggy program , but > as you've declined so far, I take it that never hearing another gripe about > leaking is a Primary Life Goal. So collection without calling __del__ is > fine -- but so is collection with calling it! If we're going to (at least > implicitly) approve of this stuff, it's probably better *to* call __del__, > if for no other reason than to catch your case of some poor innocent object > caught in a cycle not of its making that expects its __del__ to abort > starting World War III if it becomes unreachable . I suppose we can print some obnoxious message to stderr like """Your program has created cyclical trash involving one or more objects with a __del__ method; calling their __cleanup__ method didn't resolve the cycle(s). I'm going to call the __del__ method(s) but I can't guarantee that they will be called in a meaningful order, because of the cyclical dependencies.""" But I'd still like to reclaim the memory. If this is some long-running server process that is executing arbitrary Python commands sent to it by clients, it's not nice to leak, period. (Because of this, I will also need to trace functions, methods and modules -- these create massive cycles that currently require painful cleanup. Of course I also need to track down all the roots then... :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein@lyra.org Thu Mar 9 19:59:48 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 9 Mar 2000 11:59:48 -0800 (PST) Subject: [Python-Dev] finalization again In-Reply-To: <200003091951.OAA26184@eric.cnri.reston.va.us> Message-ID: On Thu, 9 Mar 2000, Guido van Rossum wrote: > > As an alternative, is it possible to move those cycles to the garbage list > > and then never look at them again? That would speed up future collection > > processing. > > With the current approach, that's almost automatic :-) > > I'd rather reclaim the memory too. Well, yah. I would too :-) I'm at ApacheCon right now, so haven't read the thread in detail, but it seems that people saw my algorithm as a bit too complex. Bah. IMO, it's a pretty straightforward way for the interpreter to get cycles cleaned up. (whether the objects in the cycles are lists/dicts, class instances, or extension types!) Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Thu Mar 9 20:18:06 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 9 Mar 2000 12:18:06 -0800 (PST) Subject: [Python-Dev] finalization again In-Reply-To: <200003091955.OAA26217@eric.cnri.reston.va.us> Message-ID: On Thu, 9 Mar 2000, Guido van Rossum wrote: >... > I don't think so. While my poor wording ("finalizer-free garbage") > didn't make this clear, my references to earlier algorithms were > intended to imply that this is garbage that consists of truly > unreachable objects. I have three lists: let's call them T(rash), > R(oot-reachable), and F(inalizer-reachable). The Schemenauer > c.s. algorithm moves all reachable nodes to R. I then propose to move > all finalizers to F, and to run another pass of Schemenauer c.s. to > also move all finalizer-reachable (but not root-reachable) nodes to F. >... > [Tim Peters] > > I see Marc-Andre already declined to get sucked into the magical part of > > this . Greg should speak for his scheme, and I haven't made time to > > understand it fully; my best guess is to call x.__cleanup__ for every object > > in the SCC (but there's no clear way to decide which order to call them in, > > and unless they're more restricted than __del__ methods they can create all > > the same problems __del__ methods can!). My scheme was to identify objects in F, but only those with a finalizer (not the closure). Then call __cleanup__ on each of them, in arbitrary order. If any are left after the sequence of __cleanup__ calls, then I call it an error. [ note that my proposal defined checking for a finalizer by calling tp_clean(TPCLEAN_CARE_CHECK); this accounts for class instances and for extension types with "heavy" processing in tp_dealloc ] The third step was to use tp_clean to try and clean all other objects in a safe fashion. Specifically: the objects have no finalizers, so there is no special care needed in finalizing, so this third step should nuke references that are stored in the object. This means object pointers are still valid (we haven't dealloc'd), but the insides have been emptied. If the third step does not remove all cycles, then one of the PyType objects did not remove all references during the tp_clean call. >... > > If I *ever* have a trash cycle with a finalizer in my code (> 0 -- "exactly > > 1" isn't special to me), I will consider it to be a bug. So I want a way to > > get it back from gc, so I can see what the heck it is, so I can fix my code > > (or harass whoever did it to me). __cleanup__ suffices for that, so the > > very act of calling it is all I'm really after ("Python invoked __cleanup__ > > == Tim has a bug"). Agreed. >... > I suppose we can print some obnoxious message to stderr like A valid alternative to raising an exception, but it falls into the whole trap of "where does stderr go?" >... > But I'd still like to reclaim the memory. If this is some > long-running server process that is executing arbitrary Python > commands sent to it by clients, it's not nice to leak, period. If an exception is raised, the top-level server loop can catch it, log the error, and keep going. But yes: it will leak. > (Because of this, I will also need to trace functions, methods and > modules -- these create massive cycles that currently require painful > cleanup. Of course I also need to track down all the roots > then... :-) Yes. It would be nice to have these participate in the "cleanup protocol" that I've described. It should help a lot at Python finalization time, effectively moving some special casing from import.c to the objects themselves. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jim@interet.com Thu Mar 9 20:20:23 2000 From: jim@interet.com (James C. Ahlstrom) Date: Thu, 09 Mar 2000 15:20:23 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: <000401bf89ea$e6e54180$79a0143f@tim> Message-ID: <38C80787.7791A1A6@interet.com> Tim Peters wrote: > Screw the docs. Pretend you're a newbie and *try* it. I did try it. > > import os > p = os.popen("dir") > while 1: > line = p.readline() > if not line: > break > print line > > Type that in by hand, or stick it in a file & run it from a cmdline > python.exe (which is a Windows console program). Under Win95 the process > freezes solid, and even trying to close the DOS box doesn't work. You have > to bring up the task manager and kill it that way. I once traced this under Point on the curve: This program works perfectly on my machine running NT. > libc pipes ares as flaky as libc popen under Windows, Jim! MarkH has the > only versions of these things that come close to working under Windows (he > wraps the native Win32 spellings of these things; MS's libc entry points > (which Python uses now) are much worse). I believe you when you say popen() is flakey. It is a little harder to believe it is not possible to write a _popen() replacement using pipes which works. Of course I wanted you to do it instead of me! Well, if I get any time before 1.6 comes out... JimA From gstein@lyra.org Thu Mar 9 20:31:38 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 9 Mar 2000 12:31:38 -0800 (PST) Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <38C80787.7791A1A6@interet.com> Message-ID: On Thu, 9 Mar 2000, James C. Ahlstrom wrote: >... > > libc pipes ares as flaky as libc popen under Windows, Jim! MarkH has the > > only versions of these things that come close to working under Windows (he > > wraps the native Win32 spellings of these things; MS's libc entry points > > (which Python uses now) are much worse). > > I believe you when you say popen() is flakey. It is a little > harder to believe it is not possible to write a _popen() > replacement using pipes which works. > > Of course I wanted you to do it instead of me! Well, if > I get any time before 1.6 comes out... It *has* been done. Bill Tutt did it a long time ago. That's what win32pipe is all about. -g -- Greg Stein, http://www.lyra.org/ From jim@interet.com Thu Mar 9 21:04:59 2000 From: jim@interet.com (James C. Ahlstrom) Date: Thu, 09 Mar 2000 16:04:59 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipestuff going to be adopted? References: Message-ID: <38C811FB.B6096FA4@interet.com> Greg Stein wrote: > > On Thu, 9 Mar 2000, James C. Ahlstrom wrote: > > Of course I wanted you to do it instead of me! Well, if > > I get any time before 1.6 comes out... > > It *has* been done. Bill Tutt did it a long time ago. That's what > win32pipe is all about. Thanks for the heads up! Unfortunately, win32pipe is not in the core, and probably covers more ground than just popen() and so might be a maintenance problem. And popen() is not written in it anyway. So we are Not There Yet (TM). Which I guess was Tim's original point. JimA From mhammond@skippinet.com.au Thu Mar 9 21:36:14 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Fri, 10 Mar 2000 08:36:14 +1100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <38C80787.7791A1A6@interet.com> Message-ID: > Point on the curve: This program works perfectly on my > machine running NT. And running from Python.exe. I bet you didnt try it from a GUI. The situation is worse WRT Windows 95. MS has a knowledge base article describing the bug, and telling you how to work around it by using a decicated .EXE. So, out of the box, popen works only on a NT from a console - pretty sorry state of affairs :-( > I believe you when you say popen() is flakey. It is a little > harder to believe it is not possible to write a _popen() > replacement using pipes which works. Which is what I believe win32pipe.popen* are. Mark. From guido@python.org Fri Mar 10 01:13:51 2000 From: guido@python.org (Guido van Rossum) Date: Thu, 09 Mar 2000 20:13:51 -0500 Subject: [Python-Dev] writelines() not thread-safe Message-ID: <200003100113.UAA27337@eric.cnri.reston.va.us> Christian Tismer just did an exhaustive search for thread unsafe use of Python operations, and found two weaknesses. One is posix.listdir(), which I had already found; the other is file.writelines(). Here's a program that demonstrates the bug; basically, while writelines is walking down the list, another thread could truncate the list, causing PyList_GetItem() to fail or a string object to be deallocated while writelines is using it. On my SOlaris 7 system it typically crashes in the first or second iteration. It's easy to fix: just don't use release the interpreter lock (get rid of Py_BEGIN_ALLOW_THREADS c.s.). This would however prevent other threads from doing any work while this thread may be blocked for I/O. An alternative solution is to put Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS just around the fwrite() call. This is safe, but would require a lot of lock operations and would probably slow things down too much. Ideas? --Guido van Rossum (home page: http://www.python.org/~guido/) import os import sys import thread import random import time import tempfile def good_guy(fp, list): t0 = time.time() fp.seek(0) fp.writelines(list) t1 = time.time() print fp.tell(), "bytes written" return t1-t0 def bad_guy(dt, list): time.sleep(random.random() * dt) del list[:] def main(): infn = "/usr/dict/words" if sys.argv[1:]: infn = sys.argv[1] print "reading %s..." % infn fp = open(infn) list = fp.readlines() fp.close() print "read %d lines" % len(list) tfn = tempfile.mktemp() fp = None try: fp = open(tfn, "w") print "calibrating..." dt = 0.0 n = 3 for i in range(n): dt = dt + good_guy(fp, list) dt = dt / n # average time it took to write the list to disk print "dt =", round(dt, 3) i = 0 while 1: i = i+1 print "test", i copy = map(lambda x: x[1:], list) thread.start_new_thread(bad_guy, (dt, copy)) good_guy(fp, copy) finally: if fp: fp.close() try: os.unlink(tfn) except os.error: pass main() From tim_one@email.msn.com Fri Mar 10 02:13:51 2000 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 9 Mar 2000 21:13:51 -0500 Subject: [Python-Dev] writelines() not thread-safe In-Reply-To: <200003100113.UAA27337@eric.cnri.reston.va.us> Message-ID: <000601bf8a36$46ebf880$58a2143f@tim> [Guido van Rossum] > Christian Tismer just did an exhaustive search for thread unsafe use > of Python operations, and found two weaknesses. One is > posix.listdir(), which I had already found; the other is > file.writelines(). Here's a program that demonstrates the bug; > basically, while writelines is walking down the list, another thread > could truncate the list, causing PyList_GetItem() to fail or a string > object to be deallocated while writelines is using it. On my SOlaris > 7 system it typically crashes in the first or second iteration. > > It's easy to fix: just don't use release the interpreter lock (get rid > of Py_BEGIN_ALLOW_THREADS c.s.). This would however prevent other > threads from doing any work while this thread may be blocked for I/O. > > An alternative solution is to put Py_BEGIN_ALLOW_THREADS and > Py_END_ALLOW_THREADS just around the fwrite() call. This is safe, but > would require a lot of lock operations and would probably slow things > down too much. > > Ideas? 2.5: 1: Before releasing the lock, make a shallow copy of the list. 1.5: As in #1, but iteratively peeling off "the next N" values, for some N balancing the number of lock operations against the memory burden (I don't care about the speed of a shallow copy here ...). 2. Pull the same trick list.sort() uses: make the list object immutable for the duration (I know you think that's a hack, and it is , but it costs virtually nothing and would raise an approriate error when they attempted the insane mutation). I actually like #2 best now, but won't in the future, because file_writelines() should really accept an argument of any sequence type. This makes 1.5 a better long-term hack. although-adding-1.5-to-1.6-is-confusing-ly y'rs - tim From tim_one@email.msn.com Fri Mar 10 02:52:26 2000 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 9 Mar 2000 21:52:26 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <1259490837-400325@hypernet.com> Message-ID: <000901bf8a3b$ab314660$58a2143f@tim> [Gordon McM, aspires to make sense of the mess] > It doesn't work for commands builtin to whatever "shell" you're > using. That's different between cmd and command, and the > various flavors, versions and extensions thereof. It's not that simple, either; e.g., old apps invoking the 16-bit subsystem can screw up too. Look at Tcl's man page for "exec" and just *try* to wrap your brain around all the caveats they were left with after throwing a few thousand lines of C at this under their Windows port . > FWIW, I gave up a long time ago. I use redirection and a > tempfile. The few times I've wanted "interactive" control, I've > used Win32Process, dup'ed, inherited handles... the whole 9 > yards. Why? Look at all the questions about popen and child > processes in general, on platforms where it *works*, (if it > weren't for Donn Cave, nobody'd get it to work anywhere ). Donn is downright scary that way. I stopped using 'em too, of course. > To reiterate Tim's point: *none* of the c runtime routines for > process control on Windows are adequate (beyond os.system > and living with a DOS box popping up). No, os.system is a problem under command.com flavors of Windows too, as system spawns a new shell and command.com's exit code is *always* 0. So Python's os.system returns 0 no matter what app the user *thinks* they were running, and whether it worked or set the baby on fire. > The raw Win32 CreateProcess does everything you could possibly want, but > takes a week or more to understand, (if this arg is a that, then that arg > is a whatsit, and the next is limited to the values X and Z unless...). Except that CreateProcess doesn't handle shell metacharacters, right? Tcl is the only language I've seen that really works hard at making cmdline-style process control portable. so-all-we-need-to-do-is-a-single-createprocess-to-invoke-tcl-ly y'rs - tim From tim_one@email.msn.com Fri Mar 10 02:52:24 2000 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 9 Mar 2000 21:52:24 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <14535.46175.991970.135642@weyr.cnri.reston.va.us> Message-ID: <000801bf8a3b$aa0c4e60$58a2143f@tim> [Fred L. Drake, Jr.] > Tim (& others), > Would this additional text be sufficient for the os.popen() > documentation? > > \strong{Note:} This function behaves unreliably under Windows > due to the native implementation of \cfunction{popen()}. Yes, that's good! If Mark/Bill's alternatives don't make it in, would also be good to point to the PythonWin extensions (although MarkH will have to give us the Official Name for that). > If someone cares to explain what's weird about it, that might be > appropriate as well, but I've never used this under Windows. As the rest of this thread should have made abundantly clear by now <0.9 wink>, it's such a mess across various Windows flavors that nobody can explain it. From tim_one@email.msn.com Fri Mar 10 03:15:18 2000 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 9 Mar 2000 22:15:18 -0500 Subject: [Python-Dev] RE: finalization again In-Reply-To: <20000309123731.A3664@acs.ucalgary.ca> Message-ID: <000a01bf8a3e$dc8878c0$58a2143f@tim> Quickie: [Tim] >> It's not obvious, but the SCCs can be found in linear time (via Tarjan's >> algorithm, which is simple but subtle; [NeilS] > Wow, it seems like it should be more expensive than that. Oh yes! Many bright people failed to discover the trick; Tarjan didn't discover it until (IIRC) the early 70's, and it was a surprise. It's just a few lines of simple code added to an ordinary depth-first search. However, while the code is simple, a correctness proof is not. BTW, if it wasn't clear, when talking about graph algorithms "linear" is usual taken to mean "in the sum of the number of nodes and edges". Cyclops.py finds all the cycles in linear time in that sense, too (but does not find the SCCs in linear time, at least not in theory -- in practice you can't tell the difference ). > What are the space requirements? Same as depth-first search, plus a way to associate an SCC id with each node, plus a single global "id" vrbl. So it's worst-case linear (in the number of nodes) space. See, e.g., any of the books in Sedgewick's "Algorithms in [Language du Jour]" series for working code. > Also, does the simple algorithm you used in Cyclops have a name? Not officially, but it answers to "hey, dumb-ass!" . then-again-so-do-i-so-make-eye-contact-ly y'rs - tim From bwarsaw@cnri.reston.va.us Fri Mar 10 04:21:46 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 9 Mar 2000 23:21:46 -0500 (EST) Subject: [Python-Dev] finalization again References: <000701bf89ab$80cb8e20$0d2d153f@tim> <200003091955.OAA26217@eric.cnri.reston.va.us> Message-ID: <14536.30810.720836.886023@anthem.cnri.reston.va.us> Okay, I had a flash of inspiration on the way home from my gig tonight. Of course, I'm also really tired so I'm sure Tim will shoot this down in his usual witty but humbling way. I just had to get this out or I wouldn't sleep tonight. What if you timestamp instances when you create them? Then when you have trash cycles with finalizers, you sort them and finalize in chronological order. The nice thing here is that the user can have complete control over finalization order by controlling object creation order. Some random thoughts: - Finalization order of cyclic finalizable trash is completely deterministic. - Given sufficient resolution of your system clock, you should never have two objects with the same timestamp. - You could reduce the memory footprint by only including a timestamp for objects whose classes have __del__'s at instance creation time. Sticking an __del__ into your class dynamically would have no effect on objects that are already created (and I wouldn't poke you with a pointy stick if even post-twiddle instances didn't get timestamped). Thus, such objects would never be finalized -- tough luck. - FIFO order /seems/ more natural to me than FILO, but then I rarely create cyclic objects, and almost never use __del__, so this whole argument has been somewhat academic to me :). - The rule seems easy enough to implement, describe, and understand. I think I came up with a few more points on the drive home, but my post jam, post lightbulb endorphodrenalin rush is quickly subsiding, so I leave the rest until tomorrow. its-simply-a-matter-of-time-ly y'rs, -Barry From Moshe Zadka Fri Mar 10 05:32:41 2000 From: Moshe Zadka (Moshe Zadka) Date: Fri, 10 Mar 2000 07:32:41 +0200 (IST) Subject: [Python-Dev] finalization again In-Reply-To: Message-ID: On Thu, 9 Mar 2000, Greg Stein wrote: > > But I'd still like to reclaim the memory. If this is some > > long-running server process that is executing arbitrary Python > > commands sent to it by clients, it's not nice to leak, period. > > If an exception is raised, the top-level server loop can catch it, log the > error, and keep going. But yes: it will leak. And Tim's version stops the leaking if the server is smart enough: occasionally, it will call gc.get_dangerous_cycles(), and nuke everything it finds there. (E.g., clean up dicts and lists). Some destructor raises an exception? Ignore it (or whatever). And no willy-nilly "but I'm using a silly OS which has hardly any concept of stderr" problems! If the server wants, it can just send a message to the log. rooting-for-tim-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From tim_one@email.msn.com Fri Mar 10 08:18:29 2000 From: tim_one@email.msn.com (Tim Peters) Date: Fri, 10 Mar 2000 03:18:29 -0500 Subject: [Python-Dev] finalization again In-Reply-To: <200003091955.OAA26217@eric.cnri.reston.va.us> Message-ID: <000001bf8a69$37d57b40$812d153f@tim> This is getting to be fun, but afraid I can only make time for the first easy one tonight: [Tim, conjures a horrid vision of finalizers installing new __del__ methods, then sez ... ] > The scheme above is meant to be bulletproof in the face of abuses even > I can't conceive of . [Guido] > Are you *sure* your scheme deals with this? Never said it did -- only that it *meant* to . Ya, you got me. The things I thought I had *proved* I put in the numbered list, and in a rush put the speculative stuff in the reply body. One practical thing I think I can prove today: after finding SCCs, and identifying the safe nodes without predecessors, all such nodes S1, S2, ... can be cleaned up without fear of resurrection, or of cleaning something in Si causing anything in Sj (i!=j) to get reclaimed either (at the time I wrote it, I could only prove that cleaning *one* Si was non-problematic). Barring, of course, this "__del__ from hell" pathology. Also suspect that this claim is isomorphic to your later elaboration on why the objects on T at this point cannot be resurrected by a finalizer that runs, since they aren't reachable from any finalizers That is, exactly the same is true of "the safe (SCC super)nodes without predecessors", so I expect we've just got two ways of identifying the same set here. Perhaps yours is bigger, though (I realize that isn't clear; later). > Let's look at an example. > (Again, lowercase nodes have no finalizers.) Take G: > > a <=> b -> C > > [and cleaning b can trigger C.__del__ which can create > a.__class__.__del__ before a is decref'ed ...] > > ... and we're halfway committing a crime we said we would never commit > (touching cyclical trash with finalizers). Wholly agreed. > I propose to disregard this absurd possibility, How come you never propose to just shoot people <0.9 wink>? > except to the extent that Python shouldn't crash -- but we make no > guarantees to the user. "Shouldn't crash" is essential, sure. Carry it another step: after C is finalized, we get back to the loop clearing b.__dict__, and the refcount on "a" falls to 0 next. So the new a.__del__ gets called. Since b was visible to a, it's possible for a.__del__ to resurrect b, which latter is now in some bizarre (from the programmer's POV) cleared state (or even in the bit bucket, if we optimistically reclaim b's memory "early"!). I can't (well, don't want to ) believe it will be hard to stop this. It's just irksome to need to think about it at all. making-java's-gc-look-easy?-ly y'rs - tim From guido@python.org Fri Mar 10 13:46:43 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 10 Mar 2000 08:46:43 -0500 Subject: [Python-Dev] finalization again In-Reply-To: Your message of "Thu, 09 Mar 2000 23:21:46 EST." <14536.30810.720836.886023@anthem.cnri.reston.va.us> References: <000701bf89ab$80cb8e20$0d2d153f@tim> <200003091955.OAA26217@eric.cnri.reston.va.us> <14536.30810.720836.886023@anthem.cnri.reston.va.us> Message-ID: <200003101346.IAA27847@eric.cnri.reston.va.us> > What if you timestamp instances when you create them? Then when you > have trash cycles with finalizers, you sort them and finalize in > chronological order. The nice thing here is that the user can have > complete control over finalization order by controlling object > creation order. > > Some random thoughts: > > - Finalization order of cyclic finalizable trash is completely > deterministic. > > - Given sufficient resolution of your system clock, you should never > have two objects with the same timestamp. Forget the clock -- just use a counter that is incremented on each allocation. > - You could reduce the memory footprint by only including a timestamp > for objects whose classes have __del__'s at instance creation time. > Sticking an __del__ into your class dynamically would have no effect > on objects that are already created (and I wouldn't poke you with a > pointy stick if even post-twiddle instances didn't get > timestamped). Thus, such objects would never be finalized -- tough > luck. > > - FIFO order /seems/ more natural to me than FILO, but then I rarely > create cyclic objects, and almost never use __del__, so this whole > argument has been somewhat academic to me :). Ai, there's the rub. Suppose I have a tree with parent and child links. And suppose I have a rule that children need to be finalized before their parents (maybe they represent a Unix directory tree, where you must rm the files before you can rmdir the directory). This suggests that we should choose LIFO: you must create the parents first (you have to create a directory before you can create files in it). However, now we add operations to move nodes around in the tree. Suddenly you can have a child that is older than its parent! Conclusion: the creation time is useless; the application logic and actual link relationships are needed. > - The rule seems easy enough to implement, describe, and understand. > > I think I came up with a few more points on the drive home, but my > post jam, post lightbulb endorphodrenalin rush is quickly subsiding, > so I leave the rest until tomorrow. > > its-simply-a-matter-of-time-ly y'rs, > -Barry Time flies like an arrow -- fruit flies like a banana. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Mar 10 15:06:48 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 10 Mar 2000 10:06:48 -0500 Subject: [Python-Dev] writelines() not thread-safe In-Reply-To: Your message of "Thu, 09 Mar 2000 21:13:51 EST." <000601bf8a36$46ebf880$58a2143f@tim> References: <000601bf8a36$46ebf880$58a2143f@tim> Message-ID: <200003101506.KAA28358@eric.cnri.reston.va.us> OK, here's a patch for writelines() that supports arbitrary sequences and fixes the lock problem using Tim's solution #1.5 (slicing 1000 items at a time). It contains a fast path for when the argument is a list, using PyList_GetSlice; otherwise it uses PyObject_GetItem and a fixed list. Please have a good look at this; I've only tested it lightly. --Guido van Rossum (home page: http://www.python.org/~guido/) Index: fileobject.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/fileobject.c,v retrieving revision 2.70 diff -c -r2.70 fileobject.c *** fileobject.c 2000/02/29 13:59:28 2.70 --- fileobject.c 2000/03/10 14:55:47 *************** *** 884,923 **** PyFileObject *f; PyObject *args; { ! int i, n; if (f->f_fp == NULL) return err_closed(); ! if (args == NULL || !PyList_Check(args)) { PyErr_SetString(PyExc_TypeError, ! "writelines() requires list of strings"); return NULL; } ! n = PyList_Size(args); ! f->f_softspace = 0; ! Py_BEGIN_ALLOW_THREADS ! errno = 0; ! for (i = 0; i < n; i++) { ! PyObject *line = PyList_GetItem(args, i); ! int len; ! int nwritten; ! if (!PyString_Check(line)) { ! Py_BLOCK_THREADS ! PyErr_SetString(PyExc_TypeError, ! "writelines() requires list of strings"); return NULL; } ! len = PyString_Size(line); ! nwritten = fwrite(PyString_AsString(line), 1, len, f->f_fp); ! if (nwritten != len) { ! Py_BLOCK_THREADS ! PyErr_SetFromErrno(PyExc_IOError); ! clearerr(f->f_fp); ! return NULL; } } ! Py_END_ALLOW_THREADS Py_INCREF(Py_None); ! return Py_None; } static PyMethodDef file_methods[] = { --- 884,975 ---- PyFileObject *f; PyObject *args; { ! #define CHUNKSIZE 1000 ! PyObject *list, *line; ! PyObject *result; ! int i, j, index, len, nwritten, islist; ! if (f->f_fp == NULL) return err_closed(); ! if (args == NULL || !PySequence_Check(args)) { PyErr_SetString(PyExc_TypeError, ! "writelines() requires sequence of strings"); return NULL; } ! islist = PyList_Check(args); ! ! /* Strategy: slurp CHUNKSIZE lines into a private list, ! checking that they are all strings, then write that list ! without holding the interpreter lock, then come back for more. */ ! index = 0; ! if (islist) ! list = NULL; ! else { ! list = PyList_New(CHUNKSIZE); ! if (list == NULL) return NULL; + } + result = NULL; + + for (;;) { + if (islist) { + Py_XDECREF(list); + list = PyList_GetSlice(args, index, index+CHUNKSIZE); + if (list == NULL) + return NULL; + j = PyList_GET_SIZE(list); } ! else { ! for (j = 0; j < CHUNKSIZE; j++) { ! line = PySequence_GetItem(args, index+j); ! if (line == NULL) { ! if (PyErr_ExceptionMatches(PyExc_IndexError)) { ! PyErr_Clear(); ! break; ! } ! /* Some other error occurred. ! Note that we may lose some output. */ ! goto error; ! } ! if (!PyString_Check(line)) { ! PyErr_SetString(PyExc_TypeError, ! "writelines() requires sequences of strings"); ! goto error; ! } ! PyList_SetItem(list, j, line); ! } ! } ! if (j == 0) ! break; ! ! Py_BEGIN_ALLOW_THREADS ! f->f_softspace = 0; ! errno = 0; ! for (i = 0; i < j; i++) { ! line = PyList_GET_ITEM(list, i); ! len = PyString_GET_SIZE(line); ! nwritten = fwrite(PyString_AS_STRING(line), ! 1, len, f->f_fp); ! if (nwritten != len) { ! Py_BLOCK_THREADS ! PyErr_SetFromErrno(PyExc_IOError); ! clearerr(f->f_fp); ! Py_DECREF(list); ! return NULL; ! } } + Py_END_ALLOW_THREADS + + if (j < CHUNKSIZE) + break; + index += CHUNKSIZE; } ! Py_INCREF(Py_None); ! result = Py_None; ! error: ! Py_XDECREF(list); ! return result; } static PyMethodDef file_methods[] = { From skip@mojam.com (Skip Montanaro) Fri Mar 10 15:28:13 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 10 Mar 2000 09:28:13 -0600 Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler Message-ID: <200003101528.JAA15951@beluga.mojam.com> Consider the following snippet of code from MySQLdb.py: try: self._query(query % escape_row(args, qc)) except TypeError: self._query(query % escape_dict(args, qc)) It's not quite right. There are at least four reasons I can think of why the % operator might raise a TypeError: 1. query has not enough format specifiers 2. query has too many format specifiers 3. argument type mismatch between individual format specifier and corresponding argument 4. query expects dist-style interpolation The except clause only handles the last case. That leaves the other three cases mishandled. The above construct pretends that all TypeErrors possible are handled by calling escape_dict() instead of escape_row(). I stumbled on case 2 yesterday and got a fairly useless error message when the code in the except clause also bombed. Took me a few minutes of head scratching to see that I had an extra %s in my format string. A note to Andy Dustman, MySQLdb's author, yielded the following modified version: try: self._query(query % escape_row(args, qc)) except TypeError, m: if m.args[0] == "not enough arguments for format string": raise if m.args[0] == "not all arguments converted": raise self._query(query % escape_dict(args, qc)) This will do the trick for me for the time being. Note, however, that the only way for Andy to decide which of the cases occurred (case 3 still isn't handled above, but should occur very rarely in MySQLdb since it only uses the more accommodating %s as a format specifier) is to compare the string value of the message to see which of the four cases was raised. This strong coupling via the error message text between the exception being raised (in C code, in this case) and the place where it's caught seems bad to me and encourages authors to either not recover from errors or to recover from them in the crudest fashion. If Guido decides to tweak the TypeError message in any fashion, perhaps to include the count of arguments in the format string and argument tuple, this code will break. It makes me wonder if there's not a better mechanism waiting to be discovered. Would it be possible to publish an interface of some sort via the exceptions module that would allow symbolic names or dictionary references to be used to decide which case is being handled? I envision something like the following in exceptions.py: UNKNOWN_ERROR_CATEGORY = 0 TYP_SHORT_FORMAT = 1 TYP_LONG_FORMAT = 2 ... IND_BAD_RANGE = 1 message_map = { # leave (TypeError, ("not enough arguments for format string",)): TYP_SHORT_FORMAT, (TypeError, ("not all arguments converted",)): TYP_LONG_FORMAT, ... (IndexError, ("list index out of range",)): IND_BAD_RANGE, ... } This would isolate the raw text of exception strings to just a single place (well, just one place on the exception handling side of things). It would be used something like try: self._query(query % escape_row(args, qc)) except TypeError, m: from exceptions import * exc_case = message_map.get((TypeError, m.args), UNKNOWN_ERROR_CATEGORY) if exc_case in [UNKNOWN_ERROR_CATEGORY,TYP_SHORT_FORMAT, TYP_LONG_FORMAT]: raise self._query(query % escape_dict(args, qc)) This could be added to exceptions.py without breaking existing code. Does this (or something like it) seem like a reasonable enhancement for Py2K? If we can narrow things down to an implementable solution I'll create a patch. Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From guido@python.org Fri Mar 10 16:17:56 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 10 Mar 2000 11:17:56 -0500 Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler In-Reply-To: Your message of "Fri, 10 Mar 2000 09:28:13 CST." <200003101528.JAA15951@beluga.mojam.com> References: <200003101528.JAA15951@beluga.mojam.com> Message-ID: <200003101617.LAA28722@eric.cnri.reston.va.us> > Consider the following snippet of code from MySQLdb.py: Skip, I'm not familiar with MySQLdb.py, and I have no idea what your example is about. From the rest of the message I feel it's not about MySQLdb at all, but about string formatting, butthe point escapes me because you never quite show what's in the format string and what error that gives. Could you give some examples based on first principles? A simple interactive session showing the various errors would be helpful... --Guido van Rossum (home page: http://www.python.org/~guido/) From gward@cnri.reston.va.us Fri Mar 10 19:05:04 2000 From: gward@cnri.reston.va.us (Greg Ward) Date: Fri, 10 Mar 2000 14:05:04 -0500 Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler In-Reply-To: <200003101617.LAA28722@eric.cnri.reston.va.us>; from guido@python.org on Fri, Mar 10, 2000 at 11:17:56AM -0500 References: <200003101528.JAA15951@beluga.mojam.com> <200003101617.LAA28722@eric.cnri.reston.va.us> Message-ID: <20000310140503.A8619@cnri.reston.va.us> On 10 March 2000, Guido van Rossum said: > Skip, I'm not familiar with MySQLdb.py, and I have no idea what your > example is about. From the rest of the message I feel it's not about > MySQLdb at all, but about string formatting, butthe point escapes me > because you never quite show what's in the format string and what > error that gives. Could you give some examples based on first > principles? A simple interactive session showing the various errors > would be helpful... I think Skip's point was just this: "TypeError" isn't expressive enough. If you catch TypeError on a statement with multiple possible type errors, you don't know which one you caught. Same holds for any exception type, really: a given statement could blow up with ValueError for any number of reasons. Etc., etc. One possible solution, and I think this is what Skip was getting at, is to add an "error code" to the exception object that identifies the error more reliably than examining the error message. It's just the errno/strerror dichotomy: strerror is for users, errno is for code. I think Skip is just saying that Pythone exception objets need an errno (although it doesn't have to be a number). It would probably only make sense to define error codes for exceptions that can be raised by Python itself, though. Greg From skip@mojam.com (Skip Montanaro) Fri Mar 10 20:17:30 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 10 Mar 2000 14:17:30 -0600 (CST) Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler In-Reply-To: <200003101617.LAA28722@eric.cnri.reston.va.us> References: <200003101528.JAA15951@beluga.mojam.com> <200003101617.LAA28722@eric.cnri.reston.va.us> Message-ID: <14537.22618.656740.296408@beluga.mojam.com> Guido> Skip, I'm not familiar with MySQLdb.py, and I have no idea what Guido> your example is about. From the rest of the message I feel it's Guido> not about MySQLdb at all, but about string formatting, My apologies. You're correct, it's really not about MySQLdb. It's about handling multiple cases raised by the same exception. First, a more concrete example that just uses simple string formats: code exception "%s" % ("a", "b") TypeError: 'not all arguments converted' "%s %s" % "a" TypeError: 'not enough arguments for format string' "%(a)s" % ("a",) TypeError: 'format requires a mapping' "%d" % {"a": 1} TypeError: 'illegal argument type for built-in operation' Let's presume hypothetically that it's possible to recover from some subset of the TypeErrors that are raised, but not all of them. Now, also presume that the format strings and the tuple, string or dict literals I've given above can be stored in variables (which they can). If we wrap the code in a try/except statement, we can catch the TypeError exception and try to do something sensible. This is precisely the trick that Andy Dustman uses in MySQLdb: first try expanding the format string using a tuple as the RH operand, then try with a dict if that fails. Unfortunately, as you can see from the above examples, there are four cases that need to be handled. To distinguish them currently, you have to compare the message you get with the exception to string literals that are generally defined in C code in the interpreter. Here's what Andy's original code looked like stripped of the MySQLdb-ese: try: x = format % tuple_generating_function(...) except TypeError: x = format % dict_generating_function(...) That doesn't handle the first two cases above. You have to inspect the message that raise sends out: try: x = format % tuple_generating_function(...) except TypeError, m: if m.args[0] == "not all arguments converted": raise if m.args[0] == "not enough arguments for format string": raise x = format % dict_generating_function(...) This comparison of except arguments with hard-coded strings (especially ones the programmer has no direct control over) seems fragile to me. If you decide to reword the error message strings, you break someone's code. In my previous message I suggested collecting this fragility in the exceptions module where it can be better isolated. My solution is a bit cumbersome, but could probably be cleaned up somewhat, but basically looks like try: x = format % tuple_generating_function(...) except TypeError, m: import exceptions msg_case = exceptions.message_map.get((TypeError, m.args), exceptions.UNKNOWN_ERROR_CATEGORY) # punt on the cases we can't recover from if msg_case == exceptions.TYP_SHORT_FORMAT: raise if msg_case == exceptions.TYP_LONG_FORMAT: raise if msg_case == exceptions.UNKNOWN_ERROR_CATEGORY: raise # handle the one we can x = format % dict_generating_function(...) In private email that crossed my original message, Andy suggested defining more standard exceptions, e.g.: class FormatError(TypeError): pass class TooManyElements(FormatError): pass class TooFewElements(FormatError): pass then raising the appropriate error based on the circumstance. Code that catches TypeError exceptions would still work. So there are two possible changes on the table: 1. define more standard exceptions so you can distinguish classes of errors on a more fine-grained basis using just the first argument of the except clause. 2. provide some machinery in exceptions.py to allow programmers a measure of uncoupling from using hard-coded strings to distinguish cases. Skip From skip@mojam.com (Skip Montanaro) Fri Mar 10 20:21:11 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 10 Mar 2000 14:21:11 -0600 (CST) Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler In-Reply-To: <20000310140503.A8619@cnri.reston.va.us> References: <200003101528.JAA15951@beluga.mojam.com> <200003101617.LAA28722@eric.cnri.reston.va.us> <20000310140503.A8619@cnri.reston.va.us> Message-ID: <14537.22839.664131.373727@beluga.mojam.com> Greg> One possible solution, and I think this is what Skip was getting Greg> at, is to add an "error code" to the exception object that Greg> identifies the error more reliably than examining the error Greg> message. It's just the errno/strerror dichotomy: strerror is for Greg> users, errno is for code. I think Skip is just saying that Greg> Pythone exception objets need an errno (although it doesn't have Greg> to be a number). It would probably only make sense to define Greg> error codes for exceptions that can be raised by Python itself, Greg> though. I'm actually allowing the string to be used as the error code. If you raise TypeError with "not all arguments converted" as the argument, then that string literal will appear in the definition of exceptions.message_map as part of a key. The programmer would only refer to the args attribute of the object being raised. either-or-makes-no-real-difference-to-me-ly y'rs, Skip From bwarsaw@cnri.reston.va.us Fri Mar 10 20:56:45 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Fri, 10 Mar 2000 15:56:45 -0500 (EST) Subject: [Python-Dev] finalization again References: <000701bf89ab$80cb8e20$0d2d153f@tim> <200003091955.OAA26217@eric.cnri.reston.va.us> <14536.30810.720836.886023@anthem.cnri.reston.va.us> <200003101346.IAA27847@eric.cnri.reston.va.us> Message-ID: <14537.24973.579056.533282@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: >> Given sufficient resolution of your system >> clock, you should never have two objects with the same >> timestamp. GvR> Forget the clock -- just use a counter that is incremented on GvR> each allocation. Good idea. GvR> Suppose I have a tree with parent and child links. And GvR> suppose I have a rule that children need to be finalized GvR> before their parents (maybe they represent a Unix directory GvR> tree, where you must rm the files before you can rmdir the GvR> directory). This suggests that we should choose LIFO: you GvR> must create the parents first (you have to create a directory GvR> before you can create files in it). However, now we add GvR> operations to move nodes around in the tree. Suddenly you GvR> can have a child that is older than its parent! Conclusion: GvR> the creation time is useless; the application logic and GvR> actual link relationships are needed. One potential way to solve this is to provide an interface for refreshing the counter; for discussion purposes, I'll call this sys.gcrefresh(obj). Throws a TypeError if obj isn't a finalizable instance. Otherwise, it sets the "timestamp" to the current counter value and increments the counter. Thus, in your example, when the child node is reparented, you sys.gcrefresh(child) and now the parent is automatically older. Of course, what if the child has its own children? You've now got an age graph like this parent > child < grandchild with the wrong age relationship between the parent and grandchild. So when you refresh, you've got to walk down the containment tree making sure your grandkids are "younger" than yourself. E.g.: class Node: ... def __del__(self): ... def reparent(self, node): self.parent = node self.refresh() def refresh(self): sys.gcrefresh(self) for c in self.children: c.refresh() The point to all this is that it gives explicit control of the finalizable cycle reclamation order to the user, via a fairly easy to understand, and manipulate mechanism. twas-only-a-flesh-wound-but-waiting-for-the-next-stroke-ly y'rs, -Barry From jim@interet.com Fri Mar 10 21:14:45 2000 From: jim@interet.com (James C. Ahlstrom) Date: Fri, 10 Mar 2000 16:14:45 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: <000801bf8a3b$aa0c4e60$58a2143f@tim> Message-ID: <38C965C4.B164C2D5@interet.com> Tim Peters wrote: > > [Fred L. Drake, Jr.] > > Tim (& others), > > Would this additional text be sufficient for the os.popen() > > documentation? > > > > \strong{Note:} This function behaves unreliably under Windows > > due to the native implementation of \cfunction{popen()}. > > Yes, that's good! If Mark/Bill's alternatives don't make it in, would also > be good to point to the PythonWin extensions (although MarkH will have to > give us the Official Name for that). Well, it looks like this thread has fizzled out. But what did we decide? Changing the docs to say popen() "doesn't work reliably" is a little weak. Maybe removing popen() is better, and demanding that Windows users use win32pipe. I played around with a patch to posixmodule.c which eliminates _popen() and implements os.popen() using CreatePipe(). It sort of works on NT and fails on 95. Anyway, I am stuck on how to make a Python file object from a pipe handle. Would it be a good idea to extract the Wisdom from win32pipe and re-implement os.popen() either in C or by using win32pipe directly? Using C is simple and to the point. I feel Tim's original complaint that popen() is a Problem still hasn't been fixed. JimA From Moshe Zadka Fri Mar 10 21:29:05 2000 From: Moshe Zadka (Moshe Zadka) Date: Fri, 10 Mar 2000 23:29:05 +0200 (IST) Subject: [Python-Dev] finalization again In-Reply-To: <14537.24973.579056.533282@anthem.cnri.reston.va.us> Message-ID: On Fri, 10 Mar 2000 bwarsaw@cnri.reston.va.us wrote: > One potential way to solve this is to provide an interface for > refreshing the counter; for discussion purposes, I'll call this > sys.gcrefresh(obj). Barry, there are other problems with your scheme, but I won't even try to point those out: having to call a function whose purpose can only be described in terms of a concrete implementation of a garbage collection scheme is simply unacceptable. I can almost see you shouting "Come back here, I'll bite your legs off" . > The point to all this is that it gives explicit control of the > finalizable cycle reclamation order to the user, via a fairly easy to > understand, and manipulate mechanism. Oh? This sounds like the most horrendus mechanism alive.... you-probably-jammed-a-*little*-too-loud-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From bwarsaw@cnri.reston.va.us Fri Mar 10 22:15:27 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Fri, 10 Mar 2000 17:15:27 -0500 (EST) Subject: [Python-Dev] finalization again References: <14537.24973.579056.533282@anthem.cnri.reston.va.us> Message-ID: <14537.29695.532507.197580@anthem.cnri.reston.va.us> Just throwing out ideas. From DavidA@ActiveState.com Fri Mar 10 22:20:45 2000 From: DavidA@ActiveState.com (David Ascher) Date: Fri, 10 Mar 2000 14:20:45 -0800 Subject: [Python-Dev] finalization again In-Reply-To: Message-ID: Moshe, some _arguments_ backing your feelings might give them more weight... As they stand, they're just insults, and if I were Barry I'd ignore them. --david ascher Moshe Zadka: > Barry, there are other problems with your scheme, but I won't even try to > point those out: having to call a function whose purpose can only be > described in terms of a concrete implementation of a garbage collection > scheme is simply unacceptable. I can almost see you shouting "Come back > here, I'll bite your legs off" . > [...] > Oh? This sounds like the most horrendus mechanism alive.... From skip@mojam.com (Skip Montanaro) Fri Mar 10 22:40:02 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 10 Mar 2000 16:40:02 -0600 Subject: [Python-Dev] on the suitability of ideas tossed out to python-dev Message-ID: <200003102240.QAA07881@beluga.mojam.com> Folks, let's not forget that python-dev is a place where oftentimes half-baked ideas will get advanced. I came up with an idea about decoupling error handling from exception message strings. I don't expect my idea to be adopted as is. Similarly, Barry's ideas about object timestamps were admittedly conceived late at night in the thrill following an apparently good gig. (I like the idea that every object has a modtime, but for other reasons than Barry suggested.) My feeling is that bad ideas will get winnowed out or drastically modified quickly enough anyway. Think of these early ideas as little more than brainstorms. A lot of times if I have an idea, I feel I need to put it down on my virtual whiteboard quickly, because a) I often don't have a lot of time to pursue stuff (do it now or it won't get done), b) because bad ideas can be the catalyst for better ideas, and c) if I don't do it immediately, I'll probably forget the idea altogether, thus missing the opportunity for reason b altogether. Try and collect a bunch of ideas before shooting any down and see what falls out. The best ideas will survive. When people start proving things and using fancy diagrams like "a <=> b -> C", then go ahead and get picky... ;-) Have a relaxing, thought provoking weekend. I'm going to go see a movie this evening with my wife and youngest son, appropriately enough titled, "My Dog Skip". Enough Pythoneering for one day... bow-wow-ly y'rs, Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From guido@python.org Sat Mar 11 00:20:01 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 10 Mar 2000 19:20:01 -0500 Subject: [Python-Dev] Unicode patches checked in Message-ID: <200003110020.TAA17777@eric.cnri.reston.va.us> I've just checked in a massive patch from Marc-Andre Lemburg which adds Unicode support to Python. This work was financially supported by Hewlett-Packard. Marc-Andre has done a tremendous amount of work, for which I cannot thank him enough. We're still awaiting some more things: Marc-Andre gave me documentation patches which will be reviewed by Fred Drake before they are checked in; Fredrik Lundh has developed a new regular expression which is Unicode-aware and which should be checked in real soon now. Also, the documentation is probably incomplete and will be updated, and of course there may be bugs -- this should be considered alpha software. However, I believe it is quite good already, otherwise I wouldn't have checked it in! I'd like to invite everyone with an interest in Unicode or Python 1.6 to check out this new Unicode-aware Python, so that we can ensure a robust code base by the time Python 1.6 is released (planned release date: June 1, 2000). The download links are below. Links: http://www.python.org/download/cvs.html Instructions on how to get access to the CVS version. (David Ascher is making nightly tarballs of the CVS version available at http://starship.python.net/crew/da/pythondists/) http://starship.python.net/crew/lemburg/unicode-proposal.txt The latest version of the specification on which the Marc has based his implementation. http://www.python.org/sigs/i18n-sig/ Home page of the i18n-sig (Internationalization SIG), which has lots of other links about this and related issues. http://www.python.org/search/search_bugs.html The Python Bugs List. Use this for all bug reports. Note that next Tuesday I'm going on a 10-day trip, with limited time to read email and no time to solve problems. The usual crowd will take care of urgent updates. See you at the Intel Computing Continuum Conference in San Francisco or at the Python Track at Software Development 2000 in San Jose! --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one@email.msn.com Sat Mar 11 02:03:47 2000 From: tim_one@email.msn.com (Tim Peters) Date: Fri, 10 Mar 2000 21:03:47 -0500 Subject: [Python-Dev] Finalization in Eiffel Message-ID: <000701bf8afe$0a0fd800$a42d153f@tim> Eiffel is Bertrand Meyer's "design by contract" OO language. Meyer took extreme care in its design, and has written extensively and articulately about the design -- agree with him or not, he's always worth reading! I used Eiffel briefly a few years ago, just out of curiosity. I didn't recall even bumping into a notion of destructors. Turns out it does have them, but they're appallingly (whether relative to Eiffel's usual clarity, or even relative to C++'s usual lack thereof <0.9 wink>) ill-specified. An Eiffel class can register a destructor by inheriting from the system MEMORY class and overriding the latter's "dispose()". This appears to be viewed as a low-level facility, and neither OOSC (2nd ed) nor "Eiffel: The Language" say much about its semantics. Within dispose, you're explicitly discouraged from invoking methods on *any* other object, and resurrection is right out the window. But the language doesn't appear to check for any of that, which is extremely un-Eiffel-like. Many msgs on comp.lang.eiffel from people who should know suggest that all but one Eiffel implementation pay no attention at all to reachability during gc, and that none support resurrection. If you need ordering during finalization, the advice is to write that part in C/C++. Violations of the vague rules appear to lead to random system damage(!). Looking at various Eiffel pkgs on the web, the sole use of dispose was in one-line bodies that released external resources (like memory & db connections) via calling an external C/C++ function. jealous-&-appalled-at-the-same-time-ly y'rs - tim From tim_one@email.msn.com Sat Mar 11 02:03:50 2000 From: tim_one@email.msn.com (Tim Peters) Date: Fri, 10 Mar 2000 21:03:50 -0500 Subject: [Python-Dev] Conventional wisdom on finalization Message-ID: <000801bf8afe$0b3df7c0$a42d153f@tim> David Chase maintains a well-regarded GC FAQ, at http://www.iecc.com/gclist/GC-faq.html Interested folks should look it up. A couple highlights: On cycles with finalizers: In theory, of course, a cycle in the graph of objects to be finalized will prevent a topological sort from succeeding. In practice, the "right" thing to do appears to be to signal an error (at least when debugging) and let the programmer clean this up. People with experience on large systems report that such cycles are in fact exceedingly rare (note, however, that some languages define "finalizers" for almost every object, and that was not the case for the large systems studied -- there, finalizers were not too common). On Java's "finalizer called only once" rule: if an object is revived in finalization, that is fine, but its finalizer will not run a second time. (It isn't clear if this is a matter of design, or merely an accident of the first implementation of the language, but it is in the specification now. Obviously, this encourages careful use of finalization, in much the same way that driving without seatbelts encourages careful driving.) Until today, I had no idea I was so resolutely conventional . seems-we're-trying-to-do-more-than-anyone-other-than-us-expects-ly y'rs - tim From shichang@icubed.com" I would love to test the Python 1.6 (Unicode support) in Chinese language aspect, but I don't know where I can get a copy of OS that supports Chinese. Anyone can point me a direction? -----Original Message----- From: Guido van Rossum [SMTP:guido@python.org] Sent: Saturday, March 11, 2000 12:20 AM To: Python mailing list; python-announce@python.org; python-dev@python.org; i18n-sig@python.org; string-sig@python.org Cc: Marc-Andre Lemburg Subject: Unicode patches checked in I've just checked in a massive patch from Marc-Andre Lemburg which adds Unicode support to Python. This work was financially supported by Hewlett-Packard. Marc-Andre has done a tremendous amount of work, for which I cannot thank him enough. We're still awaiting some more things: Marc-Andre gave me documentation patches which will be reviewed by Fred Drake before they are checked in; Fredrik Lundh has developed a new regular expression which is Unicode-aware and which should be checked in real soon now. Also, the documentation is probably incomplete and will be updated, and of course there may be bugs -- this should be considered alpha software. However, I believe it is quite good already, otherwise I wouldn't have checked it in! I'd like to invite everyone with an interest in Unicode or Python 1.6 to check out this new Unicode-aware Python, so that we can ensure a robust code base by the time Python 1.6 is released (planned release date: June 1, 2000). The download links are below. Links: http://www.python.org/download/cvs.html Instructions on how to get access to the CVS version. (David Ascher is making nightly tarballs of the CVS version available at http://starship.python.net/crew/da/pythondists/) http://starship.python.net/crew/lemburg/unicode-proposal.txt The latest version of the specification on which the Marc has based his implementation. http://www.python.org/sigs/i18n-sig/ Home page of the i18n-sig (Internationalization SIG), which has lots of other links about this and related issues. http://www.python.org/search/search_bugs.html The Python Bugs List. Use this for all bug reports. Note that next Tuesday I'm going on a 10-day trip, with limited time to read email and no time to solve problems. The usual crowd will take care of urgent updates. See you at the Intel Computing Continuum Conference in San Francisco or at the Python Track at Software Development 2000 in San Jose! --Guido van Rossum (home page: http://www.python.org/~guido/) -- http://www.python.org/mailman/listinfo/python-list From Moshe Zadka Sat Mar 11 09:10:12 2000 From: Moshe Zadka (Moshe Zadka) Date: Sat, 11 Mar 2000 11:10:12 +0200 (IST) Subject: [Python-Dev] Unicode: When Things Get Hairy Message-ID: The following "problem" is easy to fix. However, what I wanted to know is if people (Skip and Guido most importantly) think it is a problem: >>> "a" in u"bbba" 1 >>> u"a" in "bbba" Traceback (innermost last): File "", line 1, in ? TypeError: string member test needs char left operand Suggested fix: in stringobject.c, explicitly allow a unicode char left operand. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From mal@lemburg.com Sat Mar 11 10:24:26 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 11 Mar 2000 11:24:26 +0100 Subject: [Python-Dev] Unicode: When Things Get Hairy References: Message-ID: <38CA1EDA.423F8A2C@lemburg.com> Moshe Zadka wrote: > > The following "problem" is easy to fix. However, what I wanted to know is > if people (Skip and Guido most importantly) think it is a problem: > > >>> "a" in u"bbba" > 1 > >>> u"a" in "bbba" > Traceback (innermost last): > File "", line 1, in ? > TypeError: string member test needs char left operand > > Suggested fix: in stringobject.c, explicitly allow a unicode char left > operand. Hmm, this must have been introduced by your contains code... it did work before. The normal action taken by the Unicode and the string code in these mixed type situations is to first convert everything to Unicode and then retry the operation. Strings are interpreted as UTF-8 during this conversion. To simplify this task, I added method APIs to the Unicode object which do the conversion for you (they apply all the necessariy coercion business to all arguments). I guess adding another PyUnicode_Contains() wouldn't hurt :-) Perhaps I should also add a tp_contains slot to the Unicode object which then uses the above API as well. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Moshe Zadka Sat Mar 11 11:05:48 2000 From: Moshe Zadka (Moshe Zadka) Date: Sat, 11 Mar 2000 13:05:48 +0200 (IST) Subject: [Python-Dev] Unicode: When Things Get Hairy In-Reply-To: <38CA1EDA.423F8A2C@lemburg.com> Message-ID: On Sat, 11 Mar 2000, M.-A. Lemburg wrote: > Hmm, this must have been introduced by your contains code... > it did work before. Nope: the string "in" semantics were forever special-cased. Guido beat me soundly for trying to change the semantics... > The normal action taken by the Unicode and the string > code in these mixed type situations is to first > convert everything to Unicode and then retry the operation. > Strings are interpreted as UTF-8 during this conversion. Hmmm....PySeqeunce_Contains doesn't do any conversion of the arguments. Should it? (Again, it didn't before). If it does, then the order of testing for seq_contains and seq_getitem and conversions > Perhaps I should also add a tp_contains slot to the > Unicode object which then uses the above API as well. But that wouldn't help at all for u"a" in "abbbb" PySequence_Contains only dispatches on the container argument :-( (BTW: I discovered it while contemplating adding a seq_contains (not tp_contains) to unicode objects to optimize the searching for a bit.) PS: MAL: thanks for the a great birthday present! I'm enjoying the unicode patch a lot. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From guido@python.org Sat Mar 11 12:16:06 2000 From: guido@python.org (Guido van Rossum) Date: Sat, 11 Mar 2000 07:16:06 -0500 Subject: [Python-Dev] Unicode: When Things Get Hairy In-Reply-To: Your message of "Sat, 11 Mar 2000 13:05:48 +0200." References: Message-ID: <200003111216.HAA12651@eric.cnri.reston.va.us> [Moshe discovers that u"a" in "bbba" raises TypeError] [Marc-Andre] > > Hmm, this must have been introduced by your contains code... > > it did work before. > > Nope: the string "in" semantics were forever special-cased. Guido beat me > soundly for trying to change the semantics... But I believe that Marc-Andre added a special case for Unicode in PySequence_Contains. I looked for evidence, but the last snapshot that I actually saved and built before Moshe's code was checked in is from 2/18 and it isn't in there. Yet I believe Marc-Andre. The special case needs to be added back to string_contains in stringobject.c. > > The normal action taken by the Unicode and the string > > code in these mixed type situations is to first > > convert everything to Unicode and then retry the operation. > > Strings are interpreted as UTF-8 during this conversion. > > Hmmm....PySeqeunce_Contains doesn't do any conversion of the arguments. > Should it? (Again, it didn't before). If it does, then the order of > testing for seq_contains and seq_getitem and conversions Or it could be done this way. > > Perhaps I should also add a tp_contains slot to the > > Unicode object which then uses the above API as well. Yes. > But that wouldn't help at all for > > u"a" in "abbbb" It could if PySeqeunce_Contains would first look for a string and a unicode argument (in either order) and in that case convert the string to unicode. > PySequence_Contains only dispatches on the container argument :-( > > (BTW: I discovered it while contemplating adding a seq_contains (not > tp_contains) to unicode objects to optimize the searching for a bit.) You may beat Marc-Andre to it, but I'll have to let him look at the code anyway -- I'm not sufficiently familiar with the Unicode stuff myself yet. BTW, I added a tag "pre-unicode" to the CVS tree to the revisions before the Unicode changes were made. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Sat Mar 11 13:32:57 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 11 Mar 2000 14:32:57 +0100 Subject: [Python-Dev] Unicode: When Things Get Hairy References: <200003111216.HAA12651@eric.cnri.reston.va.us> Message-ID: <38CA4B08.7B13438D@lemburg.com> Guido van Rossum wrote: > > [Moshe discovers that u"a" in "bbba" raises TypeError] > > [Marc-Andre] > > > Hmm, this must have been introduced by your contains code... > > > it did work before. > > > > Nope: the string "in" semantics were forever special-cased. Guido beat me > > soundly for trying to change the semantics... > > But I believe that Marc-Andre added a special case for Unicode in > PySequence_Contains. I looked for evidence, but the last snapshot that > I actually saved and built before Moshe's code was checked in is from > 2/18 and it isn't in there. Yet I believe Marc-Andre. The special > case needs to be added back to string_contains in stringobject.c. Moshe was right: I had probably not checked the code because the obvious combinations worked out of the box... the only combination which doesn't work is "unicode in string". I'll fix it next week. BTW, there's a good chance that the string/Unicode integration is not complete yet: just keep looking for them. > > > The normal action taken by the Unicode and the string > > > code in these mixed type situations is to first > > > convert everything to Unicode and then retry the operation. > > > Strings are interpreted as UTF-8 during this conversion. > > > > Hmmm....PySeqeunce_Contains doesn't do any conversion of the arguments. > > Should it? (Again, it didn't before). If it does, then the order of > > testing for seq_contains and seq_getitem and conversions > > Or it could be done this way. > > > > Perhaps I should also add a tp_contains slot to the > > > Unicode object which then uses the above API as well. > > Yes. > > > But that wouldn't help at all for > > > > u"a" in "abbbb" > > It could if PySeqeunce_Contains would first look for a string and a > unicode argument (in either order) and in that case convert the string > to unicode. I think the right way to do this is to add a special case to seq_contains in the string implementation. That's how most other auto-coercions work too. Instead of raising an error, the implementation would then delegate the work to PyUnicode_Contains(). > > PySequence_Contains only dispatches on the container argument :-( > > > > (BTW: I discovered it while contemplating adding a seq_contains (not > > tp_contains) to unicode objects to optimize the searching for a bit.) > > You may beat Marc-Andre to it, but I'll have to let him look at the > code anyway -- I'm not sufficiently familiar with the Unicode stuff > myself yet. I'll add that one too. BTW, Happy Birthday, Moshe :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Sat Mar 11 13:57:34 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 11 Mar 2000 14:57:34 +0100 Subject: [Python-Dev] Unicode: When Things Get Hairy References: <200003111216.HAA12651@eric.cnri.reston.va.us> <38CA4B08.7B13438D@lemburg.com> Message-ID: <38CA50CE.BEEFAB5E@lemburg.com> This is a multi-part message in MIME format. --------------56A130F1FCAC300009B200AD Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit I couldn't resist :-) Here's the patch... BTW, how should we proceed with future patches ? Should I wrap them together about once a week, or send them as soon as they are done ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ --------------56A130F1FCAC300009B200AD Content-Type: text/plain; charset=us-ascii; name="Unicode-Implementation-2000-03-11.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="Unicode-Implementation-2000-03-11.patch" diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Include/unicodeobject.h Python+Unicode/Include/unicodeobject.h --- CVS-Python/Include/unicodeobject.h Fri Mar 10 23:33:05 2000 +++ Python+Unicode/Include/unicodeobject.h Sat Mar 11 14:45:59 2000 @@ -683,6 +683,17 @@ PyObject *args /* Argument tuple or dictionary */ ); +/* Checks whether element is contained in container and return 1/0 + accordingly. + + element has to coerce to an one element Unicode string. -1 is + returned in case of an error. */ + +extern DL_IMPORT(int) PyUnicode_Contains( + PyObject *container, /* Container string */ + PyObject *element /* Element string */ + ); + /* === Characters Type APIs =============================================== */ /* These should not be used directly. Use the Py_UNICODE_IS* and diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_unicode.py Python+Unicode/Lib/test/test_unicode.py --- CVS-Python/Lib/test/test_unicode.py Sat Mar 11 00:23:20 2000 +++ Python+Unicode/Lib/test/test_unicode.py Sat Mar 11 14:52:29 2000 @@ -219,6 +219,19 @@ test('translate', u"abababc", u'iiic', {ord('a'):None, ord('b'):ord('i')}) test('translate', u"abababc", u'iiix', {ord('a'):None, ord('b'):ord('i'), ord('c'):u'x'}) +# Contains: +print 'Testing Unicode contains method...', +assert ('a' in 'abdb') == 1 +assert ('a' in 'bdab') == 1 +assert ('a' in 'bdaba') == 1 +assert ('a' in 'bdba') == 1 +assert ('a' in u'bdba') == 1 +assert (u'a' in u'bdba') == 1 +assert (u'a' in u'bdb') == 0 +assert (u'a' in 'bdb') == 0 +assert (u'a' in 'bdba') == 1 +print 'done.' + # Formatting: print 'Testing Unicode formatting strings...', assert u"%s, %s" % (u"abc", "abc") == u'abc, abc' diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Misc/unicode.txt Python+Unicode/Misc/unicode.txt --- CVS-Python/Misc/unicode.txt Sat Mar 11 00:14:11 2000 +++ Python+Unicode/Misc/unicode.txt Sat Mar 11 14:53:37 2000 @@ -743,8 +743,9 @@ stream codecs as available through the codecs module should be used. -XXX There should be a short-cut open(filename,mode,encoding) available which - also assures that mode contains the 'b' character when needed. +The codecs module should provide a short-cut open(filename,mode,encoding) +available which also assures that mode contains the 'b' character when +needed. File/Stream Input: @@ -810,6 +811,10 @@ Introduction to Unicode (a little outdated by still nice to read): http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html +For comparison: + Introducing Unicode to ECMAScript -- + http://www-4.ibm.com/software/developer/library/internationalization-support.html + Encodings: Overview: @@ -832,7 +837,7 @@ History of this Proposal: ------------------------- -1.2: +1.2: Removed POD about codecs.open() 1.1: Added note about comparisons and hash values. Added note about case mapping algorithms. Changed stream codecs .read() and .write() method to match the standard file-like object methods diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/stringobject.c Python+Unicode/Objects/stringobject.c --- CVS-Python/Objects/stringobject.c Sat Mar 11 10:55:09 2000 +++ Python+Unicode/Objects/stringobject.c Sat Mar 11 14:47:45 2000 @@ -389,7 +389,9 @@ { register char *s, *end; register char c; - if (!PyString_Check(el) || PyString_Size(el) != 1) { + if (!PyString_Check(el)) + return PyUnicode_Contains(a, el); + if (PyString_Size(el) != 1) { PyErr_SetString(PyExc_TypeError, "string member test needs char left operand"); return -1; diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/unicodeobject.c Python+Unicode/Objects/unicodeobject.c --- CVS-Python/Objects/unicodeobject.c Fri Mar 10 23:53:23 2000 +++ Python+Unicode/Objects/unicodeobject.c Sat Mar 11 14:48:52 2000 @@ -2737,6 +2737,49 @@ return -1; } +int PyUnicode_Contains(PyObject *container, + PyObject *element) +{ + PyUnicodeObject *u = NULL, *v = NULL; + int result; + register const Py_UNICODE *p, *e; + register Py_UNICODE ch; + + /* Coerce the two arguments */ + u = (PyUnicodeObject *)PyUnicode_FromObject(container); + if (u == NULL) + goto onError; + v = (PyUnicodeObject *)PyUnicode_FromObject(element); + if (v == NULL) + goto onError; + + /* Check v in u */ + if (PyUnicode_GET_SIZE(v) != 1) { + PyErr_SetString(PyExc_TypeError, + "string member test needs char left operand"); + goto onError; + } + ch = *PyUnicode_AS_UNICODE(v); + p = PyUnicode_AS_UNICODE(u); + e = p + PyUnicode_GET_SIZE(u); + result = 0; + while (p < e) { + if (*p++ == ch) { + result = 1; + break; + } + } + + Py_DECREF(u); + Py_DECREF(v); + return result; + +onError: + Py_XDECREF(u); + Py_XDECREF(v); + return -1; +} + /* Concat to string or Unicode object giving a new Unicode object. */ PyObject *PyUnicode_Concat(PyObject *left, @@ -3817,6 +3860,7 @@ (intintargfunc) unicode_slice, /* sq_slice */ 0, /* sq_ass_item */ 0, /* sq_ass_slice */ + (objobjproc)PyUnicode_Contains, /*sq_contains*/ }; static int --------------56A130F1FCAC300009B200AD-- From tim_one@email.msn.com Sat Mar 11 20:10:23 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 11 Mar 2000 15:10:23 -0500 Subject: [Python-Dev] finalization again In-Reply-To: <14536.30810.720836.886023@anthem.cnri.reston.va.us> Message-ID: <000e01bf8b95$d52939e0$c72d153f@tim> [Barry A. Warsaw, jamming after hours] > ... > What if you timestamp instances when you create them? Then when you > have trash cycles with finalizers, you sort them and finalize in > chronological order. Well, I strongly agree that would be better than finalizing them in increasing order of storage address . > ... > - FIFO order /seems/ more natural to me than FILO, Forget cycles for a moment, and consider just programs that manipulate *immutable* containers (the simplest kind to think about): at the time you create an immutable container, everything *contained* must already be in existence, so every pointer goes from a newer object (container) to an older one (containee). This is the "deep" reason for why, e.g., you can't build a cycle out of pure tuples in Python (if every pointer goes new->old, you can't get a loop, else each node in the loop would be (transitively) older than itself!). Then, since a finalizer can see objects pointed *to*, a finalizer can see only older objects. Since it's desirable that a finalizer see only wholly intact (unfinalized) objects, it is in fact the oldest object ("first in") that needs to be cleaned up last ("last out"). So, under the assumption of immutability, FILO is sufficient, but FIFO dangerous. So your muse inflamed you with an interesting tune, but you fingered the riff backwards . One problem is that it all goes out the window as soon as mutation is allowed. It's *still* desirable that a finalizer see only unfinalized objects, but in the presence of mutation that no longer bears any relationship to relative creation time. Another problem is in Guido's directory example, which we can twist to view as an "immutable container" problem that builds its image of the directory bottom-up, and where a finalizer on each node tries to remove the file (or delete the directory, whichever the node represents). In this case the physical remove/delete/unlink operations have to follow a *postorder* traversal of the container tree, so that "finalizer sees only unfinalized objects" is the opposite of what the app needs! The lesson to take from that is that the implementation can't possibly guess what ordering an app may need in a fancy finalizer. At best it can promise to follow a "natural" ordering based on the points-to relationship, and while "finalizer sees only unfinalized objects" is at least clear, it's quite possibly unhelpful (in Guido's particular case, it *can* be exploited, though, by adding a postorder remove/delete/unlink method to nodes, and explicitly calling it from __del__ -- "the rules" guarantee that the root of the tree will get finalized first, and the code can rely on that in its own explicit postorder traversal). > but then I rarely create cyclic objects, and almost never use __del__, > so this whole argument has been somewhat academic to me :). Well, not a one of us creates cycles often in CPython today, simply because we don't want to track down leaks <0.5 wink>. It seems that nobody here uses __del__ much, either; indeed, my primary use of __del__ is simply to call an explicit break_cycles() function from the header node of a graph! The need for that goes away as soon as Python reclaims cycles by itself, and I may never use __del__ at all then in the vast bulk of my code. It's because we've seen no evidence here (and also that I've seen none elsewhere either) that *anyone* is keen on mixing cycles with finalizers that I've been so persistent in saying "screw it -- let it leak, but let the user get at it if they insist on doing it". Seems we're trying to provide slick support for something nobody wants to do. If it happens by accident anyway, well, people sometimes divide by 0 by accident too <0.0 wink>: give them a way to know about it, but don't move heaven & earth trying to treat it like a normal case. if-it-were-easy-to-implement-i-wouldn't-care-ly y'rs - tim From Moshe Zadka Sat Mar 11 20:35:43 2000 From: Moshe Zadka (Moshe Zadka) Date: Sat, 11 Mar 2000 22:35:43 +0200 (IST) Subject: [Python-Dev] finalization again In-Reply-To: <000e01bf8b95$d52939e0$c72d153f@tim> Message-ID: In a continuation (yes, a dangerous word in these parts) of the timbot's looks at the way other languages handle finalization, let me add something from the Sather manual I'm now reading (when I'm done with it, you'll see me begging for iterators here, and having some weird ideas in the types-sig): =============================== Finalization will only occur once, even if new references are created to the object during finalization. Because few guarantees can be made about the environment in which finalization occurs, finalization is considered dangerous and should only be used in the rare cases that conventional coding will not suffice. =============================== (Sather is garbage-collected, BTW) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From tim_one@email.msn.com Sat Mar 11 20:51:47 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 11 Mar 2000 15:51:47 -0500 Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler In-Reply-To: <200003101528.JAA15951@beluga.mojam.com> Message-ID: <001001bf8b9b$9e09d720$c72d153f@tim> [Skip Montanaro, with an expression that may raise TypeError for any of several distinct reasons, and wants to figure out which one after the fact] The existing exception machinery is sufficiently powerful for building a solution, so nothing new is needed in the language. What you really need here is an exhaustive list of all exceptions the language can raise, and when, and why, and a formally supported "detail" field (whether numeric id or string or whatever) that you can rely on to tell them apart at runtime. There are at least a thousand cases that need to be so documented and formalized. That's why not a one of them is now <0.9 wink>. If P3K is a rewrite from scratch, a rational scheme could be built in from the start. Else it would seem to require a volunteer with even less of a life than us . From tim_one@email.msn.com Sat Mar 11 20:51:49 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 11 Mar 2000 15:51:49 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <38C965C4.B164C2D5@interet.com> Message-ID: <001101bf8b9b$9f37f6e0$c72d153f@tim> [James C. Ahlstrom] > Well, it looks like this thread has fizzled out. But what did we > decide? Far as I could tell, nothing specific. > ... > I feel Tim's original complaint that popen() is a Problem > still hasn't been fixed. I was passing it on from MikeF's c.l.py posting. This isn't a new problem, of course, it just drags on year after year -- which is the heart of MikeF's gripe. People have code that *does* work, but for whatever reasons it never gets moved to the core. In the meantime, the Library Ref implies the broken code that is in the core does work. One or the other has to change, and it looks most likely to me that Fred will change the docs for 1.6. While not ideal, that would be a huge improvement over the status quo. luckily-few-people-expect-windows-to-work-anyway<0.9-wink>-ly y'rs - tim From mhammond@skippinet.com.au Mon Mar 13 03:50:35 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Mon, 13 Mar 2000 14:50:35 +1100 Subject: [Python-Dev] string.replace behaviour change since Unicode patch. Message-ID: Hi, After applying the Unicode changes string.replace() seems to have changed its behaviour: Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> import string >>> string.replace("foo\nbar", "\n", "") 'foobar' >>> But since the Unicode update: Python 1.5.2+ (#0, Feb 2 2000, 16:46:55) [MSC 32 bit (Intel)] on win32 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> import string >>> string.replace("foo\nbar", "\n", "") Traceback (innermost last): File "", line 1, in ? File "L:\src\python-cvs\lib\string.py", line 407, in replace return s.replace(old, new, maxsplit) ValueError: empty replacement string >>> The offending check is stringmodule.c, line 1578: if (repl_len <= 0) { PyErr_SetString(PyExc_ValueError, "empty replacement string"); return NULL; } Changing the check to "< 0" fixes the immediate problem, but it is unclear why the check was added at all, so I didnt bother submitting a patch... Mark. From mal@lemburg.com Mon Mar 13 09:13:50 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 13 Mar 2000 10:13:50 +0100 Subject: [Python-Dev] string.replace behaviour change since Unicode patch. References: Message-ID: <38CCB14D.C07ACC26@lemburg.com> Mark Hammond wrote: > > Hi, > After applying the Unicode changes string.replace() seems to have changed > its behaviour: > > Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> import string > >>> string.replace("foo\nbar", "\n", "") > 'foobar' > >>> > > But since the Unicode update: > > Python 1.5.2+ (#0, Feb 2 2000, 16:46:55) [MSC 32 bit (Intel)] on win32 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> import string > >>> string.replace("foo\nbar", "\n", "") > Traceback (innermost last): > File "", line 1, in ? > File "L:\src\python-cvs\lib\string.py", line 407, in replace > return s.replace(old, new, maxsplit) > ValueError: empty replacement string > >>> > > The offending check is stringmodule.c, line 1578: > if (repl_len <= 0) { > PyErr_SetString(PyExc_ValueError, "empty replacement string"); > return NULL; > } > > Changing the check to "< 0" fixes the immediate problem, but it is unclear > why the check was added at all, so I didnt bother submitting a patch... Dang. Must have been my mistake -- it should read: if (sub_len <= 0) { PyErr_SetString(PyExc_ValueError, "empty pattern string"); return NULL; } Thanks for reporting this... I'll include the fix in the next patch set. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake@acm.org Mon Mar 13 15:43:09 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 13 Mar 2000 10:43:09 -0500 (EST) Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <001101bf8b9b$9f37f6e0$c72d153f@tim> References: <38C965C4.B164C2D5@interet.com> <001101bf8b9b$9f37f6e0$c72d153f@tim> Message-ID: <14541.3213.590243.359394@weyr.cnri.reston.va.us> Tim Peters writes: > code that is in the core does work. One or the other has to change, and it > looks most likely to me that Fred will change the docs for 1.6. While not > ideal, that would be a huge improvement over the status quo. Actually, I just checked in my proposed change for the 1.5.2 doc update that I'm releasing soon. I'd like to remove it for 1.6, if the appropriate implementation is moved into the core. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gvwilson@nevex.com Mon Mar 13 21:10:52 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Mon, 13 Mar 2000 16:10:52 -0500 (EST) Subject: [Python-Dev] Python 1.7 tokenization feature request Message-ID: Once 1.6 is out the door, would people be willing to consider extending Python's token set to make HTML/XML-ish spellings using entity references legal? This would make the following 100% legal Python: i = 0 while i < 10: print i & 1 i = i + 1 which would in turn make it easier to embed Python in XML such as config-files-for-whatever-Software-Carpentry-produces-to-replace-make, PMZ, and so on. Greg From skip@mojam.com (Skip Montanaro) Mon Mar 13 21:23:17 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 13 Mar 2000 15:23:17 -0600 (CST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: References: Message-ID: <14541.23621.89087.357783@beluga.mojam.com> Greg> Once 1.6 is out the door, would people be willing to consider Greg> extending Python's token set to make HTML/XML-ish spellings using Greg> entity references legal? This would make the following 100% legal Greg> Python: Greg> i = 0 Greg> while i < 10: Greg> print i & 1 Greg> i = i + 1 What makes it difficult to pump your Python code through cgi.escape when embedding it? There doesn't seem to be an inverse function to cgi.escape (at least not in the cgi module), but I suspect it could rather easily be written. -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From akuchlin@mems-exchange.org Mon Mar 13 21:23:29 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Mon, 13 Mar 2000 16:23:29 -0500 (EST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: References: Message-ID: <14541.23633.873411.86833@amarok.cnri.reston.va.us> gvwilson@nevex.com writes: >Once 1.6 is out the door, would people be willing to consider extendin= g >Python's token set to make HTML/XML-ish spellings using entity referen= ces >legal? This would make the following 100% legal Python: > >i =3D 0 >while i < 10: > print i & 1 > i =3D i + 1 I don't think that would be sufficient. What about user-defined entities, as in résultat =3D max(a,b)? (r=9Bsultat, in French.)= Would Python have to also parse a DTD from somewhere? What about other places when Python and XML syntax collide, as in this contrived example: b: print ... Oops! The ]]> looks like the end of the CDATA section, but it's legal Python code. IMHO whatever tool is outputting the XML should handle escaping wacky characters in the Python code, which will be undone by the parser when the XML gets parsed. Users certainly won't be writing this XML by hand; writing 'if (i < 10)' is very strange. --=20 A.M. Kuchling=09=09=09http://starship.python.net/crew/amk/ Art history is the nightmare from which art is struggling to awake. -- Robert Fulford From gvwilson@nevex.com Mon Mar 13 21:58:27 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Mon, 13 Mar 2000 16:58:27 -0500 (EST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: <14541.23633.873411.86833@amarok.cnri.reston.va.us> Message-ID: > >Greg Wilson wrote: > >...would people be willing to consider extending > >Python's token set to make HTML/XML-ish spellings using entity reference= s > >legal? > > > >i =3D 0 > >while i < 10: > > print i & 1 > > i =3D i + 1 > Skip Montanaro wrote: > What makes it difficult to pump your Python code through cgi.escape when > embedding it? Most non-programmers use WYSIWYG editor, and many of these are moving toward XML-compliant formats. Parsing the standard character entities seemed like a good first step toward catering to this (large) audience. > Andrew Kuchling wrote: > I don't think that would be sufficient. What about user-defined > entities, as in résultat =3D max(a,b)? (r=9Bsultat, in French.) > Would Python have to also parse a DTD from somewhere? Longer term, I believe that someone is going to come out with a programming language that (finally) leaves the flat-ASCII world behind, and lets people use the structuring mechanisms (e.g. XML) that we have developed for everyone else's data. I think it would be to Python's advantage to be first, and if I'm wrong, there's little harm done. User-defined entities, DTD's, and the like are probably part of that, but I don't think I know enough to know what to ask for. Escaping the standard entites seems like an easy start. > Andrew Kuchling also wrote: > What about other places when Python and XML syntax collide, as in this > contrived example: >=20 > # Python code starts here > if a[index[1]]>b: > print ... >=20 > Oops! The ]]> looks like the end of the CDATA section, but it's legal > Python code. Yup; that's one of the reasons I'd like to be able to write: # Python code starts here if a[index[1]]>b: print ... > Users certainly won't be writing this XML by hand; writing 'if (i < > 10)' is very strange. I'd expect my editor to put '<' in the file when I press the '<' key, and to display '<' on the screen when viewing the file. thanks, Greg From beazley@rustler.cs.uchicago.edu Mon Mar 13 22:35:24 2000 From: beazley@rustler.cs.uchicago.edu (David M. Beazley) Date: Mon, 13 Mar 2000 16:35:24 -0600 (CST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: References: Message-ID: <200003132235.QAA08031@rustler.cs.uchicago.edu> gvwilson@nevex.com writes: > Once 1.6 is out the door, would people be willing to consider extending > Python's token set to make HTML/XML-ish spellings using entity references > legal? This would make the following 100% legal Python: > > i = 0 > while i < 10: > print i & 1 > i = i + 1 > > which would in turn make it easier to embed Python in XML such as > config-files-for-whatever-Software-Carpentry-produces-to-replace-make, > PMZ, and so on. > Sure, and while we're at it, maybe we can add support for C trigraph sequences as well. Maybe I'm missing the point, but why can't you just use a filter (cgi.escape() or something comparable)? I for one, am *NOT* in favor of complicating the Python parser in this most bogus manner. Furthermore, with respect to the editor argument, I can't think of a single reason why any sane programmer would be writing programs in Microsoft Word or whatever it is that you're talking about. Therefore, I don't think that the Python parser should be modified in any way to account for XML tags, entities, or other extraneous markup that's not part of the core language. I know that I, for one, would be extremely pissed if I fired up emacs and had to maintain someone else's code that had all of this garbage in it. Just my 0.02. -- Dave From gvwilson@nevex.com Mon Mar 13 22:48:33 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Mon, 13 Mar 2000 17:48:33 -0500 (EST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: <200003132235.QAA08031@rustler.cs.uchicago.edu> Message-ID: > David M. Beazley wrote: > ...and while we're at it, maybe we can add support for C trigraph > sequences as well. I don't know of any mass-market editors that generate C trigraphs. > ...I can't think of a single reason why any sane programmer would be > writing programs in Microsoft Word or whatever it is that you're > talking about. 'S funny --- my non-programmer friends can't figure out why any sane person would use a glorified glass TTY like emacs... or why they should have to, just to program... I just think that someone's going to do this for some language, some time soon, and I'd rather Python be in the lead than play catch-up. Thanks, Greg From Fredrik Lundh" Message-ID: <00ca01bf8d42$6a154500$34aab5d4@hagrid> Greg wrote: > > ...I can't think of a single reason why any sane programmer would be > > writing programs in Microsoft Word or whatever it is that you're > > talking about. >=20 > 'S funny --- my non-programmer friends can't figure out why any sane > person would use a glorified glass TTY like emacs... or why they = should > have to, just to program... I just think that someone's going to do = this > for some language, some time soon, and I'd rather Python be in the = lead > than play catch-up. I don't get it. the XML specification contains a lot of stuff, and I completely fail to see how adding support for a very small part of XML would make it possible to use XML editors to write Python code. what am I missing? From DavidA@ActiveState.com Mon Mar 13 23:15:25 2000 From: DavidA@ActiveState.com (David Ascher) Date: Mon, 13 Mar 2000 15:15:25 -0800 Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: Message-ID: > 'S funny --- my non-programmer friends can't figure out why any sane > person would use a glorified glass TTY like emacs... or why they should > have to, just to program... I just think that someone's going to do this > for some language, some time soon, and I'd rather Python be in the lead > than play catch-up. But the scheme you put forth causes major problems for current Python users who *are* using glass TTYs, so I don't think it'll fly for very basic political reasons nicely illustrated by Dave-the-diplomat's response. While storage of Python files in XML documents is a good thing, it's hard to see why XML should be viewed as the only storage format for Python files. I think a much richer XML schema could be useful in some distant future: ... What might be more useful in the short them IMO is to define a _standard_ mechanism for Python-in-XML encoding/decoding, so that all code which encodes Python in XML is done the same way, and so that XML editors can figure out once and for all how to decode Python-in-CDATA. Strawman Encoding # 1: replace < with < and > with > when not in strings, and vice versa on the decoding side. Strawman Encoding # 2: - do Strawman 1, AND - replace space-determined indentation with { and } tokens or other INDENT and DEDENT markers using some rare Unicode characters to work around inevitable bugs in whitespace handling of XML processors. --david From gvwilson@nevex.com Mon Mar 13 23:14:43 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Mon, 13 Mar 2000 18:14:43 -0500 (EST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: Message-ID: > David Ascher wrote: > But the scheme you put forth causes major problems for current Python > users who *are* using glass TTYs, so I don't think it'll fly for very > basic political reasons nicely illustrated by Dave's response. Understood. I thought that handling standard entities might be a useful first step toward storage of Python as XML, which in turn would help make Python more accessible to people who don't want to switch editors just to program. I felt that an all-or-nothing approach would be even less likely to get a favorable response than handling entities... :-) Greg From beazley@rustler.cs.uchicago.edu Mon Mar 13 23:12:55 2000 From: beazley@rustler.cs.uchicago.edu (David M. Beazley) Date: Mon, 13 Mar 2000 17:12:55 -0600 (CST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: References: <200003132235.QAA08031@rustler.cs.uchicago.edu> Message-ID: <200003132312.RAA08107@rustler.cs.uchicago.edu> gvwilson@nevex.com writes: > > 'S funny --- my non-programmer friends can't figure out why any sane > person would use a glorified glass TTY like emacs... or why they should > have to, just to program... Look, I'm all for CP4E and making programming more accessible to the masses, but as a professional programmer, I frankly do not care what non-programmers think about the tools that I (and most of the programming world) use to write software. Furthermore, if all of your non-programmer friends don't want to care about the underlying details, they certainly won't care how programs are represented---including a nice and *simple* text representation without markup, entities, and other syntax that is not an essential part of the language. However, as a professional, I most certainly DO care about how programs are represented--specifically, I want to be able to move them around between machines. Edit them with essentially any editor, transform them as I see fit, and be able to easily read them and have a sense of what is going on. Markup is just going to make this a huge pain in the butt. No, I'm not for this idea one bit. Sorry. > I just think that someone's going to do this > for some language, some time soon, and I'd rather Python be in the lead > than play catch-up. What gives you the idea that Python is behind? What is it playing catch up to? -- Dave From DavidA@ActiveState.com Mon Mar 13 23:36:54 2000 From: DavidA@ActiveState.com (David Ascher) Date: Mon, 13 Mar 2000 15:36:54 -0800 Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: Message-ID: > > David Ascher wrote: > > But the scheme you put forth causes major problems for current Python > > users who *are* using glass TTYs, so I don't think it'll fly for very > > basic political reasons nicely illustrated by Dave's response. > > Understood. I thought that handling standard entities might be a > useful first step toward storage of Python as XML, which in turn would > help make Python more accessible to people who don't want to switch > editors just to program. I felt that an all-or-nothing approach would be > even less likely to get a favorable response than handling entities... :-) > > Greg If you propose a transformation between Python Syntax and XML, then you potentially have something which all parties can agree to as being a good thing. Forcing one into the other is denying the history and current practices of both domains and user populations. You cannot ignore the fact that "I can read anyone's Python" is a key selling point of Python among its current practitioners, or that its cleanliness and lack of magic characters ($ is usually invoked, but < is just as magic/ugly) are part of its appeal/success. No XML editor is going to edit all XML documents without custom editors anyway! I certainly don't expect to be drawing SVG diagrams with a keyboard! That's what schemas and custom editors are for. Define a schema for 'encoded Python' (well, first, find a schema notation that will survive), write a plugin to your favorite XML editor, and then your (theoretical? =) users can use the same 'editor' to edit PythonXML or any other XML. Most XML probably won't be edited with a keyboard but with a pointing device or a speech recognizer anyway... IMO, you're being seduced by the apparent closeness between XML and Python-in-ASCII. It's only superficial... Think of Python-in-ASCII as a rendering of Python-in-XML, Dave will think of Python-in-XML as a rendering of Python-in-ASCII, and everyone will be happy (as long as everyone agrees on the one-to-one transformation). --david From paul@prescod.net Mon Mar 13 23:43:48 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 13 Mar 2000 15:43:48 -0800 Subject: [Python-Dev] Python 1.7 tokenization feature request References: Message-ID: <38CD7D34.6569C1AA@prescod.net> You should use your entities in the XML files, and then whatever application actually launches Python (PMZ, your make engine, XMetaL) could decode the data and launch Python. This is already how it works in XMetaL. I've just reinstalled recently so I don't have my macro file. Therefore, please excuse the Javascript (not Python) example. This is in "journalist.mcr" in the "Macros" folder of XMetaL. This already works fine for Python. You change lang="Python" and thanks to the benevalence of Bill Gates and the hard work of Mark Hammond, you can use Python for XMetaL macros. It doesn't work perfectly: exceptions crash XMetaL, last I tried. As long as you don't make mistakes, everything works nicely. :) You can write XMetaL macros in Python and the whole thing is stored as XML. Still, XMetaL is not very friendly as a Python editor. It doesn't have nice whitespace handling! -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Out of timber so crooked as that which man is made nothing entirely straight can be built. - Immanuel Kant From paul@prescod.net Mon Mar 13 23:59:23 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 13 Mar 2000 15:59:23 -0800 Subject: [Python-Dev] Python 1.7 tokenization feature request References: Message-ID: <38CD80DB.39150F33@prescod.net> gvwilson@nevex.com wrote: > > 'S funny --- my non-programmer friends can't figure out why any sane > person would use a glorified glass TTY like emacs... or why they should > have to, just to program... I just think that someone's going to do this > for some language, some time soon, and I'd rather Python be in the lead > than play catch-up. Your goal is worth pursuing but I agree with the others that the syntax change is not the right way. It _is_ possible to teach XMetaL to edit Python programs -- structurally -- just as it does XML. What you do is hook into the macro engine (which already supports Python) and use the Python tokenizer to build a parse tree. You copy that into a DOM using the same elements and attributes you would use if you were doing some kind of batch conversion. Then on "save" you reverse the process. Implementation time: ~3 days. The XMetaL competitor, Documentor has an API specifically designed to make this sort of thing easy. Making either of them into a friendly programmer's editor is a much larger task. I think this is where the majority of the R&D should occur, not at the syntax level. If one invents a fundamentally better way of working with the structures behind Python code, then it would be relatively easy to write code that maps that to today's Python syntax. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Out of timber so crooked as that which man is made nothing entirely straight can be built. - Immanuel Kant From Moshe Zadka Tue Mar 14 01:14:09 2000 From: Moshe Zadka (Moshe Zadka) Date: Tue, 14 Mar 2000 03:14:09 +0200 (IST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: Message-ID: On Mon, 13 Mar 2000 gvwilson@nevex.com wrote: > Once 1.6 is out the door, would people be willing to consider extending > Python's token set to make HTML/XML-ish spellings using entity references > legal? This would make the following 100% legal Python: > > i = 0 > while i < 10: > print i & 1 > i = i + 1 > > which would in turn make it easier to embed Python in XML such as > config-files-for-whatever-Software-Carpentry-produces-to-replace-make, > PMZ, and so on. Why? Whatever XML parser you use will output "i<1" as "i<1", so the Python that comes out of the XML parser is quite all right. Why change Python to do an XML parser job? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mhammond@skippinet.com.au Tue Mar 14 01:18:45 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Tue, 14 Mar 2000 12:18:45 +1100 Subject: [Python-Dev] unicode objects and C++ Message-ID: I struck a bit of a snag with the Unicode support when trying to use the most recent update in a C++ source file. The problem turned out to be that unicodeobject.h did a #include "wchar.h", but did it while an 'extern "C"' block was open. This upset the MSVC6 wchar.h, as it has special C++ support. Attached below is a patch I made to unicodeobject.h that solved my problem and allowed my compilations to succeed. Theoretically the same problem could exist for wctype.h, and probably lots of other headers, but this is the immediate problem :-) An alternative patch would be to #include "whcar.h" in PC\config.h outside of any 'extern "C"' blocks - wchar.h on Windows has guards that allows for multiple includes, so the unicodeobject.h include of that file will succeed, but not have the side-effect it has now. Im not sure what the preferred solution is - quite possibly the PC\config.h change, but Ive include the unicodeobject.h patch anyway :-) Mark. *** unicodeobject.h 2000/03/13 23:22:24 2.2 --- unicodeobject.h 2000/03/14 01:06:57 *************** *** 85,91 **** --- 85,101 ---- #endif #ifdef HAVE_WCHAR_H + + #ifdef __cplusplus + } /* Close the 'extern "C"' before bringing in system headers */ + #endif + # include "wchar.h" + + #ifdef __cplusplus + extern "C" { + #endif + #endif #ifdef HAVE_USABLE_WCHAR_T From mal@lemburg.com Mon Mar 13 23:31:30 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 14 Mar 2000 00:31:30 +0100 Subject: [Python-Dev] Python 1.7 tokenization feature request References: Message-ID: <38CD7A52.5709DF5F@lemburg.com> gvwilson@nevex.com wrote: > > > David Ascher wrote: > > But the scheme you put forth causes major problems for current Python > > users who *are* using glass TTYs, so I don't think it'll fly for very > > basic political reasons nicely illustrated by Dave's response. > > Understood. I thought that handling standard entities might be a > useful first step toward storage of Python as XML, which in turn would > help make Python more accessible to people who don't want to switch > editors just to program. I felt that an all-or-nothing approach would be > even less likely to get a favorable response than handling entities... :-) This should be easy to implement provided a hook for compile() is added to e.g. the sys-module which then gets used instead of calling the byte code compiler directly... Then you could redirect the compile() arguments to whatever codec you wish (e.g. a SGML entity codec) and the builtin compiler would only see the output of that codec. Well, just a thought... I don't think encoding programs would make life as a programmer easier, but instead harder. It adds one more level of confusion on top of it all. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Tue Mar 14 09:45:49 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 14 Mar 2000 10:45:49 +0100 Subject: [Python-Dev] unicode objects and C++ References: Message-ID: <38CE0A4D.1209B830@lemburg.com> Mark Hammond wrote: > > I struck a bit of a snag with the Unicode support when trying to use the > most recent update in a C++ source file. > > The problem turned out to be that unicodeobject.h did a #include "wchar.h", > but did it while an 'extern "C"' block was open. This upset the MSVC6 > wchar.h, as it has special C++ support. Thanks for reporting this. > Attached below is a patch I made to unicodeobject.h that solved my problem > and allowed my compilations to succeed. Theoretically the same problem > could exist for wctype.h, and probably lots of other headers, but this is > the immediate problem :-) > > An alternative patch would be to #include "whcar.h" in PC\config.h outside > of any 'extern "C"' blocks - wchar.h on Windows has guards that allows for > multiple includes, so the unicodeobject.h include of that file will succeed, > but not have the side-effect it has now. > > Im not sure what the preferred solution is - quite possibly the PC\config.h > change, but Ive include the unicodeobject.h patch anyway :-) > > Mark. > > *** unicodeobject.h 2000/03/13 23:22:24 2.2 > --- unicodeobject.h 2000/03/14 01:06:57 > *************** > *** 85,91 **** > --- 85,101 ---- > #endif > > #ifdef HAVE_WCHAR_H > + > + #ifdef __cplusplus > + } /* Close the 'extern "C"' before bringing in system headers */ > + #endif > + > # include "wchar.h" > + > + #ifdef __cplusplus > + extern "C" { > + #endif > + > #endif > > #ifdef HAVE_USABLE_WCHAR_T > I've included this patch (should solve the problem for all inlcuded system header files, since it wraps only the Unicode APIs in extern "C"): --- /home/lemburg/clients/cnri/CVS-Python/Include/unicodeobject.h Fri Mar 10 23:33:05 2000 +++ unicodeobject.h Tue Mar 14 10:38:08 2000 @@ -1,10 +1,7 @@ #ifndef Py_UNICODEOBJECT_H #define Py_UNICODEOBJECT_H -#ifdef __cplusplus -extern "C" { -#endif /* Unicode implementation based on original code by Fredrik Lundh, modified by Marc-Andre Lemburg (mal@lemburg.com) according to the @@ -167,10 +165,14 @@ typedef unsigned short Py_UNICODE; #define Py_UNICODE_MATCH(string, offset, substring)\ (!memcmp((string)->str + (offset), (substring)->str,\ (substring)->length*sizeof(Py_UNICODE))) +#ifdef __cplusplus +extern "C" { +#endif + /* --- Unicode Type ------------------------------------------------------- */ typedef struct { PyObject_HEAD int length; /* Length of raw Unicode data in buffer */ I'll post a complete Unicode update patch by the end of the week for inclusion in CVS. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From ping@lfw.org Tue Mar 14 11:19:59 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Tue, 14 Mar 2000 06:19:59 -0500 (EST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: Message-ID: On Tue, 14 Mar 2000, Moshe Zadka wrote: > On Mon, 13 Mar 2000 gvwilson@nevex.com wrote: > > legal? This would make the following 100% legal Python: > > > > i = 0 > > while i < 10: > > print i & 1 > > i = i + 1 > > Why? Whatever XML parser you use will output "i<1" as "i<1", so > the Python that comes out of the XML parser is quite all right. Why change > Python to do an XML parser job? I totally agree. To me, this is the key issue: it is NOT the responsibility of the programming language to accommodate any particular encoding format. While we're at it, why don't we change Python to accept quoted-printable source code? Or base64-encoded source code? XML already defines a perfectly reasonable mechanism for escaping a plain stream of text -- adding this processing to Python adds nothing but confusion. The possible useful benefit from adding the proposed "feature" is exactly zero. -- ?!ng "This code is better than any code that doesn't work has any right to be." -- Roger Gregory, on Xanadu From ping@lfw.org Tue Mar 14 11:21:59 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Tue, 14 Mar 2000 06:21:59 -0500 (EST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: Message-ID: On Mon, 13 Mar 2000, David Ascher wrote: > > If you propose a transformation between Python Syntax and XML, then you > potentially have something which all parties can agree to as being a good > thing. Indeed. I know that i wouldn't have any use for it at the moment, but i can see the potential for usefulness of a structured representation for Python source code (like an AST in XML) which could be directly edited in an XML editor, and processed (by an XSL stylesheet?) to produce actual runnable Python. But attempting to mix the two doesn't get you anywhere. -- ?!ng "This code is better than any code that doesn't work has any right to be." -- Roger Gregory, on Xanadu From Fredrik Lundh" Message-ID: <002201bf8dcb$ba9a11c0$34aab5d4@hagrid> Greg: > Understood. I thought that handling standard entities might be a > useful first step toward storage of Python as XML, which in turn would > help make Python more accessible to people who don't want to switch > editors just to program. I felt that an all-or-nothing approach would = be > even less likely to get a favorable response than handling entities... = :-) well, I would find it easier to support a more aggressive proposal: make sure Python 1.7 can deal with source code written in Unicode, using any supported encoding. with that in place, you can plug in your favourite unicode encoding via the Unicode framework. From Fredrik Lundh" Message-ID: <000901bf8e03$abf88420$34aab5d4@hagrid> > I've just checked in a massive patch from Marc-Andre Lemburg which > adds Unicode support to Python. massive, indeed. didn't notice this before, but I just realized that after the latest round of patches, the python15.dll is now 700k larger than it was for 1.5.2 (more than twice the size). my original unicode DLL was 13k. hmm... From akuchlin@mems-exchange.org Tue Mar 14 22:19:44 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Tue, 14 Mar 2000 17:19:44 -0500 (EST) Subject: [Python-Dev] Unicode patches checked in In-Reply-To: <000901bf8e03$abf88420$34aab5d4@hagrid> References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> Message-ID: <14542.47872.184978.985612@amarok.cnri.reston.va.us> Fredrik Lundh writes: >didn't notice this before, but I just realized that after the >latest round of patches, the python15.dll is now 700k larger >than it was for 1.5.2 (more than twice the size). Most of that is due to Modules/unicodedata.c, which is 2.1Mb of source code, and produces a 632168-byte .o file on my Sparc. (Will some compiler systems choke on a file that large? Could we read database info from a file instead, or mmap it into memory?) -- A.M. Kuchling http://starship.python.net/crew/amk/ "Are you OK, dressed like that? You don't seem to notice the cold." "I haven't come ten thousand miles to discuss the weather, Mr Moberly." -- Moberly and the Doctor, in "The Seeds of Doom" From mal@lemburg.com Wed Mar 15 08:32:29 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 15 Mar 2000 09:32:29 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> Message-ID: <38CF4A9D.13A0080@lemburg.com> "Andrew M. Kuchling" wrote: > > Fredrik Lundh writes: > >didn't notice this before, but I just realized that after the > >latest round of patches, the python15.dll is now 700k larger > >than it was for 1.5.2 (more than twice the size). > > Most of that is due to Modules/unicodedata.c, which is 2.1Mb of source > code, and produces a 632168-byte .o file on my Sparc. (Will some > compiler systems choke on a file that large? Could we read database > info from a file instead, or mmap it into memory?) That is dues to the unicodedata module being compiled into the DLL statically. On Unix you can build it shared too -- there are no direct references to it in the implementation. I suppose that on Windows the same should be done... the question really is whether this is intended or not -- moving the module into a DLL is at least technically no problem (someone would have to supply a patch for the MSVC project files though). Note that unicodedata is only needed by programs which do a lot of Unicode manipulations and in the future probably by some codecs too. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From pf@artcom-gmbh.de Wed Mar 15 10:42:26 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Wed, 15 Mar 2000 11:42:26 +0100 (MET) Subject: [Python-Dev] Unicode in Python and Tcl/Tk compared (was Unicode patches checked in...) In-Reply-To: <38CF4A9D.13A0080@lemburg.com> from "M.-A. Lemburg" at "Mar 15, 2000 9:32:29 am" Message-ID: Hi! > > Fredrik Lundh writes: > > >didn't notice this before, but I just realized that after the > > >latest round of patches, the python15.dll is now 700k larger > > >than it was for 1.5.2 (more than twice the size). > > > "Andrew M. Kuchling" wrote: > > Most of that is due to Modules/unicodedata.c, which is 2.1Mb of source > > code, and produces a 632168-byte .o file on my Sparc. (Will some > > compiler systems choke on a file that large? Could we read database > > info from a file instead, or mmap it into memory?) > M.-A. Lemburg wrote: > That is dues to the unicodedata module being compiled > into the DLL statically. On Unix you can build it shared too > -- there are no direct references to it in the implementation. > I suppose that on Windows the same should be done... the > question really is whether this is intended or not -- moving > the module into a DLL is at least technically no problem > (someone would have to supply a patch for the MSVC project > files though). > > Note that unicodedata is only needed by programs which do > a lot of Unicode manipulations and in the future probably > by some codecs too. Now as the unicode patches were checked in and as Fredrik Lundh noticed a considerable increase of the size of the python-DLL, which was obviously mostly caused by those tables, I had some fear that a Python/Tcl/Tk based application could eat up much more memory, if we update from Python1.5.2 and Tcl/Tk 8.0.5 to Python 1.6 and Tcl/Tk 8.3.0. As some of you certainly know, some kind of unicode support has also been added to Tcl/Tk since 8.1. So I did some research and would like to share what I have found out so far: Here are the compared sizes of the tcl/tk shared libs on Linux: old: | new: | bloat increase in %: -----------------------+------------------------+--------------------- libtcl8.0.so 533414 | libtcl8.3.so 610241 | 14.4 % libtk8.0.so 714908 | libtk8.3.so 811916 | 13.6 % The addition of unicode wasn't the only change to TclTk. So this seems reasonable. Unfortunately there is no python shared library, so a direct comparison of increased memory consumption is impossible. Nevertheless I've the following figures (stripped binary sizes of the Python interpreter): 1.5.2 382616 CVS_10-02-00 393668 (a month before unicode) CVS_12-03-00 507448 (just after unicode) That is an increase of "only" 111 kBytes. Not so bad but nevertheless a "bloat increase" of 32.6 %. And additionally there is now unicodedata.so 634940 _codecsmodule.so 38955 which (I guess) will also be loaded if the application starts using some of the new features. Since I didn't take care of unicode in the past, I feel unable to compare the implementations of unicode in both systems and what impact they will have on the real memory performance and even more important on the functionality of the combined use of both packages together with Tkinter. Tcl/Tk keeps around a sub-directory called 'encoding', which --I guess-- contains information somehow similar or related to that in 'unicodedata.so', but separated into several files? So below I included a shortened excerpts from the 200k+ tcl8.3.0/changes and the tk8.3.0/changes files about unicode. May be someone else more involved with unicode can shed some light on this topic? Do we need some changes to Tkinter.py or _tkinter or both? ---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ---- [...] ======== Changes for 8.1 go below this line ======== 6/18/97 (new feature) Tcl now supports international character sets: - All C APIs now accept UTF-8 strings instead of iso8859-1 strings, wherever you see "char *", unless explicitly noted otherwise. - All Tcl strings represented in UTF-8, which is a convenient multi-byte encoding of Unicode. Variable names, procedure names, and all other values in Tcl may include arbitrary Unicode characters. For example, the Tcl command "string length" returns how many Unicode characters are in the argument string. - For Java compatibility, embedded null bytes in C strings are represented as \xC080 in UTF-8 strings, but the null byte at the end of a UTF-8 string remains \0. Thus Tcl strings once again do not contain null bytes, except for termination bytes. - For Java compatibility, "\uXXXX" is used in Tcl to enter a Unicode character. "\u0000" through "\uffff" are acceptable Unicode characters. - "\xXX" is used to enter a small Unicode character (between 0 and 255) in Tcl. - Tcl automatically translates between UTF-8 and the normal encoding for the platform during interactions with the system. - The fconfigure command now supports a -encoding option for specifying the encoding of an open file or socket. Tcl will automatically translate between the specified encoding and UTF-8 during I/O. See the directory library/encoding to find out what encodings are supported (eventually there will be an "encoding" command that makes this information more accessible). - There are several new C APIs that support UTF-8 and various encodings. See Utf.3 for procedures that translate between Unicode and UTF-8 and manipulate UTF-8 strings. See Encoding.3 for procedures that create new encodings and translate between encodings. See ToUpper.3 for procedures that perform case conversions on UTF-8 strings. [...] 1/16/98 (new feature) Tk now supports international characters sets: - Font display mechanism overhauled to display Unicode strings containing full set of international characters. You do not need Unicode fonts on your system in order to use tk or see international characters. For those familiar with the Japanese or Chinese patches, there is no "-kanjifont" option. Characters from any available fonts will automatically be used if the widget's originally selected font is not capable of displaying a given character. - Textual widgets are international aware. For instance, cursor positioning commands would now move the cursor forwards/back by 1 international character, not by 1 byte. - Input Method Editors (IMEs) work on Mac and Windows. Unix is still in progress. [...] 10/15/98 (bug fix) Changed regexp and string commands to properly handle case folding according to the Unicode character tables. (stanton) 10/21/98 (new feature) Added an "encoding" command to facilitate translations of strings between different character encodings. See the encoding.n manual entry for more details. (stanton) 11/3/98 (bug fix) The regular expression character classification syntax now includes Unicode characters in the supported classes. (stanton) [...] 11/17/98 (bug fix) "scan" now correctly handles Unicode characters. (stanton) [...] 11/19/98 (bug fix) Fixed menus and titles so they properly display Unicode characters under Windows. [Bug: 819] (stanton) [...] 4/2/99 (new apis) Made various Unicode utility functions public. Tcl_UtfToUniCharDString, Tcl_UniCharToUtfDString, Tcl_UniCharLen, Tcl_UniCharNcmp, Tcl_UniCharIsAlnum, Tcl_UniCharIsAlpha, Tcl_UniCharIsDigit, Tcl_UniCharIsLower, Tcl_UniCharIsSpace, Tcl_UniCharIsUpper, Tcl_UniCharIsWordChar, Tcl_WinUtfToTChar, Tcl_WinTCharToUtf (stanton) [...] 4/5/99 (bug fix) Fixed handling of Unicode in text searches. The -count option was returning byte counts instead of character counts. [...] 5/18/99 (bug fix) Fixed clipboard code so it handles Unicode data properly on Windows NT and 95. [Bug: 1791] (stanton) [...] 6/3/99 (bug fix) Fixed selection code to handle Unicode data in COMPOUND_TEXT and STRING selections. [Bug: 1791] (stanton) [...] 6/7/99 (new feature) Optimized string index, length, range, and append commands. Added a new Unicode object type. (hershey) [...] 6/14/99 (new feature) Merged string and Unicode object types. Added new public Tcl API functions: Tcl_NewUnicodeObj, Tcl_SetUnicodeObj, Tcl_GetUnicode, Tcl_GetUniChar, Tcl_GetCharLength, Tcl_GetRange, Tcl_AppendUnicodeToObj. (hershey) [...] 6/23/99 (new feature) Updated Unicode character tables to reflect Unicode 2.1 data. (stanton) [...] --- Released 8.3.0, February 10, 2000 --- See ChangeLog for details --- ---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ---- Sorry if this was boring old stuff for some of you. Best Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From Vladimir.Marangozov@inrialpes.fr Wed Mar 15 11:40:21 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Wed, 15 Mar 2000 12:40:21 +0100 (CET) Subject: [Python-Dev] Unicode patches checked in In-Reply-To: <38CF4A9D.13A0080@lemburg.com> from "M.-A. Lemburg" at Mar 15, 2000 09:32:29 AM Message-ID: <200003151140.MAA30301@python.inrialpes.fr> M.-A. Lemburg wrote: > > Note that unicodedata is only needed by programs which do > a lot of Unicode manipulations and in the future probably > by some codecs too. Perhaps it would make sense to move the Unicode database on the Python side (write it in Python)? Or init the database dynamically in the unicodedata module on import? It's quite big, so if it's possible to avoid the static declaration (and if the unicodata module is enabled by default), I'd vote for a dynamic initialization of the database from reference (Python ?) file(s). M-A, is something in this spirit doable? -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tismer@tismer.com Wed Mar 15 12:57:04 2000 From: tismer@tismer.com (Christian Tismer) Date: Wed, 15 Mar 2000 13:57:04 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> Message-ID: <38CF88A0.CF876A74@tismer.com> "M.-A. Lemburg" wrote: ... > Note that unicodedata is only needed by programs which do > a lot of Unicode manipulations and in the future probably > by some codecs too. Would it be possible to make the Unicode support configurable? My problem is that patches in the CVS are of different kinds. Some are error corrections and enhancements which I would definately like to use. Others are brand new features like the Unicode support. Absolutely great stuff! But this will most probably change a number of times again, and I think it is a bad idea when I include it into my Stackless distribution. I'd appreciate it very much if I could use the same CVS tree for testing new stuff, and to build my distribution, with new features switched off. Please :-) ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From jim@digicool.com Wed Mar 15 13:35:48 2000 From: jim@digicool.com (Jim Fulton) Date: Wed, 15 Mar 2000 08:35:48 -0500 Subject: [Python-Dev] Finalizers considered questionable ;) Message-ID: <38CF91B4.A36C8C5@digicool.com> Here's my $0.02. I agree with the sentiments that use of finalizers should be discouraged. They are extremely helpful in cases like tempfile.TemporaryFileWrapper, so I think that they should be supported. I do think that the language should not promise a high level of service. Some observations: - I spent a little bit of time on the ANSI Smalltalk committee, where I naively advocated adding finalizers to the language. I was resoundingly told no. :) - Most of the Python objects I deal with these days are persistent. Their lifetimes are a lot more complicated that most Python objects. They get created once, but they get loaded into and out of memory many times. In fact, they can be in memory many times simultaneously. :) A couple of years ago I realized that it only made sense to call __init__ when an object was first created, not when it is subsequently (re)loaded into memory. This led to a change in Python pickling semantics and the deprecation of the loathsome __getinitargs__ protocol. :) For me, a similar case can be made against use of __del__ for persistent objects. For persistent objects, a __del__ method should only be used for cleaning up the most volatile of resources. A persistent object __del__ should not perform any semantically meaningful operations because __del__ has no semantic meaning. - Zope has a few uses of __del__. These are all for non-persistent objects. Interesting, in grepping for __del__, I found a lot of cases where __del__ was used and then commented out. Finalizers seem to be the sort of thing that people want initially and then get over. I'm inclined to essentially keep the current rules and simply not promise that __del__ will be able to run correctly. That is, Python should call __del__ and ignore exceptions raised (or provide some *optional* logging or other debugging facility). There is no reason for __del__ to fail unless it depends on cyclicly-related objects, which should be viewed as a design mistake. OTOH, __del__ should never fail because module globals go away. IMO, the current circular references involving module globals are unnecessary, but that's a different topic. ;) Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From mal@lemburg.com Wed Mar 15 15:00:14 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 15 Mar 2000 16:00:14 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> Message-ID: <38CFA57E.21A3B3EF@lemburg.com> Christian Tismer wrote: > > "M.-A. Lemburg" wrote: > ... > > > Note that unicodedata is only needed by programs which do > > a lot of Unicode manipulations and in the future probably > > by some codecs too. > > Would it be possible to make the Unicode support configurable? This is currently not planned as the Unicode integration touches many different parts of the interpreter to enhance string/Unicode integration... sorry. Also, I'm not sure whether adding #ifdefs throuhgout the code would increase its elegance ;-) > My problem is that patches in the CVS are of different kinds. > Some are error corrections and enhancements which I would > definately like to use. > Others are brand new features like the Unicode support. > Absolutely great stuff! But this will most probably change > a number of times again, and I think it is a bad idea when > I include it into my Stackless distribution. Why not ? All you have to do is rebuild the distribution every time you push a new version -- just like I did for the Unicode version before the CVS checkin was done. > I'd appreciate it very much if I could use the same CVS tree > for testing new stuff, and to build my distribution, with > new features switched off. Please :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Wed Mar 15 14:57:13 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 15 Mar 2000 15:57:13 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003151140.MAA30301@python.inrialpes.fr> Message-ID: <38CFA4C9.E6B8EB5D@lemburg.com> Vladimir Marangozov wrote: > > M.-A. Lemburg wrote: > > > > Note that unicodedata is only needed by programs which do > > a lot of Unicode manipulations and in the future probably > > by some codecs too. > > Perhaps it would make sense to move the Unicode database on the > Python side (write it in Python)? Or init the database dynamically > in the unicodedata module on import? It's quite big, so if it's > possible to avoid the static declaration (and if the unicodata module > is enabled by default), I'd vote for a dynamic initialization of the > database from reference (Python ?) file(s). The unicodedatabase module contains the Unicode database as static C data - this makes it shareable among (Python) processes. Python modules don't provide this feature: instead a dictionary would have to be built on import which would increase the heap size considerably. Those dicts would *not* be shareable. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tismer@tismer.com Wed Mar 15 15:20:06 2000 From: tismer@tismer.com (Christian Tismer) Date: Wed, 15 Mar 2000 16:20:06 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> Message-ID: <38CFAA26.2B2F0D01@tismer.com> "M.-A. Lemburg" wrote: > > Christian Tismer wrote: ... > > Absolutely great stuff! But this will most probably change > > a number of times again, and I think it is a bad idea when > > I include it into my Stackless distribution. > > Why not ? All you have to do is rebuild the distribution > every time you push a new version -- just like I did > for the Unicode version before the CVS checkin was done. But how can I then publish my source code, when I always pull Unicode into it. I don't like to be exposed to side effects like 700kb code bloat, just by chance, since it is in the dist right now (and will vanish again). I don't say there must be #ifdefs all and everywhere, but can I build without *using* Unicode? I don't want to introduce something new to my users what they didn't ask for. And I don't want to take care about their installations. Finally I will for sure not replace a 500k DLL by a 1.2M monster, so this is definately not what I want at the moment. How do I build a dist that doesn't need to change a lot of stuff in the user's installation? Note that Stackless Python is a drop-in replacement, not a Python distribution. Or should it be? ciao - chris (who really wants to get SLP 1.1 out) -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From Fredrik Lundh" <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> Message-ID: <014001bf8e98$35644480$34aab5d4@hagrid> CT: > How do I build a dist that doesn't need to change a lot of > stuff in the user's installation? somewhere in this thread, Guido wrote: > BTW, I added a tag "pre-unicode" to the CVS tree to the revisions > before the Unicode changes were made. maybe you could base SLP on that one? From Vladimir.Marangozov@inrialpes.fr Wed Mar 15 16:27:36 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Wed, 15 Mar 2000 17:27:36 +0100 (CET) Subject: [Python-Dev] Unicode patches checked in In-Reply-To: <38CFA4C9.E6B8EB5D@lemburg.com> from "M.-A. Lemburg" at Mar 15, 2000 03:57:13 PM Message-ID: <200003151627.RAA32543@python.inrialpes.fr> > [me] > > > > Perhaps it would make sense to move the Unicode database on the > > Python side (write it in Python)? Or init the database dynamically > > in the unicodedata module on import? It's quite big, so if it's > > possible to avoid the static declaration (and if the unicodata module > > is enabled by default), I'd vote for a dynamic initialization of the > > database from reference (Python ?) file(s). [Marc-Andre] > > The unicodedatabase module contains the Unicode database > as static C data - this makes it shareable among (Python) > processes. The static data is shared if the module is a shared object (.so). If unicodedata is not a .so, then you'll have a seperate copy of the database in each process. > > Python modules don't provide this feature: instead a dictionary > would have to be built on import which would increase the heap > size considerably. Those dicts would *not* be shareable. I haven't mentioned dicts, have I? I suggested that the entries in the C version of the database be rewritten in Python (or a text file) The unicodedata module would, in it's init function, allocate memory for the database and would populate it before returning "import okay" to Python -- this is one way to init the db dynamically, among others. As to sharing the database among different processes, this is a classic IPC pb, which has nothing to do with the static C declaration of the db. Or, hmmm, one of us is royally confused . -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tismer@tismer.com Wed Mar 15 16:22:42 2000 From: tismer@tismer.com (Christian Tismer) Date: Wed, 15 Mar 2000 17:22:42 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> <014001bf8e98$35644480$34aab5d4@hagrid> Message-ID: <38CFB8D2.537FCAD9@tismer.com> Fredrik Lundh wrote: > > CT: > > How do I build a dist that doesn't need to change a lot of > > stuff in the user's installation? > > somewhere in this thread, Guido wrote: > > > BTW, I added a tag "pre-unicode" to the CVS tree to the revisions > > before the Unicode changes were made. > > maybe you could base SLP on that one? I have no idea how this works. Would this mean that I cannot get patctes which come after unicode? Meanwhile, I've looked into the sources. It is easy for me to get rid of the problem by supplying my own unicodedata.c, where I replace all functions by some unimplemented exception. Furthermore, I wondered about the data format. Is the unicode database used inyou re package as well? Otherwise, I see only references form unicodedata.c, and that means the data structure can be massively enhanced. At the moment, that baby is 64k entries long, with four bytes and an optional string. This is a big waste. The strings are almost all some distinct prefixes, together with a list of hex smallwords. This is done as strings, probably this makes 80 percent of the space. The only function that uses the "decomposition" field (namely the string) is unicodedata_decomposition. It does nothing more than to wrap it into a PyObject. We can do a little better here. I gues I can bring it down to a third of this space without much effort, just by using - binary encoding for the tags as enumeration - binary encoding of the hexed entries - omission of the spaces Instead of a 64 k of structures which contain pointers anyway, I can use a 64k pointer array with offsets into one packed table. The unicodedata access functions would change *slightly*, just building some hex strings and so on. I guess this is not a time critical section? Should I try this evening? :-) cheers - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From mal@lemburg.com Wed Mar 15 16:04:43 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 15 Mar 2000 17:04:43 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> Message-ID: <38CFB49B.885B8B16@lemburg.com> Christian Tismer wrote: > > "M.-A. Lemburg" wrote: > > > > Christian Tismer wrote: > ... > > > Absolutely great stuff! But this will most probably change > > > a number of times again, and I think it is a bad idea when > > > I include it into my Stackless distribution. > > > > Why not ? All you have to do is rebuild the distribution > > every time you push a new version -- just like I did > > for the Unicode version before the CVS checkin was done. > > But how can I then publish my source code, when I always > pull Unicode into it. I don't like to be exposed to > side effects like 700kb code bloat, just by chance, since it > is in the dist right now (and will vanish again). All you have to do is build the unicodedata module shared and not statically bound into python.dll. This one module causes most of the code bloat... > I don't say there must be #ifdefs all and everywhere, but > can I build without *using* Unicode? I don't want to > introduce something new to my users what they didn't ask for. > And I don't want to take care about their installations. > Finally I will for sure not replace a 500k DLL by a 1.2M > monster, so this is definately not what I want at the moment. > > How do I build a dist that doesn't need to change a lot of > stuff in the user's installation? I don't think that the Unicode stuff will disable the running environment... (haven't tried this though). The unicodedata module is not used by the interpreter and the rest is imported on-the-fly, not during init time, so at least in theory, not using Unicode will result in Python not looking for e.g. the encodings package. > Note that Stackless Python is a drop-in replacement, > not a Python distribution. Or should it be? Probably... I think it's simply easier to install and probably also easier to maintain because it doesn't cause dependencies on other "default" installations. The user will then explicitly know that she is installing something a little different from the default distribution... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Wed Mar 15 17:26:15 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 15 Mar 2000 18:26:15 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> <014001bf8e98$35644480$34aab5d4@hagrid> <38CFB8D2.537FCAD9@tismer.com> Message-ID: <38CFC7B7.A1ABD51C@lemburg.com> Christian Tismer wrote: > > Fredrik Lundh wrote: > > > > CT: > > > How do I build a dist that doesn't need to change a lot of > > > stuff in the user's installation? > > > > somewhere in this thread, Guido wrote: > > > > > BTW, I added a tag "pre-unicode" to the CVS tree to the revisions > > > before the Unicode changes were made. > > > > maybe you could base SLP on that one? > > I have no idea how this works. Would this mean that I cannot > get patctes which come after unicode? > > Meanwhile, I've looked into the sources. It is easy for me > to get rid of the problem by supplying my own unicodedata.c, > where I replace all functions by some unimplemented exception. No need (see my other posting): simply disable the module altogether... this shouldn't hurt any part of the interpreter as the module is a user-land only module. > Furthermore, I wondered about the data format. Is the unicode > database used inyou re package as well? Otherwise, I see > only references form unicodedata.c, and that means the data > structure can be massively enhanced. > At the moment, that baby is 64k entries long, with four bytes > and an optional string. > This is a big waste. The strings are almost all some distinct > prefixes, together with a list of hex smallwords. This > is done as strings, probably this makes 80 percent of the space. I have made no attempt to optimize the structure... (due to lack of time mostly) the current implementation is really not much different from a rewrite of the UnicodeData.txt file availble at the unicode.org site. If you want to, I can mail you the marshalled Python dict version of that database to play with. > The only function that uses the "decomposition" field (namely > the string) is unicodedata_decomposition. It does nothing > more than to wrap it into a PyObject. > We can do a little better here. I gues I can bring it down > to a third of this space without much effort, just by using > - binary encoding for the tags as enumeration > - binary encoding of the hexed entries > - omission of the spaces > Instead of a 64 k of structures which contain pointers anyway, > I can use a 64k pointer array with offsets into one packed > table. > > The unicodedata access functions would change *slightly*, > just building some hex strings and so on. I guess this > is not a time critical section? It may be if these functions are used in codecs, so you should pay attention to speed too... > Should I try this evening? :-) Sure :-) go ahead... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Wed Mar 15 17:39:14 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 15 Mar 2000 18:39:14 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003151627.RAA32543@python.inrialpes.fr> Message-ID: <38CFCAC2.7690DF55@lemburg.com> Vladimir Marangozov wrote: > > > [me] > > > > > > Perhaps it would make sense to move the Unicode database on the > > > Python side (write it in Python)? Or init the database dynamically > > > in the unicodedata module on import? It's quite big, so if it's > > > possible to avoid the static declaration (and if the unicodata module > > > is enabled by default), I'd vote for a dynamic initialization of the > > > database from reference (Python ?) file(s). > > [Marc-Andre] > > > > The unicodedatabase module contains the Unicode database > > as static C data - this makes it shareable among (Python) > > processes. > > The static data is shared if the module is a shared object (.so). > If unicodedata is not a .so, then you'll have a seperate copy of the > database in each process. Uhm, comparing the two versions Python 1.5 and the current CVS Python I get these figures on Linux: Executing : ./python -i -c '1/0' Python 1.5: 1208kB / 728 kB (resident/shared) Python CVS: 1280kB / 808 kB ("/") Not much of a change if you ask me and the CVS version has the unicodedata module linked statically... so there's got to be some sharing and load-on-demand going on behind the scenes: this is what I was referring to when I mentioned static C data. The OS can much better deal with these sharing techniques and delayed loads than anything we could implement on top of it in C or Python. But perhaps this is Linux-specific... > > Python modules don't provide this feature: instead a dictionary > > would have to be built on import which would increase the heap > > size considerably. Those dicts would *not* be shareable. > > I haven't mentioned dicts, have I? I suggested that the entries in the > C version of the database be rewritten in Python (or a text file) > The unicodedata module would, in it's init function, allocate memory > for the database and would populate it before returning "import okay" > to Python -- this is one way to init the db dynamically, among others. I'm leaving this as exercise to the interested reader ;-) Really, if you have better ideas for the unicodedata module, please go ahead. > As to sharing the database among different processes, this is a classic > IPC pb, which has nothing to do with the static C declaration of the db. > Or, hmmm, one of us is royally confused . Could you check this on other platforms ? Perhaps Linux is doing more than other OSes are in this field. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Fredrik Lundh" <38CFCAC2.7690DF55@lemburg.com> Message-ID: <01f901bf8eab$a353e780$34aab5d4@hagrid> I just uploaded the first public SRE snapshot to: http://w1.132.telia.com/~u13208596/sre.htm -- this kit contains windows binaries only (make sure you have built the interpreter from a recent CVS version) -- the engine fully supports unicode target strings. (not sure about the pattern compiler, though...) -- it's probably buggy as hell. for things I'm working on at this very moment, see: http://w1.132.telia.com/~u13208596/sre/status.htm I hope to get around to fix the core dump (it crashes half- ways through sre_fulltest.py, by no apparent reason) and the backreferencing problem later today. stay tuned. PS. note that "public" doesn't really mean "suitable for the c.l.python crowd", or "suitable for production use". in other words, let's keep this one on this list for now. thanks! From tismer@tismer.com Wed Mar 15 18:15:27 2000 From: tismer@tismer.com (Christian Tismer) Date: Wed, 15 Mar 2000 19:15:27 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> <014001bf8e98$35644480$34aab5d4@hagrid> <38CFB8D2.537FCAD9@tismer.com> <38CFC7B7.A1ABD51C@lemburg.com> Message-ID: <38CFD33F.3C02BF43@tismer.com> "M.-A. Lemburg" wrote: > > Christian Tismer wrote: [the old data comression guy has been reanimated] > If you want to, I can mail you the marshalled Python dict version of > that database to play with. ... > > Should I try this evening? :-) > > Sure :-) go ahead... Thank you. Meanwhile I've heard that there is some well-known bot working on that under the hood, with a much better approach than mine. So I'll take your advice, and continue to write silly stackless enhancements. They say this is my destiny :-) ciao - continuous -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From DavidA@ActiveState.com Wed Mar 15 18:21:40 2000 From: DavidA@ActiveState.com (David Ascher) Date: Wed, 15 Mar 2000 10:21:40 -0800 Subject: [Python-Dev] Unicode patches checked in In-Reply-To: <38CFA4C9.E6B8EB5D@lemburg.com> Message-ID: > The unicodedatabase module contains the Unicode database > as static C data - this makes it shareable among (Python) > processes. > > Python modules don't provide this feature: instead a dictionary > would have to be built on import which would increase the heap > size considerably. Those dicts would *not* be shareable. I know it's complicating things, but wouldn't an mmap'ed buffer allow inter-process sharing while keeping DLL size down and everything on-disk until needed? Yes, I know, mmap calls aren't uniform across platforms and isn't supported on all platforms -- I still think that it's silly not to use it on those platforms where it is available, and I'd like to see mmap unification move forward, so this is as good a motivation as any to bite the bullet. Just a thought, --david From jim@digicool.com Wed Mar 15 18:24:53 2000 From: jim@digicool.com (Jim Fulton) Date: Wed, 15 Mar 2000 13:24:53 -0500 Subject: [Python-Dev] Allowing multiple socket maps in asyncore (and asynchat) Message-ID: <38CFD575.A0536439@digicool.com> I find asyncore to be quite useful, however, it is currently geared to having a single main loop. It uses a global socket map that all asyncore dispatchers register with. I have an application in which I want to have multiple socket maps. I propose that we start moving toward a model in which selection of a socket map and control of the asyncore loop is a bit more explicit. If no one objects, I'll work up some initial patches. Who should I submit these to? Sam? Should the medusa public CVS form the basis? Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From jcw@equi4.com Wed Mar 15 19:39:37 2000 From: jcw@equi4.com (Jean-Claude Wippler) Date: Wed, 15 Mar 2000 20:39:37 +0100 Subject: [Python-Dev] Unicode patches checked in References: Message-ID: <38CFE6F9.3E8E9385@equi4.com> David Ascher wrote: [shareable unicodedatabase] > I know it's complicating things, but wouldn't an mmap'ed buffer allow > inter-process sharing while keeping DLL size down and everything > on-disk until needed? AFAIK, on platforms which support mmap, static data already gets mmap'ed in by the OS (just like all code), so this might have little effect. I'm more concerned by the distribution size increase. -jcw From bwarsaw@cnri.reston.va.us Wed Mar 15 18:41:00 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 15 Mar 2000 13:41:00 -0500 (EST) Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> <014001bf8e98$35644480$34aab5d4@hagrid> Message-ID: <14543.55612.969101.206695@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> somewhere in this thread, Guido wrote: >> BTW, I added a tag "pre-unicode" to the CVS tree to the >> revisions before the Unicode changes were made. FL> maybe you could base SLP on that one? /F's got it exactly right. Check out a new directory using a stable tag (maybe you want to base your changes on pre-unicode tag, or python 1.52?). Patch in that subtree and then eventually you'll have to merge your changes into the head of the branch. -Barry From rushing@nightmare.com Thu Mar 16 01:52:22 2000 From: rushing@nightmare.com (Sam Rushing) Date: Wed, 15 Mar 2000 17:52:22 -0800 (PST) Subject: [Python-Dev] Allowing multiple socket maps in asyncore (and asynchat) In-Reply-To: <38CFD575.A0536439@digicool.com> References: <38CFD575.A0536439@digicool.com> Message-ID: <14544.15958.546712.466506@seattle.nightmare.com> Jim Fulton writes: > I find asyncore to be quite useful, however, it is currently > geared to having a single main loop. It uses a global socket > map that all asyncore dispatchers register with. > > I have an application in which I want to have multiple > socket maps. But still only a single event loop, yes? Why do you need multiple maps? For a priority system of some kind? > I propose that we start moving toward a model in which selection of > a socket map and control of the asyncore loop is a bit more > explicit. > > If no one objects, I'll work up some initial patches. If it can be done in a backward-compatible fashion, that sounds fine; but it sounds tricky. Even the simple {:object...} change broke so many things that we're still using the old stuff at eGroups. > Who should I submit these to? Sam? > Should the medusa public CVS form the basis? Yup, yup. -Sam From tim_one@email.msn.com Thu Mar 16 07:06:23 2000 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 16 Mar 2000 02:06:23 -0500 Subject: [Python-Dev] Finalizers considered questionable ;) In-Reply-To: <38CF91B4.A36C8C5@digicool.com> Message-ID: <000201bf8f16$237e5e80$662d153f@tim> [Jim Fulton] > ... > There is no reason for __del__ to fail unless it depends on > cyclicly-related objects, which should be viewed as a design > mistake. > > OTOH, __del__ should never fail because module globals go away. > IMO, the current circular references involving module globals are > unnecessary, but that's a different topic. ;) IOW, you view "the current circular references involving module globals" as "a design mistake" . And perhaps they are! I wouldn't call it a different topic, though: so long as people are *viewing* shutdown __del__ problems as just another instance of finalizers in cyclic trash, it makes the latter *seem* inescapably "normal", and so something that has to be catered to. If you have a way to take the shutdown problems out of the discussion, it would help clarify both topics, at the very least by deconflating them. it's-a-mailing-list-so-no-need-to-stay-on-topic-ly y'rs - tim From gstein@lyra.org Thu Mar 16 12:01:36 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 16 Mar 2000 04:01:36 -0800 (PST) Subject: [Python-Dev] Unicode patches checked in In-Reply-To: <38CF88A0.CF876A74@tismer.com> Message-ID: On Wed, 15 Mar 2000, Christian Tismer wrote: >... > Would it be possible to make the Unicode support configurable? This might be interesting from the standpoint of those guys who are doing the tiny Python interpreter thingy for embedded systems. > My problem is that patches in the CVS are of different kinds. > Some are error corrections and enhancements which I would > definately like to use. > Others are brand new features like the Unicode support. > Absolutely great stuff! But this will most probably change > a number of times again, and I think it is a bad idea when > I include it into my Stackless distribution. > > I'd appreciate it very much if I could use the same CVS tree > for testing new stuff, and to build my distribution, with > new features switched off. Please :-) But! I find this reason completely off the mark. In essence, you're arguing that we should not put *any* new feature into the CVS repository because it might mess up what *you* are doing. Sorry, but that just irks me. If you want a stable Python, then don't use the CVS version. Or base it off a specific tag in CVS. Or something. Just don't ask for development to be stopped. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Thu Mar 16 12:08:43 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 16 Mar 2000 04:08:43 -0800 (PST) Subject: [Python-Dev] const data (was: Unicode patches checked in) In-Reply-To: <200003151627.RAA32543@python.inrialpes.fr> Message-ID: On Wed, 15 Mar 2000, Vladimir Marangozov wrote: > > [me] > > > > > > Perhaps it would make sense to move the Unicode database on the > > > Python side (write it in Python)? Or init the database dynamically > > > in the unicodedata module on import? It's quite big, so if it's > > > possible to avoid the static declaration (and if the unicodata module > > > is enabled by default), I'd vote for a dynamic initialization of the > > > database from reference (Python ?) file(s). > > [Marc-Andre] > > > > The unicodedatabase module contains the Unicode database > > as static C data - this makes it shareable among (Python) > > processes. > > The static data is shared if the module is a shared object (.so). > If unicodedata is not a .so, then you'll have a seperate copy of the > database in each process. Nope. A shared module means that multiple executables can share the code. Whether the const data resides in an executable or a .so, the OS will map it into readonly memory and share it across all procsses. > > Python modules don't provide this feature: instead a dictionary > > would have to be built on import which would increase the heap > > size considerably. Those dicts would *not* be shareable. > > I haven't mentioned dicts, have I? I suggested that the entries in the > C version of the database be rewritten in Python (or a text file) > The unicodedata module would, in it's init function, allocate memory > for the database and would populate it before returning "import okay" > to Python -- this is one way to init the db dynamically, among others. This would place all that data into the per-process heap. Definitely not shared, and definitely a big hit for each Python process. > As to sharing the database among different processes, this is a classic > IPC pb, which has nothing to do with the static C declaration of the db. > Or, hmmm, one of us is royally confused . This isn't IPC. It is sharing of some constant data. The most effective way to manage this is through const C data. The OS will properly manage it. And sorry, David, but mmap'ing a file will simply add complexity. As jcw mentioned, the OS is pretty much doing this anyhow when it deals with a const data segment in your executable. I don't believe this is Linux specific. This kind of stuff has been done for a *long* time on the platforms, too. Side note: the most effective way of exposing this const data up to Python (without shoving it onto the heap) is through buffers created via: PyBuffer_FromMemory(ptr, size) This allows the data to reside in const, shared memory while it is also exposed up to Python. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Vladimir.Marangozov@inrialpes.fr Thu Mar 16 12:39:42 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Thu, 16 Mar 2000 13:39:42 +0100 (CET) Subject: [Python-Dev] const data (was: Unicode patches checked in) In-Reply-To: from "Greg Stein" at Mar 16, 2000 04:08:43 AM Message-ID: <200003161239.NAA01671@python.inrialpes.fr> Greg Stein wrote: > > [me] > > The static data is shared if the module is a shared object (.so). > > If unicodedata is not a .so, then you'll have a seperate copy of the > > database in each process. > > Nope. A shared module means that multiple executables can share the code. > Whether the const data resides in an executable or a .so, the OS will map > it into readonly memory and share it across all procsses. I must have been drunk yesterday. You're right. > I don't believe this is Linux specific. This kind of stuff has been done > for a *long* time on the platforms, too. Yes. > > Side note: the most effective way of exposing this const data up to Python > (without shoving it onto the heap) is through buffers created via: > PyBuffer_FromMemory(ptr, size) > This allows the data to reside in const, shared memory while it is also > exposed up to Python. And to avoid the size increase of the Python library, perhaps unicodedata needs to be uncommented by default in Setup.in (for the release, not now). As M-A pointed out, the module isn't isn't necessary for the normal operation of the interpreter. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From gstein@lyra.org Thu Mar 16 12:56:21 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 16 Mar 2000 04:56:21 -0800 (PST) Subject: [Python-Dev] Finalizers considered questionable ;) In-Reply-To: <000201bf8f16$237e5e80$662d153f@tim> Message-ID: On Thu, 16 Mar 2000, Tim Peters wrote: >... > IOW, you view "the current circular references involving module globals" as > "a design mistake" . And perhaps they are! I wouldn't call it a > different topic, though: so long as people are *viewing* shutdown __del__ > problems as just another instance of finalizers in cyclic trash, it makes > the latter *seem* inescapably "normal", and so something that has to be > catered to. If you have a way to take the shutdown problems out of the > discussion, it would help clarify both topics, at the very least by > deconflating them. Bah. Module globals are easy. My tp_clean suggestion handles them quite easily at shutdown. No more special-code in import.c. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tismer@tismer.com Thu Mar 16 12:53:46 2000 From: tismer@tismer.com (Christian Tismer) Date: Thu, 16 Mar 2000 13:53:46 +0100 Subject: [Python-Dev] Unicode patches checked in References: Message-ID: <38D0D95A.B13EC17E@tismer.com> Greg Stein wrote: > > On Wed, 15 Mar 2000, Christian Tismer wrote: > >... > > Would it be possible to make the Unicode support configurable? > > This might be interesting from the standpoint of those guys who are doing > the tiny Python interpreter thingy for embedded systems. > > > My problem is that patches in the CVS are of different kinds. > > Some are error corrections and enhancements which I would > > definately like to use. > > Others are brand new features like the Unicode support. > > Absolutely great stuff! But this will most probably change > > a number of times again, and I think it is a bad idea when > > I include it into my Stackless distribution. > > > > I'd appreciate it very much if I could use the same CVS tree > > for testing new stuff, and to build my distribution, with > > new features switched off. Please :-) > > But! I find this reason completely off the mark. In essence, you're > arguing that we should not put *any* new feature into the CVS repository > because it might mess up what *you* are doing. No, this is your interpretation, and a reduction which I can't follow. There are inprovements and features in the CVS version which I need. I prefer to build against it, instead of the old 1.5.2. What's wrong with that? I want to find a way that gives me the least trouble in doing so. > Sorry, but that just irks me. If you want a stable Python, then don't use > the CVS version. Or base it off a specific tag in CVS. Or something. Just > don't ask for development to be stopped. No, I ask for development to be stopped. Code freeze until Y3k :-) Why are you trying to put such a nonsense into my mouth? You know that I know that you know better. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From tismer@tismer.com Thu Mar 16 13:25:48 2000 From: tismer@tismer.com (Christian Tismer) Date: Thu, 16 Mar 2000 14:25:48 +0100 Subject: [Python-Dev] const data (was: Unicode patches checked in) References: <200003161239.NAA01671@python.inrialpes.fr> Message-ID: <38D0E0DC.B997F836@tismer.com> Vladimir Marangozov wrote: > > Greg Stein wrote: > > Side note: the most effective way of exposing this const data up to Python > > (without shoving it onto the heap) is through buffers created via: > > PyBuffer_FromMemory(ptr, size) > > This allows the data to reside in const, shared memory while it is also > > exposed up to Python. > > And to avoid the size increase of the Python library, perhaps unicodedata > needs to be uncommented by default in Setup.in (for the release, not now). > As M-A pointed out, the module isn't isn't necessary for the normal > operation of the interpreter. Sounds like a familiar idea. :-) BTW., yesterday evening I wrote an analysis script, to see how far this data is compactable without going into real compression, just redundancy folding and byte/short indexing was used. If I'm not wrong, this reduces the size of the database to less than 25kb. That small amount of extra data would make the uncommenting feature quite unimportant, except for the issue of building tiny Pythons. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From gstein@lyra.org Thu Mar 16 13:06:46 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 16 Mar 2000 05:06:46 -0800 (PST) Subject: [Python-Dev] Unicode patches checked in In-Reply-To: <38D0D95A.B13EC17E@tismer.com> Message-ID: On Thu, 16 Mar 2000, Christian Tismer wrote: > Greg Stein wrote: >... > > Sorry, but that just irks me. If you want a stable Python, then don't use > > the CVS version. Or base it off a specific tag in CVS. Or something. Just > > don't ask for development to be stopped. > > No, I ask for development to be stopped. Code freeze until Y3k :-) > Why are you trying to put such a nonsense into my mouth? > You know that I know that you know better. Simply because that is what it sounds like on this side of my monitor :-) I'm seeing your request as asking for people to make special considerations in their patches for your custom distribution. While I don't have a problem with making Python more flexible to distro maintainers, it seemed like you were approaching it from the "wrong" angle. Like I said, making Unicode optional for the embedded space makes sense; making it optional so it doesn't bloat your distro didn't :-) Not a big deal... it is mostly a perception on my part. I also tend to dislike things that hold development back. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal@lemburg.com Fri Mar 17 18:53:39 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 17 Mar 2000 19:53:39 +0100 Subject: [Python-Dev] Unicode Update 2000-03-17 Message-ID: <38D27F33.4055A942@lemburg.com> This is a multi-part message in MIME format. --------------A764B515049AA0B5F7643A5B Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Attached you find an update of the Unicode implementation. The patch is against the current CVS version. I would appreciate if someone with CVS checkin permissions could check the changes in. The patch contains all bugs and patches sent this week and also fixes a leak in the codecs code and a bug in the free list code for Unicode objects (which only shows up when compiling Python with Py_DEBUG; thanks to MarkH for spotting this one). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ --------------A764B515049AA0B5F7643A5B Content-Type: text/plain; charset=us-ascii; name="Unicode-Implementation-2000-03-17.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="Unicode-Implementation-2000-03-17.patch" Only in CVS-Python/Doc/tools: anno-api.py diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Include/unicodeobject.h Python+Unicode/Include/unicodeobject.h --- CVS-Python/Include/unicodeobject.h Fri Mar 17 15:24:30 2000 +++ Python+Unicode/Include/unicodeobject.h Tue Mar 14 10:38:08 2000 @@ -1,8 +1,5 @@ #ifndef Py_UNICODEOBJECT_H #define Py_UNICODEOBJECT_H -#ifdef __cplusplus -extern "C" { -#endif /* @@ -109,8 +106,9 @@ /* --- Internal Unicode Operations ---------------------------------------- */ /* If you want Python to use the compiler's wctype.h functions instead - of the ones supplied with Python, define WANT_WCTYPE_FUNCTIONS. - This reduces the interpreter's code size. */ + of the ones supplied with Python, define WANT_WCTYPE_FUNCTIONS or + configure Python using --with-ctype-functions. This reduces the + interpreter's code size. */ #if defined(HAVE_USABLE_WCHAR_T) && defined(WANT_WCTYPE_FUNCTIONS) @@ -169,6 +167,10 @@ (!memcmp((string)->str + (offset), (substring)->str,\ (substring)->length*sizeof(Py_UNICODE))) +#ifdef __cplusplus +extern "C" { +#endif + /* --- Unicode Type ------------------------------------------------------- */ typedef struct { @@ -647,7 +649,7 @@ int direction /* Find direction: +1 forward, -1 backward */ ); -/* Count the number of occurances of substr in str[start:end]. */ +/* Count the number of occurrences of substr in str[start:end]. */ extern DL_IMPORT(int) PyUnicode_Count( PyObject *str, /* String */ @@ -656,7 +658,7 @@ int end /* Stop index */ ); -/* Replace at most maxcount occurances of substr in str with replstr +/* Replace at most maxcount occurrences of substr in str with replstr and return the resulting Unicode object. */ extern DL_IMPORT(PyObject *) PyUnicode_Replace( diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/codecs.py Python+Unicode/Lib/codecs.py --- CVS-Python/Lib/codecs.py Sat Mar 11 00:20:43 2000 +++ Python+Unicode/Lib/codecs.py Mon Mar 13 14:33:54 2000 @@ -55,7 +55,7 @@ """ def encode(self,input,errors='strict'): - """ Encodes the object intput and returns a tuple (output + """ Encodes the object input and returns a tuple (output object, length consumed). errors defines the error handling to apply. It defaults to diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/encodings/__init__.py Python+Unicode/Lib/encodings/__init__.py --- CVS-Python/Lib/encodings/__init__.py Sat Mar 11 00:17:18 2000 +++ Python+Unicode/Lib/encodings/__init__.py Mon Mar 13 14:30:33 2000 @@ -30,13 +30,13 @@ import string,codecs,aliases _cache = {} -_unkown = '--unkown--' +_unknown = '--unknown--' def search_function(encoding): # Cache lookup - entry = _cache.get(encoding,_unkown) - if entry is not _unkown: + entry = _cache.get(encoding,_unknown) + if entry is not _unknown: return entry # Import the module diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_string.py Python+Unicode/Lib/test/test_string.py --- CVS-Python/Lib/test/test_string.py Sat Mar 11 10:52:43 2000 +++ Python+Unicode/Lib/test/test_string.py Mon Mar 13 10:12:46 2000 @@ -143,6 +143,7 @@ test('translate', 'xyz', 'xyz', table) test('replace', 'one!two!three!', 'one@two!three!', '!', '@', 1) +test('replace', 'one!two!three!', 'onetwothree', '!', '') test('replace', 'one!two!three!', 'one@two@three!', '!', '@', 2) test('replace', 'one!two!three!', 'one@two@three@', '!', '@', 3) test('replace', 'one!two!three!', 'one@two@three@', '!', '@', 4) diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_unicode.py Python+Unicode/Lib/test/test_unicode.py --- CVS-Python/Lib/test/test_unicode.py Fri Mar 17 15:24:31 2000 +++ Python+Unicode/Lib/test/test_unicode.py Mon Mar 13 10:13:05 2000 @@ -108,6 +108,7 @@ test('translate', u'xyz', u'xyz', table) test('replace', u'one!two!three!', u'one@two!three!', u'!', u'@', 1) +test('replace', u'one!two!three!', u'onetwothree', '!', '') test('replace', u'one!two!three!', u'one@two@three!', u'!', u'@', 2) test('replace', u'one!two!three!', u'one@two@three@', u'!', u'@', 3) test('replace', u'one!two!three!', u'one@two@three@', u'!', u'@', 4) diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Misc/unicode.txt Python+Unicode/Misc/unicode.txt --- CVS-Python/Misc/unicode.txt Sat Mar 11 00:14:11 2000 +++ Python+Unicode/Misc/unicode.txt Fri Mar 17 16:55:11 2000 @@ -743,8 +743,9 @@ stream codecs as available through the codecs module should be used. -XXX There should be a short-cut open(filename,mode,encoding) available which - also assures that mode contains the 'b' character when needed. +The codecs module should provide a short-cut open(filename,mode,encoding) +available which also assures that mode contains the 'b' character when +needed. File/Stream Input: @@ -810,6 +811,10 @@ Introduction to Unicode (a little outdated by still nice to read): http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html +For comparison: + Introducing Unicode to ECMAScript -- + http://www-4.ibm.com/software/developer/library/internationalization-support.html + Encodings: Overview: @@ -832,7 +837,7 @@ History of this Proposal: ------------------------- -1.2: +1.2: Removed POD about codecs.open() 1.1: Added note about comparisons and hash values. Added note about case mapping algorithms. Changed stream codecs .read() and .write() method to match the standard file-like object methods diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Modules/stropmodule.c Python+Unicode/Modules/stropmodule.c --- CVS-Python/Modules/stropmodule.c Wed Mar 1 10:22:53 2000 +++ Python+Unicode/Modules/stropmodule.c Mon Mar 13 14:33:23 2000 @@ -1054,7 +1054,7 @@ strstr replacement for arbitrary blocks of memory. - Locates the first occurance in the memory pointed to by MEM of the + Locates the first occurrence in the memory pointed to by MEM of the contents of memory pointed to by PAT. Returns the index into MEM if found, or -1 if not found. If len of PAT is greater than length of MEM, the function returns -1. diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/stringobject.c Python+Unicode/Objects/stringobject.c --- CVS-Python/Objects/stringobject.c Tue Mar 14 00:14:17 2000 +++ Python+Unicode/Objects/stringobject.c Mon Mar 13 14:33:24 2000 @@ -1395,7 +1395,7 @@ strstr replacement for arbitrary blocks of memory. - Locates the first occurance in the memory pointed to by MEM of the + Locates the first occurrence in the memory pointed to by MEM of the contents of memory pointed to by PAT. Returns the index into MEM if found, or -1 if not found. If len of PAT is greater than length of MEM, the function returns -1. @@ -1578,7 +1578,7 @@ return NULL; if (sub_len <= 0) { - PyErr_SetString(PyExc_ValueError, "empty replacement string"); + PyErr_SetString(PyExc_ValueError, "empty pattern string"); return NULL; } new_s = mymemreplace(str,len,sub,sub_len,repl,repl_len,count,&out_len); Only in CVS-Python/Objects: stringobject.c.orig diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/unicodeobject.c Python+Unicode/Objects/unicodeobject.c --- CVS-Python/Objects/unicodeobject.c Tue Mar 14 00:14:17 2000 +++ Python+Unicode/Objects/unicodeobject.c Wed Mar 15 10:49:19 2000 @@ -83,7 +83,7 @@ all objects on the free list having a size less than this limit. This reduces malloc() overhead for small Unicode objects. - At worse this will result in MAX_UNICODE_FREELIST_SIZE * + At worst this will result in MAX_UNICODE_FREELIST_SIZE * (sizeof(PyUnicodeObject) + STAYALIVE_SIZE_LIMIT + malloc()-overhead) bytes of unused garbage. @@ -180,7 +180,7 @@ unicode_freelist = *(PyUnicodeObject **)unicode_freelist; unicode_freelist_size--; unicode->ob_type = &PyUnicode_Type; - _Py_NewReference(unicode); + _Py_NewReference((PyObject *)unicode); if (unicode->str) { if (unicode->length < length && _PyUnicode_Resize(unicode, length)) { @@ -199,16 +199,19 @@ unicode->str = PyMem_NEW(Py_UNICODE, length + 1); } - if (!unicode->str) { - PyMem_DEL(unicode); - PyErr_NoMemory(); - return NULL; - } + if (!unicode->str) + goto onError; unicode->str[length] = 0; unicode->length = length; unicode->hash = -1; unicode->utf8str = NULL; return unicode; + + onError: + _Py_ForgetReference((PyObject *)unicode); + PyMem_DEL(unicode); + PyErr_NoMemory(); + return NULL; } static @@ -224,7 +227,6 @@ *(PyUnicodeObject **)unicode = unicode_freelist; unicode_freelist = unicode; unicode_freelist_size++; - _Py_ForgetReference(unicode); } else { free(unicode->str); @@ -489,7 +491,7 @@ } else { PyErr_Format(PyExc_ValueError, - "UTF-8 decoding error; unkown error handling code: %s", + "UTF-8 decoding error; unknown error handling code: %s", errors); return -1; } @@ -611,7 +613,7 @@ else { PyErr_Format(PyExc_ValueError, "UTF-8 encoding error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } @@ -733,7 +735,7 @@ } else { PyErr_Format(PyExc_ValueError, - "UTF-16 decoding error; unkown error handling code: %s", + "UTF-16 decoding error; unknown error handling code: %s", errors); return -1; } @@ -921,7 +923,7 @@ else { PyErr_Format(PyExc_ValueError, "Unicode-Escape decoding error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } @@ -1051,6 +1053,10 @@ */ +static const Py_UNICODE *findchar(const Py_UNICODE *s, + int size, + Py_UNICODE ch); + static PyObject *unicodeescape_string(const Py_UNICODE *s, int size, @@ -1069,9 +1075,6 @@ p = q = PyString_AS_STRING(repr); if (quotes) { - static const Py_UNICODE *findchar(const Py_UNICODE *s, - int size, - Py_UNICODE ch); *p++ = 'u'; *p++ = (findchar(s, size, '\'') && !findchar(s, size, '"')) ? '"' : '\''; @@ -1298,7 +1301,7 @@ else { PyErr_Format(PyExc_ValueError, "Latin-1 encoding error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } @@ -1369,7 +1372,7 @@ else { PyErr_Format(PyExc_ValueError, "ASCII decoding error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } @@ -1431,7 +1434,7 @@ else { PyErr_Format(PyExc_ValueError, "ASCII encoding error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } @@ -1502,7 +1505,7 @@ else { PyErr_Format(PyExc_ValueError, "charmap decoding error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } @@ -1618,7 +1621,7 @@ else { PyErr_Format(PyExc_ValueError, "charmap encoding error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } @@ -1750,7 +1753,7 @@ else { PyErr_Format(PyExc_ValueError, "translate error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Python/codecs.c Python+Unicode/Python/codecs.c --- CVS-Python/Python/codecs.c Fri Mar 10 23:57:27 2000 +++ Python+Unicode/Python/codecs.c Wed Mar 15 11:27:54 2000 @@ -93,9 +93,14 @@ PyObject *_PyCodec_Lookup(const char *encoding) { - PyObject *result, *args = NULL, *v; + PyObject *result, *args = NULL, *v = NULL; int i, len; + if (_PyCodec_SearchCache == NULL || _PyCodec_SearchPath == NULL) { + PyErr_SetString(PyExc_SystemError, + "codec module not properly initialized"); + goto onError; + } if (!import_encodings_called) import_encodings(); @@ -109,6 +114,7 @@ result = PyDict_GetItem(_PyCodec_SearchCache, v); if (result != NULL) { Py_INCREF(result); + Py_DECREF(v); return result; } @@ -121,6 +127,7 @@ if (args == NULL) goto onError; PyTuple_SET_ITEM(args,0,v); + v = NULL; for (i = 0; i < len; i++) { PyObject *func; @@ -146,7 +153,7 @@ if (i == len) { /* XXX Perhaps we should cache misses too ? */ PyErr_SetString(PyExc_LookupError, - "unkown encoding"); + "unknown encoding"); goto onError; } @@ -156,6 +163,7 @@ return result; onError: + Py_XDECREF(v); Py_XDECREF(args); return NULL; } @@ -378,5 +386,7 @@ void _PyCodecRegistry_Fini() { Py_XDECREF(_PyCodec_SearchPath); + _PyCodec_SearchPath = NULL; Py_XDECREF(_PyCodec_SearchCache); + _PyCodec_SearchCache = NULL; } --------------A764B515049AA0B5F7643A5B-- From bwarsaw@cnri.reston.va.us Fri Mar 17 19:16:02 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 17 Mar 2000 14:16:02 -0500 (EST) Subject: [Python-Dev] Unicode Update 2000-03-17 References: <38D27F33.4055A942@lemburg.com> Message-ID: <14546.33906.771022.916209@anthem.cnri.reston.va.us> >>>>> "M" == M writes: M> The patch is against the current CVS version. I would M> appreciate if someone with CVS checkin permissions could check M> the changes in. Hi MAL, I just tried to apply your patch against the tree, however patch complains that the Lib/codecs.py patch is reversed. I haven't looked closely at it, but do you have any ideas? Or why don't you just send me Lib/codecs.py and I'll drop it in place. Everything else patched cleanly. -Barry From ping@lfw.org Fri Mar 17 14:06:13 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 08:06:13 -0600 (CST) Subject: [Python-Dev] Boolean type for Py3K? Message-ID: I wondered to myself today while reading through the Python tutorial whether it would be a good idea to have a separate boolean type in Py3K. Would this help catch common mistakes? I won't presume to truly understand the new-to-Python experience, but one might *guess* that >>> 5 > 3 true would make a little more sense to a beginner than >>> 5 > 3 1 Of course this means introducing "true" and "false" as keywords (or built-in values like None -- perhaps they should be spelled True and False?) and completely changing the way a lot of code runs by introducing a bunch of type checking, so it may be too radical a change, but -- And i don't know if it's already been discussed a lot, but -- I thought it wouldn't hurt just to raise the question. -- ?!ng From ping@lfw.org Fri Mar 17 14:06:55 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 08:06:55 -0600 (CST) Subject: [Python-Dev] Should None be a keyword? Message-ID: Related to my last message: should None become a keyword in Py3K? -- ?!ng From bwarsaw@cnri.reston.va.us Fri Mar 17 20:49:24 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 17 Mar 2000 15:49:24 -0500 (EST) Subject: [Python-Dev] Boolean type for Py3K? References: Message-ID: <14546.39508.312796.221069@anthem.cnri.reston.va.us> >>>>> "KY" == Ka-Ping Yee writes: KY> I wondered to myself today while reading through the Python KY> tutorial whether it would be a good idea to have a separate KY> boolean type in Py3K. Would this help catch common mistakes? Almost a year ago, I mused about a boolean type in c.l.py, and came up with this prototype in Python. -------------------- snip snip -------------------- class Boolean: def __init__(self, flag=0): self.__flag = not not flag def __str__(self): return self.__flag and 'true' or 'false' def __repr__(self): return self.__str__() def __nonzero__(self): return self.__flag == 1 def __cmp__(self, other): if (self.__flag and other) or (not self.__flag and not other): return 0 else: return 1 def __rcmp__(self, other): return -self.__cmp__(other) true = Boolean(1) false = Boolean() -------------------- snip snip -------------------- I think it makes sense to augment Python's current truth rules with a built-in boolean type and True and False values. But unless it's tied in more deeply (e.g. comparisons return one of these instead of integers -- and what are the implications of that?) then it's pretty much just syntactic sugar <0.75 lick>. -Barry From bwarsaw@cnri.reston.va.us Fri Mar 17 20:50:00 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 17 Mar 2000 15:50:00 -0500 (EST) Subject: [Python-Dev] Should None be a keyword? References: Message-ID: <14546.39544.673335.378797@anthem.cnri.reston.va.us> >>>>> "KY" == Ka-Ping Yee writes: KY> Related to my last message: should None become a keyword in KY> Py3K? Why? Just to reserve it? -Barry From Moshe Zadka Fri Mar 17 20:52:29 2000 From: Moshe Zadka (Moshe Zadka) Date: Fri, 17 Mar 2000 22:52:29 +0200 (IST) Subject: [Python-Dev] Boolean type for Py3K? In-Reply-To: <14546.39508.312796.221069@anthem.cnri.reston.va.us> Message-ID: On Fri, 17 Mar 2000, Barry A. Warsaw wrote: > Almost a year ago, I mused about a boolean type in c.l.py, and came up > with this prototype in Python. Cool prototype! However, I think I have a problem with the proposed semantics: > def __cmp__(self, other): > if (self.__flag and other) or (not self.__flag and not other): > return 0 > else: > return 1 This means: true == 1 true == 2 But 1 != 2 I have some difficulty with == not being an equivalence relation... > I think it makes sense to augment Python's current truth rules with a > built-in boolean type and True and False values. Right on! Except for the built-in...why not have it like exceptions.py, Python code necessary for the interpreter? Languages which compile themselves are not unheard of > But unless it's tied > in more deeply (e.g. comparisons return one of these instead of > integers -- and what are the implications of that?) Breaking loads of horrible code. Unacceptable for the 1.x series, but perfectly fine in Py3K -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From Fredrik Lundh" <14546.39544.673335.378797@anthem.cnri.reston.va.us> Message-ID: <004e01bf9055$79012000$34aab5d4@hagrid> Barry A. Warsaw wrote: > >>>>> "KY" =3D=3D Ka-Ping Yee writes: >=20 > KY> Related to my last message: should None become a keyword in > KY> Py3K? >=20 > Why? Just to reserve it? to avoid stuff errors like: def foo(): result =3D None # two screenfuls of code None, a, b =3D mytuple # perlish unpacking which gives an interesting error on the first line, instead of a syntax error on the last. From guido@python.org Fri Mar 17 21:20:05 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 17 Mar 2000 16:20:05 -0500 Subject: [Python-Dev] Should None be a keyword? In-Reply-To: Your message of "Fri, 17 Mar 2000 08:06:55 CST." References: Message-ID: <200003172120.QAA09045@eric.cnri.reston.va.us> Yes. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Mar 17 21:20:36 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 17 Mar 2000 16:20:36 -0500 Subject: [Python-Dev] Boolean type for Py3K? In-Reply-To: Your message of "Fri, 17 Mar 2000 08:06:13 CST." References: Message-ID: <200003172120.QAA09115@eric.cnri.reston.va.us> Yes. True and False make sense. --Guido van Rossum (home page: http://www.python.org/~guido/) From python-dev@python.org Fri Mar 17 21:17:06 2000 From: python-dev@python.org (Peter Funk) Date: Fri, 17 Mar 2000 22:17:06 +0100 (MET) Subject: [Python-Dev] Should None be a keyword? In-Reply-To: <14546.39544.673335.378797@anthem.cnri.reston.va.us> from "Barry A. Warsaw" at "Mar 17, 2000 3:50: 0 pm" Message-ID: > >>>>> "KY" == Ka-Ping Yee writes: > > KY> Related to my last message: should None become a keyword in > KY> Py3K? Barry A. Warsaw schrieb: > Why? Just to reserve it? This is related to the general type checking discussion. IMO the suggested >>> 1 > 0 True wouldn't buy us much, as long the following behaviour stays in Py3K: >>> a = '2' ; b = 3 >>> a < b 0 >>> a > b 1 This is irritating to Newcomers (at least from rather short time experience as member of python-help)! And this is esspecially irritating, since you can't do >>> c = a + b Traceback (innermost last): File "", line 1, in ? TypeError: illegal argument type for built-in operation IMO this difference is far more difficult to catch for newcomers than the far more often discussed 5/3 == 1 behaviour. Have a nice weekend and don't forget to hunt for remaining bugs in Fred upcoming 1.5.2p2 docs ;-), Peter. -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From ping@lfw.org Fri Mar 17 15:53:38 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 09:53:38 -0600 (CST) Subject: [Python-Dev] list.shift() Message-ID: Has list.shift() been proposed? # pretend lists are implemented in Python and 'self' is a list def shift(self): item = self[0] del self[:1] return item This would make queues read nicely... use "append" and "pop" for a stack, "append" and "shift" for a queue. (This is while on the thought-train of "making built-in types do more, rather than introducing more special types", as you'll see in my next message.) -- ?!ng From gvanrossum@beopen.com Fri Mar 17 22:00:18 2000 From: gvanrossum@beopen.com (Guido van Rossum) Date: Fri, 17 Mar 2000 17:00:18 -0500 Subject: [Python-Dev] list.shift() References: Message-ID: <38D2AAF2.CFBF3A2@beopen.com> Ka-Ping Yee wrote: > > Has list.shift() been proposed? > > # pretend lists are implemented in Python and 'self' is a list > def shift(self): > item = self[0] > del self[:1] > return item > > This would make queues read nicely... use "append" and "pop" for > a stack, "append" and "shift" for a queue. > > (This is while on the thought-train of "making built-in types do > more, rather than introducing more special types", as you'll see > in my next message.) You can do this using list.pop(0). I don't think the name "shift" is very intuitive (smells of sh and Perl :-). Do we need a new function? --Guido From ping@lfw.org Fri Mar 17 16:08:37 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 10:08:37 -0600 (CST) Subject: [Python-Dev] Using lists as sets Message-ID: A different way to provide sets in Python, which occurred to me on Wednesday at Guido's talk in Mountain View (hi Guido!), is to just make lists work better. Someone asked Guido a question about the ugliness of using dicts in a certain way, and it was clear that what he wanted was a real set. Guido's objection to introducing more core data types is that it makes it more difficult to choose which data type to use, and opens the possibility of using entirely the wrong one -- a very well-taken point, i thought. (That recently-mentioned study of scripting vs. system language performance seems relevant here: a few of the C programs submitted were much *slower* than the ones in Python or Perl just because people had to choose and implement their own data structures, and so they were able to completely shoot themselves in both feet and lose a leg or two in the process.) So... Hypothesis: The only real reason people might want a separate set type, or have to use dicts as sets, is that linear search on a list is too slow. Therefore: All we have to do is speed up "in" on lists, and now we have a set type that is nice to read and write, and already has nice spellings for set semantics like "in". Implementation possibilities: + Whip up a hash table behind the scenes if "in" gets used a lot on a particular list and all its members are hashable. This makes "in" no longer O(n), which is most of the battle. remove() can also be cheap -- though you have to do a little more bookkeeping to take care of multiple copies of elements. + Or, add a couple of methods, e.g. take() appends an item to a list if it's not there already, drop() removes all copies of an item from a list. These tip us off: the first time one of these methods gets used, we make the hash table then. I think the semantics would be pretty understandable and simple to explain, which is the main thing. Any thoughts? -- ?!ng From ping@lfw.org Fri Mar 17 16:12:22 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 10:12:22 -0600 (CST) Subject: [Python-Dev] list.shift() In-Reply-To: <38D2AAF2.CFBF3A2@beopen.com> Message-ID: On Fri, 17 Mar 2000, Guido van Rossum wrote: > You can do this using list.pop(0). I don't think the name "shift" is very > intuitive (smells of sh and Perl :-). Do we need a new function? Oh -- sorry, that's my ignorance showing. I didn't know pop() took an argument (of course it would -- duh...). No need to add anything more, then, i think. Sorry! Fred et al. on doc-sig: it would be really good for the tutorial to show a queue example and a stack example in the section where list methods are introduced. -- ?!ng From ping@lfw.org Fri Mar 17 16:13:44 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 10:13:44 -0600 (CST) Subject: [Python-Dev] Boolean type for Py3K? In-Reply-To: <200003172120.QAA09115@eric.cnri.reston.va.us> Message-ID: Guido: (re None being a keyword) > Yes. Guido: (re booleans) > Yes. True and False make sense. Astounding. I don't think i've ever seen such quick agreement on anything! And twice in one day! I'm think i'm going to go lie down. :) :) -- ?!ng From DavidA@ActiveState.com Fri Mar 17 22:23:53 2000 From: DavidA@ActiveState.com (David Ascher) Date: Fri, 17 Mar 2000 14:23:53 -0800 Subject: [Python-Dev] Using lists as sets In-Reply-To: Message-ID: > I think the semantics would be pretty understandable and simple to > explain, which is the main thing. > > Any thoughts? Would (a,b) in Set return true of (a,b) was a subset of Set, or if (a,b) was an element of Set? --david From mal@lemburg.com Fri Mar 17 22:41:46 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 17 Mar 2000 23:41:46 +0100 Subject: [Python-Dev] Boolean type for Py3K? References: <200003172120.QAA09115@eric.cnri.reston.va.us> Message-ID: <38D2B4AA.2EE933BD@lemburg.com> Guido van Rossum wrote: > > Yes. True and False make sense. mx.Tools defines these as new builtins... and they correspond to the C level singletons Py_True and Py_False. # Truth constants True = (1==1) False = (1==0) I'm not sure whether breaking the idiom of True == 1 and False == 0 (or in other words: truth values are integers) would be such a good idea. Nothing against adding name bindings in __builtins__ though... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From ping@lfw.org Fri Mar 17 16:53:12 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 10:53:12 -0600 (CST) Subject: [Python-Dev] Boolean type for Py3K? In-Reply-To: <14546.39508.312796.221069@anthem.cnri.reston.va.us> Message-ID: On Fri, 17 Mar 2000, Barry A. Warsaw wrote: > Almost a year ago, I mused about a boolean type in c.l.py, and came up > with this prototype in Python. > > -------------------- snip snip -------------------- > class Boolean: [...] > > I think it makes sense to augment Python's current truth rules with a > built-in boolean type and True and False values. But unless it's tied > in more deeply (e.g. comparisons return one of these instead of > integers -- and what are the implications of that?) then it's pretty > much just syntactic sugar <0.75 lick>. Yeah, and the whole point *is* the change in semantics, not the syntactic sugar. I'm hoping we can gain some safety from the type checking... though i can't seem to think of a good example off the top of my head. It's easier to think of examples if things like 'if', 'and', 'or', etc. only accept booleans as conditional arguments -- but i can't imagine going that far, as that would just be really annoying. Let's see. Specifically, the following would probably return booleans: magnitude comparisons: <, >, <=, >= (and __cmp__) value equality comparisons: ==, != identity comparisons: is, is not containment tests: in, not in (and __contains__) ... and booleans would be different from integers in that arithmetic would be illegal... but that's about it. (?) Booleans are still storable immutable values; they could be keys to dicts but not lists; i don't know what else. Maybe this wouldn't actually buy us anything except for the nicer spelling of "True" and "False", which might not be worth it. ... Hmm. Can anyone think of common cases where this could help? -- n!?g From ping@lfw.org Fri Mar 17 16:59:17 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 10:59:17 -0600 (CST) Subject: [Python-Dev] Using lists as sets In-Reply-To: Message-ID: On Fri, 17 Mar 2000, David Ascher wrote: > > I think the semantics would be pretty understandable and simple to > > explain, which is the main thing. > > > > Any thoughts? > > Would > > (a,b) in Set > > return true of (a,b) was a subset of Set, or if (a,b) was an element of Set? This would return true if (a, b) was an element of the set -- exactly the same semantics as we currently have for lists. Ideally it would also be kind of nice to use < > <= >= as subset/superset operators, but that requires revising the way we do comparisons, and you know, it might not really be used all that often anyway. -, |, and & could operate on lists sensibly when we use them as sets -- just define a few simple rules for ordering and you should be fine. e.g. c = a - b is equivalent to c = a for item in b: c.drop(item) c = a | b is equivalent to c = a for item in b: c.take(item) c = a & b is equivalent to c = [] for item in a: if item in b: c.take(item) where c.take(item) is equivalent to if item not in c: c.append(item) c.drop(item) is equivalent to while item in c: c.remove(item) The above is all just semantics, of course, to make the point that the semantics can be simple. The implementation could do different things that are much faster when there's a hash table helping out. -- ?!ng From gvwilson@nevex.com Fri Mar 17 23:28:05 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Fri, 17 Mar 2000 18:28:05 -0500 (EST) Subject: [Python-Dev] Boolean type for Py3K? In-Reply-To: Message-ID: > Guido: (re None being a keyword) > > Yes. > Guido: (re booleans) > > Yes. True and False make sense. > Ka-Ping: > Astounding. I don't think i've ever seen such quick agreement on > anything! And twice in one day! I'm think i'm going to go lie down. No, no, keep going --- you're on a roll. Greg From ping@lfw.org Fri Mar 17 17:49:18 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 11:49:18 -0600 (CST) Subject: [Python-Dev] Using lists as sets In-Reply-To: Message-ID: On Fri, 17 Mar 2000, Ka-Ping Yee wrote: > > c.take(item) is equivalent to > > if item not in c: c.append(item) > > c.drop(item) is equivalent to > > while item in c: c.remove(item) I think i've decided that i like the verb "include" much better than the rather vague word "take". Perhaps this also suggests "exclude" instead of "drop". -- ?!ng From klm@digicool.com Sat Mar 18 00:32:56 2000 From: klm@digicool.com (Ken Manheimer) Date: Fri, 17 Mar 2000 19:32:56 -0500 (EST) Subject: [Python-Dev] Using lists as sets In-Reply-To: Message-ID: On Fri, 17 Mar 2000, Ka-Ping Yee wrote: > On Fri, 17 Mar 2000, David Ascher wrote: > > > I think the semantics would be pretty understandable and simple to > > > explain, which is the main thing. > > > > > > Any thoughts? > > > > Would > > > > (a,b) in Set > > > > return true of (a,b) was a subset of Set, or if (a,b) was an element of Set? > > This would return true if (a, b) was an element of the set -- > exactly the same semantics as we currently have for lists. I really like the idea of using dynamically-tuned lists provide set functionality! I often wind up needing something like set functionality, and implementing little convenience routines (unique, difference, etc) repeatedly. I don't mind that so much, but the frequency signifies that i, at least, would benefit from built-in support for sets... I guess the question is whether it's practical to come up with a reasonably adequate, reasonably general dynamic optimization strategy. Seems like an interesting challenge - is there prior art? As ping says, maintaining the existing list semantics handily answers challenges like david's question. New methods, like [].subset('a', 'b'), could provide the desired additional functionality - and contribute to biasing the object towards set optimization, etc. Neato! Ken klm@digicool.com From ping@lfw.org Fri Mar 17 19:02:13 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 13:02:13 -0600 (CST) Subject: [Python-Dev] Using lists as sets In-Reply-To: Message-ID: On Fri, 17 Mar 2000, Ken Manheimer wrote: > > I really like the idea of using dynamically-tuned lists provide set > functionality! I often wind up needing something like set functionality, > and implementing little convenience routines (unique, difference, etc) > repeatedly. I don't mind that so much, but the frequency signifies that > i, at least, would benefit from built-in support for sets... Greg asked about how to ensure that a given item only appears once in each list when used as a set, and whether i would flag the list as "i'm now operating as a set". My answer is no -- i don't want there to be any visible state on the list. (It can internally decide to optimize its behaviour for a particular purpose, but in no event should this decision ever affect the semantics of its manifested behaviour.) Externally visible state puts us back right where we started -- now the user has to decide what type of thing she wants to use, and that's more decisions and loaded guns pointing at feet that we were trying to avoid in the first place. There's something very nice about there being just two mutable container types in Python. As Guido said, the first two types you learn are lists and dicts, and it's pretty obvious which one to pick for your purposes, and you can't really go wrong. I'd like to copy my reply to Greg here because it exposes some of the philosophy i'm attempting with this proposal: You'd trust the client to use take() (or should i say include()) instead of append(). But, in the end, this wouldn't make any difference to the result of "in". In fact, you could do multisets since lists already have count(). What i'm trying to do is to put together a few very simple pieces to get all the behaviour necessary to work with sets, if you want it. I don't want the object itself to have any state that manifests itself as "now i'm a set", or "now i'm a list". You just pick the methods you want to use. It's just like stacks and queues. There's no state on the list that says "now i'm a stack, so read from the end" or "now i'm a queue, so read from the front". You decide where you want to read items by picking the appropriate method, and this lets you get the best of both worlds -- flexibility and simplicity. Back to Ken: > I guess the question is whether it's practical to come up with a > reasonably adequate, reasonably general dynamic optimization strategy. > Seems like an interesting challenge - is there prior art? I'd be quite happy with just turning on set optimization when include() and exclude() get used (nice and predictable). Maybe you could provide a set() built-in that would construct you a list with set optimization turned on, but i'm not too sure if we really want to expose it that way. -- ?!ng From Moshe Zadka Sat Mar 18 05:27:13 2000 From: Moshe Zadka (Moshe Zadka) Date: Sat, 18 Mar 2000 07:27:13 +0200 (IST) Subject: [Python-Dev] list.shift() In-Reply-To: Message-ID: On Fri, 17 Mar 2000, Ka-Ping Yee wrote: > > Has list.shift() been proposed? > > # pretend lists are implemented in Python and 'self' is a list > def shift(self): > item = self[0] > del self[:1] > return item > > This would make queues read nicely... use "append" and "pop" for > a stack, "append" and "shift" for a queue. Actually, I once thought about writing a Deque in Python for a couple of hours (I later wrote it, and then threw it away because I had nothing to do with it, but that isn't my point). So I did write "shift" (though I'm certain I didn't call it that). It's not as easy to write a maintainable yet efficient "shift": I got stuck with a pointer to the beginning of the "real list" which I incremented on a "shift", and a complex heuristic for when lists de- and re-allocate. I think the tradeoffs are shaky enough that it is better to write it in pure Python rather then having more functions in C (whether in an old builtin type rather then a new one). Anyone needing to treat a list as a Deque would just construct one l = Deque(l) built-in-functions:-just-say-no-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From artcom0!pf@artcom-gmbh.de Fri Mar 17 22:43:35 2000 From: artcom0!pf@artcom-gmbh.de (artcom0!pf@artcom-gmbh.de) Date: Fri, 17 Mar 2000 23:43:35 +0100 (MET) Subject: [Python-Dev] dict.supplement() (was Re: list.shift()) In-Reply-To: <38D2AAF2.CFBF3A2@beopen.com> from Guido van Rossum at "Mar 17, 2000 5: 0:18 pm" Message-ID: Ka-Ping Yee wrote: [...] > > # pretend lists are implemented in Python and 'self' is a list > > def shift(self): > > item = self[0] > > del self[:1] > > return item [...] Guido van Rossum: > You can do this using list.pop(0). I don't think the name "shift" is very > intuitive (smells of sh and Perl :-). Do we need a new function? I think no. But what about this one?: # pretend self and dict are dictionaries: def supplement(self, dict): for k, v in dict.items(): if not self.data.has_key(k): self.data[k] = v Note the similarities to {}.update(dict), but update replaces existing entries in self, which is sometimes not desired. I know, that supplement can also simulated with: tmp = dict.copy() tmp.update(self) self.data = d But this is stll a little ugly. IMO a builtin method to supplement (complete?) a dictionary with default values from another dictionary would sometimes be a useful tool. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From ping@lfw.org Sat Mar 18 18:48:10 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Sat, 18 Mar 2000 10:48:10 -0800 (PST) Subject: [Python-Dev] dict.supplement() (was Re: list.shift()) In-Reply-To: Message-ID: On Fri, 17 Mar 2000 artcom0!pf@artcom-gmbh.de wrote: > > I think no. But what about this one?: > > # pretend self and dict are dictionaries: > def supplement(self, dict): > for k, v in dict.items(): > if not self.data.has_key(k): > self.data[k] = v I'd go for that. It would be nice to have a non-overwriting update(). The only issue is the choice of verb; "supplement" sounds pretty reasonable to me. -- ?!ng "If I have not seen as far as others, it is because giants were standing on my shoulders." -- Hal Abelson From python-dev@python.org Sat Mar 18 19:23:37 2000 From: python-dev@python.org (Peter Funk) Date: Sat, 18 Mar 2000 20:23:37 +0100 (MET) Subject: [Python-Dev] dict.supplement() In-Reply-To: from Ka-Ping Yee at "Mar 18, 2000 10:48:10 am" Message-ID: Hi! > > # pretend self and dict are dictionaries: > > def supplement(self, dict): > > for k, v in dict.items(): > > if not self.data.has_key(k): > > self.data[k] = v Ka-Ping Yee schrieb: > I'd go for that. It would be nice to have a non-overwriting update(). > The only issue is the choice of verb; "supplement" sounds pretty > reasonable to me. In German we have the verb "ergänzen" which translates either into "supplement" or "complete" (from my dictionary). "supplement" has the disadvantage of being rather long for the name of a builtin method. Nevertheless I've used this in my class derived from UserDict.UserDict. Now let's witch topic to the recent discussion about Set type: you all certainly know, that something similar has been done before by Aaron Watters? see: Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From gvwilson@nevex.com Mon Mar 20 14:52:12 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Mon, 20 Mar 2000 09:52:12 -0500 (EST) Subject: [Python-Dev] re: Using lists as sets Message-ID: [After discussion with Ping, and weekend thought] I would like to vote against using lists as sets: 1. It blurs Python's categorization of containers. The rest of the world thinks of sets as unordered, associative, and binary-valued (a term I just made up to mean "containing 0 or 1 instance of X"). Lists, on the other hand, are ordered, positionally-indexed, and multi-valued. While a list is always a legal queue or stack (although lists permit state transitions that are illegal for queues or stacks), most lists are not legal sets. 2. Python has, in dictionaries, a much more logical starting point for sets. A set is exactly a dictionary whose keys matter, and whose values don't. Adding operations to dictionaries to insert keys, etc., without having to supply a value, naively appears no harder than adding operations to lists, and would probably be much easier to explain when teaching a class. 3. (Long-term speculation) Even if P3K isn't written in C++, many modules for it will be. It would therefore seem sensible to design P3K in a C++-friendly way --- in particular, to align Python's container hierarchy with that used in the Standard Template Library. Using lists as a basis for sets would give Python a very different container type hierarchy than the STL, which could make it difficult for automatic tools like SWIG to map STL-based things to Python and vice versa. Using dictionaries as a basis for sets would seem to be less problematic. (Note that if Wadler et al's Generic Java proposal becomes part of that language, an STL clone will almost certainly become part of that language, and require JPython interfacing.) On a semi-related note, can someone explain why programs are not allowed to iterate directly through the elements of a dictionary: for (key, value) in dict: ...body... Thanks, Greg "No XML entities were harmed in the production of this message." From Moshe Zadka Mon Mar 20 15:03:47 2000 From: Moshe Zadka (Moshe Zadka) Date: Mon, 20 Mar 2000 17:03:47 +0200 (IST) Subject: [Python-Dev] re: Using lists as sets In-Reply-To: Message-ID: On Mon, 20 Mar 2000 gvwilson@nevex.com wrote: > [After discussion with Ping, and weekend thought] > > I would like to vote against using lists as sets: I'd like to object too, but for slightly different reasons: 20-something lines of Python can implement a set (I just chacked it) with the new __contains__. We can just suply it in the standard library (Set module?) and be over and done with. Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From jcw@equi4.com Mon Mar 20 15:37:19 2000 From: jcw@equi4.com (Jean-Claude Wippler) Date: Mon, 20 Mar 2000 16:37:19 +0100 Subject: [Python-Dev] re: Using lists as sets References: Message-ID: <38D645AF.661CA335@equi4.com> gvwilson@nevex.com wrote: > > [After discussion with Ping, and weekend thought] [good stuff] Allow me to offer yet another perspective on this. I'll keep it short. Python has sequences (indexable collections) and maps (associative collections). C++'s STL has vectors, sets, multi-sets, maps, and multi-maps. I find the distinction between these puzzling, and hereby offer another, somewhat relational-database minded, categorization as food for thought: - collections consist of objects, each of them with attributes - the first N attributes form the "key", the rest is the "residue" - there is also an implicit position attribute, which I'll call "#" - so an object consists of attributes: (K1,K2,...KN,#,R1,R2,...,RM) - one more bit of specification is needed: whether # is part of the key Let me mark the position between key attributes and residue with ":", so everything before the colon marks the uniquely identifying attributes. A vector (sequence) is: #:R1,R2,...,RM A set is: K1,K2,...KN: A multi-set is: K1,K2,...KN,#: A map is: K1,K2,...KN:#,R1,R2,...,RM A multi-map is: K1,K2,...KN,#:R1,R2,...,RM And a somewhat esoteric member of this classification: A singleton is: :R1,R2,...,RM I have no idea what this means for Python, but merely wanted to show how a relational, eh, "view" on all this might perhaps simplify the issues. -jcw From fdrake@acm.org Mon Mar 20 16:55:59 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 20 Mar 2000 11:55:59 -0500 (EST) Subject: [Python-Dev] dict.supplement() (was Re: list.shift()) In-Reply-To: References: <38D2AAF2.CFBF3A2@beopen.com> Message-ID: <14550.22559.550660.403909@weyr.cnri.reston.va.us> artcom0!pf@artcom-gmbh.de writes: > Note the similarities to {}.update(dict), but update replaces existing > entries in self, which is sometimes not desired. I know, that supplement > can also simulated with: Peter, I like this! > tmp = dict.copy() > tmp.update(self) > self.data = d I presume you mean "self.data = tmp"; "self.data.update(tmp)" would be just a little more robust, at the cost of an additional update. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From tismer@tismer.com Mon Mar 20 17:10:34 2000 From: tismer@tismer.com (Christian Tismer) Date: Mon, 20 Mar 2000 18:10:34 +0100 Subject: [Python-Dev] re: Using lists as sets References: <38D645AF.661CA335@equi4.com> Message-ID: <38D65B8A.50B81D08@tismer.com> Jean-Claude Wippler wrote: [relational notation] > A vector (sequence) is: #:R1,R2,...,RM > A set is: K1,K2,...KN: > A multi-set is: K1,K2,...KN,#: > A map is: K1,K2,...KN:#,R1,R2,...,RM > A multi-map is: K1,K2,...KN,#:R1,R2,...,RM This is a nice classification! To my understanding, why not A map is: K1,K2,...KN:R1,R2,...,RM Where is a # in a map? And what do you mean by N and M? Is K1..KN one key, mae up of N sub keys, or do you mean the whole set of keys, where each one is mapped somehow. I guess not, the notation looks like I should think of tuples. No, that would imply that N and M were fixed, but they are not. But you say "- collections consist of objects, each of them with attributes". Ok, N and M seem to be individual for each object, right? But when defining a map for instance, and we're talking of the objects, then the map is the set of these objects, and I have to think of K[0]..K(N(o)):R[0]..R(M(o)) where N and M are functions of the individual object o, right? Isn't it then better to think different of these objects, saying they can produce some key object and some value object of any shape, and a position, where each of these can be missing? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From jeremy@cnri.reston.va.us Mon Mar 20 17:28:28 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Mon, 20 Mar 2000 12:28:28 -0500 (EST) Subject: [Python-Dev] re: Using lists as sets In-Reply-To: References: Message-ID: <14550.24508.341533.908941@goon.cnri.reston.va.us> >>>>> "GVW" == gvwilson writes: GVW> On a semi-related note, can someone explain why programs are GVW> not allowed to iterate directly through the elements of a GVW> dictionary: GVW> for (key, value) in dict: ...body... Pythonic design rules #2: Explicit is better than implicit. There are at least three "natural" ways to interpret "for ... in dict:" In addition to the version that strikes you as most natural, some people also imagine that a for loop should iterate over the keys or the values. Instead of guessing, Python provides explicit methods for each possibility: items, keys, values. Yet another possibility, implemented in early versions of JPython and later removed, was to treat a dictionary exactly like a list: Call __getitem__(0), then 1, ..., until a KeyError was raised. In other words, a dictionary could behave like a list provided that it had integer keys. Jeremy From jcw@equi4.com Mon Mar 20 17:56:44 2000 From: jcw@equi4.com (Jean-Claude Wippler) Date: Mon, 20 Mar 2000 18:56:44 +0100 Subject: [Python-Dev] re: Using lists as sets References: <38D645AF.661CA335@equi4.com> <38D65B8A.50B81D08@tismer.com> Message-ID: <38D6665C.ECDE09DE@equi4.com> Christian, > A map is: K1,K2,...KN:R1,R2,...,RM Yes, my list was inconsistent. > Is K1..KN one key, made up of N sub keys, or do you mean the > whole set of keys, where each one is mapped somehow. [...] > Ok, N and M seem to be individual for each object, right? [...] > Isn't it then better to think different of these objects, saying > they can produce some key object and some value object of any > shape, and a position, where each of these can be missing? Depends on your perspective. In the relational world, the (K1,...,KN) attributes identify the object, but they are not themselves considered an object. In OO-land, (K1,...,KN) is an object, and a map takes such as an object as input and delivers (R1,...,RM) as result. This tension shows the boundary of both relational and OO models, IMO. I wish it'd be possible to unify them, but I haven't figured it out. -jcw, concept maverick / fool on the hill - pick one :) From pf@artcom-gmbh.de Mon Mar 20 18:28:17 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Mon, 20 Mar 2000 19:28:17 +0100 (MET) Subject: [Python-Dev] dict.supplement() (was Re: list.shift()) In-Reply-To: <14550.22559.550660.403909@weyr.cnri.reston.va.us> from "Fred L. Drake, Jr." at "Mar 20, 2000 11:55:59 am" Message-ID: I wrote: > > Note the similarities to {}.update(dict), but update replaces existing > > entries in self, which is sometimes not desired. I know, that supplement > > can also simulated with: > Fred L. Drake, Jr.: > Peter, > I like this! > > > tmp = dict.copy() > > tmp.update(self) > > self.data = d > > I presume you mean "self.data = tmp"; "self.data.update(tmp)" would > be just a little more robust, at the cost of an additional update. Ouppss... I should have tested this before posting. But currently I use the more explicit (and probably slower version) in my code: class ConfigDict(UserDict.UserDict): def supplement(self, defaults): for k, v in defaults.items(): if not self.data.has_key(k): self.data[k] = v Works fine so far, although it requires usually an additional copy operation. Consider another example, where arbitrary instance attributes should be specified as keyword arguments to the constructor: >>> class Example: ... _defaults = {'a': 1, 'b': 2} ... _config = _defaults ... def __init__(self, **kw): ... if kw: ... self._config = self._defaults.copy() ... self._config.update(kw) ... >>> A = Example(a=12345) >>> A._config {'b': 2, 'a': 12345} >>> B = Example(c=3) >>> B._config {'b': 2, 'c': 3, 'a': 1} If 'supplement' were a dictionary builtin method, this would become simply: kw.supplement(self._defaults) self._config = kw Unfortunately this can't be achieved using a wrapper class like UserDict, since the **kw argument is always a builtin dictionary object. Regards, Peter -- Peter Funk, Oldenburger Str.86, 27777 Ganderkesee, Tel: 04222 9502 70, Fax: -60 From ping@lfw.org Mon Mar 20 12:36:34 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 20 Mar 2000 06:36:34 -0600 (CST) Subject: [Python-Dev] dict.supplement() (was Re: list.shift()) In-Reply-To: Message-ID: On Mon, 20 Mar 2000, Peter Funk wrote: > Consider another example, where arbitrary instance attributes should be > specified as keyword arguments to the constructor: > > >>> class Example: > ... _defaults = {'a': 1, 'b': 2} > ... _config = _defaults > ... def __init__(self, **kw): > ... if kw: > ... self._config = self._defaults.copy() > ... self._config.update(kw) Yes! I do this all the time. I wrote a user-interface module to take care of exactly this kind of hassle when creating lots of UI components. When you're making UI, you can easily drown in keyword arguments and default values if you're not careful. -- ?!ng From fdrake@acm.org Mon Mar 20 19:02:48 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 20 Mar 2000 14:02:48 -0500 (EST) Subject: [Python-Dev] dict.supplement() (was Re: list.shift()) In-Reply-To: References: <14550.22559.550660.403909@weyr.cnri.reston.va.us> Message-ID: <14550.30168.129259.356581@weyr.cnri.reston.va.us> Peter Funk writes: > Ouppss... I should have tested this before posting. But currently I use > the more explicit (and probably slower version) in my code: The performance is based entirely on the size of each; in the (probably typical) case of smallish dictionaries (<50 entries), it's probably cheaper to use a temporary dict and do the update. For large dicts (on the defaults side), it may make more sense to reduce the number of objects that need to be created: target = ... has_key = target.has_key for key in defaults.keys(): if not has_key(key): target[key] = defaults[key] This saves the construction of len(defaults) 2-tuples. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Moshe Zadka Mon Mar 20 19:23:01 2000 From: Moshe Zadka (Moshe Zadka) Date: Mon, 20 Mar 2000 21:23:01 +0200 (IST) Subject: [Python-Dev] re: Using lists as sets In-Reply-To: <14550.24508.341533.908941@goon.cnri.reston.va.us> Message-ID: On Mon, 20 Mar 2000, Jeremy Hylton wrote: > Yet another possibility, implemented in early versions of JPython and > later removed, was to treat a dictionary exactly like a list: Call > __getitem__(0), then 1, ..., until a KeyError was raised. In other > words, a dictionary could behave like a list provided that it had > integer keys. Two remarks: Jeremy meant "consecutive natural keys starting with 0", (yes, I've managed to learn mind-reading from the timbot) and that (the following is considered a misfeature): import UserDict a = UserDict.UserDict() a[0]="hello" a[1]="world" for word in a: print word Will print "hello", "world", and then die with KeyError. I realize why this is happening, and realize it could only be fixed in Py3K. However, a temporary (though not 100% backwards compatible) fix is that "for" will catch LookupError, rather then IndexError. Any comments? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mhammond@skippinet.com.au Mon Mar 20 19:39:31 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Mon, 20 Mar 2000 11:39:31 -0800 Subject: [Python-Dev] Unicode and Windows Message-ID: I would like to discuss Unicode on the Windows platform, and how it relates to MBCS that Windows uses. My main goal here is to ensure that Unicode on Windows can make a round-trip to and from native Unicode stores. As an example, let's take the registry - a Windows user should be able to read a Unicode value from the registry then write it back. The value written back should be _identical_ to the value read. Ditto for the file system: If the filesystem is Unicode, then I would expect the following code: for fname in os.listdir(): f = open(fname + ".tmp", "w") To create filenames on the filesystem with the exact base name even when the basename contains non-ascii characters. However, the Unicode patches do not appear to make this possible. open() uses PyArg_ParseTuple(args, "s..."); PyArg_ParseTuple() will automatically convert a Unicode object to UTF-8, so we end up passing a UTF-8 encoded string to the C runtime fopen function. The end result of all this is that we end up with UTF-8 encoded names in the registry/on the file system. It does not seem possible to get a true Unicode string onto either the file system or in the registry. Unfortunately, Im not experienced enough to know the full ramifications, but it _appears_ that on Windows the default "unicode to string" translation should be done via the WideCharToMultiByte() API. This will then pass an MBCS encoded ascii string to Windows, and the "right thing" should magically happen. Unfortunately, MBCS encoding is dependant on the current locale (ie, one MBCS sequence will mean completely different things depending on the locale). I dont see a portability issue here, as the documentation could state that "Unicode->ASCII conversions use the most appropriate conversion for the platform. If the platform is not Unicode aware, then UTF-8 will be used." This issue is the final one before I release the win32reg module. It seems _critical_ to me that if Python supports Unicode and the platform supports Unicode, then Python unicode values must be capable of being passed to the platform. For the win32reg module I could quite possibly hack around the problem, but the more general problem (categorized by the open() example above) still remains... Any thoughts? Mark. From jeremy@cnri.reston.va.us Mon Mar 20 19:51:28 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Mon, 20 Mar 2000 14:51:28 -0500 (EST) Subject: [Python-Dev] re: Using lists as sets In-Reply-To: References: <14550.24508.341533.908941@goon.cnri.reston.va.us> Message-ID: <14550.33088.110785.78631@goon.cnri.reston.va.us> >>>>> "MZ" == Moshe Zadka writes: MZ> On Mon, 20 Mar 2000, Jeremy Hylton wrote: >> Yet another possibility, implemented in early versions of JPython >> and later removed, was to treat a dictionary exactly like a list: >> Call __getitem__(0), then 1, ..., until a KeyError was raised. >> In other words, a dictionary could behave like a list provided >> that it had integer keys. MZ> Two remarks: Jeremy meant "consecutive natural keys starting MZ> with 0", (yes, I've managed to learn mind-reading from the MZ> timbot) I suppose I meant that (perhaps you can read my mind as well as I can); I also meant using values of Python's integer datatype :-). and that (the following is considered a misfeature): MZ> import UserDict MZ> a = UserDict.UserDict() MZ> a[0]="hello" MZ> a[1]="world" MZ> for word in a: print word MZ> Will print "hello", "world", and then die with KeyError. I MZ> realize why this is happening, and realize it could only be MZ> fixed in Py3K. However, a temporary (though not 100% backwards MZ> compatible) fix is that "for" will catch LookupError, rather MZ> then IndexError. I'm not sure what you mean by "fix." (Please read your mind for me .) I think by fix you mean, "allow the broken code above to execute without raising an exception." Yuck! As far as I can tell, the problem is caused by the special way that a for loop uses the __getitem__ protocol. There are two related issues that lead to confusion. In cases other than for loops, __getitem__ is invoked when the syntactic construct x[i] is used. This means either lookup in a list or in a dict depending on the type of x. If it is a list, the index must be an integer and IndexError can be raised. If it is a dict, the index can be anything (even an unhashable type; TypeError is only raised by insertion for this case) and KeyError can be raised. In a for loop, the same protocol (__getitem__) is used, but with the special convention that the object should be a sequence. Python will detect when you try to use a builtin type that is not a sequence, e.g. a dictionary. If the for loop iterates over an instance type rather than a builtin type, there is no way to check whether the __getitem__ protocol is being implemented by a sequence or a mapping. The right solution, I think, is to allow a means for stating explicitly whether a class with an __getitem__ method is a sequence or a mapping (or both?). Then UserDict can declare itself to be a mapping and using it in a for loop will raise the TypeError, "loop over non-sequence" (which has a standard meaning defined in Skip's catalog <0.8 wink>). I believe this is where types-vs.-classes meets subtyping-vs.-inheritance. I suspect that the right solution, circa Py3K, is that classes must explicitly state what types they are subtypes of or what interfaces they implement. Jeremy From Moshe Zadka Mon Mar 20 20:13:20 2000 From: Moshe Zadka (Moshe Zadka) Date: Mon, 20 Mar 2000 22:13:20 +0200 (IST) Subject: [Python-Dev] re: Using lists as sets In-Reply-To: <14550.33088.110785.78631@goon.cnri.reston.va.us> Message-ID: On Mon, 20 Mar 2000, Jeremy Hylton wrote: > I'm not sure what you mean by "fix." I mean any sane behaviour -- either failing on TypeError at the beginning, like "for" does, or executing without raising an exception. Raising an exception in the middle which is imminent is definitely (for the right values of definitely) a suprising behaviour (I know it suprised me!). >I think by fix you mean, "allow the broken code above to > execute without raising an exception." Yuck! I agree it is yucky -- it is all a weird echo of the yuckiness of the type/class dichotomy. What I suggested it a temporary patch... > As far as I can tell, the problem is caused by the special > way that a for loop uses the __getitem__ protocol. Well, my look is that it is caused by the fact __getitem__ is used both for the sequence protocol and the mapping protocol (well, I'm cheating through my teeth here, but you understand what I mean ) Agreed though, that the whole iteration protocol should be revisited -- but that is a subject for another post. > The right solution, I think, is to allow a means for stating > explicitly whether a class with an __getitem__ method is a sequence or > a mapping (or both?). And this is the fix I wanted for Py3K (details to be debated, still). See? You read my mind perfectly. > I suspect that the right solution, circa > Py3K, is that classes must explicitly state what types they are > subtypes of or what interfaces they implement. Exactly. And have subclassable built-in classes in the same fell swoop. getting-all-excited-for-py3k-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping@lfw.org Mon Mar 20 14:34:12 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 20 Mar 2000 08:34:12 -0600 (CST) Subject: [Python-Dev] Set options Message-ID: I think that at this point the possibilities for doing sets come down to four options: 1. use lists visible changes: new methods l.include, l.exclude invisible changes: faster 'in' usage: s = [1, 2], s.include(3), s.exclude(3), if item in s, for item in s 2. use dicts visible changes: for/if x in dict means keys accept dicts without values (e.g. {1, 2}) new special non-printing value ": Present" new method d.insert(x) means d[x] = Present invisible changes: none usage: s = {1, 2}, s.insert(3), del s[3], if item in s, for item in s 3. new type visible changes: set() built-in new with methods .insert, .remove invisible changes: none usage: s = set(1, 2), s.insert(3), s.remove(3) if item in s, for item in s 4. do nothing visible changes: none invisible changes: none usage: s = {1: 1, 2: 1}, s[3] = 1, del s[3], if s.has_key(item), for item in s.keys() Let me say a couple of things about #1 and #2. I'm happy with both. I quite like the idea of using dicts this way (#2), in fact -- i think it was the first idea i remember chatting about. If i remember correctly, Guido's objection to #2 was that "in" on a dictionary would work on the keys, which isn't consistent with the fact that "in" on a list works on the values. However, this doesn't really bother me at all. It's a very simple rule, especially when you think of how people understand dictionaries. If you hand someone a *real* dictionary, and ask them Is the word "python" in the dictionary? they'll go look up "python" in the *keys* of the dictionary (the words), not the values (the definitions). So i'm quite all right with saying for x in dict: and having that loop over the keys, or saying if x in dict: and having that check whether x is a valid key. It makes perfect sense to me. My main issue with #2 was that sets would print like {"Alice": 1, "Bob": 1, "Ted": 1} and this would look weird. However, as Greg explained to me, it would be possible to introduce a default value to go with set members that just says "i'm here", such as 'Present' (read as: "Alice" is present in the set) or 'Member' or even 'None', and this value wouldn't print out -- thus s = {"Bob"} s.include("Alice") print s would produce {"Alice", "Bob"} representing a dictionary that actually contained {"Alice": Present, "Bob": Present} You'd construct set constants like this too: {2, 4, 7} Using dicts this way (rather than having a separate set type that just happened to be spelled with {}) avoids the parsing issue: no need for look-ahead; you just toss in "Present" when the text doesn't supply a colon, and move on. I'd be okay with this, though i'm not sure everyone would; and together with Guido's initial objection, that's what motivated me to propose the lists-as-sets thing: fewer changes all around, no ambiguities introduced -- just two new methods, and we're done. Hmm. I know someone who's just learning Python. I will attempt to ask some questions about what she would find natural, and see if that reveals anything interesting. -- ?!ng From bwarsaw@cnri.reston.va.us Mon Mar 20 22:01:00 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Mon, 20 Mar 2000 17:01:00 -0500 (EST) Subject: [Python-Dev] re: Using lists as sets References: <14550.24508.341533.908941@goon.cnri.reston.va.us> <14550.33088.110785.78631@goon.cnri.reston.va.us> Message-ID: <14550.40860.72418.648591@anthem.cnri.reston.va.us> >>>>> "JH" == Jeremy Hylton writes: JH> As far as I can tell, the problem is caused by the special way JH> that a for loop uses the __getitem__ protocol. There are two JH> related issues that lead to confusion. >>>>> "MZ" == Moshe Zadka writes: MZ> Well, my look is that it is caused by the fact __getitem__ is MZ> used both for the sequence protocol and the mapping protocol Right. MZ> Agreed though, that the whole iteration protocol should be MZ> revisited -- but that is a subject for another post. Yup. JH> The right solution, I think, is to allow a means for stating JH> explicitly whether a class with an __getitem__ method is a JH> sequence or a mapping (or both?). Or should the two protocol use different method names (code breakage!). JH> I believe this is where types-vs.-classes meets JH> subtyping-vs.-inheritance. meets protocols-vs.-interfaces. From Moshe Zadka Tue Mar 21 05:16:00 2000 From: Moshe Zadka (Moshe Zadka) Date: Tue, 21 Mar 2000 07:16:00 +0200 (IST) Subject: [Python-Dev] re: Using lists as sets In-Reply-To: <14550.40860.72418.648591@anthem.cnri.reston.va.us> Message-ID: On Mon, 20 Mar 2000, Barry A. Warsaw wrote: > MZ> Agreed though, that the whole iteration protocol should be > MZ> revisited -- but that is a subject for another post. > > Yup. (Go Stackless, go!?) > JH> I believe this is where types-vs.-classes meets > JH> subtyping-vs.-inheritance. > > meets protocols-vs.-interfaces. It took me 5 minutes of intensive thinking just to understand what Barry meant. Just wait until we introduce Sather-like "supertypes" (which are pretty Pythonic, IMHO) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From Moshe Zadka Tue Mar 21 05:21:24 2000 From: Moshe Zadka (Moshe Zadka) Date: Tue, 21 Mar 2000 07:21:24 +0200 (IST) Subject: [Python-Dev] Set options In-Reply-To: Message-ID: On Mon, 20 Mar 2000, Ka-Ping Yee wrote: > I think that at this point the possibilities for doing sets > come down to four options: > > > 1. use lists > 2. use dicts > 3. new type > 4. do nothing 5. new Python module with a class "Set" (The issues are similar to #3, but this has the advantage of not changing the interpreter) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal@lemburg.com Tue Mar 21 00:25:09 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 21 Mar 2000 01:25:09 +0100 Subject: [Python-Dev] Unicode and Windows References: Message-ID: <38D6C165.EEF58232@lemburg.com> Mark Hammond wrote: > > I would like to discuss Unicode on the Windows platform, and how it relates > to MBCS that Windows uses. > > My main goal here is to ensure that Unicode on Windows can make a round-trip > to and from native Unicode stores. As an example, let's take the registry - > a Windows user should be able to read a Unicode value from the registry then > write it back. The value written back should be _identical_ to the value > read. Ditto for the file system: If the filesystem is Unicode, then I would > expect the following code: > for fname in os.listdir(): > f = open(fname + ".tmp", "w") > > To create filenames on the filesystem with the exact base name even when the > basename contains non-ascii characters. > > However, the Unicode patches do not appear to make this possible. open() > uses PyArg_ParseTuple(args, "s..."); PyArg_ParseTuple() will automatically > convert a Unicode object to UTF-8, so we end up passing a UTF-8 encoded > string to the C runtime fopen function. Right. The idea with open() was to write a special version (using #ifdefs) for use on Windows platforms which does all the needed magic to convert Unicode to whatever the native format and locale is... Using parser markers for this is obviously *not* the right way to get to the core of the problem. Basically, you will have to write a helper which takes a string, Unicode or some other "t" compatible object as name object and then converts it to the system's view of things. I think we had a private discussion about this a few months ago: there was some way to convert Unicode to a platform independent format which then got converted to MBCS -- don't remember the details though. > The end result of all this is that we end up with UTF-8 encoded names in the > registry/on the file system. It does not seem possible to get a true > Unicode string onto either the file system or in the registry. > > Unfortunately, Im not experienced enough to know the full ramifications, but > it _appears_ that on Windows the default "unicode to string" translation > should be done via the WideCharToMultiByte() API. This will then pass an > MBCS encoded ascii string to Windows, and the "right thing" should magically > happen. Unfortunately, MBCS encoding is dependant on the current locale > (ie, one MBCS sequence will mean completely different things depending on > the locale). I dont see a portability issue here, as the documentation > could state that "Unicode->ASCII conversions use the most appropriate > conversion for the platform. If the platform is not Unicode aware, then > UTF-8 will be used." No, no, no... :-) The default should be (and is) UTF-8 on all platforms -- whether the platform supports Unicode or not. If a platform uses a different encoding, an encoder should be used which applies the needed transformation. > This issue is the final one before I release the win32reg module. It seems > _critical_ to me that if Python supports Unicode and the platform supports > Unicode, then Python unicode values must be capable of being passed to the > platform. For the win32reg module I could quite possibly hack around the > problem, but the more general problem (categorized by the open() example > above) still remains... > > Any thoughts? Can't you use the wchar_t interfaces for the task (see the unicodeobject.h file for details) ? Perhaps you can first transfer Unicode to wchar_t and then on to MBCS using a win32 API ?! -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Tue Mar 21 09:27:56 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 21 Mar 2000 10:27:56 +0100 Subject: [Python-Dev] Set options References: Message-ID: <38D7409C.169B0C42@lemburg.com> Moshe Zadka wrote: > > On Mon, 20 Mar 2000, Ka-Ping Yee wrote: > > > I think that at this point the possibilities for doing sets > > come down to four options: > > > > > > 1. use lists > > 2. use dicts > > 3. new type > > 4. do nothing > > 5. new Python module with a class "Set" > (The issues are similar to #3, but this has the advantage of not changing > the interpreter) Perhaps someone could take Aaron's kjbuckets and write a Python emulation for it (I think he's even already done something like this for gadfly). Then the emulation could go into the core and if people want speed they can install his extension (the emulation would have to detect this and use the real thing then). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jack@oratrix.nl Tue Mar 21 11:54:30 2000 From: jack@oratrix.nl (Jack Jansen) Date: Tue, 21 Mar 2000 12:54:30 +0100 Subject: [Python-Dev] Unicode and Windows In-Reply-To: Message by "M.-A. Lemburg" , Tue, 21 Mar 2000 01:25:09 +0100 , <38D6C165.EEF58232@lemburg.com> Message-ID: <20000321115430.88A11370CF2@snelboot.oratrix.nl> I guess we need another format specifier than "s" here. "s" does the conversion to standard-python-utf8 for wide strings, and we'd need another format for conversion to current-local-os-convention-8-bit-encoding-of-unicode- strings. I assume that that would also come in handy for MacOS, where we'll have the same problem (filenames are in Apple's proprietary 8bit encoding). -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mal@lemburg.com Tue Mar 21 12:14:54 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 21 Mar 2000 13:14:54 +0100 Subject: [Python-Dev] Unicode and Windows References: <20000321115430.88A11370CF2@snelboot.oratrix.nl> Message-ID: <38D767BE.C45F8286@lemburg.com> Jack Jansen wrote: > > I guess we need another format specifier than "s" here. "s" does the > conversion to standard-python-utf8 for wide strings, Actually, "t" does the UTF-8 conversion... "s" will give you the raw internal UTF-16 representation in platform byte order. > and we'd need another > format for conversion to current-local-os-convention-8-bit-encoding-of-unicode- > strings. I'd suggest adding some king of generic PyOS_FilenameFromObject(PyObject *v, void *buffer, int buffer_len) API for the conversion of strings, Unicode and text buffers to an OS dependent filename buffer. And/or perhaps sepcific APIs for each OS... e.g. PyOS_MBCSFromObject() (only on WinXX) PyOS_AppleFromObject() (only on Mac ;) > I assume that that would also come in handy for MacOS, where we'll have the > same problem (filenames are in Apple's proprietary 8bit encoding). Is that encoding already supported by the encodings package ? If not, could you point me to a map file for the encoding ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake@acm.org Tue Mar 21 14:56:47 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 21 Mar 2000 09:56:47 -0500 (EST) Subject: [Python-Dev] Unicode and Windows In-Reply-To: <38D767BE.C45F8286@lemburg.com> References: <20000321115430.88A11370CF2@snelboot.oratrix.nl> <38D767BE.C45F8286@lemburg.com> Message-ID: <14551.36271.33825.841965@weyr.cnri.reston.va.us> M.-A. Lemburg writes: > And/or perhaps sepcific APIs for each OS... e.g. > > PyOS_MBCSFromObject() (only on WinXX) > PyOS_AppleFromObject() (only on Mac ;) Another approach may be to add some format modifiers: te -- text in an encoding specified by a C string (somewhat similar to O&) tE -- text, encoding specified by a Python object (probably a string passed as a parameter or stored from some other call) (I'd prefer the [eE] before the t, but the O modifiers follow, so consistency requires this ugly construct.) This brings up the issue of using a hidden conversion function which may create a new object that needs the same lifetime guarantees as the real parameters; we discussed this issue a month or two ago. Somewhere, there's a call context that includes the actual parameter tuple. PyArg_ParseTuple() could have access to a "scratch" area where it could place objects constructed during parameter parsing. This area could just be a hidden tuple. When the C call returns, the scratch area can be discarded. The difficulty is in giving PyArg_ParseTuple() access to the scratch area, but I don't know how hard that would be off the top of my head. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From jeremy@cnri.reston.va.us Tue Mar 21 17:14:07 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Tue, 21 Mar 2000 12:14:07 -0500 (EST) Subject: [Python-Dev] Set options In-Reply-To: <38D7409C.169B0C42@lemburg.com> References: <38D7409C.169B0C42@lemburg.com> Message-ID: <14551.44511.805860.808811@goon.cnri.reston.va.us> >>>>> "MAL" == M -A Lemburg writes: MAL> Perhaps someone could take Aaron's kjbuckets and write a Python MAL> emulation for it (I think he's even already done something like MAL> this for gadfly). Then the emulation could go into the core and MAL> if people want speed they can install his extension (the MAL> emulation would have to detect this and use the real thing MAL> then). I've been waiting for Tim Peters to say something about sets, but I'll chime in with what I recall him saying last time a discussion like this came up on c.l.py. (I may misremember, in which case I'll at least draw him into the discussion in order to correct me <0.5 wink>.) The problem with a set module is that there are a number of different ways to implement them -- in C using kjbuckets is one example. Each approach is appropriate for some applications, but not for every one. A set is pretty simple to build from a list or a dictionary, so we leave it to application writers to write the one that is appropriate for their application. Jeremy From skip@mojam.com (Skip Montanaro) Tue Mar 21 17:25:57 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 21 Mar 2000 11:25:57 -0600 (CST) Subject: [Python-Dev] Set options In-Reply-To: <38D7409C.169B0C42@lemburg.com> References: <38D7409C.169B0C42@lemburg.com> Message-ID: <14551.45221.447838.534003@beluga.mojam.com> Marc> Perhaps someone could take Aaron's kjbuckets and write a Python Marc> emulation for it ... Any reason why kjbuckets and friends have never been placed in the core? If, as it seems from the discussion, a set type is a good thing to add to the core, it seems to me that Aaron's code would be a good candidate implementation/foundation. -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From bwarsaw@cnri.reston.va.us Tue Mar 21 17:47:49 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 21 Mar 2000 12:47:49 -0500 (EST) Subject: [Python-Dev] Set options References: <38D7409C.169B0C42@lemburg.com> <14551.45221.447838.534003@beluga.mojam.com> Message-ID: <14551.46533.918688.13801@anthem.cnri.reston.va.us> >>>>> "SM" == Skip Montanaro writes: SM> Any reason why kjbuckets and friends have never been placed in SM> the core? If, as it seems from the discussion, a set type is SM> a good thing to add to the core, it seems to me that Aaron's SM> code would be a good candidate implementation/foundation. It would seem to me that distutils is a better way to go for kjbuckets. The core already has basic sets (via dictionaries). We're pretty much just quibbling about efficiency, API, and syntax, aren't we? -Barry From mhammond@skippinet.com.au Tue Mar 21 17:48:06 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Tue, 21 Mar 2000 09:48:06 -0800 Subject: [Python-Dev] Unicode and Windows In-Reply-To: <38D6C165.EEF58232@lemburg.com> Message-ID: > > Right. The idea with open() was to write a special version (using > #ifdefs) for use on Windows platforms which does all the needed > magic to convert Unicode to whatever the native format and locale > is... That works for open() - but what about other extension modules? This seems to imply that any Python extension on Windows that wants to pass a Unicode string to an external function can not use PyArg_ParseTuple() with anything other than "O", and perform the magic themselves. This just seems a little back-to-front to me. Platforms that have _no_ native Unicode support have useful utilities for working with Unicode. Platforms that _do_ have native Unicode support can not make use of these utilities. Is this by design, or simply a sad side-effect of the design? So - it is trivial to use Unicode on platforms that dont support it, but quite difficult on platforms that do. > Using parser markers for this is obviously *not* the right way > to get to the core of the problem. Basically, you will have to > write a helper which takes a string, Unicode or some other > "t" compatible object as name object and then converts it to > the system's view of things. Why "obviously"? What on earth does the existing mechamism buy me on Windows, other than grief that I can not use it? > I think we had a private discussion about this a few months ago: > there was some way to convert Unicode to a platform independent > format which then got converted to MBCS -- don't remember the details > though. There is a Win32 API function for this. However, as you succinctly pointed out, not many people are going to be aware of its name, or how to use the multitude of flags offered by these conversion functions, or know how to deal with the memory management, etc. > Can't you use the wchar_t interfaces for the task (see > the unicodeobject.h file for details) ? Perhaps you can > first transfer Unicode to wchar_t and then on to MBCS > using a win32 API ?! Sure - I can. But can everyone who writes interfaces to Unicode functions? You wrote the Python Unicode support but dont know its name - pity the poor Joe Average trying to write an extension. It seems to me that, on Windows, the Python Unicode support as it stands is really internal. I can not think of a single time that an extension writer on Windows would ever want to use the "t" markers - am I missing something? I dont believe that a single Unicode-aware function in the Windows extensions (of which there are _many_) could be changed to use the "t" markers. It still seems to me that the Unicode support works well on platforms with no Unicode support, and is fairly useless on platforms with the support. I dont believe that any extension on Windows would want to use the "t" marker - so, as Fred suggested, how about providing something for us that can help us interface to the platform's Unicode? This is getting too hard for me - I will release my windows registry module without Unicode support, and hope that in the future someone cares enough to address it, and to add a large number of LOC that will be needed simply to get Unicode talking to Unicode... Mark. From skip@mojam.com (Skip Montanaro) Tue Mar 21 18:04:11 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 21 Mar 2000 12:04:11 -0600 (CST) Subject: [Python-Dev] Set options In-Reply-To: <14551.46533.918688.13801@anthem.cnri.reston.va.us> References: <38D7409C.169B0C42@lemburg.com> <14551.45221.447838.534003@beluga.mojam.com> <14551.46533.918688.13801@anthem.cnri.reston.va.us> Message-ID: <14551.47515.648064.969034@beluga.mojam.com> BAW> It would seem to me that distutils is a better way to go for BAW> kjbuckets. The core already has basic sets (via dictionaries). BAW> We're pretty much just quibbling about efficiency, API, and syntax, BAW> aren't we? Yes (though I would quibble with your use of the word "quibbling" ;-). If new syntax is in the offing as some have proposed, why not go for a more efficient implementation at the same time? I believe Aaron has maintained that kjbuckets is generally more efficient than Python's dictionary object. Skip From mal@lemburg.com Tue Mar 21 17:44:11 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 21 Mar 2000 18:44:11 +0100 Subject: [Python-Dev] Unicode and Windows References: <20000321115430.88A11370CF2@snelboot.oratrix.nl> <38D767BE.C45F8286@lemburg.com> <14551.36271.33825.841965@weyr.cnri.reston.va.us> Message-ID: <38D7B4EB.66DAEBF3@lemburg.com> "Fred L. Drake, Jr." wrote: > > M.-A. Lemburg writes: > > And/or perhaps sepcific APIs for each OS... e.g. > > > > PyOS_MBCSFromObject() (only on WinXX) > > PyOS_AppleFromObject() (only on Mac ;) > > Another approach may be to add some format modifiers: > > te -- text in an encoding specified by a C string (somewhat > similar to O&) > tE -- text, encoding specified by a Python object (probably a > string passed as a parameter or stored from some other > call) > > (I'd prefer the [eE] before the t, but the O modifiers follow, so > consistency requires this ugly construct.) > This brings up the issue of using a hidden conversion function which > may create a new object that needs the same lifetime guarantees as the > real parameters; we discussed this issue a month or two ago. > Somewhere, there's a call context that includes the actual parameter > tuple. PyArg_ParseTuple() could have access to a "scratch" area where > it could place objects constructed during parameter parsing. This > area could just be a hidden tuple. When the C call returns, the > scratch area can be discarded. > The difficulty is in giving PyArg_ParseTuple() access to the scratch > area, but I don't know how hard that would be off the top of my head. Some time ago, I considered adding "U+" with builtin auto-conversion to the tuple parser... after some discussion about the error handling issues involved with this I quickly dropped that idea again and used the standard "O" approach plus a call to a helper function which then applied the conversion. (Note the "+" behind "U": this was intended to indicate that the returned object has had the refcount incremented and that the caller must take care of decrementing it again.) The "O" + helper approach is a little clumsy, but works just fine. Plus it doesn't add any more overhead to the already convoluted PyArg_ParseTuple(). BTW, what other external char formats are we talking about ? E.g. how do you handle MBCS or DBCS under WinXX ? Are there routines to have wchar_t buffers converted into the two ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gmcm@hypernet.com Tue Mar 21 18:25:43 2000 From: gmcm@hypernet.com (Gordon McMillan) Date: Tue, 21 Mar 2000 13:25:43 -0500 Subject: [Python-Dev] Set options In-Reply-To: <14551.44511.805860.808811@goon.cnri.reston.va.us> References: <38D7409C.169B0C42@lemburg.com> Message-ID: <1258459347-36172889@hypernet.com> Jeremy wrote: > The problem with a set module is that there are a number of different > ways to implement them -- in C using kjbuckets is one example. Nah. Sets are pretty unambiguous. They're also easy, and boring. The interesting stuff is graphs and operations like composition, closure and transpositions. That's also where stuff gets ambiguous. E.g., what's the right behavior when you invert {'a':1,'b':1}? Hint: any answer you give will be met by the wrath of God. I would love this stuff, and as a faithful worshipper of Our Lady of Corrugated Ironism, I could probably live with whatever rules are arrived at; but I'm afraid I would have to considerably enlarge my kill file. - Gordon From gstein@lyra.org Tue Mar 21 18:40:20 2000 From: gstein@lyra.org (Greg Stein) Date: Tue, 21 Mar 2000 10:40:20 -0800 (PST) Subject: [Python-Dev] Set options In-Reply-To: <14551.44511.805860.808811@goon.cnri.reston.va.us> Message-ID: On Tue, 21 Mar 2000, Jeremy Hylton wrote: > >>>>> "MAL" == M -A Lemburg writes: > MAL> Perhaps someone could take Aaron's kjbuckets and write a Python > MAL> emulation for it (I think he's even already done something like > MAL> this for gadfly). Then the emulation could go into the core and > MAL> if people want speed they can install his extension (the > MAL> emulation would have to detect this and use the real thing > MAL> then). > > I've been waiting for Tim Peters to say something about sets, but I'll > chime in with what I recall him saying last time a discussion like > this came up on c.l.py. (I may misremember, in which case I'll at > least draw him into the discussion in order to correct me <0.5 wink>.) > > The problem with a set module is that there are a number of different > ways to implement them -- in C using kjbuckets is one example. Each > approach is appropriate for some applications, but not for every one. > A set is pretty simple to build from a list or a dictionary, so we > leave it to application writers to write the one that is appropriate > for their application. Yah... +1 on what Jeremy said. Leave them out of the distro since we can't do them Right for all people. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Moshe Zadka Tue Mar 21 18:34:56 2000 From: Moshe Zadka (Moshe Zadka) Date: Tue, 21 Mar 2000 20:34:56 +0200 (IST) Subject: [Python-Dev] Set options In-Reply-To: <14551.47515.648064.969034@beluga.mojam.com> Message-ID: On Tue, 21 Mar 2000, Skip Montanaro wrote: > BAW> It would seem to me that distutils is a better way to go for > BAW> kjbuckets. The core already has basic sets (via dictionaries). > BAW> We're pretty much just quibbling about efficiency, API, and syntax, > BAW> aren't we? > > If new syntax is in the offing as some have proposed, FWIW, I'm against new syntax. The core-language has changed quite a lot between 1.5.2 and 1.6 -- * strings have grown methods * there are unicode strings * "in" operator overloadable The second change even includes a syntax change (u"some string") whose variants I'm still not familiar enough to comment on (ru"some\string"? ur"some\string"? Both legal?). I feel too many changes destabilize the language (this might seem a bit extreme, considering I pushed towards one of the changes), and we should try to improve on things other then the core -- one of these is a more hierarchical standard library, and a standard distribution mechanism, to rival CPAN -- then anyone could import data.sets.kjbuckets With only a trivial >>> import dist >>> dist.install("data.sets.kjbuckets") > why not go for a more efficient implementation at the same time? Because Python dicts are "pretty efficient", and it is not a trivial question to check optimiality in this area: tests can be rigged to prove almost anything with the right test-cases, and there's no promise we'll choose the "right ones". -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From Moshe Zadka Tue Mar 21 18:38:02 2000 From: Moshe Zadka (Moshe Zadka) Date: Tue, 21 Mar 2000 20:38:02 +0200 (IST) Subject: [Python-Dev] Set options In-Reply-To: <1258459347-36172889@hypernet.com> Message-ID: On Tue, 21 Mar 2000, Gordon McMillan wrote: > E.g., what's the right behavior when you > invert {'a':1,'b':1}? Hint: any answer you give will be met by the > wrath of God. Isn't "wrath of God" translated into Python is "an exception"? raise ValueError("dictionary is not 1-1") seems fine to me. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From skip@mojam.com (Skip Montanaro) Tue Mar 21 18:42:55 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 21 Mar 2000 12:42:55 -0600 (CST) Subject: [Python-Dev] Set options In-Reply-To: References: <14551.47515.648064.969034@beluga.mojam.com> Message-ID: <14551.49839.377385.99637@beluga.mojam.com> Skip> If new syntax is in the offing as some have proposed, Moshe> FWIW, I'm against new syntax. The core-language has changed quite Moshe> a lot between 1.5.2 and 1.6 -- I thought we were talking about Py3K, where syntax changes are somewhat more expected. Just to make things clear, the syntax change I was referring to was the value-less dict syntax that someone proposed a few days ago: myset = {"a", "b", "c"} Note that I wasn't necessarily supporting the proposal, only acknowledging that it had been made. In general, I think we need to keep straight where people feel various proposals are going to fit. When a thread goes for more than a few messages it's easy to forget. -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From ping@lfw.org Tue Mar 21 13:07:51 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Tue, 21 Mar 2000 07:07:51 -0600 (CST) Subject: [Python-Dev] Set options In-Reply-To: <14551.46533.918688.13801@anthem.cnri.reston.va.us> Message-ID: Jeremy Hylton wrote: > The problem with a set module is that there are a number of different > ways to implement them -- in C using kjbuckets is one example. Each > approach is appropriate for some applications, but not for every one. For me, anyway, this is not about trying to engineer a universally perfect solution into Python -- it's about providing some simple, basic, easy-to-understand functionality that takes care of the common case. For example, dictionaries are simple, their workings are easy enough to understand, and they aren't written to efficiently support things like inversion and composition because most of the time no one needs to do these things. The same holds true for sets. All i would want is something i can put things into, and take things out of, and ask about what's inside. Barry Warsaw wrote: > It would seem to me that distutils is a better way to go for > kjbuckets. The core already has basic sets (via dictionaries). We're > pretty much just quibbling about efficiency, API, and syntax, aren't we? Efficiency: Hashtables have proven quite adequate for dicts, so i think they're quite adequate for sets. API and syntax: I believe the goal is obvious, because Python already has very nice notation ("in", "not in") -- it just doesn't work quite the way one would want. It works semantically right on lists, but they're a little slow. It doesn't work on dicts, but we can make it so. Here is where my "explanation metric" comes into play. How much additional explaining do you have to do in each case to answer the question "what do i do when i need a set"? 1. Use lists. Explain that "include()" means "append if not already present", and "exclude()" means "remove if present". You are done. 2. Use dicts. Explain that "for x in dict" iterates over the keys, and "if x in dict" looks for a key. Explain what happens when you write "{1, 2, 3}", and the special non-printing value constant. Explain how to add elements to a set and remove elements from a set. 3. Create a new type. Explain that there exists another type "set" with methods "insert" and "remove". Explain how to construct sets. Explain how "in" and "not in" work, where this type fits in with the other types, and when to choose this type over other types. 4. Do nothing. Explain that dictionaries can be used as sets if you assign keys a dummy value, use "del" to remove keys, iterate over "dict.keys()", and use "dict.has_key()" to test membership. This is what motivated my proposal for using lists: it requires by far the least explanation. This is no surprise because a lot of things about lists have been explained already. My preference in terms of elegance is about equal for 1, 2, 3, with 4 distinctly behind; but my subjective ranking of "explanation complexity" (as in "how to get there from here") is 1 < 4 < 3 < 2. -- ?!ng From tismer@tismer.com Tue Mar 21 20:13:38 2000 From: tismer@tismer.com (Christian Tismer) Date: Tue, 21 Mar 2000 21:13:38 +0100 Subject: [Python-Dev] Unicode Database Compression Message-ID: <38D7D7F2.14A2FBB5@tismer.com> Hi, I have spent the last four days on compressing the Unicode database. With little decoding effort, I can bring the data down to 25kb. This would still be very fast, since codes are randomly accessible, although there are some simple shifts and masks. With a bit more effort, this can be squeezed down to 15kb by some more aggressive techniques like common prefix elimination. Speed would be *slightly* worse, since a small loop (average 8 cycles) is performed to obtain a character from a packed nybble. This is just all the data which is in Marc's unicodedatabase.c file. I checked efficiency by creating a delimited file like the original database text file with only these columns and ran PkZip over it. The result was 40kb. This says that I found a lot of correlations which automatic compressors cannot see. Now, before generating the final C code, I'd like to ask some questions: What is more desirable: Low compression and blinding speed? Or high compression and less speed, since we always want to unpack a whole code page? Then, what about the other database columns? There are a couple of extra atrributes which I find coded as switch statements elsewhere. Should I try to pack these codes into my squeezy database, too? And last: There are also two quite elaborated columns with textual descriptions of the codes (the uppercase blah version of character x). Do we want these at all? And if so, should I try to compress them as well? Should these perhaps go into a different source file as a dynamic module, since they will not be used so often? waiting for directives - ly y'rs - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From Moshe Zadka Wed Mar 22 05:44:00 2000 From: Moshe Zadka (Moshe Zadka) Date: Wed, 22 Mar 2000 07:44:00 +0200 (IST) Subject: [1.x] Re: [Python-Dev] Set options In-Reply-To: <14551.49839.377385.99637@beluga.mojam.com> Message-ID: On Tue, 21 Mar 2000, Skip Montanaro wrote: > Skip> If new syntax is in the offing as some have proposed, > > Moshe> FWIW, I'm against new syntax. The core-language has changed quite > Moshe> a lot between 1.5.2 and 1.6 -- > > I thought we were talking about Py3K My argument was strictly a 1.x argument. I'm hoping to get sets it in 1.7 or 1.8. > In general, I think we need to keep straight where people feel various > proposals are going to fit. You're right. I'll start prefixing my posts accordingally. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal@lemburg.com Wed Mar 22 10:11:25 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 22 Mar 2000 11:11:25 +0100 Subject: [Python-Dev] Re: Unicode Database Compression References: <38D7D7F2.14A2FBB5@tismer.com> Message-ID: <38D89C4D.370C19D@lemburg.com> Christian Tismer wrote: > > Hi, > > I have spent the last four days on compressing the > Unicode database. Cool :-) > With little decoding effort, I can bring the data down to 25kb. > This would still be very fast, since codes are randomly > accessible, although there are some simple shifts and masks. > > With a bit more effort, this can be squeezed down to 15kb > by some more aggressive techniques like common prefix > elimination. Speed would be *slightly* worse, since a small > loop (average 8 cycles) is performed to obtain a character > from a packed nybble. > > This is just all the data which is in Marc's unicodedatabase.c > file. I checked efficiency by creating a delimited file like > the original database text file with only these columns and > ran PkZip over it. The result was 40kb. This says that I found > a lot of correlations which automatic compressors cannot see. Not bad ;-) > Now, before generating the final C code, I'd like to ask some > questions: > > What is more desirable: Low compression and blinding speed? > Or high compression and less speed, since we always want to > unpack a whole code page? I'd say high speed and less compression. The reason is that the Asian codecs will need fast access to the database. With their large mapping tables size the few more kB don't hurt, I guess. > Then, what about the other database columns? > There are a couple of extra atrributes which I find coded > as switch statements elsewhere. Should I try to pack these > codes into my squeezy database, too? You basically only need to provide the APIs (and columns) defined in the unicodedata Python API, e.g. the character description column is not needed. > And last: There are also two quite elaborated columns with > textual descriptions of the codes (the uppercase blah version > of character x). Do we want these at all? And if so, should > I try to compress them as well? Should these perhaps go > into a different source file as a dynamic module, since they > will not be used so often? I guess you are talking about the "Unicode 1.0 Name" and the "10646 comment field" -- see above, there's no need to include these descriptions in the database... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Wed Mar 22 11:04:32 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 22 Mar 2000 12:04:32 +0100 Subject: [Python-Dev] Unicode and Windows References: Message-ID: <38D8A8C0.66123F2C@lemburg.com> Mark Hammond wrote: > > > > > Right. The idea with open() was to write a special version (using > > #ifdefs) for use on Windows platforms which does all the needed > > magic to convert Unicode to whatever the native format and locale > > is... > > That works for open() - but what about other extension modules? > > This seems to imply that any Python extension on Windows that wants to pass > a Unicode string to an external function can not use PyArg_ParseTuple() with > anything other than "O", and perform the magic themselves. > > This just seems a little back-to-front to me. Platforms that have _no_ > native Unicode support have useful utilities for working with Unicode. > Platforms that _do_ have native Unicode support can not make use of these > utilities. Is this by design, or simply a sad side-effect of the design? > > So - it is trivial to use Unicode on platforms that dont support it, but > quite difficult on platforms that do. The problem is that Windows seems to use a completely different internal Unicode format than most of the rest of the world. As I've commented on in a different post, the only way to have PyArg_ParseTuple() perform auto-conversion is by allowing it to return objects which are garbage collected by the caller. The problem with this is error handling, since PyArg_ParseTuple() will have to keep track of all objects it created until the call returns successfully. An alternative approach is sketched below. Note that *all* platforms will have to use this approach... not only Windows or other platforms with Unicode support. > > Using parser markers for this is obviously *not* the right way > > to get to the core of the problem. Basically, you will have to > > write a helper which takes a string, Unicode or some other > > "t" compatible object as name object and then converts it to > > the system's view of things. > > Why "obviously"? What on earth does the existing mechamism buy me on > Windows, other than grief that I can not use it? Sure, you can :-) Just fetch the object, coerce it to Unicode and then encode it according to your platform needs (PyUnicode_FromObject() takes care of the coercion part for you). > > I think we had a private discussion about this a few months ago: > > there was some way to convert Unicode to a platform independent > > format which then got converted to MBCS -- don't remember the details > > though. > > There is a Win32 API function for this. However, as you succinctly pointed > out, not many people are going to be aware of its name, or how to use the > multitude of flags offered by these conversion functions, or know how to > deal with the memory management, etc. > > > Can't you use the wchar_t interfaces for the task (see > > the unicodeobject.h file for details) ? Perhaps you can > > first transfer Unicode to wchar_t and then on to MBCS > > using a win32 API ?! > > Sure - I can. But can everyone who writes interfaces to Unicode functions? > You wrote the Python Unicode support but dont know its name - pity the poor > Joe Average trying to write an extension. Hey, Mark... I'm not a Windows geek. How can I know which APIs are available and which of them to use ? And that's my point: add conversion APIs and codecs for the different OSes which make the extension writer life easier. > It seems to me that, on Windows, the Python Unicode support as it stands is > really internal. I can not think of a single time that an extension writer > on Windows would ever want to use the "t" markers - am I missing something? > I dont believe that a single Unicode-aware function in the Windows > extensions (of which there are _many_) could be changed to use the "t" > markers. "t" is intended to return a text representation of a buffer interface aware type... this happens to be UTF-8 for Unicode objects -- what other encoding would you have expected ? > It still seems to me that the Unicode support works well on platforms with > no Unicode support, and is fairly useless on platforms with the support. I > dont believe that any extension on Windows would want to use the "t" > marker - so, as Fred suggested, how about providing something for us that > can help us interface to the platform's Unicode? That's exactly what I'm talking about all the time... there currently are PyUnicode_AsWideChar() and PyUnicode_FromWideChar() to interface to the compiler's wchar_t type. I have no problem adding more of these APIs for the various OSes -- but they would have to be coded by someone with Unicode skills on each of those platforms, e.g. PyUnicode_AsMBCS() and PyUnicode_FromMBCS() on Windows. > This is getting too hard for me - I will release my windows registry module > without Unicode support, and hope that in the future someone cares enough to > address it, and to add a large number of LOC that will be needed simply to > get Unicode talking to Unicode... I think you're getting this wrong: I'm not argueing against adding better support for Windows. The only way I can think of using parser markers in this context would be by having PyArg_ParseTuple() *copy* data into a given data buffer rather than only passing a reference to it. This would enable PyArg_ParseTuple() to apply whatever conversion is needed while still keeping the temporary objects internal. Hmm, sketching a little: "es#",&encoding,&buffer,&buffer_len -- could mean: coerce the object to Unicode, then encode it using the given encoding and then copy at most buffer_len bytes of data into buffer and update buffer_len to the number of bytes copied This costs some cycles for copying data, but gets rid off the problems involved in cleaning up after errors. The caller will have to ensure that the buffer is large enough and that the encoding fits the application's needs. Error handling will be poor since the caller can't take any action other than to pass on the error generated by PyArg_ParseTuple(). Thoughts ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Wed Mar 22 13:40:23 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 22 Mar 2000 14:40:23 +0100 Subject: [Python-Dev] Unicode and Windows References: <20000322113129.5E67C370CF2@snelboot.oratrix.nl> Message-ID: <38D8CD47.E573A246@lemburg.com> Jack Jansen wrote: > > > "es#",&encoding,&buffer,&buffer_len > > -- could mean: coerce the object to Unicode, then > > encode it using the given encoding and then > > copy at most buffer_len bytes of data into > > buffer and update buffer_len to the number of bytes > > copied > > This is a possible solution, but I think I would really prefer to also have > "eS", &encoding, &buffer_ptr > -- coerce the object to Unicode, then encode it using the given > encoding, malloc() a buffer to put the result in and return that. > > I don't mind doing something like > > { > char *filenamebuffer = NULL; > > if ( PyArg_ParseTuple(args, "eS", &macencoding, &filenamebuffer) > ... > open(filenamebuffer, ....); > PyMem_XDEL(filenamebuffer); > ... > } > > I think this would be much less error-prone than having fixed-length buffers > all over the place. PyArg_ParseTuple() should probably raise an error in case the data doesn't fit into the buffer. > And if this is indeed going to be used mainly in open() > calls and such the cost of the extra malloc()/free() is going to be dwarfed by > what the underlying OS call is going to use. Good point. You'll still need the buffer_len output parameter though -- otherwise you wouldn't be able tell the size of the allocated buffer (the returned data may not be terminated). How about this: "es#", &encoding, &buffer, &buffer_len -- both buffer and buffer_len are in/out parameters -- if **buffer is non-NULL, copy the data into it (at most buffer_len bytes) and update buffer_len on output; truncation produces an error -- if **buffer is NULL, malloc() a buffer of size buffer_len and return it through *buffer; if buffer_len is -1, the allocated buffer should be large enough to hold all data; again, truncation is an error -- apply coercion and encoding as described above (could be that I've got the '*'s wrong, but you get the picture...:) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jack@oratrix.nl Wed Mar 22 13:46:50 2000 From: jack@oratrix.nl (Jack Jansen) Date: Wed, 22 Mar 2000 14:46:50 +0100 Subject: [Python-Dev] Unicode and Windows In-Reply-To: Message by "M.-A. Lemburg" , Wed, 22 Mar 2000 14:40:23 +0100 , <38D8CD47.E573A246@lemburg.com> Message-ID: <20000322134650.ED1BD370CF2@snelboot.oratrix.nl> > > [on the user-supplies-buffer interface] > > I think this would be much less error-prone than having fixed-length buffers > > all over the place. > > PyArg_ParseTuple() should probably raise an error in case the > data doesn't fit into the buffer. Ah, that's right, that solves most of that problem. > > [on the malloced interface] > Good point. You'll still need the buffer_len output parameter > though -- otherwise you wouldn't be able tell the size of the > allocated buffer (the returned data may not be terminated). Are you sure? I would expect the "eS" format to be used to obtain 8-bit data in some local encoding, and I would expect that all 8-bit encodings of unicode data would still allow for null-termination. Or are there 8-bit encodings out there where a zero byte is normal occurrence and where it can't be used as terminator? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mal@lemburg.com Wed Mar 22 16:31:26 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 22 Mar 2000 17:31:26 +0100 Subject: [Python-Dev] Unicode and Windows References: <20000322134650.ED1BD370CF2@snelboot.oratrix.nl> Message-ID: <38D8F55E.6E324281@lemburg.com> Jack Jansen wrote: > > > > [on the user-supplies-buffer interface] > > > I think this would be much less error-prone than having fixed-length buffers > > > all over the place. > > > > PyArg_ParseTuple() should probably raise an error in case the > > data doesn't fit into the buffer. > > Ah, that's right, that solves most of that problem. > > > > [on the malloced interface] > > Good point. You'll still need the buffer_len output parameter > > though -- otherwise you wouldn't be able tell the size of the > > allocated buffer (the returned data may not be terminated). > > Are you sure? I would expect the "eS" format to be used to obtain 8-bit data > in some local encoding, and I would expect that all 8-bit encodings of unicode > data would still allow for null-termination. Or are there 8-bit encodings out > there where a zero byte is normal occurrence and where it can't be used as > terminator? Not sure whether these exist or not, but they are certainly a possibility to keep in mind. Perhaps adding "es#" and "es" (with 0-byte check) would be ideal ?! -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From pf@artcom-gmbh.de Wed Mar 22 16:54:42 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Wed, 22 Mar 2000 17:54:42 +0100 (MET) Subject: [Python-Dev] Nitpicking on UserList implementation Message-ID: Hi! Please have a look at the following method cited from Lib/UserList.py: def __radd__(self, other): if isinstance(other, UserList): # <-- ? return self.__class__(other.data + self.data) # <-- ? elif isinstance(other, type(self.data)): return self.__class__(other + self.data) else: return self.__class__(list(other) + self.data) The reference manual tells about the __r*__ methods: """These functions are only called if the left operand does not support the corresponding operation.""" So if the left operand is a UserList instance, it should always have a __add__ method, which will be called instead of the right operands __radd__. So I think the condition 'isinstance(other, UserList)' in __radd__ above will always evaluate to False and so the two lines marked with '# <-- ?' seem to be superfluous. But 'UserList' is so mature: Please tell me what I've oveerlooked before I make a fool of myself and submit a patch removing this two lines. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From gvwilson@nevex.com Thu Mar 23 17:10:16 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Thu, 23 Mar 2000 12:10:16 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods Message-ID: [The following passed the Ping test, so I'm posting it here] If None becomes a keyword, I would like to ask whether it could be used to signal that a method is a class method, as opposed to an instance method: class Ping: def __init__(self, arg): ...as usual... def method(self, arg): ...no change... def classMethod(None, arg): ...equivalent of C++ 'static'... p = Ping("thinks this is cool") # as always p.method("who am I to argue?") # as always Ping.classMethod("hey, cool!") # no 'self' p.classMethod("hey, cool!") # also selfless I'd also like to ask (separately) that assignment to None be defined as a no-op, so that programmers can write: year, month, None, None, None, None, weekday, None, None = gmtime(time()) instead of having to create throw-away variables to fill in slots in tuples that they don't care about. I think both behaviors are readable; the first provides genuinely new functionality, while I often found the second handy when I was doing logic programming. Greg From jim@digicool.com Thu Mar 23 17:18:29 2000 From: jim@digicool.com (Jim Fulton) Date: Thu, 23 Mar 2000 12:18:29 -0500 Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <38DA51E5.B39D3E7B@digicool.com> gvwilson@nevex.com wrote: > > [The following passed the Ping test, so I'm posting it here] > > If None becomes a keyword, I would like to ask whether it could be used to > signal that a method is a class method, as opposed to an instance method: > > class Ping: > > def __init__(self, arg): > ...as usual... > > def method(self, arg): > ...no change... > > def classMethod(None, arg): > ...equivalent of C++ 'static'... (snip) As a point of jargon, please lets call this thing a "static method" (or an instance function, or something) rather than a "class method". The distinction between "class methods" and "static methods" has been discussed at length in the types sig (over a year ago). If this proposal goes forward and the name "class method" is used, I'll have to argue strenuously, and I really don't want to do that. :] So, if you can live with the term "static method", you could save us alot of trouble by just saying "static method". Jim -- Jim Fulton mailto:jim@digicool.com Technical Director (888) 344-4332 Python Powered! Digital Creations http://www.digicool.com http://www.python.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From gvwilson@nevex.com Thu Mar 23 17:21:48 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Thu, 23 Mar 2000 12:21:48 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <38DA51E5.B39D3E7B@digicool.com> Message-ID: > As a point of jargon, please lets call this thing a "static method" > (or an instance function, or something) rather than a "class method". I'd call it a penguin if that was what it took to get something like this implemented... :-) greg From jim@digicool.com Thu Mar 23 17:28:25 2000 From: jim@digicool.com (Jim Fulton) Date: Thu, 23 Mar 2000 12:28:25 -0500 Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <38DA5439.F5FE8FE6@digicool.com> gvwilson@nevex.com wrote: > > > As a point of jargon, please lets call this thing a "static method" > > (or an instance function, or something) rather than a "class method". > > I'd call it a penguin if that was what it took to get something like this > implemented... :-) Thanks a great name. Let's go with penguin. :) Jim -- Jim Fulton mailto:jim@digicool.com Technical Director (888) 344-4332 Python Powered! Digital Creations http://www.digicool.com http://www.python.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From mhammond@skippinet.com.au Thu Mar 23 17:29:53 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Thu, 23 Mar 2000 09:29:53 -0800 Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: Message-ID: ... > If None becomes a keyword, I would like to ask whether it could be used to > signal that a method is a class method, as opposed to an instance method: > > def classMethod(None, arg): > ...equivalent of C++ 'static'... ... > I'd also like to ask (separately) that assignment to None be defined as a > no-op, so that programmers can write: > > year, month, None, None, None, None, weekday, None, None = > gmtime(time()) In the vernacular of a certain Mr Stein... +2 on both of these :-) [Although I do believe "static method" is a better name than "penguin" :-] Mark. From ping@lfw.org Thu Mar 23 17:47:47 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Thu, 23 Mar 2000 09:47:47 -0800 (PST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: Message-ID: On Thu, 23 Mar 2000 gvwilson@nevex.com wrote: > > If None becomes a keyword, I would like to ask whether it could be used to > signal that a method is a class method, as opposed to an instance method: > > class Ping: [...] Ack! I've been reduced to a class with just three methods. Oh well, i never really considered it a such a bad thing to be called "simple-minded". :) > def classMethod(None, arg): > ...equivalent of C++ 'static'... Yeah, i agree with Jim; you might as well call this a "static method" as opposed to a "class method". I like the way "None" is explicitly stated here, so there's no confusion about what the method does. (Without it, there's the question of whether the first argument will get thrown in, or what...) Hmm... i guess this also means one should ask what def function(None, arg): ... does outside a class definition. I suppose that should simply be illegal. > I'd also like to ask (separately) that assignment to None be defined as a > no-op, so that programmers can write: > > year, month, None, None, None, None, weekday, None, None = gmtime(time()) > > instead of having to create throw-away variables to fill in slots in > tuples that they don't care about. For what it's worth, i sometimes use "_" for this purpose (shades of Prolog!) but i can't make much of an argument for its readability... -- ?!ng I never dreamt that i would get to be The creature that i always meant to be But i thought, in spite of dreams, You'd be sitting somewhere here with me. From fdrake@acm.org Thu Mar 23 18:11:39 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 23 Mar 2000 13:11:39 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: References: Message-ID: <14554.24155.948286.451340@weyr.cnri.reston.va.us> gvwilson@nevex.com writes: > p.classMethod("hey, cool!") # also selfless This is the example that I haven't seen before (I'm not on the types-sig, so it may have been presented there), and I think this is what makes it interesting; a method in a module isn't quite sufficient here, since a subclass can override or extend the penguin this way. (Er, if we *do* go with penguin, does this mean it only works on Linux? ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From pf@artcom-gmbh.de Thu Mar 23 18:25:57 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Thu, 23 Mar 2000 19:25:57 +0100 (MET) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: from "gvwilson@nevex.com" at "Mar 23, 2000 12:10:16 pm" Message-ID: Hi! gvwilson@nevex.com: > I'd also like to ask (separately) that assignment to None be defined as a > no-op, so that programmers can write: > > year, month, None, None, None, None, weekday, None, None = gmtime(time()) You can already do this today with 1.5.2, if you use a 'del None' statement: Python 1.5.2 (#1, Jul 23 1999, 06:38:16) [GCC egcs-2.91.66 19990314/Linux (egcs- on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> from time import time, gmtime >>> year, month, None, None, None, None, weekday, None, None = gmtime(time()) >>> print year, month, None, weekday 2000 3 0 3 >>> del None >>> print year, month, None, weekday 2000 3 None 3 >>> if None will become a keyword in Py3K this pyidiom should better be written as year, month, None, None, None, None, ... = ... if sys.version[0] == '1': del None or try: del None except SyntaxError: pass # Wow running Py3K here! I wonder, how much existinng code the None --> keyword change would brake. Regards, Peter From paul@prescod.net Thu Mar 23 18:47:55 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 23 Mar 2000 10:47:55 -0800 Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <38DA66DB.635E8731@prescod.net> gvwilson@nevex.com wrote: > > [The following passed the Ping test, so I'm posting it here] > > If None becomes a keyword, I would like to ask whether it could be used to > signal that a method is a class method, as opposed to an instance method: +1 Idea is good, but I'm not really happy with any of the the proposed terminology...Python doesn't really have static anything. I would vote at the same time to make self a keyword and signal if the first argument is not one of None or self. Even now, one of my most common Python mistakes is in forgetting self. I expect it happens to anyone who shifts between other languages and Python. Why does None have an upper case "N"? Maybe the keyword version should be lower-case... -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "I and my companions suffer from a disease of the heart that can only be cured with gold", Hernan Cortes From bwarsaw@cnri.reston.va.us Thu Mar 23 18:57:00 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 23 Mar 2000 13:57:00 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <14554.26876.514559.320219@anthem.cnri.reston.va.us> >>>>> "gvwilson" == writes: gvwilson> If None becomes a keyword, I would like to ask whether gvwilson> it could be used to signal that a method is a class gvwilson> method, as opposed to an instance method: It still seems mildly weird that None would be a special kind of keyword, one that has a value and is used in ways that no other keyword is used. Greg gives an example, and here's a few more: def baddaboom(x, y, z=None): ... if z is None: ... try substituting `else' for `None' in these examples. ;) Putting that issue aside, Greg's suggestion for static method definitions is interesting. class Ping: # would this be a SyntaxError? def __init__(None, arg): ... def staticMethod(None, arg): ... p = Ping() Ping.staticMethod(p, 7) # TypeError Ping.staticMethod(7) # This is fine p.staticMethod(7) # So's this Ping.staticMethod(p) # and this !! -Barry From paul@prescod.net Thu Mar 23 18:52:25 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 23 Mar 2000 10:52:25 -0800 Subject: [Python-Dev] dir() Message-ID: <38DA67E9.AA593B7A@prescod.net> Can someone explain why dir(foo) does not return all of foo's methods? I know it's documented that way, I just don't know why it *is* that way. I'm also not clear why instances don't have auto-populated __methods__ and __members__ members? If there isn't a good reason (there probably is) then I would advocate that these functions and members should be more comprehensive. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "I and my companions suffer from a disease of the heart that can only be cured with gold", Hernan Cortes From bwarsaw@cnri.reston.va.us Thu Mar 23 19:00:57 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 23 Mar 2000 14:00:57 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <14554.27113.546575.170565@anthem.cnri.reston.va.us> >>>>> "PF" == Peter Funk writes: | try: | del None | except SyntaxError: | pass # Wow running Py3K here! I know how to break your Py3K code: stick None=None some where higher up :) PF> I wonder, how much existinng code the None --> keyword change PF> would brake. Me too. -Barry From gvwilson@nevex.com Thu Mar 23 19:01:06 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Thu, 23 Mar 2000 14:01:06 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <14554.26876.514559.320219@anthem.cnri.reston.va.us> Message-ID: > class Ping: > # would this be a SyntaxError? > def __init__(None, arg): > ... Absolutely a syntax error; ditto any of the other special names (e.g. __add__). Greg From akuchlin@mems-exchange.org Thu Mar 23 19:06:33 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Thu, 23 Mar 2000 14:06:33 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <14554.27113.546575.170565@anthem.cnri.reston.va.us> References: <14554.27113.546575.170565@anthem.cnri.reston.va.us> Message-ID: <14554.27449.69043.924322@amarok.cnri.reston.va.us> Barry A. Warsaw writes: >>>>>> "PF" == Peter Funk writes: > PF> I wonder, how much existinng code the None --> keyword change > PF> would brake. >Me too. I can't conceive of anyone using None as a function name or a variable name, except through a bug or thinking that 'None, useful, None = 1,2,3' works. Even though None isn't a fixed constant, it might as well be. How much C code have you see lately that starts with int function(void *NULL) ? Being able to do "None = 2" also smacks a bit of those legendary Fortran compilers that let you accidentally change 2 into 4. +1 on this change for Py3K, and I doubt it would cause breakage even if introduced into 1.x. -- A.M. Kuchling http://starship.python.net/crew/amk/ Principally I played pedants, idiots, old fathers, and drunkards. As you see, I had a narrow escape from becoming a professor. -- Robertson Davies, "Shakespeare over the Port" From paul@prescod.net Thu Mar 23 19:02:33 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 23 Mar 2000 11:02:33 -0800 Subject: [Python-Dev] Unicode character names Message-ID: <38DA6A49.A60E405B@prescod.net> Here's a feature I like from Perl's Unicode support: """ Support for interpolating named characters The new \N escape interpolates named characters within strings. For example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a unicode smiley face at the end. """ I get really tired of looking up the Unicode character for "ndash" or "right dagger". Does our Unicode database have enough information to make something like this possible? Obviously using the official (English) name is only really helpful for people who speak English, so we should not remove the numeric option. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "I and my companions suffer from a disease of the heart that can only be cured with gold", Hernan Cortes From tismer@tismer.com Thu Mar 23 19:27:53 2000 From: tismer@tismer.com (Christian Tismer) Date: Thu, 23 Mar 2000 20:27:53 +0100 Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <38DA7039.B7CDC6FF@tismer.com> Mark Hammond wrote: > > ... > > If None becomes a keyword, I would like to ask whether it could be used to > > signal that a method is a class method, as opposed to an instance method: > > > > def classMethod(None, arg): > > ...equivalent of C++ 'static'... > ... > > > I'd also like to ask (separately) that assignment to None be defined as a > > no-op, so that programmers can write: > > > > year, month, None, None, None, None, weekday, None, None = > > gmtime(time()) > > In the vernacular of a certain Mr Stein... > > +2 on both of these :-) me 2, äh 1.5... The assignment no-op seems to be ok. Having None as a place holder for static methods creates the problem that we loose compatibility with ordinary functions. What I would propose instead is: make the parameter name "self" mandatory for methods, and turn everything else into a static method. This does not change function semantics, but just the way the method binding works. > [Although I do believe "static method" is a better name than "penguin" :-] pynguin -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From gvwilson@nevex.com Thu Mar 23 19:33:47 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Thu, 23 Mar 2000 14:33:47 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <38DA7039.B7CDC6FF@tismer.com> Message-ID: Hi, Christian; thanks for your mail. > What I would propose instead is: > make the parameter name "self" mandatory for methods, and turn > everything else into a static method. In my experience, significant omissions (i.e. something being important because it is *not* there) often give beginners trouble. For example, in C++, you can't tell whether: int foo::bar(int bah) { return 0; } belongs to instances, or to the class as a whole, without referring back to the header file [1]. To quote the immortal Jeremy Hylton: Pythonic design rules #2: Explicit is better than implicit. Also, people often ask why 'self' is required as a method argument in Python, when it is not in C++ or Java; this proposal would (retroactively) answer that question... Greg [1] I know this isn't a problem in Java or Python; I'm just using it as an illustration. From skip@mojam.com (Skip Montanaro) Thu Mar 23 20:09:00 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Thu, 23 Mar 2000 14:09:00 -0600 (CST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <14554.27449.69043.924322@amarok.cnri.reston.va.us> References: <14554.27113.546575.170565@anthem.cnri.reston.va.us> <14554.27449.69043.924322@amarok.cnri.reston.va.us> Message-ID: <14554.31196.387213.472302@beluga.mojam.com> AMK> +1 on this change for Py3K, and I doubt it would cause breakage AMK> even if introduced into 1.x. Or if it did, it's probably code that's marginally broken already... -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From tismer@tismer.com Thu Mar 23 20:21:09 2000 From: tismer@tismer.com (Christian Tismer) Date: Thu, 23 Mar 2000 21:21:09 +0100 Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <38DA7CB5.87D62E14@tismer.com> Yo, gvwilson@nevex.com wrote: > > Hi, Christian; thanks for your mail. > > > What I would propose instead is: > > make the parameter name "self" mandatory for methods, and turn > > everything else into a static method. > > In my experience, significant omissions (i.e. something being important > because it is *not* there) often give beginners trouble. For example, > in C++, you can't tell whether: > > int foo::bar(int bah) > { > return 0; > } > > belongs to instances, or to the class as a whole, without referring back > to the header file [1]. To quote the immortal Jeremy Hylton: > > Pythonic design rules #2: > Explicit is better than implicit. Sure. I am explicitly *not* using self if I want no self. :-) > Also, people often ask why 'self' is required as a method argument in > Python, when it is not in C++ or Java; this proposal would (retroactively) > answer that question... You prefer to use the explicit keyword None? How would you then deal with def outside(None, blah): pass # stuff I believe one answer about the explicit "self" is that it should be simple and compatible with ordinary functions. Guido had just to add the semantics that in methods the first parameter automatically binds to the instance. The None gives me a bit of trouble, but not much. What I would like to spell is ordinary functions (as it is now) functions which are instance methods (with the immortal self) functions which are static methods ??? functions which are class methods !!! Static methods can work either with the "1st param==None" rule or with the "1st paramname!=self" rule or whatever. But how would you do class methods, which IMHO should have their class passed in as first parameter? Do you see a clean syntax for this? I thought of some weirdness like def meth(self, ... def static(self=None, ... # eek def classm(self=class, ... # ahem but this breaks the rule of default argument order. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From akuchlin@mems-exchange.org Thu Mar 23 20:27:41 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Thu, 23 Mar 2000 15:27:41 -0500 (EST) Subject: [Python-Dev] Unicode character names In-Reply-To: <38DA6A49.A60E405B@prescod.net> References: <38DA6A49.A60E405B@prescod.net> Message-ID: <14554.32317.730574.967165@amarok.cnri.reston.va.us> Paul Prescod writes: >The new \N escape interpolates named characters within strings. For >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a >unicode smiley face at the end. Cute idea, and it certainly means you can avoid looking up Unicode numbers. (You can look up names instead. :) ) Note that this means the Unicode database is no longer optional if this is done; it has to be around at code-parsing time. Python could import it automatically, as exceptions.py is imported. Christian's work on compressing unicodedatabase.c is therefore really important. (Is Perl5.6 actually dragging around the Unicode database in the binary, or is it read out of some external file or data structure?) -- A.M. Kuchling http://starship.python.net/crew/amk/ About ten days later, it being the time of year when the National collected down and outs to walk on and understudy I arrived at the head office of the National Theatre in Aquinas Street in Waterloo. -- Tom Baker, in his autobiography From bwarsaw@cnri.reston.va.us Thu Mar 23 20:39:43 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 23 Mar 2000 15:39:43 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods References: <38DA7039.B7CDC6FF@tismer.com> Message-ID: <14554.33039.4390.591036@anthem.cnri.reston.va.us> >>>>> "gvwilson" == writes: gvwilson> belongs to instances, or to the class as a whole, gvwilson> without referring back to the header file [1]. To quote gvwilson> the immortal Jeremy Hylton: Not to take anything away from Jeremy, who has contributed some wonderfully Pythonic quotes of his own, but this one is taken from Tim Peters' Zen of Python http://www.python.org/doc/Humor.html#zen timbot-is-the-only-one-who's-gonna-outlive-his-current-chip-set- around-here-ly y'rs, -Barry From jeremy-home@cnri.reston.va.us Thu Mar 23 20:55:25 2000 From: jeremy-home@cnri.reston.va.us (Jeremy Hylton) Date: Thu, 23 Mar 2000 15:55:25 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: References: <38DA7039.B7CDC6FF@tismer.com> Message-ID: <14554.33590.844200.145871@walden> >>>>> "GVW" == gvwilson writes: GVW> To quote the immortal Jeremy Hylton: GVW> Pythonic design rules #2: GVW> Explicit is better than implicit. I wish I could take credit for that :-). Tim Peters posted a list of 20 Pythonic theses to comp.lang.python under the title "The Python Way." I'll collect them all here in hopes of future readers mistaking me for Tim again . Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those! See http://x27.deja.com/getdoc.xp?AN=485548918&CONTEXT=953844380.1254555688&hitnum=9 for the full post. to-be-immortal-i'd-need-to-be-a-bot-ly y'rs Jeremy From hylton@jagunet.com Thu Mar 23 21:01:01 2000 From: hylton@jagunet.com (Jeremy Hylton) Date: Thu, 23 Mar 2000 16:01:01 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: References: Message-ID: <14554.34037.232728.670271@walden> >>>>> "GVW" == gvwilson writes: GVW> I'd also like to ask (separately) that assignment to None be GVW> defined as a no-op, so that programmers can write: GVW> year, month, None, None, None, None, weekday, None, None = GVW> gmtime(time()) GVW> instead of having to create throw-away variables to fill in GVW> slots in tuples that they don't care about. I think both GVW> behaviors are readable; the first provides genuinely new GVW> functionality, while I often found the second handy when I was GVW> doing logic programming. -1 on this proposal Pythonic design rule #8: Special cases aren't special enough to break the rules. I think it's confusing to have assignment mean pop the top of the stack for the special case that the name is None. If Py3K makes None a keyword, then it would also be the only keyword that can be used in an assignment. Finally, we'd need to explain to the rare newbie who used None as variable name why they assigned 12 to None but that it's value was its name when it was later referenced. (Think 'print None'.) When I need to ignore some of the return values, I use the name nil. year, month, nil, nil, nil, nil, weekday, nil, nil = gmtime(time()) I think that's just as clear, only a whisker less efficient, and requires no special cases. Heck, it's even less typing <0.5 wink>. Jeremy From gvwilson@nevex.com Thu Mar 23 20:59:41 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Thu, 23 Mar 2000 15:59:41 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <14554.33590.844200.145871@walden> Message-ID: > GVW> To quote the immortal Jeremy Hylton: > GVW> Pythonic design rules #2: > GVW> Explicit is better than implicit. > > I wish I could take credit for that :-). Tim Peters posted a list of > 20 Pythonic theses to comp.lang.python under the title "The Python > Way." Traceback (innermost last): File "", line 1, in ? AttributionError: insight incorrectly ascribed From paul@prescod.net Thu Mar 23 21:26:42 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 23 Mar 2000 13:26:42 -0800 Subject: [Python-Dev] None as a keyword / class methods References: <14554.34037.232728.670271@walden> Message-ID: <38DA8C12.DFFD63D5@prescod.net> Jeremy Hylton wrote: > > ... > year, month, nil, nil, nil, nil, weekday, nil, nil = gmtime(time()) So you're proposing nil as a new keyword? I like it. +2 -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "No, I'm not QUITE that stupid", Paul Prescod From pf@artcom-gmbh.de Thu Mar 23 21:46:49 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Thu, 23 Mar 2000 22:46:49 +0100 (MET) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <14554.27113.546575.170565@anthem.cnri.reston.va.us> from "Barry A. Warsaw" at "Mar 23, 2000 2: 0:57 pm" Message-ID: Hi Barry! > >>>>> "PF" == Peter Funk writes: > > | try: > | del None > | except SyntaxError: > | pass # Wow running Py3K here! Barry A. Warsaw: > I know how to break your Py3K code: stick None=None some where higher > up :) Hmm.... I must admit, that I don't understand your argument. In Python <= 1.5.2 'del None' works fine, iff it follows any assignment to None in the same scope regardless, whether there has been a None=None in the surrounding scope or in the same scope before this. Since something like 'del for' or 'del import' raises a SyntaxError exception in Py152, I expect 'del None' to raise the same exception in Py3K, after None has become a keyword. Right? Regards, Peter From andy@reportlab.com Thu Mar 23 21:54:23 2000 From: andy@reportlab.com (Andy Robinson) Date: Thu, 23 Mar 2000 21:54:23 GMT Subject: [Python-Dev] Unicode Character Names In-Reply-To: <20000323202533.ABDB31CEF8@dinsdale.python.org> References: <20000323202533.ABDB31CEF8@dinsdale.python.org> Message-ID: <38da90b4.756297@post.demon.co.uk> >Message: 20 >From: "Andrew M. Kuchling" >Date: Thu, 23 Mar 2000 15:27:41 -0500 (EST) >To: "python-dev@python.org" >Subject: Re: [Python-Dev] Unicode character names > >Paul Prescod writes: >>The new \N escape interpolates named characters within strings. For >>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a >>unicode smiley face at the end.=20 > >Cute idea, and it certainly means you can avoid looking up Unicode >numbers. (You can look up names instead. :) ) Note that this means the >Unicode database is no longer optional if this is done; it has to be >around at code-parsing time. Python could import it automatically, as >exceptions.py is imported. Christian's work on compressing >unicodedatabase.c is therefore really important. (Is Perl5.6 actually >dragging around the Unicode database in the binary, or is it read out >of some external file or data structure?) I agree - the names are really useful. If you are doing conversion work, often you want to know what a character is, but don't have a complete Unicode font handy. Being able to get the description for a Unicode character is useful, as well as being able to use the description as a constructor for it. Also, there are some language specific things that might make it useful to have the full character descriptions in Christian's database. For example, we'll have an (optional, not in the standard library) Japanese module with functions like=20 isHalfWidthKatakana(), isFullWidthKatakana() to help normalize things. Parsing the database and looking for strings in the descriptions is one way to build this - not the only one, but it might be useful. So I'd vote to put names in at first, and give us a few weeks to see how useful they are before a final decision. - Andy Robinson From paul@prescod.net Thu Mar 23 22:09:42 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 23 Mar 2000 14:09:42 -0800 Subject: [Python-Dev] Unicode character names References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us> Message-ID: <38DA9626.8B62DB77@prescod.net> "Andrew M. Kuchling" wrote: > > Paul Prescod writes: > >The new \N escape interpolates named characters within strings. For > >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a > >unicode smiley face at the end. > > Cute idea, and it certainly means you can avoid looking up Unicode > numbers. (You can look up names instead. :) ) More important, though, the code is "self documenting". You never have to go from the number back to the name. > Note that this means the > Unicode database is no longer optional if this is done; it has to be > around at code-parsing time. I don't like the idea enough to exclude support for small machines or anything like that. We should way the costs of requiring the Unicode database at compile time. > (Is Perl5.6 actually > dragging around the Unicode database in the binary, or is it read out > of some external file or data structure?) I have no idea. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "I and my companions suffer from a disease of the heart that can only be cured with gold", Hernan Cortes From pf@artcom-gmbh.de Thu Mar 23 22:12:25 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Thu, 23 Mar 2000 23:12:25 +0100 (MET) Subject: [Python-Dev] Py3K: True and False builtin or keyword? Message-ID: Regarding the discussion about None becoming a keyword in Py3K: Recently the truth values True and False have been mentioned. Should they become builtin values --like None is now-- or should they become keywords? Nevertheless: for the time being I came up with the following weird idea: If you put this in front of the main module of a Python app: #!/usr/bin/env python if __name__ == "__main__": import sys if sys.version[0] <= '1': __builtins__.True = 1 __builtins__.False = 0 del sys # --- continue with your app from here: --- import foo, bar, ... .... Now you can start to use False and True in any immported module as if they were already builtins. Of course this is no surprise here and Python is really fun, Peter. From mal@lemburg.com Thu Mar 23 21:07:35 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 23 Mar 2000 22:07:35 +0100 Subject: [Python-Dev] Unicode character names References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us> Message-ID: <38DA8797.F16301E4@lemburg.com> "Andrew M. Kuchling" wrote: > > Paul Prescod writes: > >The new \N escape interpolates named characters within strings. For > >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a > >unicode smiley face at the end. > > Cute idea, and it certainly means you can avoid looking up Unicode > numbers. (You can look up names instead. :) ) Note that this means the > Unicode database is no longer optional if this is done; it has to be > around at code-parsing time. Python could import it automatically, as > exceptions.py is imported. Christian's work on compressing > unicodedatabase.c is therefore really important. (Is Perl5.6 actually > dragging around the Unicode database in the binary, or is it read out > of some external file or data structure?) Sorry to disappoint you guys, but the Unicode name and comments are *not* included in the unicodedatabase.c file Christian is currently working on. The reason is simple: it would add huge amounts of string data to the file. So this is a no-no for the core distribution... Still, the above is easily possible by inventing a new encoding, say unicode-with-smileys, which then reads in a file containing the Unicode names and applies the necessary magic to decode/encode data as Paul described above. Would probably make a cool fun-project for someone who wants to dive into writing codecs. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From bwarsaw@cnri.reston.va.us Thu Mar 23 23:02:06 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Thu, 23 Mar 2000 18:02:06 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods References: <14554.27113.546575.170565@anthem.cnri.reston.va.us> Message-ID: <14554.41582.688247.569547@anthem.cnri.reston.va.us> Hi Peter! >>>>> "PF" == Peter Funk writes: PF> Since something like 'del for' or 'del import' raises a PF> SyntaxError exception in Py152, I expect 'del None' to raise PF> the same exception in Py3K, after None has become a keyword. PF> Right? I misread your example the first time through, but it still doesn't quite parse on my second read. -------------------- snip snip -------------------- pyvers = '2k' try: del import except SyntaxError: pyvers = '3k' -------------------- snip snip -------------------- % python /tmp/foo.py File "/tmp/foo.py", line 3 del import ^ SyntaxError: invalid syntax -------------------- snip snip -------------------- See, you can't catch that SyntaxError because it doesn't happen at run-time. Maybe you meant to wrap the try suite in an exec? Here's a code sample that ought to work with 1.5.2 and the mythical Py3K-with-a-None-keyword. -------------------- snip snip -------------------- pyvers = '2k' try: exec "del None" except SyntaxError: pyvers = '3k' except NameError: pass print pyvers -------------------- snip snip -------------------- Cheers, -Barry From klm@digicool.com Thu Mar 23 23:05:08 2000 From: klm@digicool.com (Ken Manheimer) Date: Thu, 23 Mar 2000 18:05:08 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: Message-ID: On Thu, 23 Mar 2000 pf@artcom-gmbh.de wrote: > Hi Barry! > > > >>>>> "PF" == Peter Funk writes: > > > > | try: > > | del None > > | except SyntaxError: > > | pass # Wow running Py3K here! > > Barry A. Warsaw: > > I know how to break your Py3K code: stick None=None some where higher > > up :) Huh. Does anyone really think we're going to catch SyntaxError at runtime, ever? Seems like the code fragment above wouldn't work in the first place. But i suppose, with most of a millennium to emerge, py3k could have more fundamental changes than i could even imagine...-) Ken klm@digicool.com From pf@artcom-gmbh.de Thu Mar 23 22:53:34 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Thu, 23 Mar 2000 23:53:34 +0100 (MET) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <14554.27449.69043.924322@amarok.cnri.reston.va.us> from "Andrew M. Kuchling" at "Mar 23, 2000 2: 6:33 pm" Message-ID: Hi! > Barry A. Warsaw writes: > >>>>>> "PF" == Peter Funk writes: > > PF> I wonder, how much existinng code the None --> keyword change > > PF> would brake. > >Me too. Andrew M. Kuchling: > I can't conceive of anyone using None as a function name or a variable > name, except through a bug or thinking that 'None, useful, None = > 1,2,3' works. Even though None isn't a fixed constant, it might as > well be. How much C code have you see lately that starts with int > function(void *NULL) ? I agree. urban legend: Once upon a time someone found the following neat snippet of C source hidden in some header file of a very very huge software, after he has spend some nights trying to figure out, why some simple edits he made in order to make the code more readable broke the system: #ifdef TRUE /* eat this: you arrogant Quiche Eaters */ #undef TRUE #undef FALSE #define TRUE (0) #define FALSE (1) #endif Obviously the poor guy would have found this particular small piece of evil code much earlier, if he had simply 'grep'ed for comments... there were not so many in this system. ;-) > Being able to do "None = 2" also smacks a bit of those legendary > Fortran compilers that let you accidentally change 2 into 4. +1 on > this change for Py3K, and I doubt it would cause breakage even if > introduced into 1.x. We'll see: those "Real Programmers" never die. Fortunately they prefer Perl over Python. <0.5 grin> Regards, Peter From klm@digicool.com Thu Mar 23 23:15:42 2000 From: klm@digicool.com (Ken Manheimer) Date: Thu, 23 Mar 2000 18:15:42 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <14554.41582.688247.569547@anthem.cnri.reston.va.us> Message-ID: On Thu, 23 Mar 2000 bwarsaw@cnri.reston.va.us wrote: > See, you can't catch that SyntaxError because it doesn't happen at > run-time. Maybe you meant to wrap the try suite in an exec? Here's a Huh. Guess i should have read barry's re-response before i posted mine: Desperately desiring to redeem myself, and contribute something to the discussion, i'll settle the class/static method naming quandry with the obvious alternative: > > p.classMethod("hey, cool!") # also selfless These should be called buddha methods - no self, samadhi, one with everything, etc. There, now i feel better. :-) Ken klm@digicool.com A Zen monk walks up to a hotdog vendor and says "make me one with everything." Ha. But that's not all. He gets the hot dog and pays with a ten. After several moments waiting, he says to the vendor, "i was expecting change", and the vendor say, "you of all people should know, change comes from inside." That's all. From bwarsaw@cnri.reston.va.us Thu Mar 23 23:19:28 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 23 Mar 2000 18:19:28 -0500 (EST) Subject: [Python-Dev] Py3K: True and False builtin or keyword? References: Message-ID: <14554.42624.213027.854942@anthem.cnri.reston.va.us> >>>>> "PF" == Peter Funk writes: PF> Now you can start to use False and True in any immported PF> module as if they were already builtins. Of course this is no PF> surprise here and Python is really fun, Peter. You /can/ do this, but that doesn't mean you /should/ :) Mucking with builtins is fun the way huffing dry erase markers is fun. Things are very pretty at first, but eventually the brain cell lossage will more than outweigh that cheap thrill. I've seen a few legitimate uses for hacking builtins. In Zope, I believe Jim hacks get_transaction() or somesuch into builtins because that way it's easy to get at without passing it through the call tree. And in Zope it makes sense since this is a fancy database application and your current transaction is a central concept. I've occasionally wrapped an existing builtin because I needed to extend it's functionality while keeping it's semantics and API unchanged. An example of this was my pre-Python-1.5.2 open_ex() in Mailman's CGI driver script. Before builtin open() would print the failing file name, my open_ex() -- shown below -- would hack that into the exception object. But one of the things about Python that I /really/ like is that YOU KNOW WHERE THINGS COME FROM. If I suddenly start seeing True and False in your code, I'm going to look for function locals and args, then module globals, then from ... import *'s. If I don't see it in any of those, I'm going to put down my dry erase markers, look again, and then utter a loud "huh?" :) -Barry realopen = open def open_ex(filename, mode='r', bufsize=-1, realopen=realopen): from Mailman.Utils import reraise try: return realopen(filename, mode, bufsize) except IOError, e: strerror = e.strerror + ': ' + filename e.strerror = strerror e.filename = filename e.args = (e.args[0], strerror) reraise(e) import __builtin__ __builtin__.__dict__['open'] = open_ex From pf@artcom-gmbh.de Thu Mar 23 23:23:57 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Fri, 24 Mar 2000 00:23:57 +0100 (MET) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: from Ken Manheimer at "Mar 23, 2000 6: 5: 8 pm" Message-ID: Hi! > > > | try: > > > | del None > > > | except SyntaxError: > > > | pass # Wow running Py3K here! > > > > Barry A. Warsaw: > > > I know how to break your Py3K code: stick None=None some where higher > > > up :) > Ken Manheimer: > Huh. Does anyone really think we're going to catch SyntaxError at > runtime, ever? Seems like the code fragment above wouldn't work in the > first place. Ouuppps... Unfortunately I had no chance to test this with Py3K before making a fool of myself by posting this silly example. Now I understand what Barry meant. So if None really becomes a keyword in Py3K we can be sure to catch all those imaginary 'del None' statements very quickly. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From billtut@microsoft.com Fri Mar 24 02:46:06 2000 From: billtut@microsoft.com (Bill Tutt) Date: Thu, 23 Mar 2000 18:46:06 -0800 Subject: [Python-Dev] Re: Unicode character names Message-ID: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> MAL wrote: >Andrew M. Kuchling" wrote: >> >> Paul Prescod writes: >>>The new \N escape interpolates named characters within strings. For >>>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a >>>unicode smiley face at the end. >> >> Cute idea, and it certainly means you can avoid looking up Unicode >> numbers. (You can look up names instead. :) ) Note that this means the >> Unicode database is no longer optional if this is done; it has to be >> around at code-parsing time. Python could import it automatically, as >> exceptions.py is imported. Christian's work on compressing >> unicodedatabase.c is therefore really important. (Is Perl5.6 actually >> dragging around the Unicode database in the binary, or is it read out >> of some external file or data structure?) > > Sorry to disappoint you guys, but the Unicode name and comments > are *not* included in the unicodedatabase.c file Christian > is currently working on. The reason is simple: it would add > huge amounts of string data to the file. So this is a no-no > for the core distribution... > Ok, now you're just being silly. Its possible to put the character names in a separate structure so that they don't automatically get paged in with the normal unicode character property data. If you never use it, it won't get paged in, its that simple.... Looking up the Unicode code value from the Unicode character name smells like a good time to use gperf to generate a perfect hash function for the character names. Esp. for the Unicode 3.0 character namespace. Then you can just store the hashkey -> Unicode character mapping, and hardly ever need to page in the actual full character name string itself. I haven't looked at what the comment field contains, so I have no idea how useful that info is. *waits while gperf crunches through the ~10,550 Unicode characters where this would be useful* Bill From akuchlin@mems-exchange.org Fri Mar 24 02:51:25 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 23 Mar 2000 21:51:25 -0500 (EST) Subject: [Python-Dev] 1.6 job list Message-ID: <200003240251.VAA19921@newcnri.cnri.reston.va.us> I've written up a list of things that need to get done before 1.6 is finished. This is my vision of what needs to be done, and doesn't have an official stamp of approval from GvR or anyone else. So it's very probably wrong. http://starship.python.net/crew/amk/python/1.6-jobs.html Here's the list formatted as text. The major outstanding things at the moment seem to be sre and Distutils; once they go in, you could probably release an alpha, because the other items are relatively minor. Still to do * XXX Revamped import hooks (or is this a post-1.6 thing?) * Update the documentation to match 1.6 changes. * Document more undocumented modules * Unicode: Add Unicode support for open() on Windows * Unicode: Compress the size of unicodedatabase * Unicode: Write \N{SMILEY} codec for Unicode * Unicode: the various XXX items in Misc/unicode.txt * Add module: Distutils * Add module: Jim Ahlstrom's zipfile.py * Add module: PyExpat interface * Add module: mmapfile * Add module: sre * Drop cursesmodule and package it separately. (Any other obsolete modules that should go?) * Delete obsolete subdirectories in Demo/ directory * Refurbish Demo subdirectories to be properly documented, match modern coding style, etc. * Support Unicode strings in PyExpat interface * Fix ./ld_so_aix installation problem on AIX * Make test.regrtest.py more usable outside of the Python test suite * Conservative garbage collection of cycles (maybe?) * Write friendly "What's New in 1.6" document/article Done Nothing at the moment. After 1.7 * Rich comparisons * Revised coercions * Parallel for loop (for i in L; j in M: ...), * Extended slicing for all sequences. * GvR: "I've also been thinking about making classes be types (not as huge a change as you think, if you don't allow subclassing built-in types), and adding a built-in array type suitable for use by NumPy." --amk From esr@thyrsus.com Fri Mar 24 03:30:53 2000 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 23 Mar 2000 22:30:53 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003240251.VAA19921@newcnri.cnri.reston.va.us>; from Andrew Kuchling on Thu, Mar 23, 2000 at 09:51:25PM -0500 References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> Message-ID: <20000323223053.J28880@thyrsus.com> Andrew Kuchling : > * Drop cursesmodule and package it separately. (Any other obsolete > modules that should go?) Annoyingly enough, I may need this to stay in, for use by the new Linux-kernel configuration system I'm writing. Why is it on the hit list? -- Eric S. Raymond Still, if you will not fight for the right when you can easily win without bloodshed, if you will not fight when your victory will be sure and not so costly, you may come to the moment when you will have to fight with all the odds against you and only a precarious chance for survival. There may be a worse case. You may have to fight when there is no chance of victory, because it is better to perish than to live as slaves. --Winston Churchill From dan@cgsoftware.com Fri Mar 24 03:52:54 2000 From: dan@cgsoftware.com (Daniel Berlin+list.python-dev) Date: 23 Mar 2000 22:52:54 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: "Eric S. Raymond"'s message of "Thu, 23 Mar 2000 22:30:53 -0500" References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> Message-ID: <4s9x6n3d.fsf@dan.resnet.rochester.edu> "Eric S. Raymond" writes: > Andrew Kuchling : > > * Drop cursesmodule and package it separately. (Any other obsolete > > modules that should go?) > > Annoyingly enough, I may need this to stay in, for use by the new Linux-kernel > configuration system I'm writing. Why is it on the hit list? IIRC, it's because nobody really maintains it, and those that care about it, use a different one (either ncurses module, or a newer cursesmodule). So from what i understand, you get complaints, but no real advantage to having it there. I'm just trying to summarize, not fall on either side (some people get touchy about issues like this). --Dan From esr@thyrsus.com Fri Mar 24 04:11:37 2000 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 23 Mar 2000 23:11:37 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: <4s9x6n3d.fsf@dan.resnet.rochester.edu>; from Daniel Berlin+list.python-dev on Thu, Mar 23, 2000 at 10:52:54PM -0500 References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> <4s9x6n3d.fsf@dan.resnet.rochester.edu> Message-ID: <20000323231137.U28880@thyrsus.com> Daniel Berlin+list.python-dev : > > Andrew Kuchling : > > > * Drop cursesmodule and package it separately. (Any other obsolete > > > modules that should go?) > > > > Annoyingly enough, I may need this to stay in, for use by the new Linux-kernel > > configuration system I'm writing. Why is it on the hit list? > > IIRC, it's because nobody really maintains it, and those that care > about it, use a different one (either ncurses module, or a newer cursesmodule). > So from what i understand, you get complaints, but no real advantage > to having it there. OK. Then what I guess I'd like is for a maintained equivalent of this to join the core -- the ncurses module you referred to, for choice. I'm not being random. I'm trying to replace the mess that currently constitutes the kbuild system -- but I'll need to support an equivalent of menuconfig. -- Eric S. Raymond "The state calls its own violence `law', but that of the individual `crime'" -- Max Stirner From akuchlin@mems-exchange.org Fri Mar 24 04:33:24 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 23 Mar 2000 23:33:24 -0500 (EST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <20000323231137.U28880@thyrsus.com> References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> <4s9x6n3d.fsf@dan.resnet.rochester.edu> <20000323231137.U28880@thyrsus.com> Message-ID: <14554.61460.311650.599253@newcnri.cnri.reston.va.us> Eric S. Raymond writes: >OK. Then what I guess I'd like is for a maintained equivalent of this >to join the core -- the ncurses module you referred to, for choice. See the "Whither cursesmodule" thread in the python-dev archives: http://www.python.org/pipermail/python-dev/2000-February/003796.html One possibility was to blow off backward compatibility; are there any systems that only have BSD curses, not SysV curses / ncurses? Given that Pavel Curtis announced he was dropping BSD curses maintainance some years ago, I expect even the *BSDs use ncurses these days. However, Oliver Andrich doesn't seem interested in maintaining his ncurses module, and someone just started a SWIG-generated interface (http://pyncurses.sourceforge.net), so it's not obvious which one you'd use. (I *would* be willing to take over maintaining Andrich's code; maintaining the BSD curses version just seems pointless these days.) --amk From dan@cgsoftware.com Fri Mar 24 04:43:51 2000 From: dan@cgsoftware.com (Daniel Berlin+list.python-dev) Date: 23 Mar 2000 23:43:51 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: Andrew Kuchling's message of "Thu, 23 Mar 2000 23:33:24 -0500 (EST)" References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> <4s9x6n3d.fsf@dan.resnet.rochester.edu> <20000323231137.U28880@thyrsus.com> <14554.61460.311650.599253@newcnri.cnri.reston.va.us> Message-ID: Andrew Kuchling writes: > Eric S. Raymond writes: > >OK. Then what I guess I'd like is for a maintained equivalent of this > >to join the core -- the ncurses module you referred to, for choice. > > See the "Whither cursesmodule" thread in the python-dev archives: > http://www.python.org/pipermail/python-dev/2000-February/003796.html > > One possibility was to blow off backward compatibility; are there any > systems that only have BSD curses, not SysV curses / ncurses? Given > that Pavel Curtis announced he was dropping BSD curses maintainance > some years ago, I expect even the *BSDs use ncurses these days. Yes, they do. ls /usr/src/lib/libncurses/ Makefile ncurses_cfg.h pathnames.h termcap.c grep 5\.0 /usr/src/contrib/ncurses/* At least, this is FreeBSD. So there is no need for BSD curses anymore, on FreeBSD's account. > --amk > From esr@thyrsus.com Fri Mar 24 04:47:56 2000 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 23 Mar 2000 23:47:56 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: <14554.61460.311650.599253@newcnri.cnri.reston.va.us>; from Andrew Kuchling on Thu, Mar 23, 2000 at 11:33:24PM -0500 References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> <4s9x6n3d.fsf@dan.resnet.rochester.edu> <20000323231137.U28880@thyrsus.com> <14554.61460.311650.599253@newcnri.cnri.reston.va.us> Message-ID: <20000323234756.A29775@thyrsus.com> Andrew Kuchling : > Eric S. Raymond writes: > >OK. Then what I guess I'd like is for a maintained equivalent of this > >to join the core -- the ncurses module you referred to, for choice. > > See the "Whither cursesmodule" thread in the python-dev archives: > http://www.python.org/pipermail/python-dev/2000-February/003796.html > > One possibility was to blow off backward compatibility; are there any > systems that only have BSD curses, not SysV curses / ncurses? Given > that Pavel Curtis announced he was dropping BSD curses maintainance > some years ago, I expect even the *BSDs use ncurses these days. BSD curses was officially declared dead by its maintainer, Keith Bostic, in early 1995. Keith and I conspired to kill it of in favor of ncurses :-). -- Eric S. Raymond If gun laws in fact worked, the sponsors of this type of legislation should have no difficulty drawing upon long lists of examples of criminal acts reduced by such legislation. That they cannot do so after a century and a half of trying -- that they must sweep under the rug the southern attempts at gun control in the 1870-1910 period, the northeastern attempts in the 1920-1939 period, the attempts at both Federal and State levels in 1965-1976 -- establishes the repeated, complete and inevitable failure of gun laws to control serious crime. -- Senator Orrin Hatch, in a 1982 Senate Report From andy@reportlab.com Fri Mar 24 10:14:44 2000 From: andy@reportlab.com (Andy Robinson) Date: Fri, 24 Mar 2000 10:14:44 GMT Subject: [Python-Dev] Unicode character names In-Reply-To: <20000324024913.B8C3A1CF22@dinsdale.python.org> References: <20000324024913.B8C3A1CF22@dinsdale.python.org> Message-ID: <38db3fc6.7370137@post.demon.co.uk> On Thu, 23 Mar 2000 21:49:13 -0500 (EST), you wrote: >Sorry to disappoint you guys, but the Unicode name and comments >are *not* included in the unicodedatabase.c file Christian >is currently working on. The reason is simple: it would add >huge amounts of string data to the file. So this is a no-no >for the core distribution... You're right about what is compiled into the core. I have to keep reminding myself to distinguish three places functionality can live: 1. What is compiled into the Python core 2. What is in the standard Python library relating to encodings. =20 3. Completely separate add-on packages, maintained outside of Python, to provide extra functionality for (e.g.) Asian encodings. It is clear that both the Unicode database, and the mapping tables and other files at unicode.org, are a great resource; but they could be placed in (2) or (3) easily, along with scripts to unpack them. It probably makes sense for the i18n-sig to kick off a separate 'CodecKit' project for now, and we can see what good emerges from it before thinking about what should go into the library. >Still, the above is easily possible by inventing a new >encoding, say unicode-with-smileys, which then reads in >a file containing the Unicode names and applies the necessary >magic to decode/encode data as Paul described above. >Would probably make a cool fun-project for someone who wants >to dive into writing codecs. Yup. Prime candidate for CodecKit. - Andy From mal@lemburg.com Fri Mar 24 08:52:36 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 09:52:36 +0100 Subject: [Python-Dev] Re: Unicode character names References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> Message-ID: <38DB2CD4.CAD9F0E2@lemburg.com> Bill Tutt wrote: > > MAL wrote: > > >Andrew M. Kuchling" wrote: > >> > >> Paul Prescod writes: > >>>The new \N escape interpolates named characters within strings. For > >>>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a > >>>unicode smiley face at the end. > >> > >> Cute idea, and it certainly means you can avoid looking up Unicode > >> numbers. (You can look up names instead. :) ) Note that this means the > >> Unicode database is no longer optional if this is done; it has to be > >> around at code-parsing time. Python could import it automatically, as > >> exceptions.py is imported. Christian's work on compressing > >> unicodedatabase.c is therefore really important. (Is Perl5.6 actually > >> dragging around the Unicode database in the binary, or is it read out > >> of some external file or data structure?) > > > > Sorry to disappoint you guys, but the Unicode name and comments > > are *not* included in the unicodedatabase.c file Christian > > is currently working on. The reason is simple: it would add > > huge amounts of string data to the file. So this is a no-no > > for the core distribution... > > > > Ok, now you're just being silly. Its possible to put the character names in > a separate structure so that they don't automatically get paged in with the > normal unicode character property data. If you never use it, it won't get > paged in, its that simple.... Sure, but it would still cause the interpreter binary or DLL to increase in size considerably... that caused some major noise a few days ago due to the fact that the unicodedata module adds some 600kB to the interpreter -- even though it would only get swapped in when needed (the interpreter itself doesn't use it). > Looking up the Unicode code value from the Unicode character name smells > like a good time to use gperf to generate a perfect hash function for the > character names. Esp. for the Unicode 3.0 character namespace. Then you can > just store the hashkey -> Unicode character mapping, and hardly ever need to > page in the actual full character name string itself. Great idea, but why not put this into separate codec module ? > I haven't looked at what the comment field contains, so I have no idea how > useful that info is. Probably not worth looking at... > *waits while gperf crunches through the ~10,550 Unicode characters where > this would be useful* -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Fri Mar 24 10:37:53 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 11:37:53 +0100 Subject: [Python-Dev] Unicode and Windows References: <20000322134650.ED1BD370CF2@snelboot.oratrix.nl> <38D8F55E.6E324281@lemburg.com> Message-ID: <38DB4581.EB5315E0@lemburg.com> Ok, I've just added two new parser markers to PyArg_ParseTuple() which will hopefully make life a little easier for extension writers. The new code will be in the next patch set which I will release early next week. Here are the docs: Internal Argument Parsing: -------------------------- These markers are used by the PyArg_ParseTuple() APIs: "U": Check for Unicode object and return a pointer to it "s": For Unicode objects: auto convert them to the and return a pointer to the object's buffer. "s#": Access to the Unicode object via the bf_getreadbuf buffer interface (see Buffer Interface); note that the length relates to the buffer length, not the Unicode string length (this may be different depending on the Internal Format). "t#": Access to the Unicode object via the bf_getcharbuf buffer interface (see Buffer Interface); note that the length relates to the buffer length, not necessarily to the Unicode string length (this may be different depending on the ). "es": Takes two parameters: encoding (const char **) and buffer (char **). The input object is first coerced to Unicode in the usual way and then encoded into a string using the given encoding. On output, a buffer of the needed size is allocated and returned through *buffer as NULL-terminated string. The encoded may not contain embedded NULL characters. The caller is responsible for free()ing the allocated *buffer after usage. "es#": Takes three parameters: encoding (const char **), buffer (char **) and buffer_len (int *). The input object is first coerced to Unicode in the usual way and then encoded into a string using the given encoding. If *buffer is non-NULL, *buffer_len must be set to sizeof(buffer) on input. Output is then copied to *buffer. If *buffer is NULL, a buffer of the needed size is allocated and output copied into it. *buffer is then updated to point to the allocated memory area. The caller is responsible for free()ing *buffer after usage. In both cases *buffer_len is updated to the number of characters written (excluding the trailing NULL-byte). The output buffer is assured to be NULL-terminated. Examples: Using "es#" with auto-allocation: static PyObject * test_parser(PyObject *self, PyObject *args) { PyObject *str; const char *encoding = "latin-1"; char *buffer = NULL; int buffer_len = 0; if (!PyArg_ParseTuple(args, "es#:test_parser", &encoding, &buffer, &buffer_len)) return NULL; if (!buffer) { PyErr_SetString(PyExc_SystemError, "buffer is NULL"); return NULL; } str = PyString_FromStringAndSize(buffer, buffer_len); free(buffer); return str; } Using "es" with auto-allocation returning a NULL-terminated string: static PyObject * test_parser(PyObject *self, PyObject *args) { PyObject *str; const char *encoding = "latin-1"; char *buffer = NULL; if (!PyArg_ParseTuple(args, "es:test_parser", &encoding, &buffer)) return NULL; if (!buffer) { PyErr_SetString(PyExc_SystemError, "buffer is NULL"); return NULL; } str = PyString_FromString(buffer); free(buffer); return str; } Using "es#" with a pre-allocated buffer: static PyObject * test_parser(PyObject *self, PyObject *args) { PyObject *str; const char *encoding = "latin-1"; char _buffer[10]; char *buffer = _buffer; int buffer_len = sizeof(_buffer); if (!PyArg_ParseTuple(args, "es#:test_parser", &encoding, &buffer, &buffer_len)) return NULL; if (!buffer) { PyErr_SetString(PyExc_SystemError, "buffer is NULL"); return NULL; } str = PyString_FromStringAndSize(buffer, buffer_len); return str; } -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein@lyra.org Fri Mar 24 10:54:02 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 02:54:02 -0800 (PST) Subject: [Python-Dev] Unicode and Windows In-Reply-To: <38DB4581.EB5315E0@lemburg.com> Message-ID: On Fri, 24 Mar 2000, M.-A. Lemburg wrote: >... > "s": For Unicode objects: auto convert them to the > and return a pointer to the object's buffer. Guess that I didn't notice this before, but it seems wierd that "s" and "s#" return different encodings. Why? > "es": > Takes two parameters: encoding (const char **) and > buffer (char **). >... > "es#": > Takes three parameters: encoding (const char **), > buffer (char **) and buffer_len (int *). I see no reason to make the encoding (const char **) rather than (const char *). We are never returning a value, so this just makes it harder to pass the encoding into ParseTuple. There is precedent for passing in single-ref pointers. For example: PyArg_ParseTuple(args, "O!", &s, PyString_Type) I would recommend using just one pointer level for the encoding. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal@lemburg.com Fri Mar 24 11:29:12 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 12:29:12 +0100 Subject: [Python-Dev] Unicode and Windows References: Message-ID: <38DB5188.AA580652@lemburg.com> Greg Stein wrote: > > On Fri, 24 Mar 2000, M.-A. Lemburg wrote: > >... > > "s": For Unicode objects: auto convert them to the > > and return a pointer to the object's buffer. > > Guess that I didn't notice this before, but it seems wierd that "s" and > "s#" return different encodings. > > Why? This is due to the buffer interface being used for "s#". Since "s#" refers to the getreadbuf slot, it returns raw data. In this case this is UTF-16 in platform dependent byte order. "s" relies on NULL-terminated strings and doesn't use the buffer interface at all. Thus "s" returns NULL-terminated UTF-8 (UTF-16 is full of NULLs). "t#" uses the getcharbuf slot and thus should return character data. UTF-8 is the right encoding here. > > "es": > > Takes two parameters: encoding (const char **) and > > buffer (char **). > >... > > "es#": > > Takes three parameters: encoding (const char **), > > buffer (char **) and buffer_len (int *). > > I see no reason to make the encoding (const char **) rather than > (const char *). We are never returning a value, so this just makes it > harder to pass the encoding into ParseTuple. > > There is precedent for passing in single-ref pointers. For example: > > PyArg_ParseTuple(args, "O!", &s, PyString_Type) > > I would recommend using just one pointer level for the encoding. You have a point there... even though it breaks the notion of prepending all parameters with an '&' (ok, except the type check one). OTOH, it would allow passing the encoding right with the PyArg_ParseTuple() call which probably makes more sense in this context. I'll change it... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tismer@tismer.com Fri Mar 24 13:13:02 2000 From: tismer@tismer.com (Christian Tismer) Date: Fri, 24 Mar 2000 14:13:02 +0100 Subject: [Python-Dev] Unicode character names References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us> <38DA8797.F16301E4@lemburg.com> Message-ID: <38DB69DE.6D04B084@tismer.com> "M.-A. Lemburg" wrote: > > "Andrew M. Kuchling" wrote: > > > > Paul Prescod writes: > > >The new \N escape interpolates named characters within strings. For > > >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a > > >unicode smiley face at the end. > > > > Cute idea, and it certainly means you can avoid looking up Unicode > > numbers. (You can look up names instead. :) ) Note that this means the > > Unicode database is no longer optional if this is done; it has to be > > around at code-parsing time. Python could import it automatically, as > > exceptions.py is imported. Christian's work on compressing > > unicodedatabase.c is therefore really important. (Is Perl5.6 actually > > dragging around the Unicode database in the binary, or is it read out > > of some external file or data structure?) > > Sorry to disappoint you guys, but the Unicode name and comments > are *not* included in the unicodedatabase.c file Christian > is currently working on. The reason is simple: it would add > huge amounts of string data to the file. So this is a no-no > for the core distribution... This is not settled, still an open question. What I have for non-textual data: 25 kb with dumb compression 15 kb with enhanced compression What amounts of data am I talking about? - The whole unicode database text file has size 632 kb. - With PkZip this goes down to 96 kb. Now, I produced another text file with just the currently used data in it, and this sounds so: - the stripped unicode text file has size 216 kb. - PkZip melts this down to 40 kb. Please compare that to my results above: I can do at least twice as good. I hope I can compete for the text sections as well (since this is something where zip is *good* at), but just let me try. Let's target 60 kb for the whole crap, and I'd be very pleased. Then, there is still the question where to put the data. Having one file in the dll and another externally would be an option. I could also imagine to use a binary external file all the time, with maximum possible compression. By loading this structure, this would be partially expanded to make it fast. An advantage is that the compressed Unicode database could become a stand-alone product. The size is in fact so crazy small, that I'd like to make this available to any other language. > Still, the above is easily possible by inventing a new > encoding, say unicode-with-smileys, which then reads in > a file containing the Unicode names and applies the necessary > magic to decode/encode data as Paul described above. That sounds reasonable. Compression makes sense as well here, since the expanded stuff makes quite an amount of kb, compared to what it is "worth", compared to, say, the Python dll. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From mal@lemburg.com Fri Mar 24 13:41:27 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 14:41:27 +0100 Subject: [Python-Dev] Unicode character names References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us> <38DA8797.F16301E4@lemburg.com> <38DB69DE.6D04B084@tismer.com> Message-ID: <38DB7087.1B105AC7@lemburg.com> Christian Tismer wrote: > > "M.-A. Lemburg" wrote: > > > > "Andrew M. Kuchling" wrote: > > > > > > Paul Prescod writes: > > > >The new \N escape interpolates named characters within strings. For > > > >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a > > > >unicode smiley face at the end. > > > > > > Cute idea, and it certainly means you can avoid looking up Unicode > > > numbers. (You can look up names instead. :) ) Note that this means the > > > Unicode database is no longer optional if this is done; it has to be > > > around at code-parsing time. Python could import it automatically, as > > > exceptions.py is imported. Christian's work on compressing > > > unicodedatabase.c is therefore really important. (Is Perl5.6 actually > > > dragging around the Unicode database in the binary, or is it read out > > > of some external file or data structure?) > > > > Sorry to disappoint you guys, but the Unicode name and comments > > are *not* included in the unicodedatabase.c file Christian > > is currently working on. The reason is simple: it would add > > huge amounts of string data to the file. So this is a no-no > > for the core distribution... > > This is not settled, still an open question. Well, ok, depends on how much you can sqeeze out of the text columns ;-) I still think that its better to leave these gimmicks out of the core and put them into some add-on, though. > What I have for non-textual data: > 25 kb with dumb compression > 15 kb with enhanced compression Looks good :-) With these sizes I think we could even integrate the unicodedatabase.c + API into the core interpreter and only have the unicodedata module to access the database from within Python. > What amounts of data am I talking about? > - The whole unicode database text file has size > 632 kb. > - With PkZip this goes down to > 96 kb. > > Now, I produced another text file with just the currently > used data in it, and this sounds so: > - the stripped unicode text file has size > 216 kb. > - PkZip melts this down to > 40 kb. > > Please compare that to my results above: I can do at least > twice as good. I hope I can compete for the text sections > as well (since this is something where zip is *good* at), > but just let me try. > Let's target 60 kb for the whole crap, and I'd be very pleased. > > Then, there is still the question where to put the data. > Having one file in the dll and another externally would > be an option. I could also imagine to use a binary external > file all the time, with maximum possible compression. > By loading this structure, this would be partially expanded > to make it fast. > An advantage is that the compressed Unicode database > could become a stand-alone product. The size is in fact > so crazy small, that I'd like to make this available > to any other language. You could take the unicodedatabase.c file (+ header file) and use it everywhere... I don't think it needs to contain any Python specific code. The API names would have to follow the Python naming schemes though. > > Still, the above is easily possible by inventing a new > > encoding, say unicode-with-smileys, which then reads in > > a file containing the Unicode names and applies the necessary > > magic to decode/encode data as Paul described above. > > That sounds reasonable. Compression makes sense as well here, > since the expanded stuff makes quite an amount of kb, compared > to what it is "worth", compared to, say, the Python dll. With 25kB for the non-text columns, I'd suggest simply adding the file to the core. Text columns could then go into a separate module. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@python.org Fri Mar 24 14:14:51 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 09:14:51 -0500 Subject: [Python-Dev] Hi -- I'm back! Message-ID: <200003241414.JAA11740@eric.cnri.reston.va.us> I'm back from ten days on the road. I'll try to dig through the various mailing list archives over the next few days, but it would be more efficient if you are waiting for me to take action or express an opinion on a particular issue (in *any* Python-related mailing list) to mail me a summary or at least a pointer. --Guido van Rossum (home page: http://www.python.org/~guido/) From jack@oratrix.nl Fri Mar 24 15:01:25 2000 From: jack@oratrix.nl (Jack Jansen) Date: Fri, 24 Mar 2000 16:01:25 +0100 Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: Message by Ka-Ping Yee , Thu, 23 Mar 2000 09:47:47 -0800 (PST) , Message-ID: <20000324150125.7144A370CF2@snelboot.oratrix.nl> > Hmm... i guess this also means one should ask what > > def function(None, arg): > ... > > does outside a class definition. I suppose that should simply > be illegal. No, it forces you to call the function with keyword arguments! (initially meant jokingly, but thinking about it for a couple of seconds there might actually be cases where this is useful) -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From skip@mojam.com (Skip Montanaro) Fri Mar 24 15:14:11 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 24 Mar 2000 09:14:11 -0600 (CST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003240251.VAA19921@newcnri.cnri.reston.va.us> References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> Message-ID: <14555.34371.749039.946891@beluga.mojam.com> AMK> I've written up a list of things that need to get done before 1.6 AMK> is finished. This is my vision of what needs to be done, and AMK> doesn't have an official stamp of approval from GvR or anyone else. AMK> So it's very probably wrong. Might I suggest moving robotparser.py from Tools/webchecker to Lib? Modules of general usefulness (this is at least generally useful for anyone writing web spiders ;-) shouldn't live in Tools, because it's not always available and users need to do extra work to make them available. I'd be happy to write up some documentation for it and twiddle the module to include doc strings. -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From fdrake@acm.org Fri Mar 24 15:20:03 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 10:20:03 -0500 (EST) Subject: [Python-Dev] Unicode and Windows In-Reply-To: References: <38DB4581.EB5315E0@lemburg.com> Message-ID: <14555.34723.841426.504538@weyr.cnri.reston.va.us> Greg Stein writes: > There is precedent for passing in single-ref pointers. For example: > > PyArg_ParseTuple(args, "O!", &s, PyString_Type) ^^^^^^^^^^^^^^^^^ Feeling ok? I *suspect* these are reversed. :) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake@acm.org Fri Mar 24 15:24:13 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 10:24:13 -0500 (EST) Subject: [Python-Dev] Unicode and Windows In-Reply-To: <38DB5188.AA580652@lemburg.com> References: <38DB5188.AA580652@lemburg.com> Message-ID: <14555.34973.303273.716146@weyr.cnri.reston.va.us> M.-A. Lemburg writes: > You have a point there... even though it breaks the notion > of prepending all parameters with an '&' (ok, except the I've never heard of this notion; I hope I didn't just miss it in the docs! The O& also doesn't require a & in front of the name of the conversion function, you just pass the right value. So there are at least two cases where you *typically* don't use a &. (Other cases in the 1.5.2 API are probably just plain weird if they don't!) Changing it to avoid the extra machinery is the Right Thing; you get to feel good today. ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mal@lemburg.com Fri Mar 24 16:38:06 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 17:38:06 +0100 Subject: [Python-Dev] Unicode and Windows References: <38DB5188.AA580652@lemburg.com> <14555.34973.303273.716146@weyr.cnri.reston.va.us> Message-ID: <38DB99EE.F5949889@lemburg.com> "Fred L. Drake, Jr." wrote: > > M.-A. Lemburg writes: > > You have a point there... even though it breaks the notion > > of prepending all parameters with an '&' (ok, except the > > I've never heard of this notion; I hope I didn't just miss it in the > docs! If you scan the parameters list in getargs.c you'll come to this conclusion and thus my notion: I've been programming like this for years now :-) > The O& also doesn't require a & in front of the name of the > conversion function, you just pass the right value. So there are at > least two cases where you *typically* don't use a &. (Other cases in > the 1.5.2 API are probably just plain weird if they don't!) > Changing it to avoid the extra machinery is the Right Thing; you get > to feel good today. ;) Ok, feeling good now ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@python.org Fri Mar 24 20:44:02 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 15:44:02 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: Your message of "Fri, 24 Mar 2000 09:14:11 CST." <14555.34371.749039.946891@beluga.mojam.com> References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <14555.34371.749039.946891@beluga.mojam.com> Message-ID: <200003242044.PAA00677@eric.cnri.reston.va.us> > Might I suggest moving robotparser.py from Tools/webchecker to Lib? Modules > of general usefulness (this is at least generally useful for anyone writing > web spiders ;-) shouldn't live in Tools, because it's not always available > and users need to do extra work to make them available. > > I'd be happy to write up some documentation for it and twiddle the module to > include doc strings. Deal. Soon as we get the docs we'll move it to Lib. --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein@lyra.org Fri Mar 24 20:50:43 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 12:50:43 -0800 (PST) Subject: [Python-Dev] Unicode and Windows In-Reply-To: <14555.34723.841426.504538@weyr.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Fred L. Drake, Jr. wrote: > Greg Stein writes: > > There is precedent for passing in single-ref pointers. For example: > > > > PyArg_ParseTuple(args, "O!", &s, PyString_Type) > ^^^^^^^^^^^^^^^^^ > > Feeling ok? I *suspect* these are reversed. :) I just checked the code to ensure that it took a single pointer rather than a double-pointer. I guess that I didn't verify the order :-) Concept is valid, tho... the params do not necessarily require an ampersand. oop! Actually... this does require an ampersand: PyArg_ParseTuple(args, "O!", &PyString_Type, &s) Don't want to pass the whole structure... Well, regardless: I would much prefer to see the encoding passed as a constant string, rather than having to shove the sucker into a variable first, just so that I can insert a useless address-of operator. Cheers, -g -- Greg Stein, http://www.lyra.org/ From akuchlin@mems-exchange.org Fri Mar 24 20:51:56 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Fri, 24 Mar 2000 15:51:56 -0500 (EST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242044.PAA00677@eric.cnri.reston.va.us> References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <14555.34371.749039.946891@beluga.mojam.com> <200003242044.PAA00677@eric.cnri.reston.va.us> Message-ID: <14555.54636.811100.254309@amarok.cnri.reston.va.us> Guido van Rossum writes: >> Might I suggest moving robotparser.py from Tools/webchecker to Lib? Modules >Deal. Soon as we get the docs we'll move it to Lib. What about putting it in a package like 'www' or 'web'? Packagizing the existing library is hard because of backward compatibility, but there's no such constraint for new modules. -- A.M. Kuchling http://starship.python.net/crew/amk/ One need not be a chamber to be haunted; / One need not be a house; / The brain has corridors surpassing / Material place. -- Emily Dickinson, "Time and Eternity" From gstein@lyra.org Fri Mar 24 21:00:25 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 13:00:25 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <14555.54636.811100.254309@amarok.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Andrew M. Kuchling wrote: > Guido van Rossum writes: > >> Might I suggest moving robotparser.py from Tools/webchecker to Lib? Modules > >Deal. Soon as we get the docs we'll move it to Lib. > > What about putting it in a package like 'www' or 'web'? Packagizing > the existing library is hard because of backward compatibility, but > there's no such constraint for new modules. Or in the "network" package that was suggested a month ago? And why *can't* we start on repackaging old module? I think the only reason that somebody came up with to NOT do it was "well, if we don't repackage the whole thing, then we should repackage nothing." Which, IMO, is totally bogus. We'll never get anywhere operating under that principle. Cheers, -g -- Greg Stein, http://www.lyra.org/ From fdrake@acm.org Fri Mar 24 21:00:19 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 16:00:19 -0500 (EST) Subject: [Python-Dev] 1.6 job list In-Reply-To: References: <14555.54636.811100.254309@amarok.cnri.reston.va.us> Message-ID: <14555.55139.484135.602894@weyr.cnri.reston.va.us> Greg Stein writes: > Or in the "network" package that was suggested a month ago? +1 > And why *can't* we start on repackaging old module? I think the only > reason that somebody came up with to NOT do it was "well, if we don't > repackage the whole thing, then we should repackage nothing." Which, IMO, > is totally bogus. We'll never get anywhere operating under that principle. That doesn't bother me, but I tend to be a little conservative (though usually not as conservative as Guido on such matters). I *would* like to decided theat 1.7 will be fully packagized, and not wait until 2.0. As long as 1.7 is a "testing the evolutionary path" release, I think that's the right thing to do. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido@python.org Fri Mar 24 21:03:54 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 16:03:54 -0500 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead Message-ID: <200003242103.QAA03288@eric.cnri.reston.va.us> Someone noticed that socket.connect() and a few related functions (connect_ex() and bind()) take either a single (host, port) tuple or two separate arguments, but that only the tuple is documented. Similar to append(), I'd like to close this gap, and I've made the necessary changes. This will probably break lots of code. Similar to append(), I'd like people to fix their code rather than whine -- two-arg connect() has never been documented, although it's found in much code (even the socket module test code :-( ). Similar to append(), I may revert the change if it is shown to cause too much pain during beta testing... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Mar 24 21:05:57 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 16:05:57 -0500 Subject: [Python-Dev] Unicode and Windows In-Reply-To: Your message of "Fri, 24 Mar 2000 12:50:43 PST." References: Message-ID: <200003242105.QAA03543@eric.cnri.reston.va.us> > Well, regardless: I would much prefer to see the encoding passed as a > constant string, rather than having to shove the sucker into a variable > first, just so that I can insert a useless address-of operator. Of course. Use & for output args, not as a matter of principle. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Mar 24 21:11:25 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 16:11:25 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: Your message of "Fri, 24 Mar 2000 13:00:25 PST." References: Message-ID: <200003242111.QAA04208@eric.cnri.reston.va.us> [Greg] > And why *can't* we start on repackaging old module? I think the only > reason that somebody came up with to NOT do it was "well, if we don't > repackage the whole thing, then we should repackage nothing." Which, IMO, > is totally bogus. We'll never get anywhere operating under that principle. The reason is backwards compatibility. Assume we create a package "web" and move all web related modules into it: httplib, urllib, htmllib, etc. Now for backwards compatibility, we add the web directory to sys.path, so one can write either "import web.urllib" or "import urllib". But that loads the same code twice! And in this (carefully chosen :-) example, urllib actually has some state which shouldn't be replicated. Plus, it's too much work -- I'd rather focus on getting 1.6 out of the door, and there's a lot of other stuff I need to do besides moving modules around. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Mar 24 21:15:00 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 16:15:00 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: Your message of "Fri, 24 Mar 2000 16:00:19 EST." <14555.55139.484135.602894@weyr.cnri.reston.va.us> References: <14555.54636.811100.254309@amarok.cnri.reston.va.us> <14555.55139.484135.602894@weyr.cnri.reston.va.us> Message-ID: <200003242115.QAA04648@eric.cnri.reston.va.us> > Greg Stein writes: > > Or in the "network" package that was suggested a month ago? [Fred] > +1 Which reminds me of another reason to wait: coming up with the right package hierarchy is hard. (E.g. I find network too long; plus, does htmllib belong there?) > That doesn't bother me, but I tend to be a little conservative > (though usually not as conservative as Guido on such matters). I > *would* like to decided theat 1.7 will be fully packagized, and not > wait until 2.0. As long as 1.7 is a "testing the evolutionary path" > release, I think that's the right thing to do. Agreed. At the SD conference I gave a talk about the future of Python, and there was (again) a good suggestion about forwards compatibility. Starting with 1.7 (if not sooner), several Python 3000 features that necessarily have to be incompatible (like 1/2 yielding 0.5 instead of 0) could issue warnings when (or unless?) Python is invoked with a compatibility flag. --Guido van Rossum (home page: http://www.python.org/~guido/) From bwarsaw@cnri.reston.va.us Fri Mar 24 21:21:54 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 24 Mar 2000 16:21:54 -0500 (EST) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead References: <200003242103.QAA03288@eric.cnri.reston.va.us> Message-ID: <14555.56434.974884.832078@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> Someone noticed that socket.connect() and a few related GvR> functions (connect_ex() and bind()) take either a single GvR> (host, port) tuple or two separate arguments, but that only GvR> the tuple is documented. GvR> Similar to append(), I'd like to close this gap, and I've GvR> made the necessary changes. This will probably break lots of GvR> code. I don't agree that socket.connect() and friends need this fix. Yes, obviously append() needed fixing because of the application of Tim's Twelfth Enlightenment to the semantic ambiguity. But socket.connect() has no such ambiguity; you may spell it differently, but you know exactly what you mean. My suggestion would be to not break any code, but extend connect's interface to allow an optional second argument. Thus all of these calls would be legal: sock.connect(addr) sock.connect(addr, port) sock.connect((addr, port)) One nit on the documentation of the socket module. The second entry says: bind (address) Bind the socket to address. The socket must not already be bound. (The format of address depends on the address family -- see above.) Huh? What "above" part should I see? Note that I'm reading this doc off the web! -Barry From gstein@lyra.org Fri Mar 24 21:27:57 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 13:27:57 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242111.QAA04208@eric.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Guido van Rossum wrote: > [Greg] > > And why *can't* we start on repackaging old module? I think the only > > reason that somebody came up with to NOT do it was "well, if we don't > > repackage the whole thing, then we should repackage nothing." Which, IMO, > > is totally bogus. We'll never get anywhere operating under that principle. > > The reason is backwards compatibility. Assume we create a package > "web" and move all web related modules into it: httplib, urllib, > htmllib, etc. Now for backwards compatibility, we add the web > directory to sys.path, so one can write either "import web.urllib" or > "import urllib". But that loads the same code twice! And in this > (carefully chosen :-) example, urllib actually has some state which > shouldn't be replicated. We don't add it to the path. Instead, we create new modules that look like: ---- httplib.py ---- from web.httplib import * ---- The only backwards-compat issue with this approach is that people who poke values into the module will have problems. I don't believe that any of the modules were designed for that, anyhow, so it would seem an acceptable to (effectively) disallow that behavior. > Plus, it's too much work -- I'd rather focus on getting 1.6 out of the > door, and there's a lot of other stuff I need to do besides moving > modules around. Stuff that *you* need to do, sure. But there *are* a lot of us who can help here, and some who desire to spend their time moving modules. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Fri Mar 24 21:32:14 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 13:32:14 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242115.QAA04648@eric.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Guido van Rossum wrote: > > Greg Stein writes: > > > Or in the "network" package that was suggested a month ago? > > [Fred] > > +1 > > Which reminds me of another reason to wait: coming up with the right > package hierarchy is hard. (E.g. I find network too long; plus, does > htmllib belong there?) htmllib does not go there. Where does it go? Dunno. Leave it unless/until somebody comes up with a place for it. We package up obvious ones. We don't have to design a complete hierarchy. There seemed to be a general "good feeling" around some kind of network (protocol) package. Call it "net" if "network" is too long. Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido@python.org Fri Mar 24 21:27:51 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 16:27:51 -0500 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: Your message of "Fri, 24 Mar 2000 16:21:54 EST." <14555.56434.974884.832078@anthem.cnri.reston.va.us> References: <200003242103.QAA03288@eric.cnri.reston.va.us> <14555.56434.974884.832078@anthem.cnri.reston.va.us> Message-ID: <200003242127.QAA06269@eric.cnri.reston.va.us> > >>>>> "GvR" == Guido van Rossum writes: > > GvR> Someone noticed that socket.connect() and a few related > GvR> functions (connect_ex() and bind()) take either a single > GvR> (host, port) tuple or two separate arguments, but that only > GvR> the tuple is documented. > > GvR> Similar to append(), I'd like to close this gap, and I've > GvR> made the necessary changes. This will probably break lots of > GvR> code. > > I don't agree that socket.connect() and friends need this fix. Yes, > obviously append() needed fixing because of the application of Tim's > Twelfth Enlightenment to the semantic ambiguity. But socket.connect() > has no such ambiguity; you may spell it differently, but you know > exactly what you mean. > > My suggestion would be to not break any code, but extend connect's > interface to allow an optional second argument. Thus all of these > calls would be legal: > > sock.connect(addr) > sock.connect(addr, port) > sock.connect((addr, port)) You probably meant: sock.connect(addr) sock.connect(host, port) sock.connect((host, port)) since (host, port) is equivalent to (addr). > One nit on the documentation of the socket module. The second entry > says: > > bind (address) > Bind the socket to address. The socket must not already be > bound. (The format of address depends on the address family -- > see above.) > > Huh? What "above" part should I see? Note that I'm reading this doc > off the web! Fred typically directs latex2html to break all sections apart. It's in the previous section: Socket addresses are represented as a single string for the AF_UNIX address family and as a pair (host, port) for the AF_INET address family, where host is a string representing either a hostname in Internet domain notation like 'daring.cwi.nl' or an IP address like '100.50.200.5', and port is an integral port number. Other address families are currently not supported. The address format required by a particular socket object is automatically selected based on the address family specified when the socket object was created. This also explains the reason for requiring a single argument: when using AF_UNIX, the second argument makes no sense! Frankly, I'm not sure what do here -- it's more correct to require a single address argument always, but it's more convenient to allow two sometimes. Note that sendto(data, addr) only accepts the tuple form: you cannot write sendto(data, host, port). --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Fri Mar 24 21:28:32 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 16:28:32 -0500 (EST) Subject: [Python-Dev] 1.6 job list In-Reply-To: References: <200003242111.QAA04208@eric.cnri.reston.va.us> Message-ID: <14555.56832.336242.378838@weyr.cnri.reston.va.us> Greg Stein writes: > Stuff that *you* need to do, sure. But there *are* a lot of us who can > help here, and some who desire to spend their time moving modules. Would it make sense for one of these people with time on their hands to propose a specific mapping from old->new names? I think that would be a good first step, regardless of the implementation timing. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido@python.org Fri Mar 24 21:29:44 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 16:29:44 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: Your message of "Fri, 24 Mar 2000 13:27:57 PST." References: Message-ID: <200003242129.QAA06510@eric.cnri.reston.va.us> > We don't add it to the path. Instead, we create new modules that look > like: > > ---- httplib.py ---- > from web.httplib import * > ---- > > The only backwards-compat issue with this approach is that people who poke > values into the module will have problems. I don't believe that any of the > modules were designed for that, anyhow, so it would seem an acceptable to > (effectively) disallow that behavior. OK, that's reasonable. I'll have to invent a different reason why I don't want this -- because I really don't! > > Plus, it's too much work -- I'd rather focus on getting 1.6 out of the > > door, and there's a lot of other stuff I need to do besides moving > > modules around. > > Stuff that *you* need to do, sure. But there *are* a lot of us who can > help here, and some who desire to spend their time moving modules. Hm. Moving modules requires painful and arcane CVS manipulations that can only be done by the few of us here at CNRI -- and I'm the only one left who's full time on Python. I'm still not convinced that it's a good plan. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Fri Mar 24 21:32:39 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 16:32:39 -0500 (EST) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: <14555.56434.974884.832078@anthem.cnri.reston.va.us> References: <200003242103.QAA03288@eric.cnri.reston.va.us> <14555.56434.974884.832078@anthem.cnri.reston.va.us> Message-ID: <14555.57079.187670.916002@weyr.cnri.reston.va.us> Barry A. Warsaw writes: > I don't agree that socket.connect() and friends need this fix. Yes, > obviously append() needed fixing because of the application of Tim's > Twelfth Enlightenment to the semantic ambiguity. But socket.connect() > has no such ambiguity; you may spell it differently, but you know > exactly what you mean. Crock. The address representations have been fairly well defined for quite a while. Be explicit. > sock.connect(addr) This is the only legal signature. (host, port) is simply the form of addr for a particular address family. > One nit on the documentation of the socket module. The second entry > says: > > bind (address) > Bind the socket to address. The socket must not already be > bound. (The format of address depends on the address family -- > see above.) > > Huh? What "above" part should I see? Note that I'm reading this doc > off the web! Definately written for the paper document! Remind me about this again in a month and I'll fix it, but I don't want to play games with this little stuff until the 1.5.2p2 and 1.6 trees have been merged. Harrumph. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gstein@lyra.org Fri Mar 24 21:37:41 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 13:37:41 -0800 (PST) Subject: [Python-Dev] delegating (was: 1.6 job list) In-Reply-To: Message-ID: On Fri, 24 Mar 2000, Greg Stein wrote: >... > > Plus, it's too much work -- I'd rather focus on getting 1.6 out of the > > door, and there's a lot of other stuff I need to do besides moving > > modules around. > > Stuff that *you* need to do, sure. But there *are* a lot of us who can > help here, and some who desire to spend their time moving modules. I just want to empahisize this point some more. Python 1.6 has a defined timeline, with a defined set of minimal requirements. However! I don't believe that a corollary of that says we MUST ignore everything else. If those other options fit within the required timeline, then why not? (assuming we have adequate testing and doc to go with the changes) There are ample people who have time and inclination to contribute. If those contributions add positive benefit, then I see no reason to exclude them (other than on pure merit, of course). Note that some of the problems stem from CVS access. Much Guido-time could be saved by a commit-then-review model, rather than review-then-Guido- commits model. Fred does this very well with the Doc/ area. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Fri Mar 24 21:38:48 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 13:38:48 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Guido van Rossum wrote: >... > > We don't add it to the path. Instead, we create new modules that look > > like: > > > > ---- httplib.py ---- > > from web.httplib import * > > ---- > > > > The only backwards-compat issue with this approach is that people who poke > > values into the module will have problems. I don't believe that any of the > > modules were designed for that, anyhow, so it would seem an acceptable to > > (effectively) disallow that behavior. > > OK, that's reasonable. I'll have to invent a different reason why I > don't want this -- because I really don't! Fair enough. > > > Plus, it's too much work -- I'd rather focus on getting 1.6 out of the > > > door, and there's a lot of other stuff I need to do besides moving > > > modules around. > > > > Stuff that *you* need to do, sure. But there *are* a lot of us who can > > help here, and some who desire to spend their time moving modules. > > Hm. Moving modules requires painful and arcane CVS manipulations that > can only be done by the few of us here at CNRI -- and I'm the only one > left who's full time on Python. I'm still not convinced that it's a > good plan. There are a number of ways to do this, and I'm familiar with all of them. It is a continuing point of strife in the Apache CVS repositories :-) But... it is premised on accepting the desire to move them, of course. Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido@python.org Fri Mar 24 21:38:51 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 16:38:51 -0500 Subject: [Python-Dev] delegating (was: 1.6 job list) In-Reply-To: Your message of "Fri, 24 Mar 2000 13:37:41 PST." References: Message-ID: <200003242138.QAA07621@eric.cnri.reston.va.us> > Note that some of the problems stem from CVS access. Much Guido-time could > be saved by a commit-then-review model, rather than review-then-Guido- > commits model. Fred does this very well with the Doc/ area. Actually, I'm experimenting with this already: Unicode, list.append() and socket.connect() are done in this way! For renames it is really painful though, even if someone else at CNRI can do it. I'd like to see a draft package hierarchy please? Also, if you have some time, please review the bugs in the bugs list. Patches submitted with a corresponding PR# will be treated with priority! --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Fri Mar 24 21:40:48 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 22:40:48 +0100 Subject: [Python-Dev] Unicode Patch Set 2000-03-24 Message-ID: <38DBE0E0.76A298FE@lemburg.com> This is a multi-part message in MIME format. --------------16C56446D7F83349DECA84A2 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Attached you find the latest update of the Unicode implementation. The patch is against the current CVS version. It includes the fix I posted yesterday for the core dump problem in codecs.c (was introduced by my previous patch set -- sorry), adds more tests for the codecs and two new parser markers "es" and "es#". -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ --------------16C56446D7F83349DECA84A2 Content-Type: text/plain; charset=us-ascii; name="Unicode-Implementation-2000-03-24.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="Unicode-Implementation-2000-03-24.patch" Only in CVS-Python/Doc/tools: anno-api.py diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/codecs.py Python+Unicode/Lib/codecs.py --- CVS-Python/Lib/codecs.py Thu Mar 23 23:58:41 2000 +++ Python+Unicode/Lib/codecs.py Fri Mar 17 23:51:01 2000 @@ -46,7 +46,7 @@ handling schemes by providing the errors argument. These string values are defined: - 'strict' - raise an error (or a subclass) + 'strict' - raise a ValueError error (or a subclass) 'ignore' - ignore the character and continue with the next 'replace' - replace with a suitable replacement character; Python will use the official U+FFFD REPLACEMENT diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/output/test_unicode Python+Unicode/Lib/test/output/test_unicode --- CVS-Python/Lib/test/output/test_unicode Fri Mar 24 22:21:26 2000 +++ Python+Unicode/Lib/test/output/test_unicode Sat Mar 11 00:23:21 2000 @@ -1,5 +1,4 @@ test_unicode Testing Unicode comparisons... done. -Testing Unicode contains method... done. Testing Unicode formatting strings... done. Testing unicodedata module... done. diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_unicode.py Python+Unicode/Lib/test/test_unicode.py --- CVS-Python/Lib/test/test_unicode.py Thu Mar 23 23:58:47 2000 +++ Python+Unicode/Lib/test/test_unicode.py Fri Mar 24 00:29:43 2000 @@ -293,3 +293,33 @@ assert unicodedata.combining(u'\u20e1') == 230 print 'done.' + +# Test builtin codecs +print 'Testing builtin codecs...', + +assert unicode('hello','ascii') == u'hello' +assert unicode('hello','utf-8') == u'hello' +assert unicode('hello','utf8') == u'hello' +assert unicode('hello','latin-1') == u'hello' + +assert u'hello'.encode('ascii') == 'hello' +assert u'hello'.encode('utf-8') == 'hello' +assert u'hello'.encode('utf8') == 'hello' +assert u'hello'.encode('utf-16-le') == 'h\000e\000l\000l\000o\000' +assert u'hello'.encode('utf-16-be') == '\000h\000e\000l\000l\000o' +assert u'hello'.encode('latin-1') == 'hello' + +u = u''.join(map(unichr, range(1024))) +for encoding in ('utf-8', 'utf-16', 'utf-16-le', 'utf-16-be', + 'raw_unicode_escape', 'unicode_escape', 'unicode_internal'): + assert unicode(u.encode(encoding),encoding) == u + +u = u''.join(map(unichr, range(256))) +for encoding in ('latin-1',): + assert unicode(u.encode(encoding),encoding) == u + +u = u''.join(map(unichr, range(128))) +for encoding in ('ascii',): + assert unicode(u.encode(encoding),encoding) == u + +print 'done.' diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Misc/unicode.txt Python+Unicode/Misc/unicode.txt --- CVS-Python/Misc/unicode.txt Thu Mar 23 23:58:48 2000 +++ Python+Unicode/Misc/unicode.txt Fri Mar 24 22:29:35 2000 @@ -715,21 +715,126 @@ These markers are used by the PyArg_ParseTuple() APIs: - 'U': Check for Unicode object and return a pointer to it + "U": Check for Unicode object and return a pointer to it - 's': For Unicode objects: auto convert them to the + "s": For Unicode objects: auto convert them to the and return a pointer to the object's buffer. - 's#': Access to the Unicode object via the bf_getreadbuf buffer interface + "s#": Access to the Unicode object via the bf_getreadbuf buffer interface (see Buffer Interface); note that the length relates to the buffer length, not the Unicode string length (this may be different depending on the Internal Format). - 't#': Access to the Unicode object via the bf_getcharbuf buffer interface + "t#": Access to the Unicode object via the bf_getcharbuf buffer interface (see Buffer Interface); note that the length relates to the buffer length, not necessarily to the Unicode string length (this may be different depending on the ). + "es": + Takes two parameters: encoding (const char *) and + buffer (char **). + + The input object is first coerced to Unicode in the usual way + and then encoded into a string using the given encoding. + + On output, a buffer of the needed size is allocated and + returned through *buffer as NULL-terminated string. + The encoded may not contain embedded NULL characters. + The caller is responsible for free()ing the allocated *buffer + after usage. + + "es#": + Takes three parameters: encoding (const char *), + buffer (char **) and buffer_len (int *). + + The input object is first coerced to Unicode in the usual way + and then encoded into a string using the given encoding. + + If *buffer is non-NULL, *buffer_len must be set to sizeof(buffer) + on input. Output is then copied to *buffer. + + If *buffer is NULL, a buffer of the needed size is + allocated and output copied into it. *buffer is then + updated to point to the allocated memory area. The caller + is responsible for free()ing *buffer after usage. + + In both cases *buffer_len is updated to the number of + characters written (excluding the trailing NULL-byte). + The output buffer is assured to be NULL-terminated. + +Examples: + +Using "es#" with auto-allocation: + + static PyObject * + test_parser(PyObject *self, + PyObject *args) + { + PyObject *str; + const char *encoding = "latin-1"; + char *buffer = NULL; + int buffer_len = 0; + + if (!PyArg_ParseTuple(args, "es#:test_parser", + encoding, &buffer, &buffer_len)) + return NULL; + if (!buffer) { + PyErr_SetString(PyExc_SystemError, + "buffer is NULL"); + return NULL; + } + str = PyString_FromStringAndSize(buffer, buffer_len); + free(buffer); + return str; + } + +Using "es" with auto-allocation returning a NULL-terminated string: + + static PyObject * + test_parser(PyObject *self, + PyObject *args) + { + PyObject *str; + const char *encoding = "latin-1"; + char *buffer = NULL; + + if (!PyArg_ParseTuple(args, "es:test_parser", + encoding, &buffer)) + return NULL; + if (!buffer) { + PyErr_SetString(PyExc_SystemError, + "buffer is NULL"); + return NULL; + } + str = PyString_FromString(buffer); + free(buffer); + return str; + } + +Using "es#" with a pre-allocated buffer: + + static PyObject * + test_parser(PyObject *self, + PyObject *args) + { + PyObject *str; + const char *encoding = "latin-1"; + char _buffer[10]; + char *buffer = _buffer; + int buffer_len = sizeof(_buffer); + + if (!PyArg_ParseTuple(args, "es#:test_parser", + encoding, &buffer, &buffer_len)) + return NULL; + if (!buffer) { + PyErr_SetString(PyExc_SystemError, + "buffer is NULL"); + return NULL; + } + str = PyString_FromStringAndSize(buffer, buffer_len); + return str; + } + File/Stream Output: ------------------- @@ -837,6 +942,7 @@ History of this Proposal: ------------------------- +1.3: Added new "es" and "es#" parser markers 1.2: Removed POD about codecs.open() 1.1: Added note about comparisons and hash values. Added note about case mapping algorithms. Changed stream codecs .read() and Only in CVS-Python/Objects: .#stringobject.c.2.59 Only in CVS-Python/Objects: stringobject.c.orig diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Python/getargs.c Python+Unicode/Python/getargs.c --- CVS-Python/Python/getargs.c Sat Mar 11 10:55:21 2000 +++ Python+Unicode/Python/getargs.c Fri Mar 24 20:22:26 2000 @@ -178,6 +178,8 @@ } else if (level != 0) ; /* Pass */ + else if (c == 'e') + ; /* Pass */ else if (isalpha(c)) max++; else if (c == '|') @@ -654,6 +656,122 @@ break; } + case 'e': /* encoded string */ + { + char **buffer; + const char *encoding; + PyObject *u, *s; + int size; + + /* Get 'e' parameter: the encoding name */ + encoding = (const char *)va_arg(*p_va, const char *); + if (encoding == NULL) + return "(encoding is NULL)"; + + /* Get 's' parameter: the output buffer to use */ + if (*format != 's') + return "(unkown parser marker combination)"; + buffer = (char **)va_arg(*p_va, char **); + format++; + if (buffer == NULL) + return "(buffer is NULL)"; + + /* Convert object to Unicode */ + u = PyUnicode_FromObject(arg); + if (u == NULL) + return "string, unicode or text buffer"; + + /* Encode object; use default error handling */ + s = PyUnicode_AsEncodedString(u, + encoding, + NULL); + Py_DECREF(u); + if (s == NULL) + return "(encoding failed)"; + if (!PyString_Check(s)) { + Py_DECREF(s); + return "(encoder failed to return a string)"; + } + size = PyString_GET_SIZE(s); + + /* Write output; output is guaranteed to be + 0-terminated */ + if (*format == '#') { + /* Using buffer length parameter '#': + + - if *buffer is NULL, a new buffer + of the needed size is allocated and + the data copied into it; *buffer is + updated to point to the new buffer; + the caller is responsible for + free()ing it after usage + + - if *buffer is not NULL, the data + is copied to *buffer; *buffer_len + has to be set to the size of the + buffer on input; buffer overflow is + signalled with an error; buffer has + to provide enough room for the + encoded string plus the trailing + 0-byte + + - in both cases, *buffer_len is + updated to the size of the buffer + /excluding/ the trailing 0-byte + + */ + int *buffer_len = va_arg(*p_va, int *); + + format++; + if (buffer_len == NULL) + return "(buffer_len is NULL)"; + if (*buffer == NULL) { + *buffer = PyMem_NEW(char, size + 1); + if (*buffer == NULL) { + Py_DECREF(s); + return "(memory error)"; + } + } else { + if (size + 1 > *buffer_len) { + Py_DECREF(s); + return "(buffer overflow)"; + } + } + memcpy(*buffer, + PyString_AS_STRING(s), + size + 1); + *buffer_len = size; + } else { + /* Using a 0-terminated buffer: + + - the encoded string has to be + 0-terminated for this variant to + work; if it is not, an error raised + + - a new buffer of the needed size + is allocated and the data copied + into it; *buffer is updated to + point to the new buffer; the caller + is responsible for free()ing it + after usage + + */ + if (strlen(PyString_AS_STRING(s)) != size) + return "(encoded string without "\ + "NULL bytes)"; + *buffer = PyMem_NEW(char, size + 1); + if (*buffer == NULL) { + Py_DECREF(s); + return "(memory error)"; + } + memcpy(*buffer, + PyString_AS_STRING(s), + size + 1); + } + Py_DECREF(s); + break; + } + case 'S': /* string object */ { PyObject **p = va_arg(*p_va, PyObject **); --------------16C56446D7F83349DECA84A2-- From fdrake@acm.org Fri Mar 24 21:40:38 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 16:40:38 -0500 (EST) Subject: [Python-Dev] delegating (was: 1.6 job list) In-Reply-To: References: Message-ID: <14555.57558.939236.363358@weyr.cnri.reston.va.us> Greg Stein writes: > Note that some of the problems stem from CVS access. Much Guido-time could > be saved by a commit-then-review model, rather than review-then-Guido- This is a non-problem; I'm willing to do the arcane CVS manipulations if the issue is Guido's time. What I will *not* do is do it piecemeal without a cohesive plan that Guido approves of at least 95%, and I'll be really careful to do that last 5% when he's not in the office. ;) > commits model. Fred does this very well with the Doc/ area. Thanks for the vote of confidence! The model that I use for the Doc/ area is more like "Fred reviews, Fred commits, and Guido can read it on python.org like everyone else." Works for me! ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From bwarsaw@cnri.reston.va.us Fri Mar 24 21:45:38 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 24 Mar 2000 16:45:38 -0500 (EST) Subject: [Python-Dev] 1.6 job list References: <200003242115.QAA04648@eric.cnri.reston.va.us> Message-ID: <14555.57858.824301.693390@anthem.cnri.reston.va.us> One thing you can definitely do now which breaks no code: propose a package hierarchy for the standard library. From akuchlin@mems-exchange.org Fri Mar 24 21:46:28 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Fri, 24 Mar 2000 16:46:28 -0500 (EST) Subject: [Python-Dev] Unicode charnames impl. In-Reply-To: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> Message-ID: <14555.57908.151946.182639@amarok.cnri.reston.va.us> Here's a strawman codec for doing the \N{NULL} thing. Questions: 0) Is the code below correct? 1) What the heck would this encoding be called? 2) What does .encode() do? (Right now it escapes \N as \N{BACKSLASH}N.) 3) How can we store all those names? The resulting dictionary makes a 361K .py file; Python dumps core trying to parse it. (Another bug...) 4) What do you with the error \N{...... no closing right bracket. Right now it stops at that point, and never advances any farther. Maybe it should assume it's an error if there's no } within the next 200 chars or some similar limit? 5) Do we need StreamReader/Writer classes, too? I've also add a script that parses the names out of the NameList.txt file at ftp://ftp.unicode.org/Public/UNIDATA/. --amk namecodec.py: ============= import codecs #from _namedict import namedict namedict = {'NULL': 0, 'START OF HEADING' : 1, 'BACKSLASH':ord('\\')} class NameCodec(codecs.Codec): def encode(self,input,errors='strict'): # XXX what should this do? Escape the # sequence \N as '\N{BACKSLASH}N'? return input.replace( '\\N', '\\N{BACKSLASH}N' ) def decode(self,input,errors='strict'): output = unicode("") last = 0 index = input.find( u'\\N{' ) while index != -1: output = output + unicode( input[last:index] ) used = index r_bracket = input.find( '}', index) if r_bracket == -1: # No closing bracket; bail out... break name = input[index + 3 : r_bracket] code = namedict.get( name ) if code is not None: output = output + unichr(code) elif errors == 'strict': raise ValueError, 'Unknown character name %s' % repr(name) elif errors == 'ignore': pass elif errors == 'replace': output = output + unichr( 0xFFFD ) last = r_bracket + 1 index = input.find( '\\N{', last) else: # Finally failed gently, no longer finding a \N{... output = output + unicode( input[last:] ) return len(input), output # Otherwise, we hit the break for an unterminated \N{...} return index, output if __name__ == '__main__': c = NameCodec() for s in [ r'b\lah blah \N{NULL} asdf', r'b\l\N{START OF HEADING}\N{NU' ]: used, s2 = c.decode(s) print repr( s2 ) s3 = c.encode(s) _, s4 = c.decode(s3) print repr(s3) assert s4 == s print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='replace' )) print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='ignore' )) makenamelist.py =============== # Hack to extract character names from NamesList.txt # Output the repr() of the resulting dictionary import re, sys, string namedict = {} while 1: L = sys.stdin.readline() if L == "": break m = re.match('([0-9a-fA-F]){4}(?:\t(.*)\s*)', L) if m is not None: last_char = int(m.group(1), 16) if m.group(2) is not None: name = string.upper( m.group(2) ) if name not in ['', '']: namedict[ name ] = last_char # print name, last_char m = re.match('\t=\s*(.*)\s*(;.*)?', L) if m is not None: name = string.upper( m.group(1) ) names = string.split(name, ',') names = map(string.strip, names) for n in names: namedict[ n ] = last_char # print n, last_char # XXX and do what with this dictionary? print namedict From mal@lemburg.com Fri Mar 24 21:50:19 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 22:50:19 +0100 Subject: [Python-Dev] Unicode Patch Set 2000-03-24 References: <38DBE0E0.76A298FE@lemburg.com> Message-ID: <38DBE31B.BCB342CA@lemburg.com> Oops, sorry, the patch file wasn't supposed to go to python-dev. Anyway, Greg's wish is included in there and MarkH should be happy now -- at least I hope he his ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Jasbahr@origin.EA.com Fri Mar 24 21:49:35 2000 From: Jasbahr@origin.EA.com (Asbahr, Jason) Date: Fri, 24 Mar 2000 15:49:35 -0600 Subject: [Python-Dev] Memory Management Message-ID: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com> Greetings! We're working on integrating our own memory manager into our project and the current challenge is figuring out how to make it play nice with Python (and SWIG). The approach we're currently taking is to patch 1.5.2 and augment the PyMem* macros to call external memory allocation functions that we provide. The idea is to easily allow the addition of third party memory management facilities to Python. Assuming 1) we get it working :-), and 2) we sync to the latest Python CVS and patch that, would this be a useful patch to give back to the community? Has anyone run up against this before? Thanks, Jason Asbahr Origin Systems, Inc. jasbahr@origin.ea.com From bwarsaw@cnri.reston.va.us Fri Mar 24 21:53:01 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Fri, 24 Mar 2000 16:53:01 -0500 (EST) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead References: <200003242103.QAA03288@eric.cnri.reston.va.us> <14555.56434.974884.832078@anthem.cnri.reston.va.us> <200003242127.QAA06269@eric.cnri.reston.va.us> Message-ID: <14555.58301.790774.159381@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> You probably meant: | sock.connect(addr) | sock.connect(host, port) | sock.connect((host, port)) GvR> since (host, port) is equivalent to (addr). Doh, yes. :) GvR> Fred typically directs latex2html to break all sections GvR> apart. It's in the previous section: I know, I was being purposefully dense for effect :) Fred, is there some way to make the html contain a link to the previous section for the "see above" text? That would solve the problem I think. GvR> This also explains the reason for requiring a single GvR> argument: when using AF_UNIX, the second argument makes no GvR> sense! GvR> Frankly, I'm not sure what do here -- it's more correct to GvR> require a single address argument always, but it's more GvR> convenient to allow two sometimes. GvR> Note that sendto(data, addr) only accepts the tuple form: you GvR> cannot write sendto(data, host, port). Hmm, that /does/ complicate things -- it makes explaining the API more difficult. Still, in this case I think I'd lean toward liberal acceptance of input parameters. :) -Barry From bwarsaw@cnri.reston.va.us Fri Mar 24 21:57:01 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 24 Mar 2000 16:57:01 -0500 (EST) Subject: [Python-Dev] 1.6 job list References: <200003242129.QAA06510@eric.cnri.reston.va.us> Message-ID: <14555.58541.207868.496747@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> OK, that's reasonable. I'll have to invent a different GvR> reason why I don't want this -- because I really don't! Tim's Fifth Enlightenment is all the reason you'd need, /if/ you can't be persuaded to change your mind :) -Barry From fdrake@acm.org Fri Mar 24 22:10:41 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 17:10:41 -0500 (EST) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: <14555.58301.790774.159381@anthem.cnri.reston.va.us> References: <200003242103.QAA03288@eric.cnri.reston.va.us> <14555.56434.974884.832078@anthem.cnri.reston.va.us> <200003242127.QAA06269@eric.cnri.reston.va.us> <14555.58301.790774.159381@anthem.cnri.reston.va.us> Message-ID: <14555.59361.460705.258859@weyr.cnri.reston.va.us> bwarsaw@cnri.reston.va.us writes: > I know, I was being purposefully dense for effect :) Fred, is there > some way to make the html contain a link to the previous section for > the "see above" text? That would solve the problem I think. No. I expect this to no longer be a problem when we push to SGML/XML, so I won't waste any time hacking around it. On the other hand, lots of places in the documentation refer to "above" and "below" in the traditional sense used in paper documents, and that doesn't work well for hypertext, even in the strongly traditional book-derivation way the Python manuals are done. As soon as it's not in the same HTML file, "above" and "below" break for a lot of people. So it still should be adjusted at an appropriate time. > Hmm, that /does/ complicate things -- it makes explaining the API more > difficult. Still, in this case I think I'd lean toward liberal > acceptance of input parameters. :) No -- all the more reason to be strict and keep the descriptions as simple as reasonable. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido@python.org Fri Mar 24 22:10:32 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 17:10:32 -0500 Subject: [Python-Dev] Memory Management In-Reply-To: Your message of "Fri, 24 Mar 2000 15:49:35 CST." <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com> References: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com> Message-ID: <200003242210.RAA11434@eric.cnri.reston.va.us> > We're working on integrating our own memory manager into our project > and the current challenge is figuring out how to make it play nice > with Python (and SWIG). The approach we're currently taking is to > patch 1.5.2 and augment the PyMem* macros to call external memory > allocation functions that we provide. The idea is to easily allow > the addition of third party memory management facilities to Python. > Assuming 1) we get it working :-), and 2) we sync to the latest Python > CVS and patch that, would this be a useful patch to give back to the > community? Has anyone run up against this before? Check out the archives for patches@python.org looking for posts by Vladimir Marangozov. Vladimir has produced several rounds of patches with a very similar goal in mind. We're still working out some details -- but it shouldn't be too long, and I hope that his patches are also suitable for you. If not, discussion is required! --Guido van Rossum (home page: http://www.python.org/~guido/) From bwarsaw@cnri.reston.va.us Fri Mar 24 22:12:35 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Fri, 24 Mar 2000 17:12:35 -0500 (EST) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead References: <200003242103.QAA03288@eric.cnri.reston.va.us> <14555.56434.974884.832078@anthem.cnri.reston.va.us> <200003242127.QAA06269@eric.cnri.reston.va.us> <14555.58301.790774.159381@anthem.cnri.reston.va.us> <14555.59361.460705.258859@weyr.cnri.reston.va.us> Message-ID: <14555.59475.802130.434345@anthem.cnri.reston.va.us> >>>>> "Fred" == Fred L Drake, Jr writes: Fred> No -- all the more reason to be strict and keep the Fred> descriptions as simple as reasonable. At the expense of (IMO unnecessarily) breaking existing code? From mal@lemburg.com Fri Mar 24 22:13:04 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 23:13:04 +0100 Subject: [Python-Dev] Unicode charnames impl. References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> <14555.57908.151946.182639@amarok.cnri.reston.va.us> Message-ID: <38DBE870.D88915B5@lemburg.com> "Andrew M. Kuchling" wrote: > > Here's a strawman codec for doing the \N{NULL} thing. Questions: > > 0) Is the code below correct? Some comments below. > 1) What the heck would this encoding be called? Ehm, 'unicode-with-smileys' I guess... after all that's what motivated the thread ;-) Seriously, I'd go with 'unicode-named'. You can then stack it on top of 'unicode-escape' and get the best of both worlds... > 2) What does .encode() do? (Right now it escapes \N as > \N{BACKSLASH}N.) .encode() should translate Unicode to a string. Since the named char thing is probably only useful on input, I'd say: don't do anything, except maybe return input.encode('unicode-escape'). > 3) How can we store all those names? The resulting dictionary makes a > 361K .py file; Python dumps core trying to parse it. (Another bug...) I've made the same experience with the large Unicode mapping tables... the trick is to split the dictionary definition in chunks and then use dict.update() to paste them together again. > 4) What do you with the error \N{...... no closing right bracket. > Right now it stops at that point, and never advances any farther. > Maybe it should assume it's an error if there's no } within the > next 200 chars or some similar limit? I'd suggest to take the upper bound of all Unicode name lengths as limit. > 5) Do we need StreamReader/Writer classes, too? If you plan to have it registered with a codec search function, yes. No big deal though, because you can use the Codec class as basis for them: class StreamWriter(Codec,codecs.StreamWriter): pass class StreamReader(Codec,codecs.StreamReader): pass ### encodings module API def getregentry(): return (Codec().encode,Codec().decode,StreamReader,StreamWriter) Then call drop the scripts into the encodings package dir and it should be useable via unicode(r'\N{SMILEY}','unicode-named') and u":-)".encode('unicode-named'). > I've also add a script that parses the names out of the NameList.txt > file at ftp://ftp.unicode.org/Public/UNIDATA/. > > --amk > > namecodec.py: > ============= > > import codecs > > #from _namedict import namedict > namedict = {'NULL': 0, 'START OF HEADING' : 1, > 'BACKSLASH':ord('\\')} > > class NameCodec(codecs.Codec): > def encode(self,input,errors='strict'): > # XXX what should this do? Escape the > # sequence \N as '\N{BACKSLASH}N'? > return input.replace( '\\N', '\\N{BACKSLASH}N' ) You should return a string on output... input will be a Unicode object and the return value too if you don't add e.g. an .encode('unicode-escape'). > def decode(self,input,errors='strict'): > output = unicode("") > last = 0 > index = input.find( u'\\N{' ) > while index != -1: > output = output + unicode( input[last:index] ) > used = index > r_bracket = input.find( '}', index) > if r_bracket == -1: > # No closing bracket; bail out... > break > > name = input[index + 3 : r_bracket] > code = namedict.get( name ) > if code is not None: > output = output + unichr(code) > elif errors == 'strict': > raise ValueError, 'Unknown character name %s' % repr(name) This could also be UnicodeError (its a subclass of ValueError). > elif errors == 'ignore': pass > elif errors == 'replace': > output = output + unichr( 0xFFFD ) '\uFFFD' would save a call. > last = r_bracket + 1 > index = input.find( '\\N{', last) > else: > # Finally failed gently, no longer finding a \N{... > output = output + unicode( input[last:] ) > return len(input), output > > # Otherwise, we hit the break for an unterminated \N{...} > return index, output Note that .decode() must only return the decoded data. The "bytes read" integer was removed in order to make the Codec APIs compatible with the standard file object APIs. > if __name__ == '__main__': > c = NameCodec() > for s in [ r'b\lah blah \N{NULL} asdf', > r'b\l\N{START OF HEADING}\N{NU' ]: > used, s2 = c.decode(s) > print repr( s2 ) > > s3 = c.encode(s) > _, s4 = c.decode(s3) > print repr(s3) > assert s4 == s > > print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='replace' )) > print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='ignore' )) > > makenamelist.py > =============== > > # Hack to extract character names from NamesList.txt > # Output the repr() of the resulting dictionary > > import re, sys, string > > namedict = {} > > while 1: > L = sys.stdin.readline() > if L == "": break > > m = re.match('([0-9a-fA-F]){4}(?:\t(.*)\s*)', L) > if m is not None: > last_char = int(m.group(1), 16) > if m.group(2) is not None: > name = string.upper( m.group(2) ) > if name not in ['', > '']: > namedict[ name ] = last_char > # print name, last_char > > m = re.match('\t=\s*(.*)\s*(;.*)?', L) > if m is not None: > name = string.upper( m.group(1) ) > names = string.split(name, ',') > names = map(string.strip, names) > for n in names: > namedict[ n ] = last_char > # print n, last_char > > # XXX and do what with this dictionary? > print namedict > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://www.python.org/mailman/listinfo/python-dev -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake@acm.org Fri Mar 24 22:12:42 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 17:12:42 -0500 (EST) Subject: [Python-Dev] Memory Management In-Reply-To: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com> References: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com> Message-ID: <14555.59482.61317.992089@weyr.cnri.reston.va.us> Asbahr, Jason writes: > community? Has anyone run up against this before? You should talk to Vladimir Marangozov; he's done a fair bit of work dealing with memory management in Python. You probably want to read the chapter he contributed to the Python/C API document for the release earlier this week. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From skip@mojam.com (Skip Montanaro) Fri Mar 24 22:19:50 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 24 Mar 2000 16:19:50 -0600 (CST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242115.QAA04648@eric.cnri.reston.va.us> References: <14555.54636.811100.254309@amarok.cnri.reston.va.us> <14555.55139.484135.602894@weyr.cnri.reston.va.us> <200003242115.QAA04648@eric.cnri.reston.va.us> Message-ID: <14555.59910.631130.241930@beluga.mojam.com> Guido> Which reminds me of another reason to wait: coming up with the Guido> right package hierarchy is hard. (E.g. I find network too long; Guido> plus, does htmllib belong there?) Ah, another topic for python-dev. Even if we can't do the packaging right away, we should be able to hash out the structure. Skip From guido@python.org Fri Mar 24 22:25:01 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 17:25:01 -0500 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: Your message of "Fri, 24 Mar 2000 17:10:41 EST." <14555.59361.460705.258859@weyr.cnri.reston.va.us> References: <200003242103.QAA03288@eric.cnri.reston.va.us> <14555.56434.974884.832078@anthem.cnri.reston.va.us> <200003242127.QAA06269@eric.cnri.reston.va.us> <14555.58301.790774.159381@anthem.cnri.reston.va.us> <14555.59361.460705.258859@weyr.cnri.reston.va.us> Message-ID: <200003242225.RAA13408@eric.cnri.reston.va.us> > bwarsaw@cnri.reston.va.us writes: > > I know, I was being purposefully dense for effect :) Fred, is there > > some way to make the html contain a link to the previous section for > > the "see above" text? That would solve the problem I think. [Fred] > No. I expect this to no longer be a problem when we push to > SGML/XML, so I won't waste any time hacking around it. > On the other hand, lots of places in the documentation refer to > "above" and "below" in the traditional sense used in paper documents, > and that doesn't work well for hypertext, even in the strongly > traditional book-derivation way the Python manuals are done. As soon > as it's not in the same HTML file, "above" and "below" break for a lot > of people. So it still should be adjusted at an appropriate time. My approach to this: put more stuff on the same page! I personally favor putting an entire chapter on one page; even if you split the top-level subsections this wouldn't have happened. --Guido van Rossum (home page: http://www.python.org/~guido/) From klm@digicool.com Fri Mar 24 22:40:54 2000 From: klm@digicool.com (Ken Manheimer) Date: Fri, 24 Mar 2000 17:40:54 -0500 (EST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us> Message-ID: Guido wrote: > OK, that's reasonable. I'll have to invent a different reason why I > don't want this -- because I really don't! I'm glad this organize-the-library-in-packages initiative seems to be moving towards concentrating on the organization, rather than just starting to put obvious things in the obvious places. Personally, i *crave* sensible, discoverable organization. The only thing i like less than complicated disorganization is complicated misorganization - and i think that just diving in and doing the "obvious" placements would have the terrible effect of making it harder, not easier, to move eventually to the right arrangement. Ken klm@digicool.com From akuchlin@mems-exchange.org Fri Mar 24 22:45:20 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Fri, 24 Mar 2000 17:45:20 -0500 (EST) Subject: [Python-Dev] Unicode charnames impl. In-Reply-To: <38DBE870.D88915B5@lemburg.com> References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> <14555.57908.151946.182639@amarok.cnri.reston.va.us> <38DBE870.D88915B5@lemburg.com> Message-ID: <14555.61440.613940.50492@amarok.cnri.reston.va.us> M.-A. Lemburg writes: >.encode() should translate Unicode to a string. Since the >named char thing is probably only useful on input, I'd say: >don't do anything, except maybe return input.encode('unicode-escape'). Wait... then you can't stack it on top of unicode-escape, because it would already be Unicode escaped. >> 4) What do you with the error \N{...... no closing right bracket. >I'd suggest to take the upper bound of all Unicode name >lengths as limit. Seems like a hack. >Note that .decode() must only return the decoded data. >The "bytes read" integer was removed in order to make >the Codec APIs compatible with the standard file object >APIs. Huh? Why does Misc/unicode.txt describe decode() as "Decodes the object input and returns a tuple (output object, length consumed)"? Or are you talking about a different .decode() method? -- A.M. Kuchling http://starship.python.net/crew/amk/ "Ruby's dead?" "Yes." "Ah me. That's the trouble with mortals. They do that. Not to worry, eh?" -- Dream and Pharamond, in SANDMAN #46: "Brief Lives:6" From gmcm@hypernet.com Fri Mar 24 22:50:12 2000 From: gmcm@hypernet.com (Gordon McMillan) Date: Fri, 24 Mar 2000 17:50:12 -0500 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: <200003242103.QAA03288@eric.cnri.reston.va.us> Message-ID: <1258184279-6957124@hypernet.com> [Guido] > Someone noticed that socket.connect() and a few related functions > (connect_ex() and bind()) take either a single (host, port) tuple or > two separate arguments, but that only the tuple is documented. > > Similar to append(), I'd like to close this gap, and I've made the > necessary changes. This will probably break lots of code. This will indeed cause great wailing and gnashing of teeth. I've been criticized for using the tuple form in the Sockets HOWTO (in fact I foolishly changed it to demonstrate both forms). > Similar to append(), I'd like people to fix their code rather than > whine -- two-arg connect() has never been documented, although it's > found in much code (even the socket module test code :-( ). > > Similar to append(), I may revert the change if it is shown to cause > too much pain during beta testing... I say give 'em something to whine about. put-sand-in-the-vaseline-ly y'rs - Gordon From klm@digicool.com Fri Mar 24 22:55:43 2000 From: klm@digicool.com (Ken Manheimer) Date: Fri, 24 Mar 2000 17:55:43 -0500 (EST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <14555.58541.207868.496747@anthem.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Barry A. Warsaw wrote: > > >>>>> "GvR" == Guido van Rossum writes: > > GvR> OK, that's reasonable. I'll have to invent a different > GvR> reason why I don't want this -- because I really don't! > > Tim's Fifth Enlightenment is all the reason you'd need, /if/ you can't > be persuaded to change your mind :) Maybe i'm just a slave to my organization mania, but i'd suggest the following order change of 5 and 6, plus an addition; from: 5 now: Flat is better than nested. 6 now: Sparse is better than dense. to: 5 Sparse is better than dense. 6 Flat is better than nested 6.5 until it gets too dense. or-is-it-me-that-gets-too-dense'ly yrs, ken klm@digicool.com (And couldn't the humor page get hooked up a bit better? That was definitely a fun part of maintaining python.org...) From gstein@lyra.org Sat Mar 25 01:19:18 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 17:19:18 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <14555.57858.824301.693390@anthem.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Barry A. Warsaw wrote: > One thing you can definitely do now which breaks no code: propose a > package hierarchy for the standard library. I already did! http://www.python.org/pipermail/python-dev/2000-February/003761.html *grumble* -g -- Greg Stein, http://www.lyra.org/ From tim_one@email.msn.com Sat Mar 25 04:19:33 2000 From: tim_one@email.msn.com (Tim Peters) Date: Fri, 24 Mar 2000 23:19:33 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us> Message-ID: <001001bf9611$52e960a0$752d153f@tim> [GregS proposes a partial packaging of std modules for 1.6, Guido objects on spurious grounds, GregS refutes that, Guido agrees] > I'll have to invent a different reason why I don't want this -- because > I really don't! This one's easy! It's why I left the 20th of the 20 Pythonic Theses for you to fill in . All you have to do now is come up with a pithy way to say "if it's something Guido is so interested in that he wants to be deeply involved in it himself, but it comes at a time when he's buried under prior commitments, then tough tulips, it waits". shades-of-the-great-renaming-ly y'rs - tim From tim_one@email.msn.com Sat Mar 25 04:19:36 2000 From: tim_one@email.msn.com (Tim Peters) Date: Fri, 24 Mar 2000 23:19:36 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: <14555.58541.207868.496747@anthem.cnri.reston.va.us> Message-ID: <001101bf9611$544239e0$752d153f@tim> [Guido] > OK, that's reasonable. I'll have to invent a different > reason why I don't want this -- because I really don't! [Barry] > Tim's Fifth Enlightenment is all the reason you'd need, /if/ you can't > be persuaded to change your mind :) No no no no no: "namespaces are one honking great idea ..." is the controlling one here: Guido really *does* want this! It's a question of timing, in the sense of "never is often better than *right* now", but to be eventually modified by "now is better than never". These were carefully designed to support any position whatsoever, you know . although-in-any-particular-case-there's-only-one-true-interpretation-ly y'rs - tim From guido@python.org Sat Mar 25 04:19:41 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 23:19:41 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: Your message of "Fri, 24 Mar 2000 17:19:18 PST." References: Message-ID: <200003250419.XAA25751@eric.cnri.reston.va.us> > > One thing you can definitely do now which breaks no code: propose a > > package hierarchy for the standard library. > > I already did! > > http://www.python.org/pipermail/python-dev/2000-February/003761.html > > *grumble* You've got to be kidding. That's not a package hierarchy proposal, it's just one package (network). Without a comprehensive proposal I'm against a partial reorganization: without a destination we can't start marching. Naming things is very contentious -- everybody has an opinion. To pick the right names you must see things in perspective. --Guido van Rossum (home page: http://www.python.org/~guido/) From Moshe Zadka Sat Mar 25 08:45:28 2000 From: Moshe Zadka (Moshe Zadka) Date: Sat, 25 Mar 2000 10:45:28 +0200 (IST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: Message-ID: On Thu, 23 Mar 2000 gvwilson@nevex.com wrote: > If None becomes a keyword, I would like to ask whether it could be used to > signal that a method is a class method, as opposed to an instance method: I'd like to know what you mean by "class" method. (I do know C++ and Java, so I have some idea...). Specifically, my question is: how does a class method access class variables? They can't be totally unqualified (because that's very unpythonic). If they are qualified by the class's name, I see it as a very mild improvement on the current situation. You could suggest, for example, to qualify class variables by "class" (so you'd do things like: class.x = 1), but I'm not sure I like it. On the whole, I think it is a much bigger issue on how be denote class methods. Also, one slight problem with your method of denoting class methods: currently, it is possible to add instance method at run time to a class by something like class C: pass def foo(self): pass C.foo = foo In your suggestion, how do you view the possiblity of adding class methods to a class? (Note that "foo", above, is also perfectly usable as a plain function). I want to note that Edward suggested denotation by a seperate namespace: C.foo = foo # foo is an instance method C.__methods__.foo = foo # foo is a class method The biggest problem with that suggestion is that it doesn't address the common case of defining it textually inside the class definition. > I'd also like to ask (separately) that assignment to None be defined as a > no-op, so that programmers can write: > > year, month, None, None, None, None, weekday, None, None = gmtime(time()) > > instead of having to create throw-away variables to fill in slots in > tuples that they don't care about. Currently, I use "_" for that purpose, after I heard the idea from Fredrik Lundh. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From gstein@lyra.org Sat Mar 25 09:26:23 2000 From: gstein@lyra.org (Greg Stein) Date: Sat, 25 Mar 2000 01:26:23 -0800 (PST) Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: <200003250419.XAA25751@eric.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Guido van Rossum wrote: > > > One thing you can definitely do now which breaks no code: propose a > > > package hierarchy for the standard library. > > > > I already did! > > > > http://www.python.org/pipermail/python-dev/2000-February/003761.html > > > > *grumble* > > You've got to be kidding. That's not a package hierarchy proposal, > it's just one package (network). > > Without a comprehensive proposal I'm against a partial reorganization: > without a destination we can't start marching. Not kidding at all. I said before that I don't think we can do everything all at once. I *do* think this is solvable with a greedy algorithm rather than waiting for some nebulous completion point. > Naming things is very contentious -- everybody has an opinion. To > pick the right names you must see things in perspective. Sure. And those diverse opinions are why I don't believe it is possible to do all at once. The task is simply too large to tackle in one shot. IMO, it must be solved incrementally. I'm not even going to attempt to try to define a hierarchy for all those modules. I count 137 on my local system. Let's say that I *do* try... some are going to end up "forced" rather than obeying some obvious grouping. If you do it a chunk at a time, then you get the obvious, intuitive groupings. Try for more, and you just bung it all up. For discussion's sake: can you provide a rationale for doing it all at once? In the current scenario, modules just appear at some point. After a partial reorg, some modules appear at a different point. "No big whoop." Just because module A is in a package doesn't imply that module B must also be in a package. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Sat Mar 25 09:35:39 2000 From: gstein@lyra.org (Greg Stein) Date: Sat, 25 Mar 2000 01:35:39 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <001001bf9611$52e960a0$752d153f@tim> Message-ID: On Fri, 24 Mar 2000, Tim Peters wrote: > [GregS proposes a partial packaging of std modules for 1.6, Guido objects on > spurious grounds, GregS refutes that, Guido agrees] > > > I'll have to invent a different reason why I don't want this -- because > > I really don't! > > This one's easy! It's why I left the 20th of the 20 Pythonic Theses for you > to fill in . All you have to do now is come up with a pithy way to > say "if it's something Guido is so interested in that he wants to be deeply > involved in it himself, but it comes at a time when he's buried under prior > commitments, then tough tulips, it waits". No need for Pythonic Theses. I don't see anybody disagreeing with the end goal. The issue comes up with *how* to get there. I say "do it incrementally" while others say "do it all at once." Personally, I don't think it is possible to do all at once. As a corollary, if you can't do it all at once, but you *require* that it be done all at once, then you have effectively deferred the problem. To put it another way, Guido has already invented a reason to not do it: he just requires that it be done all at once. Result: it won't be done. [ not saying this was Guido's intent or desire... but this is how I read the result :-) ] Cheers, -g -- Greg Stein, http://www.lyra.org/ From Moshe Zadka Sat Mar 25 09:55:12 2000 From: Moshe Zadka (Moshe Zadka) Date: Sat, 25 Mar 2000 11:55:12 +0200 (IST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <14555.34371.749039.946891@beluga.mojam.com> Message-ID: On Fri, 24 Mar 2000, Skip Montanaro wrote: > Might I suggest moving robotparser.py from Tools/webchecker to Lib? Modules > of general usefulness (this is at least generally useful for anyone writing > web spiders ;-) shouldn't live in Tools, because it's not always available > and users need to do extra work to make them available. You're right, but I'd like this to be a 1.7 change. It's just that I plan to suggest a great-renaming-fest for 1.7 modules, and then namespace wouldn't be cluttered when you don't need it. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From Moshe Zadka Sat Mar 25 10:16:23 2000 From: Moshe Zadka (Moshe Zadka) Date: Sat, 25 Mar 2000 12:16:23 +0200 (IST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Guido van Rossum wrote: > OK, that's reasonable. I'll have to invent a different reason why I > don't want this -- because I really don't! Here's a reason: there shouldn't be changes we'll retract later -- we need to come up with the (more or less) right hierarchy the first time, or we'll do a lot of work for nothing. > Hm. Moving modules requires painful and arcane CVS manipulations that > can only be done by the few of us here at CNRI -- and I'm the only one > left who's full time on Python. Hmmmmm....this is a big problem. Maybe we need to have more people with access to the CVS? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal@lemburg.com Sat Mar 25 10:47:30 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 25 Mar 2000 11:47:30 +0100 Subject: [Python-Dev] Unicode charnames impl. References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> <14555.57908.151946.182639@amarok.cnri.reston.va.us> <38DBE870.D88915B5@lemburg.com> <14555.61440.613940.50492@amarok.cnri.reston.va.us> Message-ID: <38DC9942.3C4E4B92@lemburg.com> "Andrew M. Kuchling" wrote: > > M.-A. Lemburg writes: > >.encode() should translate Unicode to a string. Since the > >named char thing is probably only useful on input, I'd say: > >don't do anything, except maybe return input.encode('unicode-escape'). > > Wait... then you can't stack it on top of unicode-escape, because it > would already be Unicode escaped. Sorry for the mixup (I guess yesterday wasn't my day...). I had stream codecs in mind: these are stackable, meaning that you can wrap one codec around another. And its also their interface API that was changed -- not the basic stateless encoder/decoder ones. Stacking of .encode()/.decode() must be done "by hand" in e.g. the way I described above. Another approach would be subclassing the unicode-escape Codec and then calling the base class method. > >> 4) What do you with the error \N{...... no closing right bracket. > >I'd suggest to take the upper bound of all Unicode name > >lengths as limit. > > Seems like a hack. It is... but what other way would there be ? > >Note that .decode() must only return the decoded data. > >The "bytes read" integer was removed in order to make > >the Codec APIs compatible with the standard file object > >APIs. > > Huh? Why does Misc/unicode.txt describe decode() as "Decodes the > object input and returns a tuple (output object, length consumed)"? > Or are you talking about a different .decode() method? You're right... I was thinking about .read() and .write(). .decode() should do return a tuple, just as documented in unicode.txt. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mhammond@skippinet.com.au Sat Mar 25 13:20:59 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Sun, 26 Mar 2000 00:20:59 +1100 Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: [Greg writes] > I'm not even going to attempt to try to > define a hierarchy for all those modules. I count 137 on my local system. > Let's say that I *do* try... some are going to end up "forced" rather than > obeying some obvious grouping. If you do it a chunk at a time, then you > get the obvious, intuitive groupings. Try for more, and you just bung it > all up. ... > Just because module A is in a package doesn't imply that module B must > also be in a package. I agree with Greg - every module will not fit into a package. But I also agree with Guido - we _should_ attempt to go through the 137 modules and put the ones that fit into logical groupings. Greg is probably correct with his selection for "net", but a general evaluation is still a good thing. A view of the bigger picture will help to quell debates over the structure, and only leave us with the squabbles over the exact spelling :-) +2 on ... err .... -1 on ... errr - awww - screw that--ly, Mark. From tismer@tismer.com Sat Mar 25 13:35:50 2000 From: tismer@tismer.com (Christian Tismer) Date: Sat, 25 Mar 2000 14:35:50 +0100 Subject: [Python-Dev] Unicode charnames impl. References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> <14555.57908.151946.182639@amarok.cnri.reston.va.us> Message-ID: <38DCC0B6.2A7D0EF1@tismer.com> "Andrew M. Kuchling" wrote: ... > 3) How can we store all those names? The resulting dictionary makes a > 361K .py file; Python dumps core trying to parse it. (Another bug...) This is simply not the place to use a dictionary. You don't need fast lookup from names to codes, but something that supports incremental search. This would enable PythonWin to sho a pop-up list after you typed the first letters. I'm working on a common substring analysis that makes each entry into 3 to 5 small integers. You then encode these in an order-preserving way. That means, the resulting code table is still lexically ordered, and access to the sentences is done via bisection. Takes me some more time to get that, but it will not be larger than 60k, or I drop it. Also note that all the names use uppercase letters and space only. An opportunity to use simple context encoding and use just 4 bits most of the time. ... > I've also add a script that parses the names out of the NameList.txt > file at ftp://ftp.unicode.org/Public/UNIDATA/. Is there any reason why you didn't use the UnicodeData.txt file, I mean do I cover everything if I continue to use that? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From Vladimir.Marangozov@inrialpes.fr Sat Mar 25 14:59:55 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Sat, 25 Mar 2000 15:59:55 +0100 (CET) Subject: [Python-Dev] Windows and PyObject_NEW Message-ID: <200003251459.PAA09181@python.inrialpes.fr> For MarkH, Guido and the Windows experienced: I've been reading Jeffrey Richter's "Advanced Windows" last night in order to try understanding better why PyObject_NEW is implemented differently for Windows. Again, I feel uncomfortable with this, especially now, when I'm dealing with the memory aspect of Python's object constructors/desctrs. Some time ago, Guido elaborated on why PyObject_NEW uses malloc() on the user's side, before calling _PyObject_New (on Windows, cf. objimpl.h): [Guido] > I can explain the MS_COREDLL business: > > This is defined on Windows because the core is in a DLL. Since the > caller may be in another DLL, and each DLL (potentially) has a > different default allocator, and (in pre-Vladimir times) the > type-specific deallocator typically calls free(), we (Mark & I) > decided that the allocation should be done in the type-specific > allocator. We changed the PyObject_NEW() macro to call malloc() and > pass that into _PyObject_New() as a second argument. While I agree with this, from reading chapters 5-9 of (a French copy of) the book (translated backwards here): 5. Win32 Memory Architecture 6. Exploring Virtual Memory 7. Using Virtual Memory in Your Applications 8. Memory Mapped Files 9. Heaps I can't find any radical Windows specificities for memory management. On Windows, like the rest of the OSes, the (virtual & physical) memory allocated for a process is common and seem to be accessible from all DDLs involved in an executable. Things like page sharing, copy-on-write, private process mem, etc. are conceptually all the same on Windows and Unix. Now, the backwards binary compatibility argument aside (assuming that extensions get recompiled when a new Python version comes out), my concern is that with the introduction of PyObject_NEW *and* PyObject_DEL, there's no point in having separate implementations for Windows and Unix any more (or I'm really missing something and I fail to see what it is). User objects would be allocated *and* freed by the core DLL (at least the object headers). Even if several DLLs use different allocators, this shouldn't be a problem if what's obtained via PyObject_NEW is freed via PyObject_DEL. This Python memory would be allocated from the Python's core DLL regions/pages/heaps. And I believe that the memory allocated by the core DLL is accessible from the other DLL's of the process. (I haven't seen evidence on the opposite, but tell me if this is not true) I thought that maybe Windows malloc() uses different heaps for the different DLLs, but that's fine too, as long as the _NEW/_DEL symmetry is respected and all heaps are accessible from all DLLs (which seems to be the case...), but: In the beginning of Chapter 9, Heaps, I read the following: """ ...About Win32 heaps (compared to Win16 heaps)... * There is only one kind of heap (it doesn't have any particular name, like "local" or "global" on Win16, because it's unique) * Heaps are always local to a process. The contents of a process heap is not accessible from the threads of another process. A large number of Win16 applications use the global heap as a way of sharing data between processes; this change in the Win32 heaps is often a source of problems for porting Win16 applications to Win32. * One process can create several heaps in its addressing space and can manipulate them all. * A DLL does not have its own heap. It uses the heaps as part of the addressing space of the process. However, a DLL can create a heap in the addressing space of a process and reserve it for its own use. Since several 16-bit DLLs share data between processes by using the local heap of a DLL, this change is a source of problems when porting Win16 apps to Win32... """ This last paragraph confuses me. On one hand, it's stated that all heaps can be manipulated by the process, and OTOH, a DLL can reserve a heap for personal use within that process (implying the heap is r/w protected for the other DLLs ?!?). The rest of this chapter does not explain how this "private reservation" is or can be done, so some of you would probably want to chime in and explain this to me. Going back to PyObject_NEW, if it turns out that all heaps are accessible from all DLLs involved in the process, I would probably lobby for unifying the implementation of _PyObject_NEW/_New and _PyObject_DEL/_Del for Windows and Unix. Actually on Windows, object allocation does not depend on a central, Python core memory allocator. Therefore, with the patches I'm working on, changing the core allocator would work (would be changed for real) only for platforms other than Windows. Next, ff it's possible to unify the implementation, it would also be possible to expose and officialize in the C API a new function set: PyObject_New() and PyObject_Del() (without leading underscores) For now, due to the implementation difference on Windows, we're forced to use the macro versions PyObject_NEW/DEL. Clearly, please tell me what would be wrong on Windows if a) & b) & c): a) we have PyObject_New(), PyObject_Del() b) their implementation is platform independent (no MS_COREDLL diffs, we retain the non-Windows variant) c) they're both used systematically for all object types -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From gmcm@hypernet.com Sat Mar 25 15:46:01 2000 From: gmcm@hypernet.com (Gordon McMillan) Date: Sat, 25 Mar 2000 10:46:01 -0500 Subject: [Python-Dev] Windows and PyObject_NEW In-Reply-To: <200003251459.PAA09181@python.inrialpes.fr> Message-ID: <1258123323-10623548@hypernet.com> Vladimir Marangozov > ... And I believe that the memory allocated > by the core DLL is accessible from the other DLL's of the process. > (I haven't seen evidence on the opposite, but tell me if this is not true) This is true. Or, I should say, it all boils down to HeapAlloc( heap, flags, bytes) and malloc is going to use the _crtheap. > In the beginning of Chapter 9, Heaps, I read the following: > > """ > ...About Win32 heaps (compared to Win16 heaps)... > > * There is only one kind of heap (it doesn't have any particular name, > like "local" or "global" on Win16, because it's unique) > > * Heaps are always local to a process. The contents of a process heap is > not accessible from the threads of another process. A large number of > Win16 applications use the global heap as a way of sharing data between > processes; this change in the Win32 heaps is often a source of problems > for porting Win16 applications to Win32. > > * One process can create several heaps in its addressing space and can > manipulate them all. > > * A DLL does not have its own heap. It uses the heaps as part of the > addressing space of the process. However, a DLL can create a heap in > the addressing space of a process and reserve it for its own use. > Since several 16-bit DLLs share data between processes by using the > local heap of a DLL, this change is a source of problems when porting > Win16 apps to Win32... > """ > > This last paragraph confuses me. On one hand, it's stated that all heaps > can be manipulated by the process, and OTOH, a DLL can reserve a heap for > personal use within that process (implying the heap is r/w protected for > the other DLLs ?!?). At any time, you can creat a new Heap handle HeapCreate(options, initsize, maxsize) Nothing special about the "dll" context here. On Win9x, only someone who knows about the handle can manipulate the heap. (On NT, you can enumerate the handles in the process.) I doubt very much that you would break anybody's code by removing the Windows specific behavior. But it seems to me that unless Python always uses the default malloc, those of us who write C++ extensions will have to override operator new? I'm not sure. I've used placement new to allocate objects in a memory mapped file, but I've never tried to muck with the global memory policy of C++ program. - Gordon From akuchlin@mems-exchange.org Sat Mar 25 17:58:56 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Sat, 25 Mar 2000 12:58:56 -0500 (EST) Subject: [Python-Dev] Unicode charnames impl. In-Reply-To: <38DCC0B6.2A7D0EF1@tismer.com> References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> <14555.57908.151946.182639@amarok.cnri.reston.va.us> <38DCC0B6.2A7D0EF1@tismer.com> Message-ID: <14556.65120.22727.524616@newcnri.cnri.reston.va.us> Christian Tismer writes: >This is simply not the place to use a dictionary. >You don't need fast lookup from names to codes, >but something that supports incremental search. >This would enable PythonWin to sho a pop-up list after >you typed the first letters. Hmm... one could argue that PythonWin or IDLE should provide their own database for incremental searching; I was planning on following Bill Tutt's suggestion of generating a perfect minimal hash for the names. gperf isn't up to the job, but I found an algorithm that should be OK. Just got to implement it now... But, if your approach pays off it'll be superior to a perfect hash. >Is there any reason why you didn't use the UnicodeData.txt file, >I mean do I cover everything if I continue to use that? Oops; I saw the NameList file and just went for it; maybe it should use the full UnicodeData.txt. --amk From Moshe Zadka Sat Mar 25 18:10:44 2000 From: Moshe Zadka (Moshe Zadka) Date: Sat, 25 Mar 2000 20:10:44 +0200 (IST) Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Mark Hammond wrote: > But I also agree with Guido - we _should_ attempt to go through the 137 Where did you come up with that number? I counted much more -- not quite sure, but certainly more. Well, here's a tentative suggestion I worked out today. This is just to have something to quibble about. In the interest of rushing it out of the door, there are a few modules (explicitly mentioned) which I have said nothing about. net httplib ftplib urllib cgi gopherlib imaplib poplib nntplib smptlib urlparse telnetlib server BaseHTTPServer CGIHTTPServer SimpleHTTPServer SocketServer asynchat asyncore text sgmllib htmllib htmlentitydefs xml whatever the xml-sig puts here mail rfc822 mime MimeWriter mimetools mimify mailcap mimetypes base64 quopri mailbox mhlib binhex parse string re regex reconvert regex_syntax regsub shlex ConfigParser linecache multifile netrc bin gzip zlib aifc chunk image imghdr colorsys imageop imgfile rgbimg yuvconvert sound sndhdr toaiff audiodev sunau sunaudio wave audioop sunaudiodev db anydbm whichdb bsddb dbm dbhash dumbdbm gdbm math bisect fpformat random whrandom cmath math crypt fpectl fpetest array md5 mpz rotor sha time calendar time tzparse sched timing interpreter new py_compile code codeop compileall keyword token tokenize parser dis bdb pdb profile pyclbr tabnanny symbol pstats traceback rlcompleter security Bastion rexec ihooks file dircache path -- a virtual module which would do a from path import * dospath posixpath macpath nturl2path ntpath macurl2path filecmp fileinput StringIO cStringIO glob fnmatch posixfile stat statcache statvfs tempfile shutil pipes popen2 commands dl fcntl serialize pickle cPickle shelve xdrlib copy copy_reg threads thread threading Queue mutex ui curses Tkinter cmd getpass internal _codecs _locale _tkinter pcre strop posix users pwd grp nis exceptions os types UserDict UserList user site locale sgi al cd cl fl fm gl misc (what used to be sgimodule.c) sv unicode codecs unicodedata unicodedatabase ========== Modules not handled ============ formatter getopt pprint pty repr tty errno operator pure readline resource select signal socket struct syslog termios Well, if you got this far, you certainly deserve... congratualtions-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From DavidA@ActiveState.com Sat Mar 25 18:28:30 2000 From: DavidA@ActiveState.com (David Ascher) Date: Sat, 25 Mar 2000 10:28:30 -0800 Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: > db > anydbm > whichdb > bsddb > dbm > dbhash > dumbdbm > gdbm This made me think of one issue which is worth considering -- is there a mechanism for third-party packages to hook into the standard naming hierarchy? It'd be weird not to have the oracle and sybase modules within the db toplevel package, for example. --david ascher From Moshe Zadka Sat Mar 25 18:30:26 2000 From: Moshe Zadka (Moshe Zadka) Date: Sat, 25 Mar 2000 20:30:26 +0200 (IST) Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: On Sat, 25 Mar 2000, David Ascher wrote: > This made me think of one issue which is worth considering -- is there a > mechanism for third-party packages to hook into the standard naming > hierarchy? It'd be weird not to have the oracle and sybase modules within > the db toplevel package, for example. My position is that any 3rd party module decides for itself where it wants to live -- once we formalized the framework. Consider PyGTK/PyGnome, PyQT/PyKDE -- they should live in the UI package too... From DavidA@ActiveState.com Sat Mar 25 18:50:14 2000 From: DavidA@ActiveState.com (David Ascher) Date: Sat, 25 Mar 2000 10:50:14 -0800 Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: > On Sat, 25 Mar 2000, David Ascher wrote: > > > This made me think of one issue which is worth considering -- is there a > > mechanism for third-party packages to hook into the standard naming > > hierarchy? It'd be weird not to have the oracle and sybase > modules within > > the db toplevel package, for example. > > My position is that any 3rd party module decides for itself where it wants > to live -- once we formalized the framework. Consider PyGTK/PyGnome, > PyQT/PyKDE -- they should live in the UI package too... That sounds good in theory, but I can see possible problems down the line: 1) The current mapping between package names and directory structure means that installing a third party package hierarchy in a different place on disk than the standard library requires some work on the import mechanisms (this may have been discussed already) and a significant amount of user education. 2) We either need a 'registration' mechanism whereby people can claim a name in the standard hierarchy or expect conflicts. As far as I can gather, in the Perl world registration occurs by submission to CPAN. Correct? One alternative is to go the Java route, which would then mean, I think, that some core modules are placed very high in the hierarchy (the equivalent of the java. subtree), and some others are deprecated to lower subtree (the equivalent of com.sun). Anyway, I agree with Guido on this one -- naming is a contentious issue wrought with long-term implications. Let's not rush into a decision just yet. --david From guido@python.org Sat Mar 25 18:56:20 2000 From: guido@python.org (Guido van Rossum) Date: Sat, 25 Mar 2000 13:56:20 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: Your message of "Sat, 25 Mar 2000 01:35:39 PST." References: Message-ID: <200003251856.NAA09636@eric.cnri.reston.va.us> > I say "do it incrementally" while others say "do it all at once." > Personally, I don't think it is possible to do all at once. As a > corollary, if you can't do it all at once, but you *require* that it be > done all at once, then you have effectively deferred the problem. To put > it another way, Guido has already invented a reason to not do it: he just > requires that it be done all at once. Result: it won't be done. Bullshit, Greg. (I don't normally like to use such strong words, but since you're being confrontational here...) I'm all for doing it incrementally -- but I want the plan for how to do it made up front. That doesn't require all the details to be worked out -- but it requires a general idea about what kind of things we will have in the namespace and what kinds of names they get. An organizing principle, if you like. If we were to decide later that we go for a Java-like deep hierarchy, the network package would have to be moved around again -- what a waste. --Guido van Rossum (home page: http://www.python.org/~guido/) From Moshe Zadka Sat Mar 25 19:35:37 2000 From: Moshe Zadka (Moshe Zadka) Date: Sat, 25 Mar 2000 21:35:37 +0200 (IST) Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: On Sat, 25 Mar 2000, David Ascher wrote: > > My position is that any 3rd party module decides for itself where it wants > > to live -- once we formalized the framework. Consider PyGTK/PyGnome, > > PyQT/PyKDE -- they should live in the UI package too... > > That sounds good in theory, but I can see possible problems down the line: > > 1) The current mapping between package names and directory structure means > that installing a third party package hierarchy in a different place on disk > than the standard library requires some work on the import mechanisms (this > may have been discussed already) and a significant amount of user education. Ummmm.... 1.a) If the work of the import-sig produces something (which I suspect it will), it's more complicated -- you could have JAR-like files with hierarchies inside. 1.b) Installation is the domain of the distutils-sig. I seem to remember Greg Ward saying something about installing packages. > 2) We either need a 'registration' mechanism whereby people can claim a name > in the standard hierarchy or expect conflicts. As far as I can gather, in > the Perl world registration occurs by submission to CPAN. Correct? Yes. But this is no worse then the current situation, where people pick a toplevel name . I agree a registration mechanism would be helpful. > One alternative is to go the Java route, which would then mean, I think, > that some core modules are placed very high in the hierarchy (the equivalent > of the java. subtree), and some others are deprecated to lower subtree (the > equivalent of com.sun). Personally, I *hate* the Java mechanism -- see Stallman's position on why GNU Java packages use gnu.* rather then org.gnu.* for some of the reasons. I really, really, like the Perl mechanism, and I think we would do well to think if something like that wouldn't suit us, with minor modifications. (Remember that lwall copied the Pythonic module mechanism, so Perl and Python modules are quite similar) > Anyway, I agree with Guido on this one -- naming is a contentious issue > wrought with long-term implications. Let's not rush into a decision just > yet. I agree. That's why I pushed out the straw-man proposal. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From bwarsaw@cnri.reston.va.us Sat Mar 25 20:07:27 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Sat, 25 Mar 2000 15:07:27 -0500 (EST) Subject: [Python-Dev] 1.6 job list References: <14555.57858.824301.693390@anthem.cnri.reston.va.us> Message-ID: <14557.7295.451011.36533@anthem.cnri.reston.va.us> I guess I was making a request for a more comprehensive list. People are asking to packagize the entire directory, so I'd like to know what organization they'd propose for all the modules. -Barry From bwarsaw@cnri.reston.va.us Sat Mar 25 20:20:09 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Sat, 25 Mar 2000 15:20:09 -0500 (EST) Subject: [Python-Dev] 1.6 job list References: <200003242129.QAA06510@eric.cnri.reston.va.us> Message-ID: <14557.8057.896921.693908@anthem.cnri.reston.va.us> >>>>> "MZ" == Moshe Zadka writes: MZ> Hmmmmm....this is a big problem. Maybe we need to have more MZ> people with access to the CVS? To make changes like this, you don't just need write access to CVS, you need physical access to the repository filesystem. It's not possible to provide this access to non-CNRI'ers. -Barry From gstein@lyra.org Sat Mar 25 20:40:59 2000 From: gstein@lyra.org (Greg Stein) Date: Sat, 25 Mar 2000 12:40:59 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <14557.8057.896921.693908@anthem.cnri.reston.va.us> Message-ID: On Sat, 25 Mar 2000, Barry A. Warsaw wrote: > >>>>> "MZ" == Moshe Zadka writes: > > MZ> Hmmmmm....this is a big problem. Maybe we need to have more > MZ> people with access to the CVS? > > To make changes like this, you don't just need write access to CVS, > you need physical access to the repository filesystem. It's not > possible to provide this access to non-CNRI'ers. Unless the CVS repository was moved to, say, SourceForge. :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From bwarsaw@cnri.reston.va.us Sat Mar 25 21:00:39 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Sat, 25 Mar 2000 16:00:39 -0500 (EST) Subject: [Python-Dev] module reorg (was: 1.6 job list) References: Message-ID: <14557.10487.736544.336550@anthem.cnri.reston.va.us> >>>>> "MZ" == Moshe Zadka writes: MZ> Personally, I *hate* the Java mechanism -- see Stallman's MZ> position on why GNU Java packages use gnu.* rather then MZ> org.gnu.* for some of the reasons. Actually, it's Per Bothner's position: http://www.gnu.org/software/java/why-gnu-packages.txt and I agree with him. I kind of wished that JimH had chosen simply `python' as JPython's top level package heirarchy, but that's too late to change now. -Barry From bwarsaw@cnri.reston.va.us Sat Mar 25 21:03:08 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Sat, 25 Mar 2000 16:03:08 -0500 (EST) Subject: [Python-Dev] 1.6 job list References: <14557.8057.896921.693908@anthem.cnri.reston.va.us> Message-ID: <14557.10636.504088.517078@anthem.cnri.reston.va.us> >>>>> "GS" == Greg Stein writes: GS> Unless the CVS repository was moved to, say, SourceForge. I didn't want to rehash that, but yes, you're absolutely right! -Barry From gstein@lyra.org Sat Mar 25 21:13:00 2000 From: gstein@lyra.org (Greg Stein) Date: Sat, 25 Mar 2000 13:13:00 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <14557.10636.504088.517078@anthem.cnri.reston.va.us> Message-ID: On Sat, 25 Mar 2000 bwarsaw@cnri.reston.va.us wrote: > >>>>> "GS" == Greg Stein writes: > > GS> Unless the CVS repository was moved to, say, SourceForge. > > I didn't want to rehash that, but yes, you're absolutely right! Me neither, ergo the smiley :-) Just felt inclined to mention it, and I think the conversation stopped last time at that point; not sure it ever was "hashed" :-). But it is only a discussion to raise if checkins-via-CNRI-guys becomes a true bottleneck. Which it hasn't and doesn't look to be. Constrained? Yes. Bottleneck? No. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jeremy-home@cnri.reston.va.us Sat Mar 25 21:22:09 2000 From: jeremy-home@cnri.reston.va.us (Jeremy Hylton) Date: Sat, 25 Mar 2000 16:22:09 -0500 (EST) Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: References: Message-ID: <14557.4689.858620.578102@walden> >>>>> "MH" == Mark Hammond writes: MH> [Greg writes] >> I'm not even going to attempt to try to define a hierarchy for >> all those modules. I count 137 on my local system. Let's say >> that I *do* try... some are going to end up "forced" rather than >> obeying some obvious grouping. If you do it a chunk at a time, >> then you get the obvious, intuitive groupings. Try for more, and >> you just bung it all up. MH> I agree with Greg - every module will not fit into a package. Sure. No one is arguing with that :-). Where I disagree with Greg, is that we shouldn't approach this piecemeal. A greedy algorithm can lead to a locally optimal solution that isn't the right for the whole library. A name or grouping might make sense on its own, but isn't sufficiently clear when taking all 137 odd modules into account. MH> But I also agree with Guido - we _should_ attempt to go through MH> the 137 modules and put the ones that fit into logical MH> groupings. Greg is probably correct with his selection for MH> "net", but a general evaluation is still a good thing. A view MH> of the bigger picture will help to quell debates over the MH> structure, and only leave us with the squabbles over the exact MH> spelling :-) x1.5 on this. I'm not sure which direction you ended up thinking this was (+ or -), but which ever direction it was I like it. Jeremy From gstein@lyra.org Sat Mar 25 21:40:48 2000 From: gstein@lyra.org (Greg Stein) Date: Sat, 25 Mar 2000 13:40:48 -0800 (PST) Subject: [Python-Dev] voting numbers Message-ID: Hey... just thought I'd drop off a description of the "formal" mechanism that the ASF uses for voting since it has been seen here and there on this group :-) +1 "I'm all for it. Do it!" +0 "Seems cool and acceptable, but I can also live without it" -0 "Not sure this is the best thing to do, but I'm not against it." -1 "Veto. And is my reasoning." Strictly speaking, there is no vetoing here, other than by Guido. For changes to Apache (as opposed to bug fixes), it depends on where the development is. Early stages, it is reasonably open and people work straight against CVS (except for really big design changes). Late stage, it requires three +1 votes during discussion of a patch before it goes in. Here on python-dev, it would seem that the votes are a good way to quickly let Guido know people's feelings about topic X or Y. On the patches mailing list, the voting could actually be quite a useful measure for the people with CVS commit access. If a patch gets -1, then its commit should wait until reason X has been resolved. Note that it can be resolved in two ways: the person lifts their veto (after some amount of persuasion or explanation), or the patch is updated to address the concerns (well, unless the veto is against the concept of the patch entirely :-). If a patch gets a few +1 votes, then it can probably go straight in. Note that the Apache guys sometimes say things like "+1 on concept" meaning they like the idea, but haven't reviewed the code. Do we formalize on using these? Not really suggesting that. But if myself (and others) drop these things into mail notes, then we may as well have a description of just what the heck is going on :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From Moshe Zadka Sat Mar 25 23:27:18 2000 From: Moshe Zadka (Moshe Zadka) Date: Sun, 26 Mar 2000 01:27:18 +0200 (IST) Subject: [Python-Dev] Q: repr.py vs. pprint.py Message-ID: Is there any reason to keep two seperate modules with simple-formatting functions? I think pprint is somewhat more sophisticated, but in the worst case, we can just dump them both in the same file (the only thing would be that pprint would export "repr", in addition to "saferepr" (among others). (Just bumped into this in my reorg suggestion) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From Moshe Zadka Sat Mar 25 23:32:38 2000 From: Moshe Zadka (Moshe Zadka) Date: Sun, 26 Mar 2000 01:32:38 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 Message-ID: Here's a second version of the straw man proposal for the reorganization of modules in packages. Note that I'm treating it as a strictly 1.7 proposal, so I don't care a "lot" about backwards compatiblity. I'm down to 4 unhandled modules, which means that if no one objects (and I'm sure someone will ), this can be a plan of action. So get your objections ready guys! net httplib ftplib urllib cgi gopherlib imaplib poplib nntplib smptlib urlparse telnetlib server BaseHTTPServer CGIHTTPServer SimpleHTTPServer SocketServer asynchat asyncore text sgmllib htmllib htmlentitydefs xml whatever the xml-sig puts here mail rfc822 mime MimeWriter mimetools mimify mailcap mimetypes base64 quopri mailbox mhlib binhex parse string re regex reconvert regex_syntax regsub shlex ConfigParser linecache multifile netrc bin gzip zlib aifc chunk image imghdr colorsys imageop imgfile rgbimg yuvconvert sound sndhdr toaiff audiodev sunau sunaudio wave audioop sunaudiodev db anydbm whichdb bsddb dbm dbhash dumbdbm gdbm math bisect fpformat random whrandom cmath math crypt fpectl fpetest array md5 mpz rotor sha time calendar time tzparse sched timing interpreter new py_compile code codeop compileall keyword token tokenize parser dis bdb pdb profile pyclbr tabnanny symbol pstats traceback rlcompleter security Bastion rexec ihooks file dircache path -- a virtual module which would do a from path import * dospath posixpath macpath nturl2path ntpath macurl2path filecmp fileinput StringIO cStringIO glob fnmatch posixfile stat statcache statvfs tempfile shutil pipes popen2 commands dl fcntl lowlevel socket select terminal termios pty tty readline syslog serialize pickle cPickle shelve xdrlib copy copy_reg threads thread threading Queue mutex ui curses Tkinter cmd getpass internal _codecs _locale _tkinter pcre strop posix users pwd grp nis sgi al cd cl fl fm gl misc (what used to be sgimodule.c) sv unicode codecs unicodedata unicodedatabase exceptions os types UserDict UserList user site locale pure formatter getopt signal pprint ========== Modules not handled ============ errno resource operator struct -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From DavidA@ActiveState.com Sat Mar 25 23:39:51 2000 From: DavidA@ActiveState.com (David Ascher) Date: Sat, 25 Mar 2000 15:39:51 -0800 Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: > I really, really, like the Perl mechanism, and I think we would do well > to think if something like that wouldn't suit us, with minor > modifications. The biggest modification which I think is needed to a Perl-like organization is that IMO there is value in knowing what packages are 'blessed' by Guido. In other words, some sort of Q/A mechanism would be good, if it can be kept simple. [Alternatively, let's not put a Q/A mechanism in place and my employer can make money selling that information, the way they do for Perl! =)] > (Remember that lwall copied the Pythonic module mechanism, > so Perl and Python modules are quite similar) That's stretching things a bit (the part after the 'so' doesn't follow from the part before), as there is a lot more to the nature of module systems, but the point is well taken. --david From Moshe Zadka Sun Mar 26 04:44:02 2000 From: Moshe Zadka (Moshe Zadka) Date: Sun, 26 Mar 2000 06:44:02 +0200 (IST) Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: On Sat, 25 Mar 2000, David Ascher wrote: > The biggest modification which I think is needed to a Perl-like organization > is that IMO there is value in knowing what packages are 'blessed' by Guido. > In other words, some sort of Q/A mechanism would be good, if it can be kept > simple. You got a point. Anyone knows how the perl-porters decide what modules to put in source.tar.gz? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping@lfw.org Sun Mar 26 05:01:58 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Sat, 25 Mar 2000 21:01:58 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Moshe Zadka wrote: > Here's a second version of the straw man proposal for the reorganization > of modules in packages. Note that I'm treating it as a strictly 1.7 > proposal, so I don't care a "lot" about backwards compatiblity. Hey, this looks pretty good. For the most part i agree with your layout. Here are a few notes... > net [...] > server [...] Good. > text [...] > xml > whatever the xml-sig puts here > mail > rfc822 > mime > MimeWriter > mimetools > mimify > mailcap > mimetypes > base64 > quopri > mailbox > mhlib > binhex I'm not convinced "mime" needs a separate branch here. (This is the deepest part of the tree, and at three levels small alarm bells went off in my head.) For example, why text.binhex but text.mail.mime.base64? > parse > string > re > regex > reconvert > regex_syntax > regsub > shlex > ConfigParser > linecache > multifile > netrc The "re" module, in particular, will get used a lot, and it's not clear why these all belong under "parse". I suggest dropping "parse" and moving these up. What's "multifile" doing here instead of with the rest of the mail/mime stuff? > bin [...] I like this. Good idea. > gzip > zlib > aifc Shouldn't "aifc" be under "sound"? > image [...] > sound [...] > db [...] Yup. > math [...] > time [...] Looks good. > interpreter [...] How about just "interp"? > security [...] > file [...] > lowlevel > socket > select Why the separate "lowlevel" branch? Why doesn't "socket" go under "net"? > terminal > termios > pty > tty > readline Why does "terminal" belong under "file"? Maybe it could go under "ui"? Hmm... "pty" doesn't really belong. > syslog Hmm... > serialize > pickle > cPickle > shelve > xdrlib > copy > copy_reg "copy" doesn't really fit here under "serialize", and "serialize" is kind of a long name. How about a "data types" package? We could then put "struct", "UserDict", "UserList", "pprint", and "repr" here. data copy copy_reg pickle cPickle shelve xdrlib struct UserDict UserList pprint repr On second thought, maybe "struct" fits better under "bin". > threads [...] > ui [...] Uh huh. > internal > _codecs > _locale > _tkinter > pcre > strop > posix Not sure this is a good idea. It means the Unicode work lives under both "unicode" and "internal._codecs", Tk is split between "ui" and "internal._tkinter", regular expressions are split between "text.re" and "internal.pcre". I can see your motivation for getting "posix" out of the way, but i suspect this is likely to confuse people. > users > pwd > grp > nis Hmm. Yes, i suppose so. > sgi [...] > unicode [...] Indeed. > os > UserDict > UserList > exceptions > types > operator > user > site Yeah, these are all top-level (except maybe UserDict and UserList, see above). > locale I think "locale" belongs under "math" with "fpformat" and the others. It's for numeric formatting. > pure What the heck is "pure"? > formatter This probably goes under "text". > struct See above under "data". I can't decide whether "struct" should be part of "data" or "bin". Hmm... probably "bin" -- since, unlike the serializers under "data", "struct" does not actually specify a serialization format, it only provides fairly low-level operations. Well, this leaves a few system-like modules that didn't really fit elsewhere for me: pty tty termios syslog select getopt signal errno resource They all seem to be Unix-related. How about putting these in a "unix" or "system" package? -- ?!ng "I'm not trying not to answer the question; i'm just not answering it." -- Lenore Snell From Moshe Zadka Sun Mar 26 05:58:34 2000 From: Moshe Zadka (Moshe Zadka) Date: Sun, 26 Mar 2000 07:58:34 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sat, 25 Mar 2000, Ka-Ping Yee wrote: > I'm not convinced "mime" needs a separate branch here. > (This is the deepest part of the tree, and at three levels > small alarm bells went off in my head.) I've had my problems with that too, but it seemed to many modules were mime specific. > For example, why text.binhex but text.mail.mime.base64? Actually, I thought about this (this isn't random at all): base64 encoding is part of the mime standard, together with quoted-printable. Binhex isn't. I don't know if you find it reason enough, and it may be smarter just having a text.encode.{quopri,uu,base64,binhex} > > parse > > string > > re > > regex > > reconvert > > regex_syntax > > regsub > > shlex > > ConfigParser > > linecache > > multifile > > netrc > > The "re" module, in particular, will get used a lot, and from import re Doesn't seem too painful. > and it's not clear why these all belong under "parse". These are all used for parsing data (which does not have some pre-written parser). I had problems with the name too... > What's "multifile" doing here instead of with the rest > of the mail/mime stuff? It's also useful generally. > Shouldn't "aifc" be under "sound"? You're right. > > interpreter > [...] > > How about just "interp"? I've no *strong* feelings, just a vague "don't abbrev." hunch > Why the separate "lowlevel" branch? Because it is -- most Python code will use one of the higher level modules. > Why doesn't "socket" go under "net"? What about UNIX domain sockets? Again, no *strong* opinion, though. > > terminal > > termios > > pty > > tty > > readline > > Why does "terminal" belong under "file"? Because it is (a special kind of file) > > serialize > > > pickle > > cPickle > > shelve > > xdrlib > > copy > > copy_reg > > "copy" doesn't really fit here under "serialize", and > "serialize" is kind of a long name. I beg to disagree -- "copy" is frequently close to serialization, both in the model (serializing to a "data structure") and in real life (that's the way people copy stuff in Java, and UNIX too: think tar cvf - | tar xvf -) What's more, copy_reg is used both for copy and for pickle I do like the idea of "data-types" package, but it needs to be ironed out a bit. > > internal > > _codecs > > _locale > > _tkinter > > pcre > > strop > > posix > > Not sure this is a good idea. It means the Unicode > work lives under both "unicode" and "internal._codecs", > Tk is split between "ui" and "internal._tkinter", > regular expressions are split between "text.re" and > "internal.pcre". I can see your motivation for getting > "posix" out of the way, but i suspect this is likely to > confuse people. You mistook my motivation -- I just want unadvertised modules (AKA internal use modules) to live in a carefully segregate section of the namespace. How would this confuse people? No one imports _tkinter or pcre, so no one would notice the change. > > locale > > I think "locale" belongs under "math" with "fpformat" and > the others. It's for numeric formatting. Only? And anyway, I doubt many people will think like that. > > pure > > What the heck is "pure"? A module that helps work with purify. > > formatter > > This probably goes under "text". You're right. > Well, this leaves a few system-like modules that didn't > really fit elsewhere for me: > > pty > tty > termios > syslog > select > getopt > signal > errno > resource > > They all seem to be Unix-related. How about putting these > in a "unix" or "system" package? "select", "signal" aren't UNIX specific. "getopt" is used for generic argument processing, so it isn't really UNIX specific. And I don't like the name "system" either. But I have no constructive proposals about thos either. so-i'll-just-shut-up-now-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From dan@cgsoftware.com Sun Mar 26 06:05:44 2000 From: dan@cgsoftware.com (Daniel Berlin) Date: Sat, 25 Mar 2000 22:05:44 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: > "select", "signal" aren't UNIX specific. Huh? How not? Can you name a non-UNIX that is providing them? (BeOS wouldn't count, select is broken, and nobody uses signals.) and if you can, is it providing them for something other than "UNIX/POSIX compatibility" > "getopt" is used for generic argument processing, so it isn't really UNIX > specific. It's a POSIX.2 function. I consider that UNIX. > And I don't like the name "system" either. But I have no > constructive proposals about thos either. > > so-i'll-just-shut-up-now-ly y'rs, Z. > -- just-picking-nits-ly y'rs, Dan From Moshe Zadka Sun Mar 26 06:32:33 2000 From: Moshe Zadka (Moshe Zadka) Date: Sun, 26 Mar 2000 08:32:33 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sat, 25 Mar 2000, Daniel Berlin wrote: > > > "select", "signal" aren't UNIX specific. > Huh? > How not? > Can you name a non-UNIX that is providing them? Win32. Both of them. I've even used select there. > and if you can, is it providing them for something other than "UNIX/POSIX > compatibility" I don't know what it provides them for, but I've *used* *select* on *WinNT*. I don't see why Python should make me feel bad when I'm doing that. > > "getopt" is used for generic argument processing, so it isn't really UNIX > > specific. > > It's a POSIX.2 function. > I consider that UNIX. Well, the argument style it processes is not unheard of in other OSes, and it's nice to have command line apps that have a common ui. That's it! "getopt" belongs in the ui package! -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping@lfw.org Sun Mar 26 07:23:45 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Sat, 25 Mar 2000 23:23:45 -0800 (PST) Subject: [Python-Dev] cPickle and cStringIO Message-ID: Are there any objections to including try: from cPickle import * except: pass in pickle and try: from cStringIO import * except: pass in StringIO? -- ?!ng "I'm not trying not to answer the question; i'm just not answering it." -- Lenore Snell From Moshe Zadka Sun Mar 26 07:14:10 2000 From: Moshe Zadka (Moshe Zadka) Date: Sun, 26 Mar 2000 09:14:10 +0200 (IST) Subject: [Python-Dev] cPickle and cStringIO In-Reply-To: Message-ID: On Sat, 25 Mar 2000, Ka-Ping Yee wrote: > Are there any objections to including > > try: > from cPickle import * > except: > pass > > in pickle and > > try: > from cStringIO import * > except: > pass > > in StringIO? Yes, until Python types are subclassable. Currently, one can inherit from pickle.Pickler/Unpickler and StringIO.StringIO. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping@lfw.org Sun Mar 26 07:37:11 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Sat, 25 Mar 2000 23:37:11 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: Okay, here's another shot at it. Notice a few things: - no text.mime package - encoders moved to text.encode - Unix stuff moved to unix package (no file.lowlevel, file.terminal) - aifc moved to bin.sound package - struct moved to bin package - locale moved to math package - linecache moved to interp package - data-type stuff moved to data package - modules in internal package moved to live with their friends Modules that are deprecated or not really intended to be imported are listed in parentheses (to give a better idea of the "real" size of each package). cStringIO and cPickle are parenthesized in hopeful anticipation of agreement on my last message... net urlparse urllib ftplib gopherlib imaplib poplib nntplib smtplib telnetlib httplib cgi server BaseHTTPServer CGIHTTPServer SimpleHTTPServer SocketServer asynchat asyncore text re # general-purpose parsing sgmllib htmllib htmlentitydefs xml whatever the xml-sig puts here mail rfc822 mailbox mhlib encode # i'm also ok with moving text.encode.* to text.* binhex uu base64 quopri MimeWriter mimify mimetools mimetypes multifile mailcap # special-purpose file parsing shlex ConfigParser netrc formatter (string, strop, pcre, reconvert, regex, regex_syntax, regsub) bin gzip zlib chunk struct image imghdr colorsys # a bit unsure, but doesn't go anywhere else imageop imgfile rgbimg yuvconvert sound aifc sndhdr toaiff audiodev sunau sunaudio wave audioop sunaudiodev db anydbm whichdb bsddb dbm dbhash dumbdbm gdbm math math # library functions cmath fpectl # type-related fpetest array mpz fpformat # formatting locale bisect # algorithm: also unsure, but doesn't go anywhere else random # randomness whrandom crypt # cryptography md5 rotor sha time calendar time tzparse sched timing interp new linecache # handling .py files py_compile code # manipulating internal objects codeop dis traceback compileall keyword # interpreter constants token symbol tokenize # parsing parser bdb # development pdb profile pyclbr tabnanny pstats rlcompleter # this might go in "ui"... security Bastion rexec ihooks file dircache path -- a virtual module which would do a from path import * nturl2path macurl2path filecmp fileinput StringIO glob fnmatch stat statcache statvfs tempfile shutil pipes popen2 commands dl (dospath, posixpath, macpath, ntpath, cStringIO) data pickle shelve xdrlib copy copy_reg UserDict UserList pprint repr (cPickle) threads thread threading Queue mutex ui _tkinter curses Tkinter cmd getpass getopt readline users pwd grp nis sgi al cd cl fl fm gl misc (what used to be sgimodule.c) sv unicode _codecs codecs unicodedata unicodedatabase unix errno resource signal posix posixfile socket select syslog fcntl termios pty tty _locale exceptions sys os types user site pure operator -- ?!ng "I'm not trying not to answer the question; i'm just not answering it." -- Lenore Snell From ping@lfw.org Sun Mar 26 07:40:27 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Sat, 25 Mar 2000 23:40:27 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: Hey, while we're at it... as long as we're renaming modules, what do you all think of getting rid of that "lib" suffix? As in: > net > urlparse > url > ftp > gopher > imap > pop > nntp > smtp > telnet > http > cgi > server [...] > text > re # general-purpose parsing > sgml > html > htmlentitydefs [...] "import net.ftp" seems nicer to me than "import ftplib". We could also just stick htmlentitydefs.entitydefs in html and deprecate htmlentitydefs. -- ?!ng "I'm not trying not to answer the question; i'm just not answering it." -- Lenore Snell From ping@lfw.org Sun Mar 26 07:53:06 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Sat, 25 Mar 2000 23:53:06 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Moshe Zadka wrote: > > For example, why text.binhex but text.mail.mime.base64? > > Actually, I thought about this (this isn't random at all): base64 encoding > is part of the mime standard, together with quoted-printable. Binhex > isn't. I don't know if you find it reason enough, and it may be smarter > just having a text.encode.{quopri,uu,base64,binhex} I think i'd like that better, yes. > > and it's not clear why these all belong under "parse". > > These are all used for parsing data (which does not have some pre-written > parser). I had problems with the name too... And parsing is what the "text" package is about anyway. I say move them up. (See the layout in my other message. Notice most of the regular-expression stuff is deprecated anyway, so it's not like there are really that many.) > > Why doesn't "socket" go under "net"? > > What about UNIX domain sockets? Again, no *strong* opinion, though. Bleck, you're right. Well, i think we just have to pick one or the other here, and i think most people would guess "net" first. (You can think of it as IPC, and file IPC-related things under then "net" category...?) > > Why does "terminal" belong under "file"? > > Because it is (a special kind of file) Only in Unix. It's Unix that likes to think of all things, including terminals, as files. > I do like the idea of "data-types" package, but it needs to be ironed > out a bit. See my other message for a possible suggested hierarchy... > > > internal [...] > You mistook my motivation -- I just want unadvertised modules (AKA > internal use modules) to live in a carefully segregate section of the > namespace. How would this confuse people? No one imports _tkinter or pcre, > so no one would notice the change. I think it makes more sense to classify modules by their topic rather than their exposure. (For example, you wouldn't move deprecated modules to a "deprecated" package.) Keep in mind that (well, at least to me) the main point of any naming hierarchy is to avoid name collisions. "internal" doesn't really help that purpose. You also want to be sure (or as sure as you can) that modules will be obvious to find in the hierarchy. An "internal" package creates a distinction orthogonal to the topic-matter distinction we're using for the rest of the packages, which *potentially* introduces the question "well... is this module internal or not?" for every other module. Yes, admittedly this is only "potentially", but i hope you see the abstract point i'm trying to make... > > > locale > > > > I think "locale" belongs under "math" with "fpformat" and > > the others. It's for numeric formatting. > > Only? And anyway, I doubt many people will think like that. Yeah, it is pretty much only for numeric formatting. The more generic locale stuff seems to be in _locale. > > They all seem to be Unix-related. How about putting these > > in a "unix" or "system" package? > > "select", "signal" aren't UNIX specific. Yes, but when they're available on other systems they're an attempt to emulate Unix or Posix functionality, aren't they? > Well, the argument style it processes is not unheard of in other OSes, and > it's nice to have command line apps that have a common ui. That's it! > "getopt" belongs in the ui package! I like ui.getopt. It's a pretty good idea. -- ?!ng "I'm not trying not to answer the question; i'm just not answering it." -- Lenore Snell From Moshe Zadka Sun Mar 26 08:05:49 2000 From: Moshe Zadka (Moshe Zadka) Date: Sun, 26 Mar 2000 10:05:49 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: +1. I've had minor nits, but nothing is perfect, and this is definitely "good enough". Now we'll just have to wait until the BDFL says something... -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From Moshe Zadka Sun Mar 26 08:06:59 2000 From: Moshe Zadka (Moshe Zadka) Date: Sun, 26 Mar 2000 10:06:59 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sat, 25 Mar 2000, Ka-Ping Yee wrote: > Hey, while we're at it... as long as we're renaming modules, > what do you all think of getting rid of that "lib" suffix? +0 -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From Moshe Zadka Sun Mar 26 08:19:34 2000 From: Moshe Zadka (Moshe Zadka) Date: Sun, 26 Mar 2000 10:19:34 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sat, 25 Mar 2000, Ka-Ping Yee wrote: > > "select", "signal" aren't UNIX specific. > > Yes, but when they're available on other systems they're an > attempt to emulate Unix or Posix functionality, aren't they? I thinki "signal" is ANSI C, but I'm not sure. no-other-comments-ly y'rs, Z. From gstein@lyra.org Sun Mar 26 11:52:53 2000 From: gstein@lyra.org (Greg Stein) Date: Sun, 26 Mar 2000 03:52:53 -0800 (PST) Subject: [Python-Dev] Windows and PyObject_NEW In-Reply-To: <1258123323-10623548@hypernet.com> Message-ID: On Sat, 25 Mar 2000, Gordon McMillan wrote: >... > I doubt very much that you would break anybody's code by > removing the Windows specific behavior. > > But it seems to me that unless Python always uses the > default malloc, those of us who write C++ extensions will have > to override operator new? I'm not sure. I've used placement > new to allocate objects in a memory mapped file, but I've never > tried to muck with the global memory policy of C++ program. Actually, the big problem arises when you have debug vs. non-debug DLLs. malloc() uses different heaps based on the debug setting. As a result, it is a bad idea to call malloc() from a debug DLL and free() it from a non-debug DLL. If the allocation pattern is fixed, then things may be okay. IF. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Sun Mar 26 12:02:40 2000 From: gstein@lyra.org (Greg Stein) Date: Sun, 26 Mar 2000 04:02:40 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Moshe Zadka wrote: >... > [ tree ] This is a great start. I have two comments: 1) keep it *very* shallow. depth just makes it conceptually difficult. 2) you're pushing too hard. modules do not *have* to go into a package. there are some placements that you've made which are very questionable... it appears they are done for movement's sake rather than for being "right" I'm off to sleep, but will look into specific comments tomorrow or so. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Sun Mar 26 12:14:32 2000 From: gstein@lyra.org (Greg Stein) Date: Sun, 26 Mar 2000 04:14:32 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003251856.NAA09636@eric.cnri.reston.va.us> Message-ID: On Sat, 25 Mar 2000, Guido van Rossum wrote: > > I say "do it incrementally" while others say "do it all at once." > > Personally, I don't think it is possible to do all at once. As a > > corollary, if you can't do it all at once, but you *require* that it be > > done all at once, then you have effectively deferred the problem. To put > > it another way, Guido has already invented a reason to not do it: he just > > requires that it be done all at once. Result: it won't be done. > > Bullshit, Greg. (I don't normally like to use such strong words, but > since you're being confrontational here...) Fair enough, and point accepted. Sorry. I will say, tho, that you've taken this slightly out of context. The next paragraph explicitly stated that I don't believe you had this intent. I just felt that coming up with a complete plan before doing anything would be prone to failure. You asked to invent a new reason :-), so I said you had one already :-) Confrontational? Yes, guilty as charged. I was a bit frustrated. > I'm all for doing it incrementally -- but I want the plan for how to > do it made up front. That doesn't require all the details to be > worked out -- but it requires a general idea about what kind of things > we will have in the namespace and what kinds of names they get. An > organizing principle, if you like. If we were to decide later that we > go for a Java-like deep hierarchy, the network package would have to > be moved around again -- what a waste. All righty. So I think there is probably a single question that I have here: Moshe posted a large breakdown of how things could be packaged. He and Ping traded a number of comments, and more will be coming as soon as people wake up :-) However, if you are only looking for a "general idea", then should python-dev'ers nit pick the individual modules, or just examine the general breakdown and hierarchy? thx, -g -- Greg Stein, http://www.lyra.org/ From Moshe Zadka Sun Mar 26 12:09:02 2000 From: Moshe Zadka (Moshe Zadka) Date: Sun, 26 Mar 2000 14:09:02 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Greg Stein wrote: > This is a great start. I have two comments: > > 1) keep it *very* shallow. depth just makes it conceptually difficult. I tried, and Ping shallowed it even more. BTW: Anyone who cares to comment, please comment on Ping's last suggestion. I pretty much agree with the changes he made. > 2) you're pushing too hard. modules do not *have* to go into a package. > there are some placements that you've made which are very > questionable... it appears they are done for movement's sake rather > than for being "right" Well, I'm certainly sorry I gave that impression -- the reason I wans't "right" wasn't that, it was more my desire to be "fast" -- I wanted to have some proposal out the door, since it is harder to argue about something concrete. The biggest prrof of concept that we all agree is that no one seriously took objections to anything -- there were just some minor nits to pick. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From Moshe Zadka Sun Mar 26 12:11:10 2000 From: Moshe Zadka (Moshe Zadka) Date: Sun, 26 Mar 2000 14:11:10 +0200 (IST) Subject: [Python-Dev] 1.6 job list In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Greg Stein wrote: > Moshe posted a large breakdown of how things could be packaged. He and > Ping traded a number of comments, and more will be coming as soon as > people wake up :-) Just a general comment -- it's so much fun to live in a different zone then all of you guys. just-wasting-time-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From gstein@lyra.org Sun Mar 26 12:23:57 2000 From: gstein@lyra.org (Greg Stein) Date: Sun, 26 Mar 2000 04:23:57 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Moshe Zadka wrote: > On Sun, 26 Mar 2000, Greg Stein wrote: >... > > 2) you're pushing too hard. modules do not *have* to go into a package. > > there are some placements that you've made which are very > > questionable... it appears they are done for movement's sake rather > > than for being "right" > > Well, I'm certainly sorry I gave that impression -- the reason I wans't > "right" wasn't that, it was more my desire to be "fast" -- I wanted to > have some proposal out the door, since it is harder to argue about > something concrete. The biggest prrof of concept that we all agree is that > no one seriously took objections to anything -- there were just some minor > nits to pick. Not something to apologize for! :-) Well, the indicator was the line in your original post about "unhandled modules" and the conversation between you and Ping with statements along the lines of "wasn't sure where to put this." I say just leave it then :-) If a module does not make *obvious* sense to be in a package, then it should not be there. For example: locale. That is not about numbers or about text. It has general utility. If there was an i18n package, then it would go there. Otherwise, don't force it somewhere else. Other packages are similar, so don't single out my comment about locale. Cheers, -g -- Greg Stein, http://www.lyra.org/ From DavidA@ActiveState.com Sun Mar 26 18:09:15 2000 From: DavidA@ActiveState.com (David Ascher) Date: Sun, 26 Mar 2000 10:09:15 -0800 Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: > If a module does not make *obvious* sense to be in a package, then it > should not be there. For example: locale. That is not about numbers or > about text. It has general utility. If there was an i18n package, then it > would go there. Otherwise, don't force it somewhere else. Other packages > are similar, so don't single out my comment about locale. I maintain that a general principle re: what the aim of this reorg is is needed before the partitioning of the space can make sense. What Moshe and Ping have is a good stab at partitioning of a subspace of the total space of Python modules and packages, i.e., the standard library. If we limit the aim of the reorg to cover just that subspace, then that's fine and Ping's proposal seems grossly fine to me. If we want to have a Perl-like packaging, then we _need_ to take into account all known Python modules of general utility, such as the database modules, the various GUI packages, the mx* packages, Aaron's work, PIL, etc., etc. Ignoring those means that the dataset used to decide the partitioning function is highly biased. Given the larger dataset, locale might very well fit in a not-toplevel location. I know that any organizational scheme is going to be optimal at best at its inception, and that as history happens, it will become suboptimal. However, it's important to know what the space being partitioned is supposed to look like. A final comment: there's a history and science to this kind of organization, which is part of library science. I suspect there is quite a bit of knowledge available as to organizing principles to do it right. It would be nice if someone could research it a bit and summarize the basic principles to the rest of us. I agree with Greg that we need high-level input from Guido on this. --david 'academic today' ascher From ping@lfw.org Sun Mar 26 20:34:11 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Sun, 26 Mar 2000 12:34:11 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Greg Stein wrote: > > If a module does not make *obvious* sense to be in a package, then it > should not be there. For example: locale. That is not about numbers or > about text. It has general utility. If there was an i18n package, then it > would go there. Otherwise, don't force it somewhere else. Other packages > are similar, so don't single out my comment about locale. I goofed. I apologize. Moshe and Greg are right: locale isn't just about numbers. I just read the comment at the top of locale.py: "Support for number formatting using the current locale settings" and didn't notice the from _locale import * a couple of lines down. "import locale; dir(locale)" didn't work for me because for some reason there's no _locale built-in on my system (Red Hat 6.1, python-1.5.1-10). So i looked for 'def's and they all looked like they had to do with numeric formatting. My mistake. "locale", at least, belongs at the top level. Other candidates for top-level: bisect # algorithm struct # more general than "bin" or "data" colorsys # not really just for image file formats yuvconvert # not really just for image file formats rlcompleter # not really part of the interpreter dl # not really just about files Alternatively, we could have: ui.rlcompleter, unix.dl (It would be nice, by the way, to replace "bisect" with an "algorithm" module containing some nice pedagogical implementations of things like bisect, quicksort, heapsort, Dijkstra's algorithm etc.) The following also could be left at the top-level, since they seem like applications (i.e. they probably won't get imported by code, only interactively). No strong opinion on this. bdb pdb pyclbr tabnanny profile pstats Also... i was avoiding calling the "unix" package "posix" because we already have a "posix" module. But wait... the proposed tree already contains "math" and "time" packages. If there is no conflict (is there a conflict?) then the "unix" package should probably be named "posix". -- ?!ng "In the sciences, we are now uniquely privileged to sit side by side with the giants on whose shoulders we stand." -- Gerald Holton From Moshe Zadka Mon Mar 27 05:35:23 2000 From: Moshe Zadka (Moshe Zadka) Date: Mon, 27 Mar 2000 07:35:23 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Ka-Ping Yee wrote: > The following also could be left at the top-level, since > they seem like applications (i.e. they probably won't > get imported by code, only interactively). No strong > opinion on this. > > bdb > pdb > pyclbr > tabnanny > profile > pstats Let me just state my feelings about the interpreter package: since Python programs are probably the most suited to reasoning about Python programs (among other things, thanks to the strong introspection capabilities of Python), many Python modules were written to supply a convenient interface to that introspection. These modules are *only* needed by programs dealing with Python programs, and hence should live in a well defined part of the namespace. I regret calling it "interpreter" though: "Python" is a better name (something like that java.lang package) > Also... i was avoiding calling the "unix" package "posix" > because we already have a "posix" module. But wait... the > proposed tree already contains "math" and "time" packages. Yes. That was a hard decision I made, and I'm sort of waiting for Guido to veto it: it would negate the easy backwards compatible path of providing a toplevel module for each module which is moved somewhere else which does "from import *". > If there is no conflict (is there a conflict?) then the > "unix" package should probably be named "posix". I hardly agree. "dl", for example, is a common function on unices, but it is not part of the POSIX standard. I think "posix" module should have POSIX fucntions, and the "unix" package should deal with functinality available on real-life unices. standards-are-fun-aren't-they-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From pf@artcom-gmbh.de Mon Mar 27 06:52:25 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Mon, 27 Mar 2000 08:52:25 +0200 (MEST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: from Moshe Zadka at "Mar 27, 2000 7:35:23 am" Message-ID: Hi! Moshe Zadka wrote: > Yes. That was a hard decision I made, and I'm sort of waiting for Guido to > veto it: it would negate the easy backwards compatible path of providing > a toplevel module for each module which is moved somewhere else which does > "from import *". If the result of this renaming initiative will be that I can't use import sys, os, time, re, struct, cPickle, parser import Tkinter; Tk=Tkinter; del Tkinter anymore in Python 1.x and instead I have to change this into (for example): form posix import time from text import re from bin import struct from Python import parser from ui import Tkinter; ... ... I would really really *HATE* this change! [side note: The 'from MODULE import ...' form is evil and I have abandoned its use in favor of the 'import MODULE' form in 1987 or so, as our Modula-2 programs got bigger and bigger. With 20+ software developers working on a ~1,000,000 LOC of Modula-2 software system, this decision proofed itself well. The situation with Python is comparable. Avoiding 'from ... import' rewards itself later, when your software has grown bigger and when it comes to maintaince by people not familar with the used modules. ] May be I didn't understand what this new subdivision of the standard library should achieve. The library documentation provides a existing logical subdivision into chapters, which group the library into several kinds of services. IMO this subdivision could be discussed and possibly revised. But at the moment I got the impression, that it was simply ignored. Why? What's so bad with it? Why is a subdivision on the documentation level not sufficient? Why should modules be moved into packages? I don't get it. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From Moshe Zadka Mon Mar 27 07:09:18 2000 From: Moshe Zadka (Moshe Zadka) Date: Mon, 27 Mar 2000 09:09:18 +0200 (IST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: Message-ID: On Mon, 27 Mar 2000, Peter Funk wrote: > If the result of this renaming initiative will be that I can't use > import sys, os, time, re, struct, cPickle, parser > import Tkinter; Tk=Tkinter; del Tkinter > anymore in Python 1.x and instead I have to change this into (for example): > form posix import time from time import time > from text import re > from bin import struct > from Python import parser > from ui import Tkinter; ... Yes. > I would really really *HATE* this change! Well, I'm sorry to hear that -- I'm waiting for this change to happen for a long time. > [side note: > The 'from MODULE import ...' form is evil and I have abandoned its use > in favor of the 'import MODULE' form in 1987 or so, as our Modula-2 > programs got bigger and bigger. With 20+ software developers working > on a ~1,000,000 LOC of Modula-2 software system, this decision > proofed itself well. Well, yes. Though syntactically equivalent, from package import module Is the recommended way to use packages, unless there is a specific need. > May be I didn't understand what this new subdivision of the standard > library should achieve. Namespace cleanup. Too many toplevel names seem evil to some of us. > Why is a subdivision on the documentation level not sufficient? > Why should modules be moved into packages? I don't get it. To allow a greater number of modules to live without worrying about namespace collision. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping@lfw.org Mon Mar 27 08:08:57 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 27 Mar 2000 00:08:57 -0800 (PST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: Message-ID: Hi, Peter. Your question as to the purpose of module reorganization is well worth asking, and perhaps we should stand back for a while and try to really answer it well first. I think that my answers for your question would be: 1. To alleviate potential namespace collision. 2. To permit talking about packages as a unit. I hereby solicit other reasons from the rest of the group... Reason #1 is not a serious problem yet, but i think i've seen a few cases where it might start to be an issue. Reason #2 has to do with things like assigning people responsibility for taking care of a particular package, or making commitments about which packages will be available with which distributions or platforms. Hence, for example, the idea of the "unix" package. Neither of these reasons necessitate a deep and holy hierarchy, so we certainly want to keep it shallow and simple if we're going to do this at all. > If the result of this renaming initiative will be that I can't use > import sys, os, time, re, struct, cPickle, parser > import Tkinter; Tk=Tkinter; del Tkinter > anymore in Python 1.x and instead I have to change this into (for example): > form posix import time > from text import re > from bin import struct > from Python import parser > from ui import Tkinter; ... Won't import sys, os, time.time, text.re, bin.struct, data.pickle, python.parser also work? ...i hope? > The library documentation provides a existing logical subdivision into > chapters, which group the library into several kinds of services. > IMO this subdivision could be discussed and possibly revised. > But at the moment I got the impression, that it was simply ignored. > Why? What's so bad with it? I did look at the documentation for some guidance in arranging the modules, though admittedly it didn't direct me much. -- ?!ng "In the sciences, we are now uniquely privileged to sit side by side with the giants on whose shoulders we stand." -- Gerald Holton From pf@artcom-gmbh.de Mon Mar 27 08:35:50 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Mon, 27 Mar 2000 10:35:50 +0200 (MEST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: from Ka-Ping Yee at "Mar 27, 2000 0: 8:57 am" Message-ID: Hi! > > import sys, os, time, re, struct, cPickle, parser [...] Ka-Ping Yee: > Won't > > import sys, os, time.time, text.re, bin.struct, data.pickle, python.parser > > also work? ...i hope? That is even worse. So not only the 'import' sections, which I usually keep at the top of my modules, have to be changed: This way for example 're.compile(...' has to be changed into 'text.re.compile(...' all over the place possibly breaking the 'Maximum Line Length' styleguide rule. Regards, Peter From pf@artcom-gmbh.de Mon Mar 27 10:16:48 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Mon, 27 Mar 2000 12:16:48 +0200 (MEST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? Message-ID: String objects have grown methods since 1.5.2. So it makes sense to provide a class 'UserString' similar to 'UserList' and 'UserDict', so that there is a standard base class to inherit from, if someone has the desire to extend the string methods. What do you think? Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From fdrake@acm.org Mon Mar 27 15:12:55 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 27 Mar 2000 10:12:55 -0500 (EST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: References: Message-ID: <14559.31351.783771.472320@weyr.cnri.reston.va.us> Moshe Zadka writes: > Well, I'm certainly sorry I gave that impression -- the reason I wans't > "right" wasn't that, it was more my desire to be "fast" -- I wanted to > have some proposal out the door, since it is harder to argue about > something concrete. The biggest prrof of concept that we all agree is that > no one seriously took objections to anything -- there were just some minor > nits to pick. It's *really easy* to argue about something concrete. ;) It's just harder to misunderstand the specifics of the proposal. It's too early to say what people think; not enough people have had time to look at the proposals yet. On the other hand, I think its great -- that we have a proposal to discuss. I'll make my comments after I've read through the last version posted when I have time to read these. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake@acm.org Mon Mar 27 16:20:43 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 27 Mar 2000 11:20:43 -0500 (EST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: References: Message-ID: <14559.35419.793906.868645@weyr.cnri.reston.va.us> Peter Funk said: > The library documentation provides a existing logical subdivision into > chapters, which group the library into several kinds of services. > IMO this subdivision could be discussed and possibly revised. > But at the moment I got the impression, that it was simply ignored. > Why? What's so bad with it? Ka-Ping Yee writes: > I did look at the documentation for some guidance in arranging > the modules, though admittedly it didn't direct me much. The library reference is pretty well disorganized at this point. I want to improve that for the 1.6 docs. I received a suggestion a few months back, but haven't had a chance to dig into it, or even respond to the email. ;( -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From jeremy@cnri.reston.va.us Mon Mar 27 17:14:46 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Mon, 27 Mar 2000 12:14:46 -0500 (EST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: References: Message-ID: <14559.38662.835289.499610@goon.cnri.reston.va.us> >>>>> "PF" == Peter Funk writes: PF> That is even worse. So not only the 'import' sections, which I PF> usually keep at the top of my modules, have to be changed: This PF> way for example 're.compile(...' has to be changed into PF> 'text.re.compile(...' all over the place possibly breaking the PF> 'Maximum Line Length' styleguide rule. There is nothing wrong with changing only the import statement: from text import re The only problematic use of from ... import ... is from text.re import * which adds an unspecified set of names to the current namespace. Jeremy From Moshe Zadka Mon Mar 27 17:59:34 2000 From: Moshe Zadka (Moshe Zadka) Date: Mon, 27 Mar 2000 19:59:34 +0200 (IST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <14559.35419.793906.868645@weyr.cnri.reston.va.us> Message-ID: Peter Funk said: > The library documentation provides a existing logical subdivision into > chapters, which group the library into several kinds of services. > IMO this subdivision could be discussed and possibly revised. > But at the moment I got the impression, that it was simply ignored. > Why? What's so bad with it? Ka-Ping Yee writes: > I did look at the documentation for some guidance in arranging > the modules, though admittedly it didn't direct me much. Fred L. Drake, Jr. writes: > The library reference is pretty well disorganized at this point. I > want to improve that for the 1.6 docs. Let me just mention where my inspirations came from: shame of shames, it came from Perl. It's hard to use Perl's organization as is, because it doesn't (view itself) as a general purpose langauge: so things like CGI.pm are toplevel, and regex's are part of the syntax. However, there are a lot of good hints there. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From klm@digicool.com Mon Mar 27 18:31:01 2000 From: klm@digicool.com (Ken Manheimer) Date: Mon, 27 Mar 2000 13:31:01 -0500 (EST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <14559.38662.835289.499610@goon.cnri.reston.va.us> Message-ID: On Mon, 27 Mar 2000, Jeremy Hylton wrote: > >>>>> "PF" == Peter Funk writes: > > PF> That is even worse. So not only the 'import' sections, which I > PF> usually keep at the top of my modules, have to be changed: This > PF> way for example 're.compile(...' has to be changed into > PF> 'text.re.compile(...' all over the place possibly breaking the > PF> 'Maximum Line Length' styleguide rule. > > There is nothing wrong with changing only the import statement: > from text import re > > The only problematic use of from ... import ... is > from text.re import * > which adds an unspecified set of names to the current namespace. Actually, i think there's another important gotcha with from .. import which may be contributing to peter's sense of concern, but which i don't think needs to in this case. I also thought we had discussed providing transparency in general, at least of the 1.x series. ? The other gotcha i mean applies when the thing you're importing is a terminal, ie a non-module. Then, changes to the assignments of the names in the original module aren't reflected in the names you've imported - they're decoupled from the namespace of the original module. When the thing you're importing is, itself, a module, the same kind of thing *can* happen, but you're more generally concerned with tracking revisions to the contents of those modules, which is tracked ok in the thing you "from .. import"ed. I thought the other problem peter was objecting to, having to change the import sections in the first place, was going to be avoided in the 1.x series (if we do this kind of thing) by inherently extending the import path to include all the packages, so people need not change their code? Seems like most of this would be fairly transparent w.r.t. the operation of existing applications. Have i lost track of the discussion? Ken klm@digicool.com From Moshe Zadka Mon Mar 27 18:55:35 2000 From: Moshe Zadka (Moshe Zadka) Date: Mon, 27 Mar 2000 20:55:35 +0200 (IST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: Message-ID: On Mon, 27 Mar 2000, Ken Manheimer wrote: > I also thought we had discussed providing > transparency in general, at least of the 1.x series. ? Yes, but it would be clearly marked as deprecated in 1.7, print out error messages in 1.8 and won't work at all in 3000. (That's my view on the point, but I got the feeling this is where the wind is blowing). So the transperancy mechanism is intended only to be "something backwards compatible"...it's not supposed to be a reason why things are ugly (I don't think they are, though). BTW: the transperancy mechanism I suggested was not pushing things into the import path, but rather having toplevel modules which "from import *" from the modules that were moved. E.g., re.py would contain # Deprecated: don't import re, it won't work in future releases from text.re import * -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From skip@mojam.com (Skip Montanaro) Mon Mar 27 19:34:39 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 27 Mar 2000 13:34:39 -0600 (CST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: References: Message-ID: <14559.47055.604042.381126@beluga.mojam.com> Peter> The library documentation provides a existing logical subdivision Peter> into chapters, which group the library into several kinds of Peter> services. Perhaps it makes sense to revise the library reference manual's documentation to reflect the proposed package hierarchy once it becomes concrete. -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From skip@mojam.com (Skip Montanaro) Mon Mar 27 19:52:08 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 27 Mar 2000 13:52:08 -0600 (CST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: References: Message-ID: <14559.48104.34263.680278@beluga.mojam.com> Responding to an early item in this thread and trying to adapt to later items... Ping wrote: I'm not convinced "mime" needs a separate branch here. (This is the deepest part of the tree, and at three levels small alarm bells went off in my head.) It's not clear that mime should be beneath text/mail. Moshe moved it up a level, but not the way I would have done it. I think the mime stuff still belongs in a separate mime package. I wouldn't just sprinkle the modules under text. I see two possibilities: text>mime net>mime I prefer net>mime, because MIME and its artifacts are used heavily in networked applications where the content being transferred isn't text. -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From fdrake@acm.org Mon Mar 27 20:05:32 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 27 Mar 2000 15:05:32 -0500 (EST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <14559.47055.604042.381126@beluga.mojam.com> References: <14559.47055.604042.381126@beluga.mojam.com> Message-ID: <14559.48908.354425.313775@weyr.cnri.reston.va.us> Skip Montanaro writes: > Perhaps it makes sense to revise the library reference manual's > documentation to reflect the proposed package hierarchy once it becomes > concrete. I'd go for this. ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido@python.org Mon Mar 27 20:43:06 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 27 Mar 2000 15:43:06 -0500 Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? Message-ID: <200003272043.PAA18445@eric.cnri.reston.va.us> The _tkinter.c source code is littered with #ifdefs that mostly center around distinguishing between Tcl/Tk 8.0 and older versions. The two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2. Would it be reasonable to assume that everybody is using at least Tcl/Tk version 8.0? This would simplify the code somewhat. Or should I ask this in a larger forum? --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Mon Mar 27 20:59:04 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 27 Mar 2000 15:59:04 -0500 (EST) Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? In-Reply-To: <200003272043.PAA18445@eric.cnri.reston.va.us> References: <200003272043.PAA18445@eric.cnri.reston.va.us> Message-ID: <14559.52120.633384.651377@weyr.cnri.reston.va.us> Guido van Rossum writes: > The _tkinter.c source code is littered with #ifdefs that mostly center > around distinguishing between Tcl/Tk 8.0 and older versions. The > two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2. > > Would it be reasonable to assume that everybody is using at least > Tcl/Tk version 8.0? This would simplify the code somewhat. Simplify! It's more important that the latest versions are supported than pre-8.0 versions. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gstein@lyra.org Mon Mar 27 21:31:30 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 27 Mar 2000 13:31:30 -0800 (PST) Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? In-Reply-To: <14559.52120.633384.651377@weyr.cnri.reston.va.us> Message-ID: On Mon, 27 Mar 2000, Fred L. Drake, Jr. wrote: > Guido van Rossum writes: > > The _tkinter.c source code is littered with #ifdefs that mostly center > > around distinguishing between Tcl/Tk 8.0 and older versions. The > > two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2. > > > > Would it be reasonable to assume that everybody is using at least > > Tcl/Tk version 8.0? This would simplify the code somewhat. > > Simplify! It's more important that the latest versions are > supported than pre-8.0 versions. I strongly agree. My motto is, "if the latest Python version doesn't work for you, then don't upgrade!" This is also Open Source -- they can easily get the source to the old _Tkinter if they want new Python + 7.x support. If you ask in a larger forum, then you are certain to get somebody to say, "yes... I need that support." Then you have yourself a quandary :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From Fredrik Lundh" Message-ID: <009801bf9835$f85b87e0$34aab5d4@hagrid> Guido van Rossum wrote: > The _tkinter.c source code is littered with #ifdefs that mostly center > around distinguishing between Tcl/Tk 8.0 and older versions. The > two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2. >=20 > Would it be reasonable to assume that everybody is using at least > Tcl/Tk version 8.0? This would simplify the code somewhat. yes. if people are using older versions, they can always use the version shipped with 1.5.2. (has anyone actually tested that one with pre-8.0 versions, btw?) > Or should I ask this in a larger forum? maybe. maybe not. From jack@oratrix.nl Mon Mar 27 21:58:56 2000 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 27 Mar 2000 23:58:56 +0200 Subject: [Python-Dev] 1.6 job list In-Reply-To: Message by Moshe Zadka , Sat, 25 Mar 2000 12:16:23 +0200 (IST) , Message-ID: <20000327215901.ABA08F58C1@oratrix.oratrix.nl> Recently, Moshe Zadka said: > Here's a reason: there shouldn't be changes we'll retract later -- we > need to come up with the (more or less) right hierarchy the first time, > or we'll do a lot of work for nothing. I think I disagree here (hmm, it's probably better to say that I agree, but I agree on a tangent:-). I think we can be 100% sure that we're wrong the first time around, and we should plan for that. One of the reasons why were' wrong is because the world is moving on. A module that at this point in time will reside at some level in the hierarchy may in a few years (or shorter) be one of a large family and be beter off elsewhere in the hierarchy. It would be silly if it would have to stay where it was because of backward compatability. If we plan for being wrong we can make the mistakes less painful. I think that a simple scheme where a module can say "I'm expecting the Python 1.6 namespace layout" would make transition to a completely different Python 1.7 namespace layout a lot less painful, because some agent could do the mapping. This can either happen at runtime (through a namespace, or through an import hook, or probably through other tricks as well) or optionally by a script that would do the translations. Of course this doesn't mean we should go off and hack in a couple of namespaces (hence my "agreeing on a tangent"), but it does mean that I think Gregs idea of not wanting to change everything at once has merit. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From pf@artcom-gmbh.de Mon Mar 27 22:11:39 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Tue, 28 Mar 2000 00:11:39 +0200 (MEST) Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? In-Reply-To: <200003272043.PAA18445@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 27, 2000 3:43: 6 pm" Message-ID: Guido van Rossum: > Or should I ask this in a larger forum? Don't ask. Simply tell the people on comp.lang.python that support for the ancient Tcl/Tk versions < 8.0 will be dropped in Python 1.6. Period. ;-) Regards, Peter From guido@python.org Mon Mar 27 22:17:33 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 27 Mar 2000 17:17:33 -0500 Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? In-Reply-To: Your message of "Tue, 28 Mar 2000 00:11:39 +0200." References: Message-ID: <200003272217.RAA28910@eric.cnri.reston.va.us> > Don't ask. Simply tell the people on comp.lang.python that support > for the ancient Tcl/Tk versions < 8.0 will be dropped in Python 1.6. > Period. ;-) OK, I'm convinced. We will pre-8.0 support. Could someone submit a set of patches? It would make sense to call #error if a pre-8.0 version is detected at compile-time! --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond@skippinet.com.au Mon Mar 27 23:02:21 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Tue, 28 Mar 2000 09:02:21 +1000 Subject: [Python-Dev] Windows and PyObject_NEW In-Reply-To: <200003251459.PAA09181@python.inrialpes.fr> Message-ID: Sorry for the delay, but Gordon's reply was accurate so should have kept you going ;-) > I've been reading Jeffrey Richter's "Advanced Windows" last night in order > to try understanding better why PyObject_NEW is implemented > differently for > Windows. So that is where the heaps discussion came from :-) The problem is simply "too many heaps are available". > Again, I feel uncomfortable with this, especially now, when > I'm dealing with the memory aspect of Python's object > constructors/desctrs. It is this exact reason it was added in the first place. I believe this code predates the "_d" convention on Windows. AFAIK, this could could be removed today and everything should work (but see below why it probably wont) MSVC allows you to choose from a number of CRT versions. Only in one of these versions is the CRTL completely shared between the .EXE and all the various .DLLs in the application. What was happening is that this macro ended up causing the "malloc" for a new object to occur in Python15.dll, but the Python type system meant that tp_dealloc() (to cleanup the object) was called in the DLL implementing the new type. Unless Python15.dll and our extension DLL shared the same CRTL (and hence the same malloc heap, fileno table etc) things would die. The DLL version of "free()" would complain, as it had never seen the pointer before. This change meant the malloc() and the free() were both implemented in the same DLL/EXE This was particularly true with Debug builds. MSVC's debug CRTL implementations have some very nice debugging features (guard-blocks, block validity checks with debugger breapoints when things go wrong, leak tracking, etc). However, this means they use yet another heap. Mixing debug builds with release builds in Python is a recipe for disaster. Theoretically, the problem has largely gone away now that a) we have seperate "_d" versions and b) the "official" postition is to use the same CRTL as Python15.dll. However, is it still a minor FAQ on comp.lang.python why PyRun_ExecFile (or whatever) fails with mysterious errors - the reason is exactly the same - they are using a different CRTL, so the CRTL can't map the file pointers correctly, and we get unexplained IO errors. But now that this macro hides the malloc problem, there may be plenty of "home grown" extensions out there that do use a different CRTL and dont see any problems - mainly cos they arent throwing file handles around! Finally getting to the point of all this: We now also have the PyMem_* functions. This problem also doesnt exist if extension modules use these functions instead of malloc()/free(). We only ask them to change the PyObject allocations and deallocations, not the rest of their code, so it is no real burden. IMO, we should adopt these functions for most internal object allocations and the extension samples/docs. Also, we should consider adding relevant PyFile_fopen(), PyFile_fclose() type functions, that simply are a thin layer over the fopen/fclose functions. If extensions writers used these instead of fopen/fclose we would gain a few fairly intangible things - lose the minor FAQ, platforms that dont have fopen at all (eg, CE) would love you, etc. Mark. From mhammond@skippinet.com.au Tue Mar 28 01:04:11 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Tue, 28 Mar 2000 11:04:11 +1000 Subject: [Python-Dev] Windows and PyObject_NEW In-Reply-To: Message-ID: [I wrote] > Also, we should consider adding relevant PyFile_fopen(), PyFile_fclose() Maybe I had something like PyFile_FromString in mind!! That-damn-time-machine-again-ly, Mark. From Moshe Zadka Tue Mar 28 05:36:59 2000 From: Moshe Zadka (Moshe Zadka) Date: Tue, 28 Mar 2000 07:36:59 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: <14559.48104.34263.680278@beluga.mojam.com> Message-ID: On Mon, 27 Mar 2000, Skip Montanaro wrote: > Responding to an early item in this thread and trying to adapt to later > items... > > Ping wrote: > > I'm not convinced "mime" needs a separate branch here. (This is the > deepest part of the tree, and at three levels small alarm bells went off > in my head.) > > It's not clear that mime should be beneath text/mail. Moshe moved it up a > level, Actually, Ping moved it up a level. I only decided to agree with him retroactively... > I think the mime stuff still > belongs in a separate mime package. I wouldn't just sprinkle the modules > under text. I see two possibilities: > > text>mime > net>mime > > I prefer net>mime, I don't. MIME is not a "wire protocol" like all the other things in net -- it's used inside another wire protocol, like RFC822 or HTTP. If at all, I'd go for having a net/ mail/ mime/ Package, but Ping would yell at me again for nesting 3 levels. I could live with text/mime, because the mime format basically *is* text. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From Moshe Zadka Tue Mar 28 05:47:13 2000 From: Moshe Zadka (Moshe Zadka) Date: Tue, 28 Mar 2000 07:47:13 +0200 (IST) Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? In-Reply-To: <200003272043.PAA18445@eric.cnri.reston.va.us> Message-ID: On Mon, 27 Mar 2000, Guido van Rossum wrote: > The _tkinter.c source code is littered with #ifdefs that mostly center > around distinguishing between Tcl/Tk 8.0 and older versions. The > two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2. > > Would it be reasonable to assume that everybody is using at least > Tcl/Tk version 8.0? This would simplify the code somewhat. I want to ask a different question: when is Python going to officially support Tcl/Tk v8.2/8.3? I'd really like for this to happen, as I hate having several libraries of Tcl/Tk on my machine. (I assume you know the joke about Jews always answering a question with a question ) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From jack@oratrix.nl Tue Mar 28 08:55:56 2000 From: jack@oratrix.nl (Jack Jansen) Date: Tue, 28 Mar 2000 10:55:56 +0200 Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message by Ka-Ping Yee , Sat, 25 Mar 2000 23:37:11 -0800 (PST) , Message-ID: <20000328085556.CFEAC370CF2@snelboot.oratrix.nl> > Okay, here's another shot at it. Notice a few things: > ... > bin > ... > image ... > sound > ... These I don't like, I think image and sound should be either at toplevel, or otherwise in a separate package (mm?). I know images and sounds are customarily stored in binary files, but so are databases and other things. Hmm, the bin group in general seems to be a bit of a catch-all. gzip, zlib and chunk definitely belong together, but struct is a wholly different beast. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From jack@oratrix.nl Tue Mar 28 09:01:51 2000 From: jack@oratrix.nl (Jack Jansen) Date: Tue, 28 Mar 2000 11:01:51 +0200 Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message by Moshe Zadka , Sat, 25 Mar 2000 20:30:26 +0200 (IST) , Message-ID: <20000328090151.86B59370CF2@snelboot.oratrix.nl> > On Sat, 25 Mar 2000, David Ascher wrote: > > > This made me think of one issue which is worth considering -- is there a > > mechanism for third-party packages to hook into the standard naming > > hierarchy? It'd be weird not to have the oracle and sybase modules within > > the db toplevel package, for example. > > My position is that any 3rd party module decides for itself where it wants > to live -- once we formalized the framework. Consider PyGTK/PyGnome, > PyQT/PyKDE -- they should live in the UI package too... For separate modules, yes. For packages this is different. As a point in case think of MacPython: it could stuff all mac-specific packages under the toplevel "mac", but it would probably be nicer if it could extend the existing namespace. It is a bit silly if mac users have to do "from mac.text.encoding import macbinary" but "from text.encoding import binhex", just because BinHex support happens to live in the core (purely for historical reasons). But maybe this holds only for the platform distributions, then it shouldn't be as much of a problem as there aren't that many. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From Moshe Zadka Tue Mar 28 09:24:14 2000 From: Moshe Zadka (Moshe Zadka) Date: Tue, 28 Mar 2000 11:24:14 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: <20000328085556.CFEAC370CF2@snelboot.oratrix.nl> Message-ID: On Tue, 28 Mar 2000, Jack Jansen wrote: > These I don't like, I think image and sound should be either at toplevel, or > otherwise in a separate package (mm?). I know images and sounds are > customarily stored in binary files, but so are databases and other things. Hmmm...I think of "bin" as "interface to binary files". Agreed that I don't have a good reason for seperating gdbm from zlib. > Hmm, the bin group in general seems to be a bit of a catch-all. gzip, zlib and > chunk definitely belong together, but struct is a wholly different beast. I think Ping and I decided to move struct to toplevel. Ping, would you like to take your last proposal and fold into it the consensual changes,, or should I? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From Fredrik Lundh" Message-ID: <02c101bf989a$2ee35860$34aab5d4@hagrid> Guido van Rossum wrote: > Similar to append(), I'd like to close this gap, and I've made the > necessary changes. This will probably break lots of code. >=20 > Similar to append(), I'd like people to fix their code rather than > whine -- two-arg connect() has never been documented, although it's > found in much code (even the socket module test code :-( ). >=20 > Similar to append(), I may revert the change if it is shown to cause > too much pain during beta testing... proposal: if anyone changes the API for a fundamental module, and fails to update the standard library, the change is automatically "minus one'd" for each major module that no longer works :-) (in this case, that would be -5 or so...) From Fredrik Lundh" Message-ID: <02c901bf989b$be203d80$34aab5d4@hagrid> Peter Funk wrote: > Why should modules be moved into packages? I don't get it. fwiw, neither do I... I'm not so sure that Python really needs a simple reorganization of the existing set of standard library modules. just moving the modules around won't solve the real problems with the 1.5.2 std library... > IMO this subdivision could be discussed and possibly revised. =20 here's one proposal: http://www.pythonware.com/people/fredrik/librarybook-contents.htm From gstein@lyra.org Tue Mar 28 10:09:44 2000 From: gstein@lyra.org (Greg Stein) Date: Tue, 28 Mar 2000 02:09:44 -0800 (PST) Subject: [Python-Dev] 3rd parties in the hierarchy (was: module reorg) In-Reply-To: <20000328090151.86B59370CF2@snelboot.oratrix.nl> Message-ID: On Tue, 28 Mar 2000, Jack Jansen wrote: > > On Sat, 25 Mar 2000, David Ascher wrote: > > > This made me think of one issue which is worth considering -- is there a > > > mechanism for third-party packages to hook into the standard naming > > > hierarchy? It'd be weird not to have the oracle and sybase modules within > > > the db toplevel package, for example. > > > > My position is that any 3rd party module decides for itself where it wants > > to live -- once we formalized the framework. Consider PyGTK/PyGnome, > > PyQT/PyKDE -- they should live in the UI package too... > > For separate modules, yes. For packages this is different. As a point in case > think of MacPython: it could stuff all mac-specific packages under the > toplevel "mac", but it would probably be nicer if it could extend the existing > namespace. It is a bit silly if mac users have to do "from mac.text.encoding > import macbinary" but "from text.encoding import binhex", just because BinHex > support happens to live in the core (purely for historical reasons). > > But maybe this holds only for the platform distributions, then it shouldn't be > as much of a problem as there aren't that many. Assuming that you use an archive like those found in my "small" distro or Gordon's distro, then this is no problem. The archive simply recognizes and maps "text.encoding.macbinary" to its own module. Another way to say it: stop thinking in terms of the filesystem as the sole mechanism for determining placement in the package hierarchy. Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido@python.org Tue Mar 28 13:38:12 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 08:38:12 -0500 Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? In-Reply-To: Your message of "Tue, 28 Mar 2000 07:47:13 +0200." References: Message-ID: <200003281338.IAA29532@eric.cnri.reston.va.us> > I want to ask a different question: when is Python going to officially > support Tcl/Tk v8.2/8.3? I'd really like for this to happen, as I hate > having several libraries of Tcl/Tk on my machine. This is already in the CVS tree, except for the Windows installer. Python 1.6 will not install a separate complete Tcl installation; instead, it will install the needed Tcl/Tk files (Tcl/Tk 8.3 or newer) in the Python tree, so it won't affect existing Tcl/Tk installations. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Mar 28 13:57:02 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 08:57:02 -0500 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: Your message of "Tue, 28 Mar 2000 11:44:14 +0200." <02c101bf989a$2ee35860$34aab5d4@hagrid> References: <200003242103.QAA03288@eric.cnri.reston.va.us> <02c101bf989a$2ee35860$34aab5d4@hagrid> Message-ID: <200003281357.IAA29621@eric.cnri.reston.va.us> > proposal: if anyone changes the API for a fundamental module, and > fails to update the standard library, the change is automatically "minus > one'd" for each major module that no longer works :-) > > (in this case, that would be -5 or so...) Oops. Sigh. While we're pretending that this change goes in, could you point me to those five modules? Also, we need to add test cases to the standard test suite that would have found these! --Guido van Rossum (home page: http://www.python.org/~guido/) From gward@cnri.reston.va.us Tue Mar 28 15:04:47 2000 From: gward@cnri.reston.va.us (Greg Ward) Date: Tue, 28 Mar 2000 10:04:47 -0500 Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: ; from ping@lfw.org on Sat, Mar 25, 2000 at 11:37:11PM -0800 References: Message-ID: <20000328100446.A2586@cnri.reston.va.us> On 25 March 2000, Ka-Ping Yee said: > Okay, here's another shot at it. Notice a few things: Damn, I started writing a response to Moshe's original proposal -- and *then* saw this massive thread. Oh well. Turns out I still have a few useful things to say: First, any organization scheme for the standard library (or anything else, for that matter) should have a few simple guidelines. Here are two: * "deep hierarchies considered harmful": ie. avoid sub-packages if at all possible * "everything should have a purpose": every top-level package should be describable with a single, clear sentence of plain language. Eg.: net - Internet protocols, data formats, and client/server infrastructure unix - Unix-specific system calls, protocols, and conventions And two somewhat open issues: * "as long as we're renaming...": maybe this would be a good time to standardize naming conventions, eg. "cgi" -> "cgilib" *or* "{http,ftp,url,...}lib" -> "{http,ftp,url}...", "MimeWriter" -> "mimewriter", etc. * "shared namespaces vs system namespaces": the Perl model of "nothing belongs to The System; anyone can add a module in Text:: or Net:: or whatever" works there because Perl doesn't have __init__ files or anything to distinguish module namespaces; they just are. Python's import mechanism would have to change to support this, and the fact that __init__ files may contain arbitrary code makes this feel like a very tricky change to make. Now specific comments... > net > urlparse > urllib > ftplib > gopherlib > imaplib > poplib > nntplib > smtplib > telnetlib > httplib > cgi Rename? Either cgi -> cgilib or foolib -> foo? > server > BaseHTTPServer > CGIHTTPServer > SimpleHTTPServer > SocketServer > asynchat > asyncore This is one good place for a sub-package. It's a also a good place to rename: the convention for Python module names seems to be all-lowercase; and "Server" is redundant when you're in the net.server package. How about: net.server.base_http net.server.cgi_http net.server.simple_http net.server.socket Underscores negotiable. They don't seem to be popular in module names, although sometimes they would be real life-savers. > text I think "text" should mean "plain old unstructured, un-marked-up ASCII text", where "unstructured, un-marked-up" really means "not structured or marked up in a well-known standard way". Or maybe not. I'm just trying to come up with an excuse for moving xml to top-level, which I think is where it belongs. Maybe the excuse should just be, "XML is really important and visible, and anyways Paul Prescod will raise a stink if it isn't put at top-level in Python package-space". > re # general-purpose parsing Top-level: this is a fundamental module that should be treated on a par with 'string'. (Well, except for building RE methods into strings... hmmMMmm...maybe... [no, I'm kidding!]) > sgmllib > htmllib > htmlentitydefs Not sure what to do about these. Someone referred somewhere to a "web" top-level package, which seems to have disappeared. If it reappars, it would be a good place for the HTML modules (not to mention a big chunk of "net") -- this would mainly be for "important and visible" (ie. PR) reasons, rather than sound technical reasons. > xml > whatever the xml-sig puts here Should be top-level. > mail > rfc822 > mailbox > mhlib "mail" should either be top-level or under "net". (Yes, I *know* it's not a wire-level protocol: that's what net.smtplib is for. But last time I checked, email is pretty useless without a network. And vice-versa.) Or maybe these all belong in a top-level "data" package: I'm starting to warm to that. > bin > gzip > zlib > chunk > struct > image > imghdr > colorsys # a bit unsure, but doesn't go anywhere else > imageop > imgfile > rgbimg > yuvconvert > sound > aifc > sndhdr > toaiff > audiodev > sunau > sunaudio > wave > audioop > sunaudiodev I agree with Jack: image and sound (audio?) should be top-level. I don't think I like the idea of an intervening "mm" or "multimedia" or "media" or what-have-you package, though. The other stuff in "bin" is kind of a grab-bag: "chunk" and "struct" might belong in the mythical "data" package. > db > anydbm > whichdb > bsddb > dbm > dbhash > dumbdbm > gdbm Yup. > math > math # library functions > cmath > fpectl # type-related > fpetest > array > mpz > fpformat # formatting > locale > bisect # algorithm: also unsure, but doesn't go anywhere else > random # randomness > whrandom > crypt # cryptography > md5 > rotor > sha Hmmm. "locale" has already been dealt with; obviously it should be top-evel. I think "array" should be top-level or under the mythical "data". Six crypto-related modules seems like enough to justify a top-level "crypt" package, though. > time > calendar > time > tzparse > sched > timing Yup. > interp > new > linecache # handling .py files [...] > tabnanny > pstats > rlcompleter # this might go in "ui"... I like "python" for this one. (But I'm not sure if tabnanny and rlcompleter belong there.) > security > Bastion > rexec > ihooks What does ihooks have to do with security? > file > dircache > path -- a virtual module which would do a from path import * > nturl2path > macurl2path > filecmp > fileinput > StringIO Lowercase for consistency? > glob > fnmatch > stat > statcache > statvfs > tempfile > shutil > pipes > popen2 > commands > dl No problem until these last two -- 'commands' is a Unix-specific thing that has very little to do with the filesystem per se, and 'dl' is (as I understand it) deep ju-ju with sharp edges that should probably be hidden away in the 'python' ('sys'?) package. Oh yeah, "dl" should be elsewhere -- "python" maybe? Top-level? Perhaps we need a "deepmagic" package for "dl" and "new"? ;-) > data > pickle > shelve > xdrlib > copy > copy_reg > UserDict > UserList > pprint > repr > (cPickle) Oh hey, it's *not* a mythical package! Guess I didn't read far enough ahead. I like it, but would add more stuff to it (obviously): 'struct', 'chunk', 'array' for starters. Should cPickle be renamed to fastpickle? > threads > thread > threading > Queue Lowercase? > ui > _tkinter > curses > Tkinter > cmd > getpass > getopt > readline > users > pwd > grp > nis These belong in "unix". Possibly "nis" belongs in "net" -- do any non-Unix OSes use NIS? > sgi > al > cd > cl > fl > fm > gl > misc (what used to be sgimodule.c) > sv Should this be "sgi" or "irix"? Ditto for "sun" vs "solaris" if there are a significant number of Sun/Solaris modules. Note that the respective trademark holders might get very antsy about who gets to put names in those namespaces -- that's exactly what happened with Sun, Solaris 8, and Perl. I believe the compromise they arrived at was that the "Solaris::" namespace remains open, but Sun gets the "Sun::" namespace. There should probably be a win32 package, for core registry access stuff if nothing else. There might someday be a "linux" package; it's highly unlikely there would be a "pc" or "alpha" package though. All of those argue over "irix" and "solaris" instead of "sgi" and "sun". Greg From gvwilson@nevex.com Tue Mar 28 15:45:10 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Tue, 28 Mar 2000 10:45:10 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: Message-ID: > > Greg Wilson > > If None becomes a keyword, I would like to ask whether it could be > > used to signal that a method is a class method, as opposed to an > > instance method: > I'd like to know what you mean by "class" method. (I do know C++ and > Java, so I have some idea...). Specifically, my question is: how does > a class method access class variables? They can't be totally > unqualified (because that's very unpythonic). If they are qualified by > the class's name, I see it as a very mild improvement on the current > situation. You could suggest, for example, to qualify class variables > by "class" (so you'd do things like: > > class.x = 1 > > ), but I'm not sure I like it. On the whole, I think it is a much > bigger issue on how be denote class methods. I don't like overloading the word 'class' this way, as it makes it difficult to distinguish a parent's 'foo' member and a child's 'foo' member: class Parent: foo = 3 ...other stuff... class Child(Parent): foo = 9 def test(): print class.foo # obviously 9, but how to get 3? I think that using the class's name instead of 'self' will be easy to explain, will look like it belongs in the language, will be unlikely to lead to errors, and will handle multiple inheritance with ease: class Child(Parent): foo = 9 def test(): print Child.foo # 9 print Parent.foo # 3 > Also, one slight problem with your method of denoting class methods: > currently, it is possible to add instance method at run time to a > class by something like > > class C: > pass > > def foo(self): > pass > > C.foo = foo > > In your suggestion, how do you view the possiblity of adding class > methods to a class? (Note that "foo", above, is also perfectly usable > as a plain function). Hm, I hadn't thought of this... :-( > > I'd also like to ask (separately) that assignment to None be defined as a > > no-op, so that programmers can write: > > > > year, month, None, None, None, None, weekday, None, None = gmtime(time()) > > > > instead of having to create throw-away variables to fill in slots in > > tuples that they don't care about. > > Currently, I use "_" for that purpose, after I heard the idea from > Fredrik Lundh. I do the same thing when I need to; I just thought that making assignment to "None" special would formalize this in a readable way. From jeremy@cnri.reston.va.us Tue Mar 28 17:31:48 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Tue, 28 Mar 2000 12:31:48 -0500 (EST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: References: <14559.38662.835289.499610@goon.cnri.reston.va.us> Message-ID: <14560.60548.74378.613188@goon.cnri.reston.va.us> >>>>> "KLM" == Ken Manheimer writes: >> The only problematic use of from ... import ... is >> from text.re import * >> which adds an unspecified set of names to the current >> namespace. KLM> The other gotcha i mean applies when the thing you're importing KLM> is a terminal, ie a non-module. Then, changes to the KLM> assignments of the names in the original module aren't KLM> reflected in the names you've imported - they're decoupled from KLM> the namespace of the original module. This isn't an import issue. Some people simply don't understand that assignment (and import as form of assignment) is name binding. Import binds an imported object to a name in the current namespace. It does not affect bindings in other namespaces, nor should it. KLM> I thought the other problem peter was objecting to, having to KLM> change the import sections in the first place, was going to be KLM> avoided in the 1.x series (if we do this kind of thing) by KLM> inherently extending the import path to include all the KLM> packages, so people need not change their code? Seems like KLM> most of this would be fairly transparent w.r.t. the operation KLM> of existing applications. I'm not sure if there is consensus on backwards compatibility. I'm not in favor of creating a huge sys.path that includes every package's contents. It would be a big performance hit. Jeremy From Moshe Zadka Tue Mar 28 17:36:47 2000 From: Moshe Zadka (Moshe Zadka) Date: Tue, 28 Mar 2000 19:36:47 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: <20000328100446.A2586@cnri.reston.va.us> Message-ID: On Tue, 28 Mar 2000, Greg Ward wrote: > * "deep hierarchies considered harmful": ie. avoid sub-packages if at > all possible > > * "everything should have a purpose": every top-level package should > be describable with a single, clear sentence of plain language. Good guidelines, but they aren't enough. And anyway, rules were meant to be broken <0.9 wink> > * "as long as we're renaming...": maybe this would be a good time to > standardize naming conventions, eg. "cgi" -> "cgilib" *or* > "{http,ftp,url,...}lib" -> "{http,ftp,url}...", "MimeWriter" -> > "mimewriter", etc. +1 > * "shared namespaces vs system namespaces": the Perl model of "nothing > belongs to The System; anyone can add a module in Text:: or Net:: or > whatever" works there because Perl doesn't have __init__ files or > anything to distinguish module namespaces; they just are. Python's > import mechanism would have to change to support this, and the fact > that __init__ files may contain arbitrary code makes this feel > like a very tricky change to make. Indeed. But I still feel that "few things should belong to the system" is quite a useful rule... (That's what I referred to when I said Perl's module system is more suited to CPAN (now there's a surprise)) > Rename? Either cgi -> cgilib or foolib -> foo? Yes. But I wanted the first proposal to be just about placing stuff, because that airs out more disagreements. > This is one good place for a sub-package. It's a also a good place to > rename: the convention for Python module names seems to be > all-lowercase; and "Server" is redundant when you're in the net.server > package. How about: > > net.server.base_http > net.server.cgi_http > net.server.simple_http > net.server.socket Hmmmmm......+0 > Underscores negotiable. They don't seem to be popular in module names, > although sometimes they would be real life-savers. Personally, I prefer underscores to CamelCase. > Or maybe not. I'm just trying to come up with an excuse for moving xml > to top-level, which I think is where it belongs. Maybe the excuse > should just be, "XML is really important and visible, and anyways Paul > Prescod will raise a stink if it isn't put at top-level in Python > package-space". I still think "xml" should be a brother to "html" and "sgml". Current political trans not withstanding. > Not sure what to do about these. Someone referred somewhere to a "web" > top-level package, which seems to have disappeared. If it reappars, it > would be a good place for the HTML modules (not to mention a big chunk > of "net") -- this would mainly be for "important and visible" (ie. PR) > reasons, rather than sound technical reasons. I think the "web" package should be reinstated. But you won't like it: I'd put xml in web. > "mail" should either be top-level or under "net". (Yes, I *know* it's > not a wire-level protocol: that's what net.smtplib is for. But last > time I checked, email is pretty useless without a network. And > vice-versa.) Ummmm.....I'd disagree, but I lack the strength and the moral conviction. Put it under net and we'll call it a deal > Or maybe these all belong in a top-level "data" package: I'm starting to > warm to that. Ummmm...I don't like the "data" package personally. It seems to disobey your second guideline. > I agree with Jack: image and sound (audio?) should be top-level. I > don't think I like the idea of an intervening "mm" or "multimedia" or > "media" or what-have-you package, though. Definitely multimedia. Okay, I'm bought. > Six crypto-related modules seems like enough to justify a top-level > "crypt" package, though. It seemed obvious to me that "crypt" should be under "math". But maybe that's just the mathematician in me speaking. > I like "python" for this one. (But I'm not sure if tabnanny and > rlcompleter belong there.) I agree, and I'm not sure about rlcompleter, but am sure about tabnanny. > What does ihooks have to do with security? Well, it was more or less written to support rexec. A weak argument, admittedly > No problem until these last two -- 'commands' is a Unix-specific thing > that has very little to do with the filesystem per se Hmmmmm...it is on the same level with popen. Why not move popen too? >, and 'dl' is (as I > understand it) deep ju-ju with sharp edges that should probably be > hidden away Ummmmmm.....not in the "python" package: it doesn't have anything to do with the interpreter. > Should this be "sgi" or "irix"? Ditto for "sun" vs "solaris" if there > are a significant number of Sun/Solaris modules. Note that the > respective trademark holders might get very antsy about who gets to put > names in those namespaces -- that's exactly what happened with Sun, > Solaris 8, and Perl. I believe the compromise they arrived at was that > the "Solaris::" namespace remains open, but Sun gets the "Sun::" > namespace. Ummmmm.....I don't see how they have any legal standing. I for one refuse to care about what Sun Microsystem thinks about names for Python packages. > There should probably be a win32 package, for core registry access stuff > if nothing else. And for all the other extensions in win32all Yep! (Just goes to show what happens when you decide to package based on a UNIX system) > All of those > argue over "irix" and "solaris" instead of "sgi" and "sun". Fine with me -- just wanted to move them out of my face -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From andy@reportlab.com Tue Mar 28 18:13:02 2000 From: andy@reportlab.com (Andy Robinson) Date: Tue, 28 Mar 2000 18:13:02 GMT Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: <20000327170031.693531CDF6@dinsdale.python.org> References: <20000327170031.693531CDF6@dinsdale.python.org> Message-ID: <38e0f4cf.24247656@post.demon.co.uk> On Mon, 27 Mar 2000 12:00:31 -0500 (EST), Peter Funk wrote: > Do we need a UserString class? This will probably be useful on top of the i18n stuff in due course, so I'd like it. Something Mike Da Silva and I have discussed a lot is implementing a higher-level 'typed string' library on top of the Unicode stuff. =20 A 'typed string' is like a string, but knows what encoding it is in - possibly Unicode, possibly a native encoding and embodies some basic type safety and convenience notions, like not being able to add a Shift-JIS and an EUC string together. Iteration would always be per character, not per byte; and a certain amount of magic would say that if the string was (say) Japanese, it would acquire a few extra methods for doing some Japan-specific things like expanding half-width katakana. Of course, we can do this anyway, but I think defining the API clearly in UserString is a great idea. - Andy Robinson From guido@python.org Tue Mar 28 19:22:43 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 14:22:43 -0500 Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: Your message of "Tue, 28 Mar 2000 18:13:02 GMT." <38e0f4cf.24247656@post.demon.co.uk> References: <20000327170031.693531CDF6@dinsdale.python.org> <38e0f4cf.24247656@post.demon.co.uk> Message-ID: <200003281922.OAA03113@eric.cnri.reston.va.us> > > Do we need a UserString class? > > This will probably be useful on top of the i18n stuff in due course, > so I'd like it. > > Something Mike Da Silva and I have discussed a lot is implementing a > higher-level 'typed string' library on top of the Unicode stuff. > A 'typed string' is like a string, but knows what encoding it is in - > possibly Unicode, possibly a native encoding and embodies some basic > type safety and convenience notions, like not being able to add a > Shift-JIS and an EUC string together. Iteration would always be per > character, not per byte; and a certain amount of magic would say that > if the string was (say) Japanese, it would acquire a few extra methods > for doing some Japan-specific things like expanding half-width > katakana. > > Of course, we can do this anyway, but I think defining the API clearly > in UserString is a great idea. Agreed. Please somebody send a patch! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Mar 28 19:25:39 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 14:25:39 -0500 Subject: [Python-Dev] First alpha release of Python 1.6 Message-ID: <200003281925.OAA03287@eric.cnri.reston.va.us> I'm hoping to release a first, rough alpha of Python 1.6 by April 1st (no joke!). Not everything needs to be finished by then, but I hope to have the current versions of distutil, expat, and sre in there. Anything else that needs to go into 1.6 and isn't ready yet? (Small stuff doesn't matter, everything currently in the patches queue can probably go in if it isn't rejected by then.) --Guido van Rossum (home page: http://www.python.org/~guido/) From DavidA@ActiveState.com Tue Mar 28 19:40:24 2000 From: DavidA@ActiveState.com (David Ascher) Date: Tue, 28 Mar 2000 11:40:24 -0800 Subject: [Python-Dev] First alpha release of Python 1.6 In-Reply-To: <200003281925.OAA03287@eric.cnri.reston.va.us> Message-ID: > Anything else that needs to go into 1.6 and isn't ready yet? No one seems to have found time to figure out the mmap module support. --david From guido@python.org Tue Mar 28 19:33:29 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 14:33:29 -0500 Subject: [Python-Dev] First alpha release of Python 1.6 In-Reply-To: Your message of "Tue, 28 Mar 2000 11:40:24 PST." References: Message-ID: <200003281933.OAA04896@eric.cnri.reston.va.us> > > Anything else that needs to go into 1.6 and isn't ready yet? > > No one seems to have found time to figure out the mmap module support. I wasn't even aware that that was a priority. If someone submits it, it will go in -- alpha 1 is not a total feature freeze, just a "testing the waters". --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer@tismer.com Tue Mar 28 19:49:17 2000 From: tismer@tismer.com (Christian Tismer) Date: Tue, 28 Mar 2000 21:49:17 +0200 Subject: [Python-Dev] First alpha release of Python 1.6 References: <200003281925.OAA03287@eric.cnri.reston.va.us> Message-ID: <38E10CBD.C6B71D50@tismer.com> Guido van Rossum wrote: ... > Anything else that needs to go into 1.6 and isn't ready yet? Stackless Python of course, but it *is* ready yet. Just kidding. I will provide a compressed unicode database in a few days. That will be a non-Python-specific module, and (Marc or I) will provide a Python specific wrapper. This will probably not get ready until April 1. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From akuchlin@mems-exchange.org Tue Mar 28 19:51:29 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Tue, 28 Mar 2000 14:51:29 -0500 (EST) Subject: [Python-Dev] mmapfile module In-Reply-To: References: <200003281925.OAA03287@eric.cnri.reston.va.us> Message-ID: <14561.3393.761177.776684@amarok.cnri.reston.va.us> David Ascher writes: >> Anything else that needs to go into 1.6 and isn't ready yet? >No one seems to have found time to figure out the mmap module support. The issue there is cross-platform compatibility; the Windows and Unix versions take completely different constructor arguments, so how should we paper over the differences? Unix arguments: (file descriptor, size, flags, protection) Win32 arguments:(filename, tagname, size) We could just say, "OK, the args are completely different between Win32 and Unix, despite it being the same function name". Maybe that's best, because there seems no way to reconcile those two different sets of arguments. -- A.M. Kuchling http://starship.python.net/crew/amk/ I'm here for the FBI, not the _Weekly World News_. -- Scully in X-FILES #1 From DavidA@ActiveState.com Tue Mar 28 20:06:09 2000 From: DavidA@ActiveState.com (David Ascher) Date: Tue, 28 Mar 2000 12:06:09 -0800 Subject: [Python-Dev] mmapfile module In-Reply-To: <14561.3393.761177.776684@amarok.cnri.reston.va.us> Message-ID: > The issue there is cross-platform compatibility; the Windows and Unix > versions take completely different constructor arguments, so how > should we paper over the differences? > > Unix arguments: (file descriptor, size, flags, protection) > Win32 arguments:(filename, tagname, size) > > We could just say, "OK, the args are completely different between > Win32 and Unix, despite it being the same function name". Maybe > that's best, because there seems no way to reconcile those two > different sets of arguments. I guess my approach would be to provide two platform-specific modules, and to figure out a high-level Python module which could provide a reasonable platform-independent interface on top of it. One problem with that approach is that I think that there is also great value in having a portable mmap interface in the C layer, where i see lots of possible uses in extension modules (much like the threads API). --david From guido@python.org Tue Mar 28 20:00:57 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 15:00:57 -0500 Subject: [Python-Dev] mmapfile module In-Reply-To: Your message of "Tue, 28 Mar 2000 12:06:09 PST." References: Message-ID: <200003282000.PAA11988@eric.cnri.reston.va.us> > > The issue there is cross-platform compatibility; the Windows and Unix > > versions take completely different constructor arguments, so how > > should we paper over the differences? > > > > Unix arguments: (file descriptor, size, flags, protection) > > Win32 arguments:(filename, tagname, size) > > > > We could just say, "OK, the args are completely different between > > Win32 and Unix, despite it being the same function name". Maybe > > that's best, because there seems no way to reconcile those two > > different sets of arguments. > > I guess my approach would be to provide two platform-specific modules, and > to figure out a high-level Python module which could provide a reasonable > platform-independent interface on top of it. One problem with that approach > is that I think that there is also great value in having a portable mmap > interface in the C layer, where i see lots of possible uses in extension > modules (much like the threads API). I don't know enough about this, but it seems that there might be two steps: *creating* a mmap object is necessarily platform-specific; but *using* a mmap object could be platform-neutral. What is the API for mmap objects? --Guido van Rossum (home page: http://www.python.org/~guido/) From klm@digicool.com Tue Mar 28 20:07:25 2000 From: klm@digicool.com (Ken Manheimer) Date: Tue, 28 Mar 2000 15:07:25 -0500 (EST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <14560.60548.74378.613188@goon.cnri.reston.va.us> Message-ID: On Tue, 28 Mar 2000, Jeremy Hylton wrote: > >>>>> "KLM" == Ken Manheimer writes: > > >> The only problematic use of from ... import ... is > >> from text.re import * > >> which adds an unspecified set of names to the current > >> namespace. > > KLM> The other gotcha i mean applies when the thing you're importing > KLM> is a terminal, ie a non-module. Then, changes to the > KLM> assignments of the names in the original module aren't > KLM> reflected in the names you've imported - they're decoupled from > KLM> the namespace of the original module. > > This isn't an import issue. Some people simply don't understand > that assignment (and import as form of assignment) is name binding. > Import binds an imported object to a name in the current namespace. > It does not affect bindings in other namespaces, nor should it. I know that - i was addressing the asserted evilness of from ... import ... and how it applied - and didn't - w.r.t. packages. > KLM> I thought the other problem peter was objecting to, having to > KLM> change the import sections in the first place, was going to be > KLM> avoided in the 1.x series (if we do this kind of thing) by > KLM> inherently extending the import path to include all the > KLM> packages, so people need not change their code? Seems like > KLM> most of this would be fairly transparent w.r.t. the operation > KLM> of existing applications. > > I'm not sure if there is consensus on backwards compatibility. I'm > not in favor of creating a huge sys.path that includes every package's > contents. It would be a big performance hit. Yes, someone reminded me that the other (better, i think) option is stub modules in the current places that do the "from ... import *" for the right values of "...". py3k finishes the migration by eliminating the stubs. Ken klm@digicool.com From gward@cnri.reston.va.us Tue Mar 28 20:29:55 2000 From: gward@cnri.reston.va.us (Greg Ward) Date: Tue, 28 Mar 2000 15:29:55 -0500 Subject: [Python-Dev] First alpha release of Python 1.6 In-Reply-To: <200003281925.OAA03287@eric.cnri.reston.va.us>; from guido@python.org on Tue, Mar 28, 2000 at 02:25:39PM -0500 References: <200003281925.OAA03287@eric.cnri.reston.va.us> Message-ID: <20000328152955.A3136@cnri.reston.va.us> On 28 March 2000, Guido van Rossum said: > I'm hoping to release a first, rough alpha of Python 1.6 by April 1st > (no joke!). > > Not everything needs to be finished by then, but I hope to have the > current versions of distutil, expat, and sre in there. We just need to do a bit of CVS trickery to put Distutils under the Python tree. I'd *like* for Distutils to have its own CVS existence at least until 1.6 is released, but it's not essential. Two of the big Distutils to-do items that I enumerated at IPC8 have been knocked off: the "dist" command has been completely redone (and renamed "sdist", for "source distribution"), as has the "install" command. The really major to-do items left for Distutils are: * implement the "bdist" command with enough marbles to generate RPMs and some sort of Windows installer (Wise?); Solaris packages, Debian packages, and something for the Mac would be nice too. * documentation (started, but only just) And there are some almost-as-important items: * Mac OS support; this has been started, at least for the unfashionable and clunky sounding MPW compiler; CodeWarrior support (via AppleEvents, I think) would be nice * test suite -- at least the fundamental Distutils marbles should get a good exercise; it would also be nice to put together a bunch of toy module distributions and make sure that "build" and "install" on them do the right things... all automatically, of course! * reduce number of tracebacks: right now, certain errors in the setup script or on the command line can result in a traceback, when they should just result in SystemExit with "error in setup script: ..." or "error on command line: ..." * fold in Finn Bock's JPython compat. patch * fold in Michael Muller's "pkginfo" patch * finish and fold in my Python 1.5.1 compat. patch (only necessary as long as Distutils has a life of its own, outside Python) Well, I'd better get cracking ... Guido, we can do the CVS thing any time; I guess I'll mosey on downstairs. Greg -- Greg Ward - software developer gward@cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913 From Fredrik Lundh" <14561.3393.761177.776684@amarok.cnri.reston.va.us> Message-ID: <003501bf98ee$50097a20$34aab5d4@hagrid> Andrew M. Kuchling wrote: > The issue there is cross-platform compatibility; the Windows and Unix > versions take completely different constructor arguments, so how > should we paper over the differences? >=20 > Unix arguments: (file descriptor, size, flags, protection) > Win32 arguments:(filename, tagname, size) >=20 > We could just say, "OK, the args are completely different between > Win32 and Unix, despite it being the same function name". Maybe > that's best, because there seems no way to reconcile those two > different sets of arguments. I don't get this. Why expose low-level implementation details to the user (flags, protection, tagname)? (And how come the Windows implementation doesn't support read-only vs. read/write flags?) Unless the current implementation uses something radically different from mmap/MapViewOfFile, wouldn't an interface like: (filename, mode=3D"rb", size=3Dentire file, offset=3D0) be sufficient? (where mode can be "wb" or "wb+" or "rb+", optionally without the "b") From Donald Beaudry Tue Mar 28 20:46:06 2000 From: Donald Beaudry (Donald Beaudry) Date: Tue, 28 Mar 2000 15:46:06 -0500 Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <200003282046.PAA18822@zippy.init.com> ...sorry to jump in on the middle of this one, but. A while back I put a lot of thought into how to support class methods and class attributes. I feel that I solved the problem in a fairly complete way though the solution does have some warts. Here's an example: >>> class foo(base): ... value = 10 # this is an instance attribute called 'value' ... # as usual, it is shared between all instances ... # until explicitly set on a particular instance ... ... def set_value(self, x): ... print "instance method" ... self.value = x ... ... # ... # here come the weird part ... # ... class __class__: ... value = 5 # this is a class attribute called value ... ... def set_value(cl, x): ... print "class method" ... cl.value = x ... ... def set_instance_default_value(cl, x): ... cl._.value = x ... >>> f = foo() >>> f.value 10 >>> foo.value = 20 >>> f.value 10 >>> f.__class__.value 20 >>> foo._.value 10 >>> foo._.value = 1 >>> f.value 1 >>> foo.set_value(100) class method >>> foo.value 100 >>> f.value 1 >>> f.set_value(40) instance method >>> f.value 40 >>> foo._.value 1 >>> ff=foo() >>> foo.set_instance_default_value(15) >>> ff.value 15 >>> foo._.set_value(ff, 5) instance method >>> ff.value 5 >>> Is anyone still with me? The crux of the problem is that in the current python class/instance implementation, classes dont have attributes of their own. All of those things that look like class attributes are really there as defaults for the instances. To support true class attributes a new name space must be invented. Since I wanted class objects to look like any other object, I chose to move the "instance defaults" name space under the underscore attribute. This allows the class's unqualified namespace to refer to its own attributes. Clear as mud, right? In case you are wondering, yes, the code above is a working example. I released it a while back as the 'objectmodule' and just updated it to work with Python-1.5.2. The update has yet to be released. -- Donald Beaudry Ab Initio Software Corp. 201 Spring Street donb@init.com Lexington, MA 02421 ...Will hack for sushi... From akuchlin@mems-exchange.org Tue Mar 28 20:50:18 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Tue, 28 Mar 2000 15:50:18 -0500 (EST) Subject: [Python-Dev] mmapfile module In-Reply-To: <003501bf98ee$50097a20$34aab5d4@hagrid> References: <200003281925.OAA03287@eric.cnri.reston.va.us> <14561.3393.761177.776684@amarok.cnri.reston.va.us> <003501bf98ee$50097a20$34aab5d4@hagrid> Message-ID: <14561.6922.415063.279939@amarok.cnri.reston.va.us> Fredrik Lundh writes: >(And how come the Windows implementation doesn't support >read-only vs. read/write flags?) Good point; that should be fixed. > (filename, mode="rb", size=entire file, offset=0) >be sufficient? (where mode can be "wb" or "wb+" or "rb+", >optionally without the "b") Hmm... maybe we can dispose of the PROT_* argument that way on Unix. But how would you specify MAP_SHARED vs. MAP_PRIVATE, or MAP_ANONYMOUS? (MAP_FIXED seems useless to a Python programmer.) Another character in the mode argument, or a flags argument? Worse, as you pointed out in the same thread, MAP_ANONYMOUS on OSF/1 doesn't want to take a file descriptor at all. Also, the tag name on Windows seems important, from Gordon McMillan's explanation of it: http://www.python.org/pipermail/python-dev/1999-November/002808.html -- A.M. Kuchling http://starship.python.net/crew/amk/ You mustn't kill me. You don't love me. You d-don't even know me. -- The Furies kill Abel, in SANDMAN #66: "The Kindly Ones:10" From guido@python.org Tue Mar 28 21:02:04 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 16:02:04 -0500 Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: Your message of "Tue, 28 Mar 2000 15:46:06 EST." <200003282046.PAA18822@zippy.init.com> References: <200003282046.PAA18822@zippy.init.com> Message-ID: <200003282102.QAA13041@eric.cnri.reston.va.us> > A while back I put a lot of thought into how to support class methods > and class attributes. I feel that I solved the problem in a fairly > complete way though the solution does have some warts. Here's an > example: [...] > Is anyone still with me? > > The crux of the problem is that in the current python class/instance > implementation, classes dont have attributes of their own. All of > those things that look like class attributes are really there as > defaults for the instances. To support true class attributes a new > name space must be invented. Since I wanted class objects to look > like any other object, I chose to move the "instance defaults" name > space under the underscore attribute. This allows the class's > unqualified namespace to refer to its own attributes. Clear as mud, > right? > > In case you are wondering, yes, the code above is a working example. > I released it a while back as the 'objectmodule' and just updated it > to work with Python-1.5.2. The update has yet to be released. This looks like it would break a lot of code. How do you refer to a superclass method? It seems that ClassName.methodName would refer to the class method, not to the unbound instance method. Also, moving the default instance attributes to a different namespace seems to be a semantic change that could change lots of things. I am still in favor of saying "Python has no class methods -- use module-global functions for that". Between the module, the class and the instance, there are enough namespaces -- we don't need another one. --Guido van Rossum (home page: http://www.python.org/~guido/) From pf@artcom-gmbh.de Tue Mar 28 21:01:29 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Tue, 28 Mar 2000 23:01:29 +0200 (MEST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: <200003281922.OAA03113@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 28, 2000 2:22:43 pm" Message-ID: I wrote: > > > Do we need a UserString class? > > Andy Robinson: > > This will probably be useful on top of the i18n stuff in due course, > > so I'd like it. > > > > Something Mike Da Silva and I have discussed a lot is implementing a > > higher-level 'typed string' library on top of the Unicode stuff. > > A 'typed string' is like a string, but knows what encoding it is in - > > possibly Unicode, possibly a native encoding and embodies some basic > > type safety and convenience notions, like not being able to add a > > Shift-JIS and an EUC string together. Iteration would always be per > > character, not per byte; and a certain amount of magic would say that > > if the string was (say) Japanese, it would acquire a few extra methods > > for doing some Japan-specific things like expanding half-width > > katakana. > > > > Of course, we can do this anyway, but I think defining the API clearly > > in UserString is a great idea. > Guido van Rossum: > Agreed. Please somebody send a patch! I feel unable to do, what Andy proposed. What I had in mind was a simple wrapper class around the builtin string type similar to UserDict and UserList which can be used to derive other classes from. I use UserList and UserDict quite often and find them very useful. They are simple and powerful and easy to extend. May be the things Andy Robinson proposed above belong into a sub class which inherits from a simple UserString class? Do we need an additional UserUnicode class for unicode string objects? Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From guido@python.org Tue Mar 28 21:56:49 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 16:56:49 -0500 Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: Your message of "Tue, 28 Mar 2000 23:01:29 +0200." References: Message-ID: <200003282156.QAA13361@eric.cnri.reston.va.us> [Peter Funk] > > > > Do we need a UserString class? > > > > Andy Robinson: > > > This will probably be useful on top of the i18n stuff in due course, > > > so I'd like it. > > > > > > Something Mike Da Silva and I have discussed a lot is implementing a > > > higher-level 'typed string' library on top of the Unicode stuff. > > > A 'typed string' is like a string, but knows what encoding it is in - > > > possibly Unicode, possibly a native encoding and embodies some basic > > > type safety and convenience notions, like not being able to add a > > > Shift-JIS and an EUC string together. Iteration would always be per > > > character, not per byte; and a certain amount of magic would say that > > > if the string was (say) Japanese, it would acquire a few extra methods > > > for doing some Japan-specific things like expanding half-width > > > katakana. > > > > > > Of course, we can do this anyway, but I think defining the API clearly > > > in UserString is a great idea. > > > Guido van Rossum: > > Agreed. Please somebody send a patch! [PF] > I feel unable to do, what Andy proposed. What I had in mind was a > simple wrapper class around the builtin string type similar to > UserDict and UserList which can be used to derive other classes from. Yes. I think Andy wanted his class to be a subclass of UserString. > I use UserList and UserDict quite often and find them very useful. > They are simple and powerful and easy to extend. Agreed. > May be the things Andy Robinson proposed above belong into a sub class > which inherits from a simple UserString class? Do we need > an additional UserUnicode class for unicode string objects? It would be great if there was a single UserString class which would work with either Unicode or 8-bit strings. I think that shouldn't be too hard, since it's just a wrapper. So why don't you give the UserString.py a try and leave Andy's wish alone? --Guido van Rossum (home page: http://www.python.org/~guido/) From python-dev@python.org Tue Mar 28 21:47:59 2000 From: python-dev@python.org (Peter Funk) Date: Tue, 28 Mar 2000 23:47:59 +0200 (MEST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <02c901bf989b$be203d80$34aab5d4@hagrid> from Fredrik Lundh at "Mar 28, 2000 11:55:19 am" Message-ID: Hi! > Peter Funk wrote: > > Why should modules be moved into packages? I don't get it. > Fredrik Lundh: > fwiw, neither do I... Pheeewww... And I thought I'am the only one! ;-) > I'm not so sure that Python really needs a simple reorganization > of the existing set of standard library modules. just moving the > modules around won't solve the real problems with the 1.5.2 std > library... Right. I propose to leave the namespace flat. I like to argue with Brad J. Cox ---the author of the book "Object Oriented Programming - An Evolutionary Approach" Addison Wesley, 1987--- who proposes the idea of what he calls a "Software-IC": He looks closely to design process of electronic engineers which ussually deal with large data books with prefabricated components. There are often hundreds of them in such a databook and most of them have terse and not very mnemonic names. But the engineers using them all day *know* after a short while that a 7400 chip is a TTL-chip containing 4 NAND gates. Nearly the same holds true for software engineers using Software-IC like 're' or 'struct' as their daily building blocks. A software engineer who is already familar with his/her building blocks has absolutely no advantage from a deeply nested namespace. Now for something completely different: Fredrik Lundh about the library documentation: > here's one proposal: > http://www.pythonware.com/people/fredrik/librarybook-contents.htm Whether 'md5', 'getpass' and 'traceback' fit into a category 'Commonly Used Modules' is ....ummmm.... at least a bit questionable. But we should really focus the discussion on the structure of the documentation. Since many standard library modules belong into several logical catagories at once, a true tree structured organization is simply not sufficient to describe everything. So it is important to set up pointers between related functionality. For example 'string.replace' is somewhat related to 're.sub' or 'getpass' is related to 'crypt', however 'crypt' is related to 'md5' and so on. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From pf@artcom-gmbh.de Tue Mar 28 22:13:02 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Wed, 29 Mar 2000 00:13:02 +0200 (MEST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules _tkinter.c,1.91,1.92 In-Reply-To: <200003282007.PAA12045@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 28, 2000 3: 7: 9 pm" Message-ID: Hi! Guido van Rossum: > Modified Files: > _tkinter.c [...] > *** 491,501 **** > > v->interp = Tcl_CreateInterp(); > - > - #if TKMAJORMINOR == 8001 > - TclpInitLibraryPath(baseName); > - #endif /* TKMAJORMINOR */ > > ! #if defined(macintosh) && TKMAJORMINOR >= 8000 > ! /* This seems to be needed since Tk 8.0 */ > ClearMenuBar(); > TkMacInitMenus(v->interp); > --- 475,481 ---- > > v->interp = Tcl_CreateInterp(); > > ! #if defined(macintosh) > ! /* This seems to be needed */ > ClearMenuBar(); > TkMacInitMenus(v->interp); > *************** Are you sure that the call to 'TclpInitLibraryPath(baseName);' is not required in Tcl/Tk 8.1, 8.2, 8.3 ? I would propose the following: +#if TKMAJORMINOR >= 8001 + TclpInitLibraryPath(baseName); +# endif /* TKMAJORMINOR */ Here I quote from the Tcl8.3 source distribution: /* *--------------------------------------------------------------------------- * * TclpInitLibraryPath -- * * Initialize the library path at startup. We have a minor * metacircular problem that we don't know the encoding of the * operating system but we may need to talk to operating system * to find the library directories so that we know how to talk to * the operating system. * * We do not know the encoding of the operating system. * We do know that the encoding is some multibyte encoding. * In that multibyte encoding, the characters 0..127 are equivalent * to ascii. * * So although we don't know the encoding, it's safe: * to look for the last slash character in a path in the encoding. * to append an ascii string to a path. * to pass those strings back to the operating system. * * But any strings that we remembered before we knew the encoding of * the operating system must be translated to UTF-8 once we know the * encoding so that the rest of Tcl can use those strings. * * This call sets the library path to strings in the unknown native * encoding. TclpSetInitialEncodings() will translate the library * path from the native encoding to UTF-8 as soon as it determines * what the native encoding actually is. * * Called at process initialization time. * * Results: * None. */ Sorry, but I don't know enough about this in connection with the unicode patches and if we should pay attention to this. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From akuchlin@mems-exchange.org Tue Mar 28 22:21:07 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Tue, 28 Mar 2000 17:21:07 -0500 (EST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: References: <02c901bf989b$be203d80$34aab5d4@hagrid> Message-ID: <14561.12371.857178.550236@amarok.cnri.reston.va.us> Peter Funk quoted: >Fredrik Lundh: >> I'm not so sure that Python really needs a simple reorganization >> of the existing set of standard library modules. just moving the >> modules around won't solve the real problems with the 1.5.2 std >> library... >Right. I propose to leave the namespace flat. I third that comment. Arguments against reorganizing for 1.6: 1) I doubt that we have time to do a good job of it for 1.6. (1.7, maybe.) 2) Right now there's no way for third-party extensions to add themselves to a package in the standard library. Once Python finds foo/__init__.py, it won't look for site-packages/foo/__init__.py, so if you grab, say, "crypto" as a package name in the standard library, it's forever lost to third-party extensions. 3) Rearranging the modules is a good chance to break backward compatibility in other ways. If you want to rewrite, say, httplib in a non-compatible way to support HTTP/1.1, then the move from httplib.py to net.http.py is a great chance to do that, and leave httplib.py as-is for old programs. If you just copy httplib.py, rewriting net.http.py is now harder, since you have to either maintain compatibility or break things *again* in the next version of Python. 4) We wanted to get 1.6 out fairly quickly, and therefore limited the number of features that would get in. (Vide the "Python 1.6 timing" thread last ... November, was it?) Packagizing is feature creep that'll slow things down Maybe we should start a separate list to discuss a package hierarchy for 1.7. But for 1.6, forget it. -- A.M. Kuchling http://starship.python.net/crew/amk/ Posting "Please send e-mail, since I don't read this group": Poster is rendered illiterate by a simple trepanation. -- Kibo, in the Happynet Manifesto From guido@python.org Tue Mar 28 22:24:46 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 17:24:46 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules _tkinter.c,1.91,1.92 In-Reply-To: Your message of "Wed, 29 Mar 2000 00:13:02 +0200." References: Message-ID: <200003282224.RAA13573@eric.cnri.reston.va.us> > Are you sure that the call to 'TclpInitLibraryPath(baseName);' > is not required in Tcl/Tk 8.1, 8.2, 8.3 ? > I would propose the following: > > +#if TKMAJORMINOR >= 8001 > + TclpInitLibraryPath(baseName); > +# endif /* TKMAJORMINOR */ It is an internal routine which shouldn't be called at all by the user. I believe it is called internally at the right time. Note that we now call Tcl_FindExecutable(), which *is* intended to be called by the user (and exists in all 8.x versions) -- maybe this causes TclpInitLibraryPath() to be called. I tested it on Solaris, with Tcl/Tk versions 8.0.4, 8.1.1, 8.2.3 and 8.3.0, and it doesn't seem to make any difference, as long as that version of Tcl/Tk has actually been installed. (When it's not installed, TclpInitLibraryPath() doesn't help either.) I still have to check this on Windows -- maybe it'll have to go back in. [...] > Sorry, but I don't know enough about this in connection with the > unicode patches and if we should pay attention to this. It seems to be allright... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Mar 28 22:25:27 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 17:25:27 -0500 Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: Your message of "Tue, 28 Mar 2000 17:21:07 EST." <14561.12371.857178.550236@amarok.cnri.reston.va.us> References: <02c901bf989b$be203d80$34aab5d4@hagrid> <14561.12371.857178.550236@amarok.cnri.reston.va.us> Message-ID: <200003282225.RAA13586@eric.cnri.reston.va.us> > Maybe we should start a separate list to discuss a package hierarchy > for 1.7. But for 1.6, forget it. Yes! Please! --Guido van Rossum (home page: http://www.python.org/~guido/) From Donald Beaudry Tue Mar 28 22:56:03 2000 From: Donald Beaudry (Donald Beaudry) Date: Tue, 28 Mar 2000 17:56:03 -0500 Subject: [Python-Dev] None as a keyword / class methods References: <200003282046.PAA18822@zippy.init.com> <200003282102.QAA13041@eric.cnri.reston.va.us> Message-ID: <200003282256.RAA21080@zippy.init.com> Guido van Rossum wrote, > This looks like it would break a lot of code. Only if it were to replace the current implementation. Perhaps I inadvertly made that suggestion. It was not my intention. Another way to look at my post is to say that it was intended to point out why we cant have class methods in the current implementation... it's a name space issue. > How do you refer to a superclass method? It seems that > ClassName.methodName would refer to the class method, not to the > unbound instance method. Right. To get at the unbound instance methods you must go through the 'unbound accessor' which is accessed via the underscore. If you wanted to chain to a superclass method it would look like this: class child(parent): def do_it(self, x): z = parent._.do_it(self, x) return z > Also, moving the default instance attributes to a different > namespace seems to be a semantic change that could change lots of > things. I agree... and that's why I wouldnt suggest doing it to the current class/instance implementation. However, for those who insist on having class attributes and methods I think it would be cool to settle on a standard "syntax". > I am still in favor of saying "Python has no class methods -- use > module-global functions for that". Or use a class/instance implementation provided via an extension module rather than the built-in one. The class named 'base' shown in my example is a class designed for that purpose. > Between the module, the class and the instance, there are enough > namespaces -- we don't need another one. The topic comes up often enough to make me think some might disagree. -- Donald Beaudry Ab Initio Software Corp. 201 Spring Street donb@init.com Lexington, MA 02421 ...So much code, so little time... From Moshe Zadka Tue Mar 28 23:24:29 2000 From: Moshe Zadka (Moshe Zadka) Date: Wed, 29 Mar 2000 01:24:29 +0200 (IST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <14561.12371.857178.550236@amarok.cnri.reston.va.us> Message-ID: On Tue, 28 Mar 2000, Andrew M. Kuchling wrote: > Peter Funk quoted: > >Fredrik Lundh: > >> I'm not so sure that Python really needs a simple reorganization > >> of the existing set of standard library modules. just moving the > >> modules around won't solve the real problems with the 1.5.2 std > >> library... > >Right. I propose to leave the namespace flat. > > I third that comment. Arguments against reorganizing for 1.6: Let me just note that my original great renaming proposal was titled "1.7". I'm certain I don't want it to affect the 1.6 release -- my god, it's almost alpha time and we don't even know how to reorganize. Strictly 1.7. > 4) We wanted to get 1.6 out fairly quickly, and therefore limited > the number of features that would get in. (Vide the "Python 1.6 > timing" thread last ... November, was it?) Packagizing is feature > creep that'll slow things down Oh yes. I'm waiting for that 1.6....I wouldn't want to stall it for the world. But this is a good chance as any to discuss reasons, before strategies. Here's why I believe we should re-organize Python modules: -- modules fall quite naturally into subpackages. Reducing the number of toplevel modules will lessen the clutter -- it would be easier to synchronize documentation and code (think "automatically generated documentation") -- it would enable us to move toward a CPAN-like module repository, together with the dist-sig efforts. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From gmcm@hypernet.com Tue Mar 28 23:44:27 2000 From: gmcm@hypernet.com (Gordon McMillan) Date: Tue, 28 Mar 2000 18:44:27 -0500 Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <14561.12371.857178.550236@amarok.cnri.reston.va.us> References: Message-ID: <1257835425-27941123@hypernet.com> Andrew M. Kuchling wrote: [snip] > 2) Right now there's no way for third-party extensions to add > themselves to a package in the standard library. Once Python finds > foo/__init__.py, it won't look for site-packages/foo/__init__.py, so > if you grab, say, "crypto" as a package name in the standard library, > it's forever lost to third-party extensions. That way lies madness. While I'm happy to carp at Java for requiring "com", "net" or whatever as a top level name, their intent is correct: the names grabbed by the Python standard packages belong to no one but the Python standard packages. If you *don't* do that, upgrades are an absolute nightmare. Marc-Andre grabbed "mx". If (as I rather suspect ) he wants to remake the entire standard lib in his image, he's welcome to - *under* mx. What would happen if he (and everyone else) installed themselves *into* my core packages, then I decided I didn't want his stuff? More than likely I'd have to scrub the damn installation and start all over again. - Gordon From DavidA@ActiveState.com Wed Mar 29 00:01:57 2000 From: DavidA@ActiveState.com (David Ascher) Date: Tue, 28 Mar 2000 16:01:57 -0800 Subject: [Python-Dev] yeah! for Jeremy and Greg Message-ID: I'm thrilled to see the extended call syntax patches go in! One less wart in the language! Jeremy ZitBlaster Hylton and Greg Noxzema Ewing! --david From pf@artcom-gmbh.de Tue Mar 28 23:53:50 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Wed, 29 Mar 2000 01:53:50 +0200 (MEST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: <200003282156.QAA13361@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 28, 2000 4:56:49 pm" Message-ID: Hi! > [Peter Funk] > > > > > Do we need a UserString class? [...] Guido van Rossum: > So why don't you give the UserString.py a try and leave Andy's wish alone? Okay. Here we go. Could someone please have a close eye on this? I've haccked it up in hurry. ---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ---- #!/usr/bin/env python """A user-defined wrapper around string objects Note: string objects have grown methods in Python 1.6 This module requires Python 1.6 or later. """ import sys # XXX Totally untested and hacked up until 2:00 am with too less sleep ;-) class UserString: def __init__(self, string=""): self.data = string def __repr__(self): return repr(self.data) def __cmp__(self, string): if isinstance(string, UserString): return cmp(self.data, string.data) else: return cmp(self.data, string) def __len__(self): return len(self.data) # methods defined in alphabetical order def capitalize(self): return self.__class__(self.data.capitalize()) def center(self, width): return self.__class__(self.data.center(width)) def count(self, sub, start=0, end=sys.maxint): return self.data.count(sub, start, end) def encode(self, encoding=None, errors=None): # XXX improve this? if encoding: if errors: return self.__class__(self.data.encode(encoding, errors)) else: return self.__class__(self.data.encode(encoding)) else: return self.__class__(self.data.encode()) def endswith(self): raise NotImplementedError def find(self, sub, start=0, end=sys.maxint): return self.data.find(sub, start, end) def index(self): return self.data.index(sub, start, end) def isdecimal(self): return self.data.isdecimal() def isdigit(self): return self.data.isdigit() def islower(self): return self.data.islower() def isnumeric(self): return self.data.isnumeric() def isspace(self): return self.data.isspace() def istitle(self): return self.data.istitle() def isupper(self): return self.data.isupper() def join(self, seq): return self.data.join(seq) def ljust(self, width): return self.__class__(self.data.ljust(width)) def lower(self): return self.__class__(self.data.lower()) def lstrip(self): return self.__class__(self.data.lstrip()) def replace(self, old, new, maxsplit=-1): return self.__class__(self.data.replace(old, new, maxsplit)) def rfind(self, sub, start=0, end=sys.maxint): return self.data.rfind(sub, start, end) def rindex(self, sub, start=0, end=sys.maxint): return self.data.rindex(sub, start, end) def rjust(self, width): return self.__class__(self.data.rjust(width)) def rstrip(self): return self.__class__(self.data.rstrip()) def split(self, sep=None, maxsplit=-1): return self.data.split(sep, maxsplit) def splitlines(self, maxsplit=-1): return self.data.splitlines(maxsplit) def startswith(self, prefix, start=0, end=sys.maxint): return self.data.startswith(prefix, start, end) def strip(self): return self.__class__(self.data.strip()) def swapcase(self): return self.__class__(self.data.swapcase()) def title(self): return self.__class__(self.data.title()) def translate(self, table, deletechars=""): return self.__class__(self.data.translate(table, deletechars)) def upper(self): return self.__class__(self.data.upper()) def __add__(self, other): if isinstance(other, UserString): return self.__class__(self.data + other.data) elif isinstance(other, type(self.data)): return self.__class__(self.data + other) else: return self.__class__(self.data + str(other)) def __radd__(self, other): if isinstance(other, type(self.data)): return self.__class__(other + self.data) else: return self.__class__(str(other) + self.data) def __mul__(self, n): return self.__class__(self.data*n) __rmul__ = __mul__ def _test(): s = UserString("abc") u = UserString(u"efg") # XXX add some real tests here? return [0] if __name__ == "__main__": import sys sys.exit(_test()[0]) From Fredrik Lundh" Message-ID: <012301bf990b$2a494c80$34aab5d4@hagrid> > I'm thrilled to see the extended call syntax patches go in! One less = wart > in the language! but did he compile before checking in? ..\Python\compile.c(1225) : error C2065: 'CALL_FUNCTION_STAR' : undeclared identifier (compile.c and opcode.h both mention this identifier, but nobody defines it... should it be CALL_FUNCTION_VAR, perhaps?) From guido@python.org Wed Mar 29 00:07:34 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 19:07:34 -0500 Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: Your message of "Wed, 29 Mar 2000 01:53:50 +0200." References: Message-ID: <200003290007.TAA16081@eric.cnri.reston.va.us> > > [Peter Funk] > > > > > > Do we need a UserString class? > [...] > Guido van Rossum: > > So why don't you give the UserString.py a try and leave Andy's wish alone? [Peter] > Okay. Here we go. Could someone please have a close eye on this? > I've haccked it up in hurry. Good job! Go get some sleep, and tomorrow morning when you're fresh, compare it to UserList. From visual inpsection, you seem to be missing __getitem__ and __getslice__, and maybe more (of course not __set*__). --Guido van Rossum (home page: http://www.python.org/~guido/) From ping@lfw.org Wed Mar 29 00:13:24 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Tue, 28 Mar 2000 18:13:24 -0600 (CST) Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: <012301bf990b$2a494c80$34aab5d4@hagrid> Message-ID: On Wed, 29 Mar 2000, Fredrik Lundh wrote: > > I'm thrilled to see the extended call syntax patches go in! One less wart > > in the language! > > but did he compile before checking in? You beat me to it. I read David's message and got so excited i just had to try it right away. So i updated my CVS tree, did "make", and got the same error: make[1]: Entering directory `/home/ping/dev/python/dist/src/Python' gcc -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c compile.c -o compile.o compile.c: In function `com_call_function': compile.c:1225: `CALL_FUNCTION_STAR' undeclared (first use in this function) compile.c:1225: (Each undeclared identifier is reported only once compile.c:1225: for each function it appears in.) make[1]: *** [compile.o] Error 1 > (compile.c and opcode.h both mention this identifier, but > nobody defines it... should it be CALL_FUNCTION_VAR, > perhaps?) But CALL_FUNCTION_STAR is mentioned in the comments... #define CALL_FUNCTION 131 /* #args + (#kwargs<<8) */ #define MAKE_FUNCTION 132 /* #defaults */ #define BUILD_SLICE 133 /* Number of items */ /* The next 3 opcodes must be contiguous and satisfy (CALL_FUNCTION_STAR - CALL_FUNCTION) & 3 == 1 */ #define CALL_FUNCTION_VAR 140 /* #args + (#kwargs<<8) */ #define CALL_FUNCTION_KW 141 /* #args + (#kwargs<<8) */ #define CALL_FUNCTION_VAR_KW 142 /* #args + (#kwargs<<8) */ The condition (CALL_FUNCTION_STAR - CALL_FUNCTION) & 3 == 1 doesn't make much sense, though... -- ?!ng From jeremy@cnri.reston.va.us Wed Mar 29 00:18:54 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Tue, 28 Mar 2000 19:18:54 -0500 (EST) Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: <012301bf990b$2a494c80$34aab5d4@hagrid> References: <012301bf990b$2a494c80$34aab5d4@hagrid> Message-ID: <14561.19438.157799.810802@goon.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: >> I'm thrilled to see the extended call syntax patches go in! One >> less wart in the language! FL> but did he compile before checking in? Indeed, but not often enough :-). FL> ..\Python\compile.c(1225) : error C2065: 'CALL_FUNCTION_STAR' : FL> undeclared identifier FL> (compile.c and opcode.h both mention this identifier, but nobody FL> defines it... should it be CALL_FUNCTION_VAR, perhaps?) This was a last minute change of names. I had previously compiled under the old names. The Makefile doesn't describe the dependency between opcode.h and compile.c. And the compile.o file I had worked, because the only change was to the name of a macro. It's too bad the Makefile doesn't have all the dependencies. It seems that it's necessary to do a make clean before checking in a change that affects many files. Jeremy From klm@digicool.com Wed Mar 29 00:30:05 2000 From: klm@digicool.com (Ken Manheimer) Date: Tue, 28 Mar 2000 19:30:05 -0500 (EST) Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: Message-ID: On Tue, 28 Mar 2000, David Ascher wrote: > I'm thrilled to see the extended call syntax patches go in! One less wart > in the language! Me too! Even the lisps i used to know (albeit ancient, according to eric) couldn't get it as tidy as this. (Silly me, now i'm imagining we're going to see operator assignments just around the bend. "Give them a tasty morsel, they ask for your dinner..."-) Ken klm@digicool.com From ping@lfw.org Wed Mar 29 00:35:54 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Tue, 28 Mar 2000 18:35:54 -0600 (CST) Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: <14561.19438.157799.810802@goon.cnri.reston.va.us> Message-ID: On Tue, 28 Mar 2000, Jeremy Hylton wrote: > > It's too bad the Makefile doesn't have all the dependencies. It seems > that it's necessary to do a make clean before checking in a change > that affects many files. I updated again and rebuilt. >>> def sum(*args): ... s = 0 ... for x in args: s = s + x ... return s ... >>> sum(2,3,4) 9 >>> sum(*[2,3,4]) 9 >>> x = (2,3,4) >>> sum(*x) 9 >>> def func(a, b, c): ... print a, b, c ... >>> func(**{'a':2, 'b':1, 'c':6}) 2 1 6 >>> func(**{'c':8, 'a':1, 'b':9}) 1 9 8 >>> *cool*. So does this completely obviate the need for "apply", then? apply(x, y, z) <==> x(*y, **z) -- ?!ng From guido@python.org Wed Mar 29 00:35:17 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 19:35:17 -0500 Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: Your message of "Tue, 28 Mar 2000 18:35:54 CST." References: Message-ID: <200003290035.TAA16278@eric.cnri.reston.va.us> > *cool*. > > So does this completely obviate the need for "apply", then? > > apply(x, y, z) <==> x(*y, **z) I think so (except for backwards compatibility). The 1.6 docs for apply should point this out! --Guido van Rossum (home page: http://www.python.org/~guido/) From DavidA@ActiveState.com Wed Mar 29 00:42:20 2000 From: DavidA@ActiveState.com (David Ascher) Date: Tue, 28 Mar 2000 16:42:20 -0800 Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: Message-ID: > I updated again and rebuilt. > > >>> def sum(*args): > ... s = 0 > ... for x in args: s = s + x > ... return s > ... > >>> sum(2,3,4) > 9 > >>> sum(*[2,3,4]) > 9 > >>> x = (2,3,4) > >>> sum(*x) > 9 > >>> def func(a, b, c): > ... print a, b, c > ... > >>> func(**{'a':2, 'b':1, 'c':6}) > 2 1 6 > >>> func(**{'c':8, 'a':1, 'b':9}) > 1 9 8 > >>> > > *cool*. But most importantly, IMO: class SubClass(Class): def __init__(self, a, *args, **kw): self.a = a Class.__init__(self, *args, **kw) Much neater. From bwarsaw@cnri.reston.va.us Wed Mar 29 00:46:11 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 28 Mar 2000 19:46:11 -0500 (EST) Subject: [Python-Dev] yeah! for Jeremy and Greg References: <14561.19438.157799.810802@goon.cnri.reston.va.us> Message-ID: <14561.21075.637108.322536@anthem.cnri.reston.va.us> Uh oh. Fresh CVS update and make clean, make: -------------------- snip snip -------------------- Python 1.5.2+ (#20, Mar 28 2000, 19:37:38) [GCC 2.8.1] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> def sum(*args): ... s = 0 ... for x in args: s = s + x ... return s ... >>> class Nums: ... def __getitem__(self, i): ... if i >= 10 or i < 0: raise IndexError ... return i ... >>> n = Nums() >>> for i in n: print i ... 0 1 2 3 4 5 6 7 8 9 >>> sum(*n) Traceback (innermost last): File "", line 1, in ? SystemError: bad argument to internal function -------------------- snip snip -------------------- -Barry From bwarsaw@cnri.reston.va.us Wed Mar 29 01:02:16 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 28 Mar 2000 20:02:16 -0500 (EST) Subject: [Python-Dev] yeah! for Jeremy and Greg References: <14561.19438.157799.810802@goon.cnri.reston.va.us> <14561.21075.637108.322536@anthem.cnri.reston.va.us> Message-ID: <14561.22040.383370.283163@anthem.cnri.reston.va.us> Changing the definition of class Nums to class Nums: def __getitem__(self, i): if 0 <= i < 10: return i raise IndexError def __len__(self): return 10 I.e. adding the __len__() method avoids the SystemError. Either the *arg call should not depend on the sequence being lenght-able, or it should error check that the length calculation doesn't return -1 or raise an exception. Looking at PySequence_Length() though, it seems that m->sq_length(s) can return -1 without setting a type_error. So the fix is either to include a check for return -1 in PySequence_Length() when calling sq_length, or instance_length() should set a TypeError when it has no __len__() method and returns -1. I gotta run so I can't follow this through -- I'm sure I'll see the right solution from someone in tomorrow mornings email :) -Barry From ping@lfw.org Wed Mar 29 01:17:27 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Tue, 28 Mar 2000 19:17:27 -0600 (CST) Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: <14561.22040.383370.283163@anthem.cnri.reston.va.us> Message-ID: On Tue, 28 Mar 2000, Barry A. Warsaw wrote: > > Changing the definition of class Nums to > > class Nums: > def __getitem__(self, i): > if 0 <= i < 10: return i > raise IndexError > def __len__(self): > return 10 > > I.e. adding the __len__() method avoids the SystemError. It should be noted that "apply" has the same problem, with a different counterintuitive error message: >>> n = Nums() >>> apply(sum, n) Traceback (innermost last): File "", line 1, in ? AttributeError: __len__ -- ?!ng From jeremy@cnri.reston.va.us Wed Mar 29 02:59:26 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Tue, 28 Mar 2000 21:59:26 -0500 (EST) Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: References: Message-ID: <14561.29070.940238.542509@bitdiddle.cnri.reston.va.us> >>>>> "DA" == David Ascher writes: DA> But most importantly, IMO: DA> class SubClass(Class): DA> def __init__(self, a, *args, **kw): DA> self.a = a DA> Class.__init__(self, *args, **kw) DA> Much neater. This version of method overloading was what I liked most about Greg's patch. Note that I also prefer: class SubClass(Class): super_init = Class.__init__ def __init__(self, a, *args, **kw): self.a = a self.super_init(*args, **kw) I've been happy to have all the overridden methods explicitly labelled at the top of a class lately. It is much easier to change the class hierarchy later. Jeremy From gward@cnri.reston.va.us Wed Mar 29 03:15:00 2000 From: gward@cnri.reston.va.us (Greg Ward) Date: Tue, 28 Mar 2000 22:15:00 -0500 Subject: [Python-Dev] __debug__ and py_compile Message-ID: <20000328221500.A3290@cnri.reston.va.us> Hi all -- a particularly active member of the Distutils-SIG brought the global '__debug__' flag to my attention, since I (and thus my code) didn't know if calling 'py_compile.compile()' would result in a ".pyc" or a ".pyo" file. It appears that, using __debug__, you can determine what you're going to get. Cool! However, it doesn't look like you can *choose* what you're going to get. Is this correct? Ie. does the presence/absence of -O when the interpreter starts up *completely* decide how code is compiled? Also, can I rely on __debug__ being there in the future? How about in the past? I still occasionally ponder making Distutils compatible with Python 1.5.1. Thanks -- Greg From guido@python.org Wed Mar 29 04:08:12 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 23:08:12 -0500 Subject: [Python-Dev] __debug__ and py_compile In-Reply-To: Your message of "Tue, 28 Mar 2000 22:15:00 EST." <20000328221500.A3290@cnri.reston.va.us> References: <20000328221500.A3290@cnri.reston.va.us> Message-ID: <200003290408.XAA17991@eric.cnri.reston.va.us> > a particularly active member of the Distutils-SIG brought the > global '__debug__' flag to my attention, since I (and thus my code) > didn't know if calling 'py_compile.compile()' would result in a ".pyc" > or a ".pyo" file. It appears that, using __debug__, you can determine > what you're going to get. Cool! > > However, it doesn't look like you can *choose* what you're going to > get. Is this correct? Ie. does the presence/absence of -O when the > interpreter starts up *completely* decide how code is compiled? Correct. You (currently) can't change the opt setting of the compiler. (It was part of the compiler restructuring to give more freedom here; this has been pushed back to 1.7.) > Also, can I rely on __debug__ being there in the future? How about in > the past? I still occasionally ponder making Distutils compatible with > Python 1.5.1. __debug__ is as old as the assert statement, going back to at least 1.5.0. --Guido van Rossum (home page: http://www.python.org/~guido/) From Moshe Zadka Wed Mar 29 05:35:51 2000 From: Moshe Zadka (Moshe Zadka) Date: Wed, 29 Mar 2000 07:35:51 +0200 (IST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <1257835425-27941123@hypernet.com> Message-ID: On Tue, 28 Mar 2000, Gordon McMillan wrote: > What would happen if he (and everyone else) installed > themselves *into* my core packages, then I decided I didn't > want his stuff? More than likely I'd have to scrub the damn > installation and start all over again. I think Greg Stein answered that objection, by reminding us that the filesystem isn't the only way to set up a package hierarchy. In particular, even with Python's current module system, there is no need to scrub installations: Python core modules go (under UNIX) in /usr/local/lib/python1.5, and 3rd party modules go in /usr/local/lib/python1.5/site-packages. Need to remove stuff? Remove whatever is in /usr/local/lib/python1.5/site-packages. Need to upgrade? Just backup /usr/local/lib/python1.5/site-packages, remove /usr/local/lib/python1.5/, install, and move 3rd party modules back from backup. This becomes even easier if the standard installation is in a JAR-like file, and 3rd party modules are also in a JAR-like file, but specified to be in their natural place. Wow! That was a long rant! Anyway, I already expressed my preference of the Perl way, over the Java way. For one thing, I don't want to have to register a domain just so I could distribute Python code -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From bwarsaw@cnri.reston.va.us Wed Mar 29 05:42:34 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 29 Mar 2000 00:42:34 -0500 (EST) Subject: [Python-Dev] yeah! for Jeremy and Greg References: <14561.19438.157799.810802@goon.cnri.reston.va.us> <14561.21075.637108.322536@anthem.cnri.reston.va.us> Message-ID: <14561.38858.41246.28460@anthem.cnri.reston.va.us> >>>>> "BAW" == Barry A Warsaw writes: BAW> Uh oh. Fresh CVS update and make clean, make: >>> sum(*n) | Traceback (innermost last): | File "", line 1, in ? | SystemError: bad argument to internal function Here's a proposed patch that will cause a TypeError to be raised instead. -Barry -------------------- snip snip -------------------- Index: abstract.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/abstract.c,v retrieving revision 2.33 diff -c -r2.33 abstract.c *** abstract.c 2000/03/10 22:55:18 2.33 --- abstract.c 2000/03/29 05:36:21 *************** *** 860,866 **** PyObject *s; { PySequenceMethods *m; ! if (s == NULL) { null_error(); return -1; --- 860,867 ---- PyObject *s; { PySequenceMethods *m; ! int size = -1; ! if (s == NULL) { null_error(); return -1; *************** *** 868,877 **** m = s->ob_type->tp_as_sequence; if (m && m->sq_length) ! return m->sq_length(s); ! type_error("len() of unsized object"); ! return -1; } PyObject * --- 869,879 ---- m = s->ob_type->tp_as_sequence; if (m && m->sq_length) ! size = m->sq_length(s); ! if (size < 0) ! type_error("len() of unsized object"); ! return size; } PyObject * Index: ceval.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Python/ceval.c,v retrieving revision 2.169 diff -c -r2.169 ceval.c *** ceval.c 2000/03/28 23:49:16 2.169 --- ceval.c 2000/03/29 05:39:00 *************** *** 1636,1641 **** --- 1636,1649 ---- break; } nstar = PySequence_Length(stararg); + if (nstar < 0) { + if (!PyErr_Occurred) + PyErr_SetString( + PyExc_TypeError, + "len() of unsized object"); + x = NULL; + break; + } } if (nk > 0) { if (kwdict == NULL) { From bwarsaw@cnri.reston.va.us Wed Mar 29 05:46:19 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Wed, 29 Mar 2000 00:46:19 -0500 (EST) Subject: [Python-Dev] yeah! for Jeremy and Greg References: <14561.22040.383370.283163@anthem.cnri.reston.va.us> Message-ID: <14561.39083.748093.694726@anthem.cnri.reston.va.us> >>>>> "KY" == Ka-Ping Yee writes: | It should be noted that "apply" has the same problem, with a | different counterintuitive error message: >> n = Nums() apply(sum, n) | Traceback (innermost last): | File "", line 1, in ? | AttributeError: __len__ The patch I just posted fixes this too. The error message ain't great, but at least it's consistent with the direct call. -Barry -------------------- snip snip -------------------- Traceback (innermost last): File "/tmp/doit.py", line 15, in ? print apply(sum, n) TypeError: len() of unsized object From pf@artcom-gmbh.de Wed Mar 29 06:30:22 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Wed, 29 Mar 2000 08:30:22 +0200 (MEST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: from Moshe Zadka at "Mar 29, 2000 7:44:42 am" Message-ID: Hi! > On Wed, 29 Mar 2000, Peter Funk wrote: > > > class UserString: > > def __init__(self, string=""): > > self.data = string > ^^^^^^^ Moshe Zadka wrote: > Why do you feel there is a need to default? Strings are immutable I had something like this in my mind: class MutableString(UserString): """Python strings are immutable objects. But of course this can be changed in a derived class implementing the missing methods. >>> s = MutableString() >>> s[0:5] = "HUH?" """ def __setitem__(self, char): .... def __setslice__(self, i, j, substring): .... > What about __int__, __long__, __float__, __str__, __hash__? > And what about __getitem__ and __contains__? > And __complex__? I was obviously too tired and too eager to get this out! Thanks for reviewing and responding so quickly. I will add them. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From Moshe Zadka Wed Mar 29 06:51:30 2000 From: Moshe Zadka (Moshe Zadka) Date: Wed, 29 Mar 2000 08:51:30 +0200 (IST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: Message-ID: On Wed, 29 Mar 2000, Peter Funk wrote: > Moshe Zadka wrote: > > Why do you feel there is a need to default? Strings are immutable > > I had something like this in my mind: > > class MutableString(UserString): > """Python strings are immutable objects. But of course this can > be changed in a derived class implementing the missing methods. Then add the default in the constructor for MutableString.... eagerly-waiting-for-UserString.py-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From Moshe Zadka Wed Mar 29 07:03:53 2000 From: Moshe Zadka (Moshe Zadka) Date: Wed, 29 Mar 2000 09:03:53 +0200 (IST) Subject: [Python-Dev] 1.5.2->1.6 Changes Message-ID: I'm starting to compile a list of changes from 1.5.2 to 1.6. Here's what I came up with so far -- string objects now have methods (though they are still immutable) -- unicode support: Unicode strings are marked with u"string", and there is support for arbitrary encoders/decoders -- "in" operator can now be overriden in user-defined classes to mean anything: it calls the magic method __contains__ -- SRE is the new regular expression engine. re.py became an interface to the same engine. The new engine fully supports unicode regular expressions. -- Some methods which would take multiple arguments and treat them as a tuple were fixed: list.{append, insert, remove, count}, socket.connect -- Some modules were made obsolete -- filecmp.py (supersedes the old cmp.py and dircmp.py modules), -- tabnanny.py (make sure the source file doesn't assume a specific tab-width) -- win32reg (win32 registry editor) -- unicode module, and codecs package -- New calling syntax: f(*args, **kw) equivalent to apply(f, args, kw) -- _tkinter now uses the object, rather then string, interface to Tcl. Please e-mail me personally if you think of any other changes, and I'll try to integrate them into a complete "changes" document. Thanks in advance -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From esr@thyrsus.com Wed Mar 29 07:21:29 2000 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 29 Mar 2000 02:21:29 -0500 Subject: [Python-Dev] 1.5.2->1.6 Changes In-Reply-To: ; from Moshe Zadka on Wed, Mar 29, 2000 at 09:03:53AM +0200 References: Message-ID: <20000329022129.A15539@thyrsus.com> Moshe Zadka : > -- _tkinter now uses the object, rather then string, interface to Tcl. Hm, does this mean that the annoying requirement to do explicit gets and sets to move data between the Python world and the Tcl/Tk world is gone? -- Eric S. Raymond "A system of licensing and registration is the perfect device to deny gun ownership to the bourgeoisie." -- Vladimir Ilyich Lenin From Moshe Zadka Wed Mar 29 07:22:54 2000 From: Moshe Zadka (Moshe Zadka) Date: Wed, 29 Mar 2000 09:22:54 +0200 (IST) Subject: [Python-Dev] 1.5.2->1.6 Changes In-Reply-To: <20000329022129.A15539@thyrsus.com> Message-ID: On Wed, 29 Mar 2000, Eric S. Raymond wrote: > Moshe Zadka : > > -- _tkinter now uses the object, rather then string, interface to Tcl. > > Hm, does this mean that the annoying requirement to do explicit gets and > sets to move data between the Python world and the Tcl/Tk world is gone? I doubt it. It's just that Python and Tcl have such a different outlook about variables, that I don't think it can be slided over. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From pf@artcom-gmbh.de Wed Mar 29 09:16:17 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Wed, 29 Mar 2000 11:16:17 +0200 (MEST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: from Moshe Zadka at "Mar 29, 2000 8:51:30 am" Message-ID: Hi! Moshe Zadka: > eagerly-waiting-for-UserString.py-ly y'rs, Z. Well, I've added the missing methods. Unfortunately I ran out of time now and a 'test_userstring.py' derived from 'src/Lib/test/test_string.py' is still missing. Regards, Peter ---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ---- #!/usr/bin/env python """A user-defined wrapper around string objects Note: string objects have grown methods in Python 1.6 This module requires Python 1.6 or later. """ from types import StringType, UnicodeType import sys class UserString: def __init__(self, string): self.data = string def __str__(self): return str(self.data) def __repr__(self): return repr(self.data) def __int__(self): return int(self.data) def __long__(self): return long(self.data) def __float__(self): return float(self.data) def __hash__(self): return hash(self.data) def __cmp__(self, string): if isinstance(string, UserString): return cmp(self.data, string.data) else: return cmp(self.data, string) def __contains__(self, char): return char in self.data def __len__(self): return len(self.data) def __getitem__(self, index): return self.__class__(self.data[index]) def __getslice__(self, start, end): start = max(start, 0); end = max(end, 0) return self.__class__(self.data[start:end]) def __add__(self, other): if isinstance(other, UserString): return self.__class__(self.data + other.data) elif isinstance(other, StringType) or isinstance(other, UnicodeType): return self.__class__(self.data + other) else: return self.__class__(self.data + str(other)) def __radd__(self, other): if isinstance(other, StringType) or isinstance(other, UnicodeType): return self.__class__(other + self.data) else: return self.__class__(str(other) + self.data) def __mul__(self, n): return self.__class__(self.data*n) __rmul__ = __mul__ # the following methods are defined in alphabetical order: def capitalize(self): return self.__class__(self.data.capitalize()) def center(self, width): return self.__class__(self.data.center(width)) def count(self, sub, start=0, end=sys.maxint): return self.data.count(sub, start, end) def encode(self, encoding=None, errors=None): # XXX improve this? if encoding: if errors: return self.__class__(self.data.encode(encoding, errors)) else: return self.__class__(self.data.encode(encoding)) else: return self.__class__(self.data.encode()) def endswith(self, suffix, start=0, end=sys.maxint): return self.data.endswith(suffix, start, end) def find(self, sub, start=0, end=sys.maxint): return self.data.find(sub, start, end) def index(self, sub, start=0, end=sys.maxint): return self.data.index(sub, start, end) def isdecimal(self): return self.data.isdecimal() def isdigit(self): return self.data.isdigit() def islower(self): return self.data.islower() def isnumeric(self): return self.data.isnumeric() def isspace(self): return self.data.isspace() def istitle(self): return self.data.istitle() def isupper(self): return self.data.isupper() def join(self, seq): return self.data.join(seq) def ljust(self, width): return self.__class__(self.data.ljust(width)) def lower(self): return self.__class__(self.data.lower()) def lstrip(self): return self.__class__(self.data.lstrip()) def replace(self, old, new, maxsplit=-1): return self.__class__(self.data.replace(old, new, maxsplit)) def rfind(self, sub, start=0, end=sys.maxint): return self.data.rfind(sub, start, end) def rindex(self, sub, start=0, end=sys.maxint): return self.data.rindex(sub, start, end) def rjust(self, width): return self.__class__(self.data.rjust(width)) def rstrip(self): return self.__class__(self.data.rstrip()) def split(self, sep=None, maxsplit=-1): return self.data.split(sep, maxsplit) def splitlines(self, maxsplit=-1): return self.data.splitlines(maxsplit) def startswith(self, prefix, start=0, end=sys.maxint): return self.data.startswith(prefix, start, end) def strip(self): return self.__class__(self.data.strip()) def swapcase(self): return self.__class__(self.data.swapcase()) def title(self): return self.__class__(self.data.title()) def translate(self, table, deletechars=""): return self.__class__(self.data.translate(table, deletechars)) def upper(self): return self.__class__(self.data.upper()) class MutableString(UserString): """mutable string objects Python strings are immutable objects. This has the advantage, that strings may be used as dictionary keys. If this property isn't needed and you insist on changing string values in place instead, you may cheat and use MutableString. But the purpose of this class is an educational one: to prevent people from inventing their own mutable string class derived from UserString and than forget thereby to remove (override) the __hash__ method inherited from ^UserString. This would lead to errors that would be very hard to track down. A faster and better solution is to rewrite the program using lists.""" def __init__(self, string=""): self.data = string def __hash__(self): raise TypeError, "unhashable type (it is mutable)" def __setitem__(self, index, sub): if index < 0 or index >= len(self.data): raise IndexError self.data = self.data[:index] + sub + self.data[index+1:] def __delitem__(self, index): if index < 0 or index >= len(self.data): raise IndexError self.data = self.data[:index] + self.data[index+1:] def __setslice__(self, start, end, sub): start = max(start, 0); end = max(end, 0) if isinstance(sub, UserString): self.data = self.data[:start]+sub.data+self.data[end:] elif isinstance(sub, StringType) or isinstance(sub, UnicodeType): self.data = self.data[:start]+sub+self.data[end:] else: self.data = self.data[:start]+str(sub)+self.data[end:] def __delslice__(self, start, end): start = max(start, 0); end = max(end, 0) self.data = self.data[:start] + self.data[end:] def immutable(self): return UserString(self.data) def _test(): s = UserString("abc") u = UserString(u"efg") # XXX add some real tests here? return 0 if __name__ == "__main__": sys.exit(_test()) From mal@lemburg.com Wed Mar 29 09:34:21 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 29 Mar 2000 11:34:21 +0200 Subject: [Python-Dev] Great Renaming? What is the goal? References: <1257835425-27941123@hypernet.com> Message-ID: <38E1CE1D.7899B1BC@lemburg.com> Gordon McMillan wrote: > > Andrew M. Kuchling wrote: > [snip] > > 2) Right now there's no way for third-party extensions to add > > themselves to a package in the standard library. Once Python finds > > foo/__init__.py, it won't look for site-packages/foo/__init__.py, so > > if you grab, say, "crypto" as a package name in the standard library, > > it's forever lost to third-party extensions. > > That way lies madness. While I'm happy to carp at Java for > requiring "com", "net" or whatever as a top level name, their > intent is correct: the names grabbed by the Python standard > packages belong to no one but the Python standard > packages. If you *don't* do that, upgrades are an absolute > nightmare. > > Marc-Andre grabbed "mx". If (as I rather suspect ) he > wants to remake the entire standard lib in his image, he's > welcome to - *under* mx. Right, that's the way I see it too. BTW, where can I register the "mx" top-level package name ? Should these be registered in the NIST registry ? Will the names registered there be honored ? > What would happen if he (and everyone else) installed > themselves *into* my core packages, then I decided I didn't > want his stuff? More than likely I'd have to scrub the damn > installation and start all over again. That's a no-no, IMHO. Unless explicitly allowed, packages should *not* install themselves as subpackages to other existing top-level packages. If they do, its their problem if the hierarchy changes... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Moshe Zadka Wed Mar 29 09:59:47 2000 From: Moshe Zadka (Moshe Zadka) Date: Wed, 29 Mar 2000 11:59:47 +0200 (IST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: Message-ID: On Wed, 29 Mar 2000, Peter Funk wrote: > Hi! > > Moshe Zadka: > > eagerly-waiting-for-UserString.py-ly y'rs, Z. > > Well, I've added the missing methods. Unfortunately I ran out of time now and > a 'test_userstring.py' derived from 'src/Lib/test/test_string.py' is still > missing. Great work, Peter! I really like UserString. However, I have two issues with MutableString: 1. I tshouldn't share implementation with UserString, otherwise your algorithm are not behaving with correct big-O properties. It should probably use a char-array (from the array module) as the internal representation. 2. It shouldn't share interface iwth UserString, since it doesn't have a proper implementation with __hash__. All in all, I probably disagree with making MutableString a subclass of UserString. If I have time later today, I'm hoping to be able to make my own MutableString From pf@artcom-gmbh.de Wed Mar 29 10:35:32 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Wed, 29 Mar 2000 12:35:32 +0200 (MEST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: from Moshe Zadka at "Mar 29, 2000 11:59:47 am" Message-ID: Hi! > > Moshe Zadka: > > > eagerly-waiting-for-UserString.py-ly y'rs, Z. > > > On Wed, 29 Mar 2000, Peter Funk wrote: > > Well, I've added the missing methods. Unfortunately I ran out of time now and > > a 'test_userstring.py' derived from 'src/Lib/test/test_string.py' is still > > missing. > Moshe Zadka schrieb: > Great work, Peter! I really like UserString. However, I have two issues > with MutableString: > > 1. I tshouldn't share implementation with UserString, otherwise your > algorithm are not behaving with correct big-O properties. It should > probably use a char-array (from the array module) as the internal > representation. Hmm.... I don't understand what you mean with 'big-O properties'. The internal representation of any object should be considered ... umm ... internal. > 2. It shouldn't share interface iwth UserString, since it doesn't have a > proper implementation with __hash__. What's wrong with my implementation of __hash__ raising a TypeError with the attribution 'unhashable object'. This is the same behaviour, if you try to add some other mutable object as key to dictionary: >>> l = [] >>> d = { l : 'foo' } Traceback (innermost last): File "", line 1, in ? TypeError: unhashable type > All in all, I probably disagree with making MutableString a subclass of > UserString. If I have time later today, I'm hoping to be able to make my > own MutableString As I tried to point out in the docstring of 'MutableString', I don't want people actually start using the 'MutableString' class. My Intentation was to prevent people from trying to invent their own and than probably wrong MutableString class derived from UserString. Only Newbies will really ever need mutable strings in Python (see FAQ). May be my 'MutableString' idea belongs somewhere into the to be written src/Doc/libuserstring.tex. But since Newbies tend to ignore docs ... Sigh. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From gmcm@hypernet.com Wed Mar 29 11:07:20 2000 From: gmcm@hypernet.com (Gordon McMillan) Date: Wed, 29 Mar 2000 06:07:20 -0500 Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: References: <1257835425-27941123@hypernet.com> Message-ID: <1257794452-30405909@hypernet.com> Moshe Zadka wrote: > On Tue, 28 Mar 2000, Gordon McMillan wrote: > > > What would happen if he (and everyone else) installed > > themselves *into* my core packages, then I decided I didn't > > want his stuff? More than likely I'd have to scrub the damn > > installation and start all over again. > > I think Greg Stein answered that objection, by reminding us that the > filesystem isn't the only way to set up a package hierarchy. You mean when Greg said: >Assuming that you use an archive like those found in my "small" distro or > Gordon's distro, then this is no problem. The archive simply recognizes > and maps "text.encoding.macbinary" to its own module. I don't know what this has to do with it. When we get around to the 'macbinary' part, we have already established that 'text.encoding' is the parent which should supply 'macbinary'. > In > particular, even with Python's current module system, there is no need to > scrub installations: Python core modules go (under UNIX) in > /usr/local/lib/python1.5, and 3rd party modules go in > /usr/local/lib/python1.5/site-packages. And if there's a /usr/local/lib/python1.5/text/encoding, there's no way that /usr/local/lib/python1.5/site- packages/text/encoding will get searched. I believe you could hack up an importer that did allow this, and I think you'd be 100% certifiable if you did. Just look at the surprise factor. Hacking stuff into another package is just as evil as math.pi = 42. > Anyway, I already expressed my preference of the Perl way, over the Java > way. For one thing, I don't want to have to register a domain just so I > could distribute Python code I haven't the foggiest what the "Perl way" is; I wouldn't be surprised if it relied on un-Pythonic sociological factors. I already said the Java mechanics are silly; uniqueness is what matters. When Python packages start selling in the four and five figure range , then a registry mechanism will likely be necessary. - Gordon From Moshe Zadka Wed Mar 29 11:21:09 2000 From: Moshe Zadka (Moshe Zadka) Date: Wed, 29 Mar 2000 13:21:09 +0200 (IST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: Message-ID: On Wed, 29 Mar 2000, Peter Funk wrote: > > 1. I tshouldn't share implementation with UserString, otherwise your > > algorithm are not behaving with correct big-O properties. It should > > probably use a char-array (from the array module) as the internal > > representation. > > Hmm.... I don't understand what you mean with 'big-O properties'. > The internal representation of any object should be considered ... > umm ... internal. Yes, but s[0] = 'a' Should take O(1) time, not O(len(s)) > > 2. It shouldn't share interface iwth UserString, since it doesn't have a > > proper implementation with __hash__. > > What's wrong with my implementation of __hash__ raising a TypeError with > the attribution 'unhashable object'. A subtype shouldn't change contracts of its supertypes. hash() was implicitly contracted as "raising no exceptions". -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From Moshe Zadka Wed Mar 29 11:30:59 2000 From: Moshe Zadka (Moshe Zadka) Date: Wed, 29 Mar 2000 13:30:59 +0200 (IST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <1257794452-30405909@hypernet.com> Message-ID: On Wed, 29 Mar 2000, Gordon McMillan wrote: > And if there's a /usr/local/lib/python1.5/text/encoding, there's > no way that /usr/local/lib/python1.5/site- > packages/text/encoding will get searched. Oh my god! I just realized you're right. Well, back to the drawing board. > I haven't the foggiest what the "Perl way" is; I wouldn't be > surprised if it relied on un-Pythonic sociological factors. No, it relies on non-Pythonic (but not unpythonic -- simply different) technical choices. > I > already said the Java mechanics are silly; uniqueness is what > matters. As in all things namespacish ;-) Though I suspect a registry will be needed much sooner. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From guido@python.org Wed Mar 29 12:26:56 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 29 Mar 2000 07:26:56 -0500 Subject: [Python-Dev] 1.5.2->1.6 Changes In-Reply-To: Your message of "Wed, 29 Mar 2000 02:21:29 EST." <20000329022129.A15539@thyrsus.com> References: <20000329022129.A15539@thyrsus.com> Message-ID: <200003291226.HAA18216@eric.cnri.reston.va.us> > Moshe Zadka : > > -- _tkinter now uses the object, rather then string, interface to Tcl. Eric Raymond: > Hm, does this mean that the annoying requirement to do explicit gets and > sets to move data between the Python world and the Tcl/Tk world is gone? Not sure what you are referring to -- this should be completely transparant to Python/Tkinter users. If you are thinking of the way Tcl variables are created and manipulated in Python, no, this doesn't change, alas (Tcl variables aren't objects -- they are manipulated through get and set commands. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Mar 29 12:32:16 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 29 Mar 2000 07:32:16 -0500 Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: Your message of "Wed, 29 Mar 2000 11:34:21 +0200." <38E1CE1D.7899B1BC@lemburg.com> References: <1257835425-27941123@hypernet.com> <38E1CE1D.7899B1BC@lemburg.com> Message-ID: <200003291232.HAA18234@eric.cnri.reston.va.us> > > Marc-Andre grabbed "mx". If (as I rather suspect ) he > > wants to remake the entire standard lib in his image, he's > > welcome to - *under* mx. > > Right, that's the way I see it too. BTW, where can I register > the "mx" top-level package name ? Should these be registered > in the NIST registry ? Will the names registered there be > honored ? I think the NIST registry is a failed experiment -- too cumbersome to maintain or consult. We can do this the same way as common law handles trade marks: if you have used it as your brand name long enough, even if you didn't register, someone else cannot grab it away from you. > > What would happen if he (and everyone else) installed > > themselves *into* my core packages, then I decided I didn't > > want his stuff? More than likely I'd have to scrub the damn > > installation and start all over again. > > That's a no-no, IMHO. Unless explicitly allowed, packages > should *not* install themselves as subpackages to other > existing top-level packages. If they do, its their problem > if the hierarchy changes... Agreed. Although some people seem to *want* this. Probably because it's okay to do that in Java and (apparently?) in Perl. And C++, probably. It all probably stems back to Lisp. I admit that I didn't see this subtlety when I designed Python's package architecture. It's too late to change (e.g. because of __init__.py). Is it a problem though? Let's be open-minded about this and think about whether we want to allow this or not, and why... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Mar 29 12:35:33 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 29 Mar 2000 07:35:33 -0500 Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: Your message of "Wed, 29 Mar 2000 13:21:09 +0200." References: Message-ID: <200003291235.HAA18249@eric.cnri.reston.va.us> > > What's wrong with my implementation of __hash__ raising a TypeError with > > the attribution 'unhashable object'. > > A subtype shouldn't change contracts of its supertypes. hash() was > implicitly contracted as "raising no exceptions". Let's not confuse subtypes and subclasses. One of the things implicit in the discussion on types-sig is that not every subclass is a subtype! Yes, this violates something we all learned from C++ -- but it's a great insight. No time to explain it more, but for me, Peter's subclassing UserString for MutableString to borrow implementation is fine. --Guido van Rossum (home page: http://www.python.org/~guido/) From pf@artcom-gmbh.de Wed Mar 29 13:49:24 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Wed, 29 Mar 2000 15:49:24 +0200 (MEST) Subject: [Python-Dev] NIST Registry (was Great Renaming? What is the goal?) In-Reply-To: <200003291232.HAA18234@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 29, 2000 7:32:16 am" Message-ID: Hi! Guido van Rossum: > I think the NIST registry is a failed experiment -- too cumbersome to > maintain or consult. The WEB frontend of the NIST registry is not that bad --- if you are even aware of the fact, that such a beast exists! I use Python since 1994 and discovered the NIST registry incidental a few weeks ago, when I was really looking for something about the Win32 registry and used the search engine on www.python.org. My first thought was: What a neat clever idea! I think this is an example how the Python community suffers from poor advertising of good ideas. > We can do this the same way as common law > handles trade marks: if you have used it as your brand name long > enough, even if you didn't register, someone else cannot grab it away > from you. Okay. But a more formal registry wouldn't hurt. Something like the global module index from the current docs supplemented with all contribution modules which can be currently found a www.vex.net would be a useful resource. Regards, Peter From Moshe Zadka Wed Mar 29 14:15:36 2000 From: Moshe Zadka (Moshe Zadka) Date: Wed, 29 Mar 2000 16:15:36 +0200 (IST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: <200003291235.HAA18249@eric.cnri.reston.va.us> Message-ID: On Wed, 29 Mar 2000, Guido van Rossum wrote: > Let's not confuse subtypes and subclasses. One of the things implicit > in the discussion on types-sig is that not every subclass is a > subtype! Yes, this violates something we all learned from C++ -- but > it's a great insight. No time to explain it more, but for me, Peter's > subclassing UserString for MutableString to borrow implementation is > fine. Oh, I agree with this. An earlier argument which got snipped in the discussion is why it's a bad idea to borrow implementation (a totally different argument) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From fdrake@acm.org Wed Mar 29 16:02:13 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 29 Mar 2000 11:02:13 -0500 (EST) Subject: [Python-Dev] 1.5.2->1.6 Changes In-Reply-To: References: Message-ID: <14562.10501.726637.335088@seahag.cnri.reston.va.us> Moshe Zadka writes: > -- filecmp.py (supersedes the old cmp.py and dircmp.py modules), > -- tabnanny.py (make sure the source file doesn't assume a specific tab-width) Weren't these in 1.5.2? I think filecmp is documented in the released docs... ah, no, I'm safe. ;) > Please e-mail me personally if you think of any other changes, and I'll > try to integrate them into a complete "changes" document. The documentation is updated. ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From skip@mojam.com (Skip Montanaro) Wed Mar 29 16:57:51 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 29 Mar 2000 10:57:51 -0600 Subject: [Python-Dev] CVS woes... Message-ID: <200003291657.KAA22177@beluga.mojam.com> Does anyone else besides me have trouble getting their Python tree to sync with the CVS repository? I've tried all manner of flags to "cvs update", most recently "cvs update -d -A ." with no success. There are still some files I know Fred Drake has patched that show up as different and it refuses to pick up Lib/robotparser.py. I'm going to blast my current tree and start anew after saving one or two necessary files. Any thoughts you might have would be much appreciated. (Private emails please, unless for some reason you think this should be a python-dev topic. I only post here because I suspect most of the readers use CVS to keep in frequent sync and may have some insight.) Thx, -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From Moshe Zadka Wed Mar 29 17:06:59 2000 From: Moshe Zadka (Moshe Zadka) Date: Wed, 29 Mar 2000 19:06:59 +0200 (IST) Subject: [Python-Dev] 1.5.2->1.6 Changes In-Reply-To: <14562.10501.726637.335088@seahag.cnri.reston.va.us> Message-ID: On Wed, 29 Mar 2000, Fred L. Drake, Jr. wrote: > > Moshe Zadka writes: > > -- filecmp.py (supersedes the old cmp.py and dircmp.py modules), > > -- tabnanny.py (make sure the source file doesn't assume a specific tab-width) > > Weren't these in 1.5.2? I think filecmp is documented in the > released docs... ah, no, I'm safe. ;) Tabnanny wasn't a module, and filecmp wasn't at all. > The documentation is updated. ;) Yes, but it was released as a late part of 1.5.2. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From Fredrik Lundh" Message-ID: <01b701bf999d$267b6740$34aab5d4@hagrid> Skip wrote: > Does anyone else besides me have trouble getting their Python tree to = sync > with the CVS repository? I've tried all manner of flags to "cvs = update", > most recently "cvs update -d -A ." with no success. There are still = some > files I know Fred Drake has patched that show up as different and it = refuses=20 > to pick up Lib/robotparser.py. note that robotparser doesn't show up on cvs.python.org either. maybe cnri's cvs admins should look into this... From fdrake@acm.org Wed Mar 29 18:20:14 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 29 Mar 2000 13:20:14 -0500 (EST) Subject: [Python-Dev] CVS woes... In-Reply-To: <200003291657.KAA22177@beluga.mojam.com> References: <200003291657.KAA22177@beluga.mojam.com> Message-ID: <14562.18782.465814.696099@seahag.cnri.reston.va.us> Skip Montanaro writes: > most recently "cvs update -d -A ." with no success. There are still some > files I know Fred Drake has patched that show up as different and it refuses You should be aware that many of the more recent documentation patches have been in the 1.5.2p2 branch (release-1.5.2p1-patches, I think), rather than the development head. I'm hoping to begin the merge in the next week. I also have a few patches that I haven't had time to look at yet, and I'm not inclined to make any changes until I've merged the 1.5.2p2 docs with the 1.6 tree, mostly to keep the merge from being any more painful than I already expect it to be. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From bwarsaw@cnri.reston.va.us Wed Mar 29 18:22:57 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 29 Mar 2000 13:22:57 -0500 (EST) Subject: [Python-Dev] CVS woes... References: <200003291657.KAA22177@beluga.mojam.com> <01b701bf999d$267b6740$34aab5d4@hagrid> Message-ID: <14562.18945.407398.812930@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> note that robotparser doesn't show up on cvs.python.org FL> either. maybe cnri's cvs admins should look into this... I've just resync'd python/dist and am doing a fresh checkout now. Looks like Lib/robotparser.py is there now. -Barry From guido@python.org Wed Mar 29 18:23:38 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 29 Mar 2000 13:23:38 -0500 Subject: [Python-Dev] CVS woes... In-Reply-To: Your message of "Wed, 29 Mar 2000 10:57:51 CST." <200003291657.KAA22177@beluga.mojam.com> References: <200003291657.KAA22177@beluga.mojam.com> Message-ID: <200003291823.NAA20134@eric.cnri.reston.va.us> > Does anyone else besides me have trouble getting their Python tree to sync > with the CVS repository? I've tried all manner of flags to "cvs update", > most recently "cvs update -d -A ." with no success. There are still some > files I know Fred Drake has patched that show up as different and it refuses > to pick up Lib/robotparser.py. My bad. When I move or copy a file around in the CVS repository directly instead of using cvs commit, I have to manually call a script that updates the mirror. I've done that now, and robotparser.py should now be in the mirror. --Guido van Rossum (home page: http://www.python.org/~guido/) From gward@cnri.reston.va.us Wed Mar 29 19:06:14 2000 From: gward@cnri.reston.va.us (Greg Ward) Date: Wed, 29 Mar 2000 14:06:14 -0500 Subject: [Python-Dev] Distutils now in Python CVS tree Message-ID: <20000329140613.A5850@cnri.reston.va.us> Hi all -- Distutils is now available through the Python CVS tree *in addition to its own CVS tree*. That is, if you keep on top of developments in the Python CVS tree, then you will be tracking the latest Distutils code in Lib/distutils. Or, you can keep following the Distutils through its own CVS tree. (This is all done through one itty-bitty little symlink in the CNRI CVS repository, and It Just Works. Cool.) Note that only the 'distutils' subdirectory of the distutils distribution is tracked by Python: that is, changes to the documentation, test suites, and example setup scripts are *not* reflected in the Python CVS tree. If you follow neither Python nor Distutils CVS updates, this doesn't affect you. If you've been following Distutils CVS updates, you can continue to do so as you've always done (and as is documented on the Distutils "Anonymous CVS" web page). If you've been following Python CVS updates, then you are now following most Distutils CVS updates too -- as long as you do "cvs update -d", of course. If you're interested in following updates in the Distutils documentation, tests, examples, etc. then you should follow the Distutils CVS tree directly. If you've been following *both* Python and Distutils CVS updates, and hacking on the Distutils, then you should pick one or the other as your working directory. If you submit patches, it doesn't really matter if they're relative to the top of the Python tree, the top of the Distutils tree, or what -- I'll probably figure it out. However, it's probably best to continue sending Distutils patches to distutils-sig@python.org, *or* direct to me (gward@python.net) for trivial patches. Unless Guido says otherwise, I don't see a compelling reason to send Distutils patches to patches@python.org. In related news, the distutils-checkins list is probably going to go away, and all Distutils checkin messages will go python-checkins instead. Let me know if you avidly follow distutils-checkins, but do *not* want to follow python-checkins -- if lots of people respond (doubtful, as distutils-checkins only had 3 subscribers last I checked!), we'll reconsider. Greg From fdrake@acm.org Wed Mar 29 19:28:19 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 29 Mar 2000 14:28:19 -0500 (EST) Subject: [Python-Dev] Re: [Distutils] Distutils now in Python CVS tree In-Reply-To: <20000329140525.A5842@cnri.reston.va.us> References: <20000329140525.A5842@cnri.reston.va.us> Message-ID: <14562.22867.998809.897214@seahag.cnri.reston.va.us> Greg Ward writes: > Distutils is now available through the Python CVS tree *in addition to > its own CVS tree*. That is, if you keep on top of developments in the > Python CVS tree, then you will be tracking the latest Distutils code in > Lib/distutils. Or, you can keep following the Distutils through its own > CVS tree. (This is all done through one itty-bitty little symlink in > the CNRI CVS repository, and It Just Works. Cool.) Greg, You may want to point out the legalese requirements for patches to the Python tree. ;( That means the patches should probably go to patches@python.org or you should ensure an archive of all the legal statements is maintained at CNRI. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From ping@lfw.org Wed Mar 29 21:44:31 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Wed, 29 Mar 2000 15:44:31 -0600 (CST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <02c901bf989b$be203d80$34aab5d4@hagrid> Message-ID: On Tue, 28 Mar 2000, Fredrik Lundh wrote: > > > IMO this subdivision could be discussed and possibly revised. > > here's one proposal: > http://www.pythonware.com/people/fredrik/librarybook-contents.htm Wow. I don't think i hardly ever use any of the modules in your "Commonly Used Modules" category. Except traceback, from time to time, but that's really the only one! Hmm. I'd arrange things a little differently, though i do like the category for Data Representation (it should probably go next to Data Storage though). I would prefer a separate group for interpreter-and-development-related things. The "File Formats" group seems weak... to me, its contents would better belong in a "parsing" or "text processing" classification. urlparse definitely goes with urllib. These comments are kind of random, i know... maybe i'll try putting together another grouping if i have any time. -- ?!ng From adustman@comstar.net Thu Mar 30 00:57:06 2000 From: adustman@comstar.net (Andy Dustman) Date: Wed, 29 Mar 2000 19:57:06 -0500 (EST) Subject: [Python-Dev] socketmodule with SSL enabled In-Reply-To: <200003290150.UAA17819@eric.cnri.reston.va.us> Message-ID: I had to make the following one-line change to socketmodule.c so that it would link properly with openssl-0.9.4. In studying the openssl include files, I found: #define SSLeay_add_ssl_algorithms() SSL_library_init() SSL_library_init() seems to be the "correct" call nowadays. I don't know why this isn't being picked up. I also don't know how well the module works, other than it imports, but I sure would like to try it with Zope/ZServer/Medusa... -- andy dustman | programmer/analyst | comstar.net, inc. telephone: 770.485.6025 / 706.549.7689 | icq: 32922760 | pgp: 0xc72f3f1d "Therefore, sweet knights, if you may doubt your strength or courage, come no further, for death awaits you all, with nasty, big, pointy teeth!" Index: socketmodule.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Modules/socketmodule.c,v retrieving revision 1.98 diff -c -r1.98 socketmodule.c *** socketmodule.c 2000/03/24 20:56:56 1.98 --- socketmodule.c 2000/03/30 00:49:09 *************** *** 2384,2390 **** return; #ifdef USE_SSL SSL_load_error_strings(); ! SSLeay_add_ssl_algorithms(); SSLErrorObject = PyErr_NewException("socket.sslerror", NULL, NULL); if (SSLErrorObject == NULL) return; --- 2384,2390 ---- return; #ifdef USE_SSL SSL_load_error_strings(); ! SSL_library_init(); SSLErrorObject = PyErr_NewException("socket.sslerror", NULL, NULL); if (SSLErrorObject == NULL) return; From gstein@lyra.org Thu Mar 30 02:54:27 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 29 Mar 2000 18:54:27 -0800 (PST) Subject: [Python-Dev] installation points (was: Great Renaming? What is the goal?) In-Reply-To: <1257794452-30405909@hypernet.com> Message-ID: On Wed, 29 Mar 2000, Gordon McMillan wrote: > Moshe Zadka wrote: > > On Tue, 28 Mar 2000, Gordon McMillan wrote: > > > What would happen if he (and everyone else) installed > > > themselves *into* my core packages, then I decided I didn't > > > want his stuff? More than likely I'd have to scrub the damn > > > installation and start all over again. > > > > I think Greg Stein answered that objection, by reminding us that the > > filesystem isn't the only way to set up a package hierarchy. > > You mean when Greg said: > >Assuming that you use an archive like those found in my "small" distro or > > Gordon's distro, then this is no problem. The archive simply recognizes > > and maps "text.encoding.macbinary" to its own module. > > I don't know what this has to do with it. When we get around > to the 'macbinary' part, we have already established that > 'text.encoding' is the parent which should supply 'macbinary'. good point... > > In > > particular, even with Python's current module system, there is no need to > > scrub installations: Python core modules go (under UNIX) in > > /usr/local/lib/python1.5, and 3rd party modules go in > > /usr/local/lib/python1.5/site-packages. > > And if there's a /usr/local/lib/python1.5/text/encoding, there's > no way that /usr/local/lib/python1.5/site- > packages/text/encoding will get searched. > > I believe you could hack up an importer that did allow this, and > I think you'd be 100% certifiable if you did. Just look at the > surprise factor. > > Hacking stuff into another package is just as evil as math.pi = > 42. Not if the package was designed for it. For a "package" like "net", it would be perfectly acceptable to allow third-parties to define that as their installation point. And yes, assume there is an importer that looks into the installed archives for modules. In the example, the harder part is determining where the "text.encoding" package is loaded from. And yah: it may be difficult to arrange the the text.encoding's importer to allow for archive searching. Cheers, -g -- Greg Stein, http://www.lyra.org/ From thomas.heller@ion-tof.com Thu Mar 30 19:30:25 2000 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Thu, 30 Mar 2000 21:30:25 +0200 Subject: [Python-Dev] Metaclasses, customizing attribute access for classes Message-ID: <021c01bf9a7e$662327c0$4500a8c0@thomasnotebook> Dear Python-developers, Recently I played with metaclasses from within python, also with Jim Fulton's ExtensionClass. I even tried to write my own metaclass in a C-extension, using the famous Don Beaudry hook. It seems that ExtensionClass does not completely what I want. Metaclasses implemented in python are somewhat slow, also writing them is a lot of work. Writing a metaclass in C is even more work... Well, what do I want? Often, I use the following pattern: class X: def __init__ (self): self.delegate = anObjectImplementedInC(...) def __getattr__ (self, key): return self.delegate.dosomething(key) def __setattr__ (self, key, value): self.delegate.doanotherthing(key, value) def __delattr__ (self, key): self.delegate.doevenmore(key) This is too slow (for me). So what I would like do to is: class X: def __init__ (self): self.__dict__ = aMappingObject(...) and now aMappingObject will automatically receive all the setattr, getattr, and delattr calls. The *only* thing which is required for this is to remove the restriction that the __dict__ attribute must be a dictionary. This is only a small change to classobject.c (which unfortunately I have only implemented for 1.5.2, not for the CVS version). The performance impact for this change is unnoticable in pystone. What do you think? Should I prepare a patch? Any chance that this can be included in a future python version? Thomas Heller From petrilli@amber.org Thu Mar 30 19:52:02 2000 From: petrilli@amber.org (Christopher Petrilli) Date: Thu, 30 Mar 2000 14:52:02 -0500 Subject: [Python-Dev] Unicode compile Message-ID: <20000330145202.B9078@trump.amber.org> I don't know how much memory other people have in their machiens, but in this machine (128Mb), I get the following trying to compile a CVS checkout of Python: gcc -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c ./unicodedatabase.c:53482: virtual memory exhausted I hope that this is a temporary thing, or we ship the database some other manner, but I would argue that you should be able to compile Python on a machine with 32Mb of RAM at MOST.... for an idea of how much VM this machine has, i have 256Mb of SWAP on top of it. Chris -- | Christopher Petrilli | petrilli@amber.org From guido@python.org Thu Mar 30 20:12:22 2000 From: guido@python.org (Guido van Rossum) Date: Thu, 30 Mar 2000 15:12:22 -0500 Subject: [Python-Dev] Unicode compile In-Reply-To: Your message of "Thu, 30 Mar 2000 14:52:02 EST." <20000330145202.B9078@trump.amber.org> References: <20000330145202.B9078@trump.amber.org> Message-ID: <200003302012.PAA22062@eric.cnri.reston.va.us> > I don't know how much memory other people have in their machiens, but > in this machine (128Mb), I get the following trying to compile a CVS > checkout of Python: > > gcc -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c > ./unicodedatabase.c:53482: virtual memory exhausted > > I hope that this is a temporary thing, or we ship the database some > other manner, but I would argue that you should be able to compile > Python on a machine with 32Mb of RAM at MOST.... for an idea of how > much VM this machine has, i have 256Mb of SWAP on top of it. I'm not sure how to fix this, short of reading the main database from a file. Marc-Andre? --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer@tismer.com Thu Mar 30 20:14:55 2000 From: tismer@tismer.com (Christian Tismer) Date: Thu, 30 Mar 2000 22:14:55 +0200 Subject: [Python-Dev] Unicode compile References: <20000330145202.B9078@trump.amber.org> Message-ID: <38E3B5BF.2D00F930@tismer.com> Christopher Petrilli wrote: > > I don't know how much memory other people have in their machiens, but > in this machine (128Mb), I get the following trying to compile a CVS > checkout of Python: > > gcc -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c > ./unicodedatabase.c:53482: virtual memory exhausted > > I hope that this is a temporary thing, or we ship the database some > other manner, but I would argue that you should be able to compile > Python on a machine with 32Mb of RAM at MOST.... for an idea of how > much VM this machine has, i have 256Mb of SWAP on top of it. I had similar effects, what made me work on a compressed database (see older messages). Due to time limits, I will not get ready before 1.6.a1 is out. And then quite a lot of other changes will be necessary by Marc, since the API changes quite much. But it will definately be a less than 20 KB module, proven. ciao - chris(2) -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From akuchlin@mems-exchange.org Thu Mar 30 20:14:27 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Thu, 30 Mar 2000 15:14:27 -0500 (EST) Subject: [Python-Dev] Unicode compile In-Reply-To: <200003302012.PAA22062@eric.cnri.reston.va.us> References: <20000330145202.B9078@trump.amber.org> <200003302012.PAA22062@eric.cnri.reston.va.us> Message-ID: <14563.46499.555853.413690@amarok.cnri.reston.va.us> Guido van Rossum writes: >I'm not sure how to fix this, short of reading the main database from >a file. Marc-Andre? Turning off optimization may help. (Or it may not -- it might be creating the data structures for a large static table that's the problem.) --amk From akuchlin@mems-exchange.org Thu Mar 30 20:22:02 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Thu, 30 Mar 2000 15:22:02 -0500 (EST) Subject: [Python-Dev] mmapfile module In-Reply-To: <200003282000.PAA11988@eric.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> Message-ID: <14563.46954.70800.706245@amarok.cnri.reston.va.us> Guido van Rossum writes: >I don't know enough about this, but it seems that there might be two >steps: *creating* a mmap object is necessarily platform-specific; but >*using* a mmap object could be platform-neutral. > >What is the API for mmap objects? You create them; Unix wants a file descriptor, and Windows wants a filename. Then they behave like buffer objects, like mutable strings. I like Fredrik's suggestion of an 'open(filename, mode, ...)' type of interface. If someone can suggest a way to handle the extra flags such as MAP_SHARED and the Windows tag argument, I'll happily implement it. Maybe just keyword arguments that differ across platforms? open(filename, mode, [tag = 'foo',] [flags = mmapfile.MAP_SHARED]). We could preserve the ability to mmap() only a file descriptor on Unix through a separate openfd() function. I'm also strongly tempted to rename the module from mmapfile to just 'mmap'. I'd suggest waiting until the interface is finalized before adding the module to the CVS tree -- which means after 1.6a1 -- but I can add the module as it stands if you like. Guido, let me know if you want me to do that. -- A.M. Kuchling http://starship.python.net/crew/amk/ A Puck is harder by far to hurt than some little lord of malice from the lands of ice and snow. We Pucks are old and hard and wild... -- Robin Goodfellow, in SANDMAN #66: "The Kindly Ones:10" From guido@python.org Thu Mar 30 20:23:42 2000 From: guido@python.org (Guido van Rossum) Date: Thu, 30 Mar 2000 15:23:42 -0500 Subject: [Python-Dev] socketmodule with SSL enabled In-Reply-To: Your message of "Wed, 29 Mar 2000 19:57:06 EST." References: Message-ID: <200003302023.PAA22350@eric.cnri.reston.va.us> > I had to make the following one-line change to socketmodule.c so that it > would link properly with openssl-0.9.4. In studying the openssl include > files, I found: > > #define SSLeay_add_ssl_algorithms() SSL_library_init() > > SSL_library_init() seems to be the "correct" call nowadays. I don't know > why this isn't being picked up. I also don't know how well the module > works, other than it imports, but I sure would like to try it with > Zope/ZServer/Medusa... Strange -- the version of OpenSSL I have also calls itself 0.9.4 ("OpenSSL 0.9.4 09 Aug 1999" to be precise) and doesn't have SSL_library_init(). I wonder what gives... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Mar 30 20:25:58 2000 From: guido@python.org (Guido van Rossum) Date: Thu, 30 Mar 2000 15:25:58 -0500 Subject: [Python-Dev] mmapfile module In-Reply-To: Your message of "Thu, 30 Mar 2000 15:22:02 EST." <14563.46954.70800.706245@amarok.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> Message-ID: <200003302025.PAA22367@eric.cnri.reston.va.us> > Guido van Rossum writes: > >I don't know enough about this, but it seems that there might be two > >steps: *creating* a mmap object is necessarily platform-specific; but > >*using* a mmap object could be platform-neutral. > > > >What is the API for mmap objects? [AMK] > You create them; Unix wants a file descriptor, and Windows wants a > filename. Then they behave like buffer objects, like mutable strings. > > I like Fredrik's suggestion of an 'open(filename, mode, ...)' type of > interface. If someone can suggest a way to handle the extra flags > such as MAP_SHARED and the Windows tag argument, I'll happily > implement it. Maybe just keyword arguments that differ across > platforms? open(filename, mode, [tag = 'foo',] [flags = > mmapfile.MAP_SHARED]). We could preserve the ability to mmap() only a > file descriptor on Unix through a separate openfd() function. Yes, keyword args seem to be the way to go. To avoid an extra function you could add a fileno=... kwarg, in which case the filename is ignored or required to be "". > I'm > also strongly tempted to rename the module from mmapfile to just > 'mmap'. Sure. > I'd suggest waiting until the interface is finalized before adding the > module to the CVS tree -- which means after 1.6a1 -- but I can add the > module as it stands if you like. Guido, let me know if you want me to > do that. Might as well check it in -- the alpha is going to be rough and I expect another alpha to come out shortly to correct the biggest problems. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Thu Mar 30 20:22:08 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 30 Mar 2000 22:22:08 +0200 Subject: [Python-Dev] Unicode compile References: <20000330145202.B9078@trump.amber.org> <200003302012.PAA22062@eric.cnri.reston.va.us> Message-ID: <38E3B770.6CD61C37@lemburg.com> Guido van Rossum wrote: > > > I don't know how much memory other people have in their machiens, but > > in this machine (128Mb), I get the following trying to compile a CVS > > checkout of Python: > > > > gcc -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c > > ./unicodedatabase.c:53482: virtual memory exhausted > > > > I hope that this is a temporary thing, or we ship the database some > > other manner, but I would argue that you should be able to compile > > Python on a machine with 32Mb of RAM at MOST.... for an idea of how > > much VM this machine has, i have 256Mb of SWAP on top of it. > > I'm not sure how to fix this, short of reading the main database from > a file. Marc-Andre? Hmm, the file compiles fine on my 64MB Linux machine with about 100MB of swap. What gcc version do you use ? Anyway, once Christian is ready with his compact replacement I think we no longer have to worry about that chunk of static data :-) Reading in the data from a file is not a very good solution, because it would override the OS optimizations for static data in object files (like e.g. swapping in only those pages which are really needed, etc.). An alternative solution would be breaking the large table into several smaller ones and accessing it via a redirection function. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From adustman@comstar.net Thu Mar 30 21:12:51 2000 From: adustman@comstar.net (Andy Dustman) Date: Thu, 30 Mar 2000 16:12:51 -0500 (EST) Subject: [Python-Dev] socketmodule with SSL enabled In-Reply-To: <200003302023.PAA22350@eric.cnri.reston.va.us> Message-ID: On Thu, 30 Mar 2000, Guido van Rossum wrote: > > I had to make the following one-line change to socketmodule.c so that it > > would link properly with openssl-0.9.4. In studying the openssl include > > files, I found: > > > > #define SSLeay_add_ssl_algorithms() SSL_library_init() > > > > SSL_library_init() seems to be the "correct" call nowadays. I don't know > > why this isn't being picked up. I also don't know how well the module > > works, other than it imports, but I sure would like to try it with > > Zope/ZServer/Medusa... > > Strange -- the version of OpenSSL I have also calls itself 0.9.4 > ("OpenSSL 0.9.4 09 Aug 1999" to be precise) and doesn't have > SSL_library_init(). > > I wonder what gives... I don't know. Right after I made the patch, I found that 0.9.5 is available, and I was able to successfully compile against that version (with the patch). -- andy dustman | programmer/analyst | comstar.net, inc. telephone: 770.485.6025 / 706.549.7689 | icq: 32922760 | pgp: 0xc72f3f1d "Therefore, sweet knights, if you may doubt your strength or courage, come no further, for death awaits you all, with nasty, big, pointy teeth!" From akuchlin@mems-exchange.org Thu Mar 30 21:19:45 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Thu, 30 Mar 2000 16:19:45 -0500 (EST) Subject: [Python-Dev] mmapfile module In-Reply-To: <200003302025.PAA22367@eric.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> Message-ID: <14563.50417.909045.81868@amarok.cnri.reston.va.us> Guido van Rossum writes: >Might as well check it in -- the alpha is going to be rough and I >expect another alpha to come out shortly to correct the biggest >problems. Done -- just doing my bit to ensure the first alpha is rough! :) My next task is to add the Expat module. My understanding is that it's OK to add Expat itself, too; where should I put all that code? Modules/expat/* ? -- A.M. Kuchling http://starship.python.net/crew/amk/ I'll bring the Kindly Ones down on his blasted head. -- Desire, in SANDMAN #31: "Three Septembers and a January" From fdrake@acm.org Thu Mar 30 21:29:58 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 30 Mar 2000 16:29:58 -0500 (EST) Subject: [Python-Dev] mmapfile module In-Reply-To: <14563.50417.909045.81868@amarok.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> Message-ID: <14563.51030.24773.587972@seahag.cnri.reston.va.us> Andrew M. Kuchling writes: > Done -- just doing my bit to ensure the first alpha is rough! :) > > My next task is to add the Expat module. My understanding is that > it's OK to add Expat itself, too; where should I put all that code? > Modules/expat/* ? Do you have documentation for this? -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From akuchlin@mems-exchange.org Thu Mar 30 21:30:35 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Thu, 30 Mar 2000 16:30:35 -0500 (EST) Subject: [Python-Dev] mmapfile module In-Reply-To: <14563.51030.24773.587972@seahag.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <14563.51030.24773.587972@seahag.cnri.reston.va.us> Message-ID: <14563.51067.560938.367690@amarok.cnri.reston.va.us> Fred L. Drake, Jr. writes: > Do you have documentation for this? Somewhere at home, I think, but not here at work. I'll try to get it checked in before 1.6alpha1, but don't hold me to that. --amk From guido@python.org Thu Mar 30 21:31:58 2000 From: guido@python.org (Guido van Rossum) Date: Thu, 30 Mar 2000 16:31:58 -0500 Subject: [Python-Dev] mmapfile module In-Reply-To: Your message of "Thu, 30 Mar 2000 16:19:45 EST." <14563.50417.909045.81868@amarok.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> Message-ID: <200003302131.QAA22897@eric.cnri.reston.va.us> > Done -- just doing my bit to ensure the first alpha is rough! :) When the going gets rough, the rough get going :-) > My next task is to add the Expat module. My understanding is that > it's OK to add Expat itself, too; where should I put all that code? > Modules/expat/* ? Whoa... Not sure. This will give issues with Patrice, at least (even if it is pure Open Source -- given the size). I'd prefer to add instructions to Setup.in about where to get it. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Thu Mar 30 21:34:55 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 30 Mar 2000 16:34:55 -0500 (EST) Subject: [Python-Dev] mmapfile module In-Reply-To: <14563.51067.560938.367690@amarok.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <14563.51030.24773.587972@seahag.cnri.reston.va.us> <14563.51067.560938.367690@amarok.cnri.reston.va.us> Message-ID: <14563.51327.190466.477566@seahag.cnri.reston.va.us> Andrew M. Kuchling writes: > Somewhere at home, I think, but not here at work. I'll try to get it > checked in before 1.6alpha1, but don't hold me to that. The date isn't important; I'm not planning to match alpha/beta releases with Doc releases. I just want to be sure it gets in soon so that the debugging process can kick in for that as well. ;) Thanks! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido@python.org Thu Mar 30 21:34:02 2000 From: guido@python.org (Guido van Rossum) Date: Thu, 30 Mar 2000 16:34:02 -0500 Subject: [Python-Dev] mmapfile module In-Reply-To: Your message of "Thu, 30 Mar 2000 16:31:58 EST." <200003302131.QAA22897@eric.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> Message-ID: <200003302134.QAA22939@eric.cnri.reston.va.us> > Whoa... Not sure. This will give issues with Patrice, at least (even > if it is pure Open Source -- given the size). For those outside CNRI -- Patrice is CNRI's tough IP lawyer. --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin@mems-exchange.org Thu Mar 30 21:48:13 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Thu, 30 Mar 2000 16:48:13 -0500 (EST) Subject: [Python-Dev] Expat module In-Reply-To: <200003302131.QAA22897@eric.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> Message-ID: <14563.52125.401817.986919@amarok.cnri.reston.va.us> Guido van Rossum writes: >> My next task is to add the Expat module. My understanding is that >> it's OK to add Expat itself, too; where should I put all that code? >> Modules/expat/* ? > >Whoa... Not sure. This will give issues with Patrice, at least (even >if it is pure Open Source -- given the size). I'd prefer to add >instructions to Setup.in about where to get it. Fair enough; I'll just add the module itself, then, and we can always change it later. Should we consider replacing the makesetup/Setup.in mechanism with a setup.py script that uses the Distutils? You'd have to compile a minipython with just enough critical modules -- strop and posixmodule are probably the most important ones -- in order to run setup.py. It's something I'd like to look at for 1.6, because then you could be much smarter in automatically enabling modules. -- A.M. Kuchling http://starship.python.net/crew/amk/ This is the way of Haskell or Design by Contract of Eiffel. This one is like wearing a XV century armor, you walk very safely but in a very tiring way. -- Manuel Gutierrez Algaba, 26 Jan 2000 From guido@python.org Thu Mar 30 22:41:45 2000 From: guido@python.org (Guido van Rossum) Date: Thu, 30 Mar 2000 17:41:45 -0500 Subject: [Python-Dev] Expat module In-Reply-To: Your message of "Thu, 30 Mar 2000 16:48:13 EST." <14563.52125.401817.986919@amarok.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> <14563.52125.401817.986919@amarok.cnri.reston.va.us> Message-ID: <200003302241.RAA23050@eric.cnri.reston.va.us> > Fair enough; I'll just add the module itself, then, and we can always > change it later. OK. > Should we consider replacing the makesetup/Setup.in mechanism with a > setup.py script that uses the Distutils? You'd have to compile a > minipython with just enough critical modules -- strop and posixmodule > are probably the most important ones -- in order to run setup.py. > It's something I'd like to look at for 1.6, because then you could be > much smarter in automatically enabling modules. If you can come up with something that works well enough, that would be great. (Although I'm not sure where the distutils come in.) We still need to use configure/autoconf though. Hardcoding a small complement of modules is no problem. (Why do you think you need strop though? Remember we have string methods!) --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond@skippinet.com.au Thu Mar 30 23:03:39 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Fri, 31 Mar 2000 09:03:39 +1000 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/PC python_nt.rc,1.8,1.9 In-Reply-To: <200003302259.RAA23266@eric.cnri.reston.va.us> Message-ID: This is the version number as displayed by Windows Explorer in the "properties" dialog. Mark. > Modified Files: > python_nt.rc > Log Message: > Seems there was a version string here that still looked > like 1.5.2. > > > Index: python_nt.rc > ========================================================== > ========= > RCS file: /projects/cvsroot/python/dist/src/PC/python_nt.rc,v > retrieving revision 1.8 > retrieving revision 1.9 > diff -C2 -r1.8 -r1.9 > *** python_nt.rc 2000/03/29 01:50:50 1.8 > --- python_nt.rc 2000/03/30 22:59:09 1.9 > *************** > *** 29,34 **** > > VS_VERSION_INFO VERSIONINFO > ! FILEVERSION 1,5,2,3 > ! PRODUCTVERSION 1,5,2,3 > FILEFLAGSMASK 0x3fL > #ifdef _DEBUG > --- 29,34 ---- > > VS_VERSION_INFO VERSIONINFO > ! FILEVERSION 1,6,0,0 > ! PRODUCTVERSION 1,6,0,0 > FILEFLAGSMASK 0x3fL > #ifdef _DEBUG > > > _______________________________________________ > Python-checkins mailing list > Python-checkins@python.org > http://www.python.org/mailman/listinfo/python-checkins > From Fredrik Lundh" at this time, SRE uses types instead of classes for compiled patterns and matches. these classes provide a documented interface, and a bunch of internal attributes, for example: RegexObjects: code -- a PCRE code object pattern -- the source pattern groupindex -- maps group names to group indices MatchObjects: regs -- same as match.span()? groupindex -- as above re -- the pattern object used for this match string -- the target string used for this match the problem is that some other modules use these attributes directly. for example, xmllib.py uses the pattern attribute, and other code I've seen uses regs to speed things up. in SRE, I would like to get rid of all these (except possibly for the match.string attribute). opinions? From guido@python.org Thu Mar 30 23:31:43 2000 From: guido@python.org (Guido van Rossum) Date: Thu, 30 Mar 2000 18:31:43 -0500 Subject: [Python-Dev] SRE: what to do with undocumented attributes? In-Reply-To: Your message of "Fri, 31 Mar 2000 00:40:51 +0200." <00b701bf9a99$022339c0$34aab5d4@hagrid> References: <00b701bf9a99$022339c0$34aab5d4@hagrid> Message-ID: <200003302331.SAA24895@eric.cnri.reston.va.us> > at this time, SRE uses types instead of classes for compiled > patterns and matches. these classes provide a documented > interface, and a bunch of internal attributes, for example: > > RegexObjects: > > code -- a PCRE code object > pattern -- the source pattern > groupindex -- maps group names to group indices > > MatchObjects: > > regs -- same as match.span()? > groupindex -- as above > re -- the pattern object used for this match > string -- the target string used for this match > > the problem is that some other modules use these attributes > directly. for example, xmllib.py uses the pattern attribute, and > other code I've seen uses regs to speed things up. > > in SRE, I would like to get rid of all these (except possibly for > the match.string attribute). > > opinions? Sounds reasonable. All std lib modules that violate this will need to be fixed once sre.py replaces re.py. (Checkin of sre is next.) --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin@mems-exchange.org Thu Mar 30 23:40:16 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Thu, 30 Mar 2000 18:40:16 -0500 (EST) Subject: [Python-Dev] SRE: what to do with undocumented attributes? In-Reply-To: <00b701bf9a99$022339c0$34aab5d4@hagrid> References: <00b701bf9a99$022339c0$34aab5d4@hagrid> Message-ID: <14563.58848.109072.339060@amarok.cnri.reston.va.us> Fredrik Lundh writes: >RegexObjects: > code -- a PCRE code object > pattern -- the source pattern > groupindex -- maps group names to group indices pattern and groupindex are documented in the Library Reference, and they're part of the public interface. .code is not, so you can drop it. >MatchObjects: > regs -- same as match.span()? > groupindex -- as above > re -- the pattern object used for this match > string -- the target string used for this match .re and .string are documented. I don't see a reference to MatchObject.groupindex anywhere, and .regs isn't documented, so those two can be ignored; xmllib or whatever external modules use them are being very naughty, so go ahead and break them. -- A.M. Kuchling http://starship.python.net/crew/amk/ Imagine a thousand thousand fireflies of every shape and color; Oh, that was Baghdad at night in those days. -- From SANDMAN #50: "Ramadan" From Fredrik Lundh" <14563.58848.109072.339060@amarok.cnri.reston.va.us> Message-ID: <00e901bf9a9c$6c036240$34aab5d4@hagrid> Andrew wrote: > >RegexObjects: > > code -- a PCRE code object > > pattern -- the source pattern > > groupindex -- maps group names to group indices >=20 > pattern and groupindex are documented in the Library Reference, and > they're part of the public interface. hmm. I could have sworn... guess I didn't look carefully enough (or someone's used his time machine again :-). oh well, more bloat... btw, "pattern" doesn't make much sense in SRE -- who says the pattern object was created by re.compile? guess I'll just set it to None in other cases (e.g. sregex, sreverb, sgema...) From bwarsaw@cnri.reston.va.us Fri Mar 31 00:35:16 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 30 Mar 2000 19:35:16 -0500 (EST) Subject: [Python-Dev] SRE: what to do with undocumented attributes? References: <00b701bf9a99$022339c0$34aab5d4@hagrid> <14563.58848.109072.339060@amarok.cnri.reston.va.us> <00e901bf9a9c$6c036240$34aab5d4@hagrid> Message-ID: <14563.62148.860971.360871@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> hmm. I could have sworn... guess I didn't look carefully FL> enough (or someone's used his time machine again :-). Yep, sorry. If it's documented as in the public interface, it should be kept. Anything else can go (he says without yet grep'ing through his various code bases). -Barry From bwarsaw@cnri.reston.va.us Fri Mar 31 04:34:15 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 30 Mar 2000 23:34:15 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 References: <200003310117.UAA26774@eric.cnri.reston.va.us> Message-ID: <14564.10951.90258.729547@anthem.cnri.reston.va.us> >>>>> "Guido" == Guido van Rossum writes: Guido> Modified Files: mmapmodule.c Log Message: Hacked for Win32 Guido> by Mark Hammond. Reformatted for 8-space tabs and fitted Guido> into 80-char lines by GvR. Can we change the 8-space-tab rule for all new C code that goes in? I know that we can't practically change existing code right now, but for new C code, I propose we use no tab characters, and we use a 4-space block indentation. -Barry From DavidA@ActiveState.com Fri Mar 31 05:07:02 2000 From: DavidA@ActiveState.com (David Ascher) Date: Thu, 30 Mar 2000 21:07:02 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 In-Reply-To: <14564.10951.90258.729547@anthem.cnri.reston.va.us> Message-ID: > Can we change the 8-space-tab rule for all new C code that goes in? I > know that we can't practically change existing code right now, but for > new C code, I propose we use no tab characters, and we use a 4-space > block indentation. Heretic! +1, FWIW =) From bwarsaw@cnri.reston.va.us Fri Mar 31 05:16:48 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Fri, 31 Mar 2000 00:16:48 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 References: <14564.10951.90258.729547@anthem.cnri.reston.va.us> Message-ID: <14564.13504.310866.835201@anthem.cnri.reston.va.us> >>>>> "DA" == David Ascher writes: DA> Heretic! DA> +1, FWIW =) I hereby offer to so untabify and reformat any C code in the standard distribution that Guido will approve of. -Barry From mhammond@skippinet.com.au Fri Mar 31 05:16:26 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Fri, 31 Mar 2000 15:16:26 +1000 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 In-Reply-To: Message-ID: +1 for me too. It also brings all source files under the same guidelines (rather than seperate ones for .py and .c) Mark. From bwarsaw@cnri.reston.va.us Fri Mar 31 05:40:16 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Fri, 31 Mar 2000 00:40:16 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 References: Message-ID: <14564.14912.629414.970309@anthem.cnri.reston.va.us> >>>>> "MH" == Mark Hammond writes: MH> +1 for me too. It also brings all source files under the same MH> guidelines (rather than seperate ones for .py and .c) BTW, I further propose that if Guido lets me reformat the C code, that we freeze other checkins for the duration and I temporarily turn off the python-checkins email. That is, unless you guys /want/ to be bombarded with boatloads of useless diffs. :) -Barry From pf@artcom-gmbh.de Fri Mar 31 06:45:45 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Fri, 31 Mar 2000 08:45:45 +0200 (MEST) Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....) In-Reply-To: <14564.14912.629414.970309@anthem.cnri.reston.va.us> from "bwarsaw@cnri.reston.va.us" at "Mar 31, 2000 0:40:16 am" Message-ID: Hi! sigh :-( > >>>>> "MH" == Mark Hammond writes: > > MH> +1 for me too. It also brings all source files under the same > MH> guidelines (rather than seperate ones for .py and .c) bwarsaw@cnri.reston.va.us: > BTW, I further propose that if Guido lets me reformat the C code, that > we freeze other checkins for the duration and I temporarily turn off > the python-checkins email. That is, unless you guys /want/ to be > bombarded with boatloads of useless diffs. :) -1 for C reformatting. The 4 space intendation seesm reasonable for Python sources, but I disaggree for C code. C is not Python. Let me cite a very prominent member of the open source community (pasted from /usr/src/linux/Documentation/CodingStyle): Chapter 1: Indentation Tabs are 8 characters, and thus indentations are also 8 characters. There are heretic movements that try to make indentations 4 (or even 2!) characters deep, and that is akin to trying to define the value of PI to be 3. Rationale: The whole idea behind indentation is to clearly define where a block of control starts and ends. Especially when you've been looking at your screen for 20 straight hours, you'll find it a lot easier to see how the indentation works if you have large indentations. Now, some people will claim that having 8-character indentations makes the code move too far to the right, and makes it hard to read on a 80-character terminal screen. The answer to that is that if you need more than 3 levels of indentation, you're screwed anyway, and should fix your program. In short, 8-char indents make things easier to read, and have the added benefit of warning you when you're nesting your functions too deep. Heed that warning. Also the Python interpreter has no strong relationship with Linux kernel a agree with Linus on this topic. Python source code is another thing: Python identifiers are usually longer due to qualifiying and Python operands are often lists, tuples or the like, so lines contain more stuff. disliking-yet-another-white-space-discussion-ly y'rs - peter From mhammond@skippinet.com.au Fri Mar 31 07:11:50 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Fri, 31 Mar 2000 17:11:50 +1000 Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....) In-Reply-To: Message-ID: > Rationale: The whole idea behind indentation is to > clearly define where > a block of control starts and ends. Especially when Ironically, this statement is a strong argument for insisting on Python using real tab characters! "Clearly define" is upgraded to "used to define". > 80-character terminal screen. The answer to that is > that if you need > more than 3 levels of indentation, you're screwed > anyway, and should fix > your program. Yeah, right! int foo() { // one level for the privilege of being here. switch (bar) { // uh oh - running out of room... case WTF: // Oh no - if I use an "if" statement, // my code is "screwed"?? } } > disliking-yet-another-white-space-discussion-ly y'rs - peter Like-death-and-taxes-ly y'rs - Mark. From Moshe Zadka Fri Mar 31 08:04:32 2000 From: Moshe Zadka (Moshe Zadka) Date: Fri, 31 Mar 2000 10:04:32 +0200 (IST) Subject: [Python-Dev] mmapfile module In-Reply-To: <200003302134.QAA22939@eric.cnri.reston.va.us> Message-ID: On Thu, 30 Mar 2000, Guido van Rossum wrote: > > Whoa... Not sure. This will give issues with Patrice, at least (even > > if it is pure Open Source -- given the size). > > For those outside CNRI -- Patrice is CNRI's tough IP lawyer. It was understandable from the context... Personally, I'd rather if it was folded in by value, and not by reference: one reason is versioning problems, and another is pure laziness on my part. what-do-you-have-when-you-got-a-lawyer-up-to-his-neck-in-the-sand-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal@lemburg.com Fri Mar 31 07:42:04 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 31 Mar 2000 09:42:04 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us> Message-ID: <38E456CC.1A49334A@lemburg.com> "Barry A. Warsaw" wrote: > > >>>>> "Guido" == Guido van Rossum writes: > > Guido> Modified Files: mmapmodule.c Log Message: Hacked for Win32 > Guido> by Mark Hammond. Reformatted for 8-space tabs and fitted > Guido> into 80-char lines by GvR. > > Can we change the 8-space-tab rule for all new C code that goes in? I > know that we can't practically change existing code right now, but for > new C code, I propose we use no tab characters, and we use a 4-space > block indentation. Why not just leave new code formatted as it is (except maybe to bring the used TAB width to the standard 8 spaces used throughout the Python C source code) ? BTW, most of the new unicode stuff uses 4-space indents. Unfortunately, it mixes whitespace and tabs since Emacs c-mode doesn't do the python-mode magic yet (is there a way to turn it on ?). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Fredrik Lundh" Message-ID: <01ae01bf9af1$927b1940$34aab5d4@hagrid> Peter Funk wrote: > Also the Python interpreter has no strong relationship with Linux = kernel > a agree with Linus on this topic. Python source code is another = thing: > Python identifiers are usually longer due to qualifiying and Python > operands are often lists, tuples or the like, so lines contain more = stuff. you're just guessing, right? (if you check, you'll find that the actual difference is very small. iirc, that's true for c, c++, java, python, tcl, and probably a few more languages. dunno about perl, though... :-) From Fredrik Lundh" <14564.10951.90258.729547@anthem.cnri.reston.va.us> <38E456CC.1A49334A@lemburg.com> Message-ID: <01b501bf9af1$f9b44500$34aab5d4@hagrid> M.-A. Lemburg wrote: > Why not just leave new code formatted as it is (except maybe > to bring the used TAB width to the standard 8 spaces used throughout > the Python C source code) ? >=20 > BTW, most of the new unicode stuff uses 4-space indents. > Unfortunately, it mixes whitespace and tabs since Emacs=20 > c-mode doesn't do the python-mode magic yet (is there a > way to turn it on ?). http://www.jwz.org/doc/tabs-vs-spaces.html contains some hints. From Moshe Zadka Fri Mar 31 11:24:05 2000 From: Moshe Zadka (Moshe Zadka) Date: Fri, 31 Mar 2000 13:24:05 +0200 (IST) Subject: [Python-Dev] 1.5.2 -> 1.6 Changes Message-ID: Here is a new list of things that will change in the next release. Thanks to all the people who gave me hints and information! If you have anything you think I missed, or mistreated, please e-mail me personally -- I'll post an updated version soon. Obligatory ========== A lot of bug-fixes, some optimizations, many improvements in the documentation Core changes ============ Deleting objects is safe even for deeply nested data structures. Long/int unifications: long integers can be used in seek() calls, as slice indexes. str(1L) --> '1', not '1L' (repr() is still the same) Builds on NT Alpha UnboundLocalError is raised when a local variable is undefined long, int take optional "base" parameter string objects now have methods (though they are still immutable) unicode support: Unicode strings are marked with u"string", and there is support for arbitrary encoders/decoders "in" operator can now be overriden in user-defined classes to mean anything: it calls the magic method __contains__ New calling syntax: f(*args, **kw) equivalent to apply(f, args, kw) Some methods which would take multiple arguments and treat them as a tuple were fixed: list.{append, insert, remove, count}, socket.connect New modules =========== winreg - Windows registry interface. Distutils - tools for distributing Python modules robotparser - parse a robots.txt file (for writing web spiders) linuxaudio - audio for Linux mmap - treat a file as a memory buffer sre - regular expressions (fast, supports unicode) filecmp - supersedes the old cmp.py and dircmp.py modules tabnanny - check Python sources for tab-width dependance unicode - support for unicode codecs - support for Unicode encoders/decoders Module changes ============== re - changed to be a frontend to sre readline, ConfigParser, cgi, calendar, posix, readline, xmllib, aifc, chunk, wave, random, shelve, nntplib - minor enhancements socket, httplib, urllib - optional OpenSSL support _tkinter - support for 8.1,8.2,8.3 (no support for versions older then 8.0) Tool changes ============ IDLE -- complete overhaul (Andrew, I'm still waiting for the expat support and integration to add to this list -- other than that, please contact me if you want something less telegraphic ) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping@lfw.org Fri Mar 31 12:01:21 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Fri, 31 Mar 2000 04:01:21 -0800 (PST) Subject: [Python-Dev] Roundup et al. Message-ID: Hi -- there was some talk on this list earlier about nosy lists, managing patches, and such things, so i just wanted to mention, for anybody interested, that i threw together Roundup very quickly for you to try out. http://www.lfw.org/python/ There's a tar file there -- it's very messy code, and i apologize (it was hastily hacked out of the running prototype implementation), but it should be workable enough to play with. There's a test installation to play with at http://www.lfw.org/ping/roundup/roundup.cgi Dummy user:password pairs are test:test, spam:spam, eggs:eggs. A fancier design, still in the last stages of coming together (which will be my submission to the Software Carpentry contest) is up at http://crit.org/http://www.lfw.org/ping/sctrack.html and i welcome your thoughts and comments on that if you have the spare time (ha!) and generous inclination to contribute them. Thank you and apologies for the interruption. -- ?!ng "To be human is to continually change. Your desire to remain as you are is what ultimately limits you." -- The Puppet Master, Ghost in the Shell From guido@python.org Fri Mar 31 12:10:45 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 31 Mar 2000 07:10:45 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 In-Reply-To: Your message of "Thu, 30 Mar 2000 23:34:15 EST." <14564.10951.90258.729547@anthem.cnri.reston.va.us> References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us> Message-ID: <200003311210.HAA29010@eric.cnri.reston.va.us> > Can we change the 8-space-tab rule for all new C code that goes in? I > know that we can't practically change existing code right now, but for > new C code, I propose we use no tab characters, and we use a 4-space > block indentation. Actually, this one was formatted for 8-space indents but using 4-space tabs, so in my editor it looked like 16-space indents! Given that we don't want to change existing code, I'd prefer to stick with 1-tab 8-space indents. --Guido van Rossum (home page: http://www.python.org/~guido/) From Moshe Zadka Fri Mar 31 13:10:06 2000 From: Moshe Zadka (Moshe Zadka) Date: Fri, 31 Mar 2000 15:10:06 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc ACKS,1.51,1.52 In-Reply-To: <200003311301.IAA29221@eric.cnri.reston.va.us> Message-ID: On Fri, 31 Mar 2000, Guido van Rossum wrote: > + Christian Tismer > + Christian Tismer Ummmmm....I smell something fishy here. Are there two Christian Tismers? That would explain how Christian has so much time to work on Stackless. Well, between the both of them, Guido will have no chance but to put Stackless in the standard distribution. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From fredrik@pythonware.com Fri Mar 31 13:16:16 2000 From: fredrik@pythonware.com (Fredrik Lundh) Date: Fri, 31 Mar 2000 15:16:16 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc ACKS,1.51,1.52 References: <200003311301.IAA29221@eric.cnri.reston.va.us> Message-ID: <000d01bf9b13$4be1db00$0500a8c0@secret.pythonware.com> > Tracy Tims > + Christian Tismer > + Christian Tismer > R Lindsay Todd two christians? From bwarsaw@cnri.reston.va.us Fri Mar 31 13:55:13 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Fri, 31 Mar 2000 08:55:13 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us> <38E456CC.1A49334A@lemburg.com> Message-ID: <14564.44609.221250.471147@anthem.cnri.reston.va.us> >>>>> "M" == M writes: M> BTW, most of the new unicode stuff uses 4-space indents. M> Unfortunately, it mixes whitespace and tabs since Emacs M> c-mode doesn't do the python-mode magic yet (is there a M> way to turn it on ?). (setq indent-tabs-mode nil) I could add that to the "python" style. And to zap all your existing tab characters: C-M-h M-x untabify RET -Barry From skip@mojam.com (Skip Montanaro) Fri Mar 31 14:04:46 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 31 Mar 2000 08:04:46 -0600 (CST) Subject: [Python-Dev] 1.5.2 -> 1.6 Changes In-Reply-To: References: Message-ID: <14564.45182.460160.589244@beluga.mojam.com> Moshe, I would highlight those bits that are likely to warrant a little closer scrutiny. The list.{append,insert,...} and socket.connect change certainly qualify. Perhaps split the Core Changes section into two subsections, one set of changes likely to require some adaptation and one set that should be backwards-compatible. -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From guido@python.org Fri Mar 31 14:47:31 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 31 Mar 2000 09:47:31 -0500 Subject: [Python-Dev] 1.5.2 -> 1.6 Changes In-Reply-To: Your message of "Fri, 31 Mar 2000 08:04:46 CST." <14564.45182.460160.589244@beluga.mojam.com> References: <14564.45182.460160.589244@beluga.mojam.com> Message-ID: <200003311447.JAA29633@eric.cnri.reston.va.us> See what I've done to Moshe's list: http://www.python.org/1.6/ --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@mojam.com (Skip Montanaro) Fri Mar 31 15:28:56 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 31 Mar 2000 09:28:56 -0600 (CST) Subject: [Python-Dev] 1.5.2 -> 1.6 Changes In-Reply-To: <200003311447.JAA29633@eric.cnri.reston.va.us> References: <14564.45182.460160.589244@beluga.mojam.com> <200003311447.JAA29633@eric.cnri.reston.va.us> Message-ID: <14564.50232.734778.152933@beluga.mojam.com> --Uivpi/QkbC Content-Type: text/plain; charset=us-ascii Content-Description: message body text Content-Transfer-Encoding: 7bit Guido> See what I've done to Moshe's list: http://www.python.org/1.6/ Looks good. Attached are a couple nitpicky diffs. Skip --Uivpi/QkbC Content-Type: application/octet-stream Content-Description: diffs to 1.6 Release Notes Content-Disposition: attachment; filename="1.6.diff" Content-Transfer-Encoding: base64 ZGlmZiAtYzIgMS42Lmh0bWwub3JpZyAxLjYuaHRtbAoqKiogMS42Lmh0bWwub3JpZwlGcmkg TWFyIDMxIDA5OjI3OjA4IDIwMDAKLS0tIDEuNi5odG1sCUZyaSBNYXIgMzEgMDk6MjY6MzYg MjAwMAoqKioqKioqKioqKioqKioKKioqIDIwNSwyMTMgKioqKgogIGNhbGxzIHRoZSBtYWdp YyBtZXRob2QgX19jb250YWluc19fLgogIAohIDxwPk5ldyBjYWxsaW5nIHN5bnRheDogZigq YXJncywgKiprdykgaXMgZXF1aXZhbGVudCB0byBhcHBseShmLCBhcmdzLAohIGt3KS4gIFRo aXMgY2FuIGFsc28gYmUgY29tYmluZWQgd2l0aCByZWd1bGFyIGFyZ3VtZW50cywgZS5nLiBm KDEsIDIsCiEgeD0zLCB5PTQsICooNSwgNiksICoqeydwJzogNywgJ3EnPTh9KSBpcyBlcXVp dmFsZW50IHRvIGYoMSwgMiwgNSwgNiwgeD0zLAohIHk9NCwgcD03LCBxPTgpLiAgQ29tbW9u IHVzYWdlIGlzIGZvciBiYXNlIGNsYXNzIG1ldGhvZHM6IGRlZgohIG1ldGhvZChzZWxmLCAq YXJncyk6IEJhc2VDbGFzcy5tZXRob2Qoc2VsZiwgKmFyZ3MpLgogIAogIAotLS0gMjA1LDIx NyAtLS0tCiAgY2FsbHMgdGhlIG1hZ2ljIG1ldGhvZCBfX2NvbnRhaW5zX18uCiAgCiEgPHA+ TmV3IGNhbGxpbmcgc3ludGF4OiA8Y29kZT5mKCphcmdzLCAqKmt3KTwvY29kZT4gaXMgZXF1 aXZhbGVudCB0byA8Y29kZT5hcHBseShmLCBhcmdzLAohIGt3KTwvY29kZT4uICBUaGlzIGNh biBhbHNvIGJlIGNvbWJpbmVkIHdpdGggcmVndWxhciBhcmd1bWVudHMsIGUuZy4gPGNvZGU+ ZigxLCAyLAohIHg9MywgeT00LCAqKDUsIDYpLCAqKnsncCc6IDcsICdxJz04fSk8L2NvZGU+ IGlzIGVxdWl2YWxlbnQgdG8gPGNvZGU+ZigxLCAyLCA1LCA2LCB4PTMsCiEgeT00LCBwPTcs IHE9OCk8L2NvZGU+LiAgQ29tbW9uIHVzYWdlIGlzIGZvciBiYXNlIGNsYXNzIG1ldGhvZHM6 CiEgPHByZT4KISAgICAgZGVmIG1ldGhvZChzZWxmLCAqYXJncyk6CiEgICAgICAgICBCYXNl Q2xhc3MubWV0aG9kKHNlbGYsICphcmdzKQohICAgICAgICAgLi4uCiEgPC9wcmU+CiAgCiAg CioqKioqKioqKioqKioqKgoqKiogMjQ0LDI0OCAqKioqCiAgCiAgPHA+X3RraW50ZXIgLSBz dXBwb3J0IGZvciA4LjEsOC4yLDguMyAobm8gc3VwcG9ydCBmb3IgdmVyc2lvbnMgb2xkZXIK ISB0aGVuIDguMCkuCiAgCiAgCi0tLSAyNDgsMjUyIC0tLS0KICAKICA8cD5fdGtpbnRlciAt IHN1cHBvcnQgZm9yIDguMSw4LjIsOC4zIChubyBzdXBwb3J0IGZvciB2ZXJzaW9ucyBvbGRl cgohIHRoYW4gOC4wKS4KICAKICAK --Uivpi/QkbC-- From guido@python.org Fri Mar 31 15:47:56 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 31 Mar 2000 10:47:56 -0500 Subject: [Python-Dev] Windows installer pre-prelease Message-ID: <200003311547.KAA15538@eric.cnri.reston.va.us> The Windows installer is always hard to get just right. If you have a moment, go to http://www.python.org/1.6/ and download the Windows Installer prerelease. Let me know what works, what doesn't! I've successfully installed it on Windows NT 4.0 and on Windows 98, both with default install target and with a modified install target. I'd love to hear that it also installs cleanly on Windows 95. Please test IDLE from the start menu! --Guido van Rossum (home page: http://www.python.org/~guido/) From gward@cnri.reston.va.us Fri Mar 31 16:18:43 2000 From: gward@cnri.reston.va.us (Greg Ward) Date: Fri, 31 Mar 2000 11:18:43 -0500 Subject: [Python-Dev] Distutils for the std. library (was: Expat module) In-Reply-To: <14563.52125.401817.986919@amarok.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Thu, Mar 30, 2000 at 04:48:13PM -0500 References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> <14563.52125.401817.986919@amarok.cnri.reston.va.us> Message-ID: <20000331111842.A8060@cnri.reston.va.us> On 30 March 2000, Andrew M. Kuchling said: > Should we consider replacing the makesetup/Setup.in mechanism with a > setup.py script that uses the Distutils? You'd have to compile a > minipython with just enough critical modules -- strop and posixmodule > are probably the most important ones -- in order to run setup.py. > It's something I'd like to look at for 1.6, because then you could be > much smarter in automatically enabling modules. Gee, I didn't think anyone was gonna open *that* can of worms for 1.6. Obviously, I'd love to see the Distutils used to build parts of the Python library. Some possible problems: * Distutils relies heavily on the sys, os, string, and re modules, so those would have to be built and included in the mythical mini-python (as would everything they rely on -- strop, pcre, ... ?) * Distutils currently assumes that it's working with an installed Python -- it doesn't know anything about working in the Python source tree. I think this could be fixed just be tweaking the distutils.sysconfig module, but there might be subtle assumptions elsewhere in the code. * I haven't written the mythical Autoconf-in-Python yet, so we'd still have to rely on either the configure script or user intervention to find out whether library X is installed, and where its header and library files live (for X in zlib, tcl, tk, ...). Of course, the configure script would still be needed to build the mini-python, so it's not going away any time soon. Greg From skip@mojam.com (Skip Montanaro) Fri Mar 31 16:26:55 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 31 Mar 2000 10:26:55 -0600 (CST) Subject: [Python-Dev] Distutils for the std. library (was: Expat module) In-Reply-To: <20000331111842.A8060@cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> <14563.52125.401817.986919@amarok.cnri.reston.va.us> <20000331111842.A8060@cnri.reston.va.us> Message-ID: <14564.53711.803509.962248@beluga.mojam.com> Greg> * Distutils relies heavily on the sys, os, string, and re Greg> modules, so those would have to be built and included in the Greg> mythical mini-python (as would everything they rely on -- Greg> strop, pcre, ... ?) With string methods in 1.6, reliance on the string and strop modules should be lessened or eliminated, right? re and os may need a tweak or two to use string methods themselves. The sys module is always available. Perhaps it would make sense to put sre(module)?.c into the Python directory where sysmodule.c lives. That way, a Distutils-capable mini-python could be built without messing around in the Modules directory at all... -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From Moshe Zadka Fri Mar 31 16:25:11 2000 From: Moshe Zadka (Moshe Zadka) Date: Fri, 31 Mar 2000 18:25:11 +0200 (IST) Subject: [Python-Dev] Distutils for the std. library (was: Expat module) In-Reply-To: <20000331111842.A8060@cnri.reston.va.us> Message-ID: On Fri, 31 Mar 2000, Greg Ward wrote: > Gee, I didn't think anyone was gonna open *that* can of worms for 1.6. Well, it's not like it's not a lot of work, but it could be done, with liberal interpretation of "mini": include in "mini" Python *all* modules which do not rely on libraries not distributed with the Python core -- zlib, expat and Tkinter go right out the window, but most everything else can stay. That way, Distutils can use all modules it currently uses . The other problem, file-location, is a problem I have talked about earlier: it *cannot* be assumed that the default place for putting new libraries is the same place the Python interpreter resides, for many reasons. Why not ask the user explicitly? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From gward@cnri.reston.va.us Fri Mar 31 16:29:33 2000 From: gward@cnri.reston.va.us (Greg Ward) Date: Fri, 31 Mar 2000 11:29:33 -0500 Subject: [Python-Dev] Distutils for the std. library (was: Expat module) In-Reply-To: <14564.53711.803509.962248@beluga.mojam.com>; from skip@mojam.com on Fri, Mar 31, 2000 at 10:26:55AM -0600 References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> <14563.52125.401817.986919@amarok.cnri.reston.va.us> <20000331111842.A8060@cnri.reston.va.us> <14564.53711.803509.962248@beluga.mojam.com> Message-ID: <20000331112933.B8060@cnri.reston.va.us> On 31 March 2000, Skip Montanaro said: > With string methods in 1.6, reliance on the string and strop modules should > be lessened or eliminated, right? re and os may need a tweak or two to use > string methods themselves. The sys module is always available. Perhaps it > would make sense to put sre(module)?.c into the Python directory where > sysmodule.c lives. That way, a Distutils-capable mini-python could be built > without messing around in the Modules directory at all... But I'm striving to maintain compatability with (at least) Python 1.5.2 in Distutils. That need will fade with time, but it's not going to disappear the moment Python 1.6 is released. (Guess I'll have to find somewhere else to play with string methods and extended call syntax). Greg From thomas.heller@ion-tof.com Fri Mar 31 17:09:41 2000 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Fri, 31 Mar 2000 19:09:41 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: distutils/distutils msvccompiler.py References: <200003311653.LAA08175@thrak.cnri.reston.va.us> Message-ID: <038701bf9b33$e7c49240$4500a8c0@thomasnotebook> > Simplified Thomas Heller's registry patch: just assign all those > HKEY_* and Reg* names once, rather than having near-duplicate code > in the two import attempts. Your change won't work, the function names in win32api and winreg are not the same: Example: win32api.RegEnumValue <-> winreg.EnumValue > > Also dropped the leading underscore on all the imported symbols, > as it's not appropriate (they're not local to this module). Are they used anywhere else? Or do you think they *could* be used somewhere else? Thomas Heller From mal@lemburg.com Fri Mar 31 10:19:58 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 31 Mar 2000 12:19:58 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us> <38E456CC.1A49334A@lemburg.com> <01b501bf9af1$f9b44500$34aab5d4@hagrid> Message-ID: <38E47BCE.94E4E012@lemburg.com> Fredrik Lundh wrote: > > M.-A. Lemburg wrote: > > Why not just leave new code formatted as it is (except maybe > > to bring the used TAB width to the standard 8 spaces used throughout > > the Python C source code) ? > > > > BTW, most of the new unicode stuff uses 4-space indents. > > Unfortunately, it mixes whitespace and tabs since Emacs > > c-mode doesn't do the python-mode magic yet (is there a > > way to turn it on ?). > > http://www.jwz.org/doc/tabs-vs-spaces.html > contains some hints. Ah, cool. Thanks :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From pf@artcom-gmbh.de Fri Mar 31 18:56:40 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Fri, 31 Mar 2000 20:56:40 +0200 (MEST) Subject: [Python-Dev] 'make install' should create lib/site-packages IMO In-Reply-To: <200003311513.KAA00790@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 31, 2000 10:13:20 am" Message-ID: Hi! Guido van Rossum: [...] > Modified Files: > Makefile.in > Log Message: > Added distutils and distutils/command to LIBSUBDIRS. Noted by Andrew > Kuchling. [...] > ! LIBSUBDIRS= lib-old lib-tk test test/output encodings \ > ! distutils distutils/command $(MACHDEPS) [...] What about 'site-packages'? SuSE added this to their Python packaging and I think it is a good idea to have an empty 'site-packages' directory installed by default. Regards, Peter From akuchlin@mems-exchange.org Fri Mar 31 20:16:53 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Fri, 31 Mar 2000 15:16:53 -0500 (EST) Subject: [Python-Dev] SRE: what to do with undocumented attributes? In-Reply-To: <00e901bf9a9c$6c036240$34aab5d4@hagrid> References: <00b701bf9a99$022339c0$34aab5d4@hagrid> <14563.58848.109072.339060@amarok.cnri.reston.va.us> <00e901bf9a9c$6c036240$34aab5d4@hagrid> Message-ID: <14565.1973.361549.291817@amarok.cnri.reston.va.us> Fredrik Lundh writes: >btw, "pattern" doesn't make much sense in SRE -- who says >the pattern object was created by re.compile? guess I'll just >set it to None in other cases (e.g. sregex, sreverb, sgema...) Good point; I can imagine fabulously complex patterns assembled programmatically, for which no summary could be made. I guess there could be another attribute that also gives the class (module? function?) used to compile the pattern, but more likely, the pattern attribute should be deprecated and eventually dropped. -- A.M. Kuchling http://starship.python.net/crew/amk/ You know how she is when she gets an idea into her head. I mean, when one finally penetrates. -- Desire describes Delirium, in SANDMAN #41: "Brief Lives:1" From pf@artcom-gmbh.de Fri Mar 31 20:14:41 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Fri, 31 Mar 2000 22:14:41 +0200 (MEST) Subject: [Python-Dev] 1.5.2 -> 1.6 Changes In-Reply-To: <200003311447.JAA29633@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 31, 2000 9:47:31 am" Message-ID: Hi! Guido van Rossum : > See what I've done to Moshe's list: http://www.python.org/1.6/ Very fine, but I have a few small annotations: 1.'linuxaudio' has been renamed to 'linuxaudiodev' 2.The following text: "_tkinter - support for 8.1,8.2,8.3 (no support for versions older than 8.0)." looks a bit misleading, since it is not explicit about Version 8.0.x I suggest the following wording: "_tkinter - supports Tcl/Tk from version 8.0 up to the current 8.3. Support for versions older than 8.0 has been dropped." 3.'src/Tools/i18n/pygettext.py' by Barry should be mentioned. This is a very useful utility. I suggest to append the following text: "New utility pygettext.py -- Python equivalent of xgettext(1). A message text extraction tool used for internationalizing applications written in Python" Regards, Peter From fdrake@acm.org Fri Mar 31 20:30:00 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 31 Mar 2000 15:30:00 -0500 (EST) Subject: [Python-Dev] 1.5.2 -> 1.6 Changes In-Reply-To: References: <200003311447.JAA29633@eric.cnri.reston.va.us> Message-ID: <14565.2760.665022.206361@seahag.cnri.reston.va.us> Peter Funk writes: > I suggest the following wording: ... > a very useful utility. I suggest to append the following text: Peter, I'm beginning to figure this out -- you really just want to get published! ;) You forgot the legelese. ;( -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido@python.org Fri Mar 31 21:30:42 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 31 Mar 2000 16:30:42 -0500 Subject: [Python-Dev] Python 1.6 alpha 1 released Message-ID: <200003312130.QAA04361@eric.cnri.reston.va.us> I've just released a source tarball and a Windows installer for Python 1.6 alpha 1 to the Python website: http://www.python.org/1.6/ Probably the biggest news (if you hadn't heard the rumors) is Unicode support. More news on the above webpage. Note: this is an alpha release. Some of the code is very rough! Please give it a try with your favorite Python application, but don't trust it for production use yet. I plan to release several more alpha and beta releases over the next two months, culminating in an 1.6 final release around June first. We need your help to make the final 1.6 release as robust as possible -- please test this alpha release!!! --Guido van Rossum (home page: http://www.python.org/~guido/) From bjorn@roguewave.com Fri Mar 31 22:02:07 2000 From: bjorn@roguewave.com (Bjorn Pettersen) Date: Fri, 31 Mar 2000 15:02:07 -0700 Subject: [Python-Dev] Re: Python 1.6 alpha 1 released References: <200003312130.QAA04361@eric.cnri.reston.va.us> Message-ID: <38E5205F.DE811F61@roguewave.com> Guido van Rossum wrote: > > I've just released a source tarball and a Windows installer for Python > 1.6 alpha 1 to the Python website: > > http://www.python.org/1.6/ > > Probably the biggest news (if you hadn't heard the rumors) is Unicode > support. More news on the above webpage. > > Note: this is an alpha release. Some of the code is very rough! > Please give it a try with your favorite Python application, but don't > trust it for production use yet. I plan to release several more alpha > and beta releases over the next two months, culminating in an 1.6 > final release around June first. > > We need your help to make the final 1.6 release as robust as possible > -- please test this alpha release!!! > > --Guido van Rossum (home page: http://www.python.org/~guido/) Just read the announcement page, and found that socket.connect() no longer takes two arguments as was previously documented. If this change is staying I'm assuming the examples in the manual that uses a two argument socket.connect() will be changed? A quick look shows that this breaks all the network scripts I have installed (at least the ones that I found, undoubtedly there are many more). Because of this I will put any upgrade plans on hold. -- bjorn From gandalf@starship.python.net Fri Mar 31 21:56:16 2000 From: gandalf@starship.python.net (Vladimir Ulogov) Date: Fri, 31 Mar 2000 16:56:16 -0500 (EST) Subject: [Python-Dev] Re: Python 1.6 alpha 1 released In-Reply-To: <200003312130.QAA04361@eric.cnri.reston.va.us> Message-ID: Guido, """where you used to write sock.connect(host, port) you must now write sock.connect((host, port))""" Is it possible to keep old notation ? I'm understand (according you past mail about parameters of the connect) this may be not what you has have in mind, but we do use this notation "a lot" and for us it will means to create workaround for socket.connect function. It's inconvinient. In general, I'm thinknig the socket.connect(Host, Port) looks prettier :)) than socket.connect((Host, Port)) Vladimir From gstein at lyra.org Wed Mar 1 00:47:55 2000 From: gstein at lyra.org (Greg Stein) Date: Tue, 29 Feb 2000 15:47:55 -0800 (PST) Subject: [Python-Dev] breaking list.append() In-Reply-To: <38BC2375.5C832488@tismer.com> Message-ID: On Tue, 29 Feb 2000, Christian Tismer wrote: > Greg Stein wrote: > > +1 on breaking it now, rather than deferring it Yet Again. > > > > IMO, there has been plenty of warning, and there is plenty of time to > > correct the software. > > > > I'm +0 on adding a warning architecture to Python to support issuing a > > warning/error when .append is called with multiple arguments. > > Well, the (bad) effect of this patch is that you cannot run > PythonWin any longer unless Mark either supplies an updated > distribution, or one corrects the two barfing Scintilla > support scripts by hand. Yes, but there is no reason to assume this won't happen. Why don't we simply move forward with the assumption that PythonWin and Scintilla will be updated? If we stand around pointing at all the uses of append that are incorrect and claim that is why we can't move forward, then we won't get anywhere. Instead, let's just *MOVE* and see that software authors update accordingly. It isn't like it is a difficult change to make. Heck, PythonWin and Scintilla could be updated within the week and re-released. *WAY* ahead of the 1.6 release. > Bad for me, since I'm building Stackless Python against 1.5.2+, > and that means the users will see PythonWin barf when installing SLP. If you're building a system using an interim release of Python, then I think you need to take responsibility for that. If you don't want those people to have problems, then you can back out the list.append change. Or you can release patches to PythonWin. I don't think the Python world at large should be hampered because somebody is using an unstable/interim version of Python. Again: we couldn't move forward. > Adding a warning instead of raising an exception would be nice IMHO, > since the warning could probably contain the file name and line > number to change, and I would leave my users with this easy task. Yes, this would be nice. But somebody has to take the time to code it up. The warning won't appear out of nowhere... Cheers, -g -- Greg Stein, http://www.lyra.org/ From mhammond at skippinet.com.au Wed Mar 1 00:57:38 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 1 Mar 2000 10:57:38 +1100 Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: > Why don't we simply move forward with the assumption that PythonWin and > Scintilla will be updated? Done :-) However, I think dropping it now _is_ a little heavy handed. I decided to do a wider search and found a few in, eg, Sam Rushings calldll based ODBC package. Personally, I would much prefer a warning now, and drop it later. _Then_ we can say we have made enough noise about it. It would only be 2 years ago that I became aware that this "feature" of append was not a feature at all - up until then I used it purposely, and habits are sometimes hard to change :-) MArk. From gstein at lyra.org Wed Mar 1 01:12:29 2000 From: gstein at lyra.org (Greg Stein) Date: Tue, 29 Feb 2000 16:12:29 -0800 (PST) Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: On Wed, 1 Mar 2000, Mark Hammond wrote: > > Why don't we simply move forward with the assumption that PythonWin and > > Scintilla will be updated? > > Done :-) hehe... > However, I think dropping it now _is_ a little heavy handed. I decided to > do a wider search and found a few in, eg, Sam Rushings calldll based ODBC > package. > > Personally, I would much prefer a warning now, and drop it later. _Then_ we > can say we have made enough noise about it. It would only be 2 years ago > that I became aware that this "feature" of append was not a feature at all - > up until then I used it purposely, and habits are sometimes hard to change > :-) What's the difference between a warning and an error? If you're running a program and it suddenly spits out a warning about a misuse of list.append, I'd certainly see that as "the program did something unexpected; that is an error." But this is all moot. Guido has already said that we would be amenable to a warning/error infrastructure which list.append could use. His description used some awkward sentences, so I'm not sure (without spending some brain cycles to parse the email) exactly what his desired defaults and behavior are. But hey... the possibility is there, and is just waiting for somebody to code it. IMO, Guido has left an out for people that are upset with the current hard-line approach. One of those people just needs to spend a bit of time coming up with a patch :-) And yes, Guido is also the Benevolent Dictator and can certainly have his mind changed, so people can definitely continue pestering him to back away from the hard-line approach... Cheers, -g -- Greg Stein, http://www.lyra.org/ From ping at lfw.org Wed Mar 1 01:20:07 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 29 Feb 2000 18:20:07 -0600 (CST) Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: On Tue, 29 Feb 2000, Greg Stein wrote: > > What's the difference between a warning and an error? If you're running a > program and it suddenly spits out a warning about a misuse of list.append, > I'd certainly see that as "the program did something unexpected; that is > an error." A big, big difference. Perhaps to one of us, it's the minor inconvenience of reading the error message and inserting a couple of parentheses in the appropriate file -- but to the end user, it's the difference between the program working (albeit noisily) and *not* working. When the program throws an exception and stops, it is safe to say most users will declare it broken and give up. We can't assume that they're going to be able to figure out what to edit (or be brave enough to try) just by reading the error message... or even what interpreter flag to give, if errors (rather than warnings) are the default behaviour. -- ?!ng From klm at digicool.com Wed Mar 1 01:37:09 2000 From: klm at digicool.com (Ken Manheimer) Date: Tue, 29 Feb 2000 19:37:09 -0500 (EST) Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: On Wed, 1 Mar 2000, Mark Hammond wrote: > > Why don't we simply move forward with the assumption that PythonWin and > > Scintilla will be updated? > > Done :-) > > However, I think dropping it now _is_ a little heavy handed. I decided to > do a wider search and found a few in, eg, Sam Rushings calldll based ODBC > package. > > Personally, I would much prefer a warning now, and drop it later. _Then_ we > can say we have made enough noise about it. It would only be 2 years ago > that I became aware that this "feature" of append was not a feature at all - > up until then I used it purposely, and habits are sometimes hard to change > :-) I agree with mark. Why the sudden rush?? It seems to me to be unfair to make such a change - one that will break peoples code - without advanced warning, which typically is handled by a deprecation period. There *are* going to be people who won't be informed of the change in the short span of less than a single release. Just because it won't cause you pain isn't a good reason to disregard the pain of those that will suffer, particularly when you can do something relatively low-cost to avoid it. Ken klm at digicool.com From gstein at lyra.org Wed Mar 1 01:57:56 2000 From: gstein at lyra.org (Greg Stein) Date: Tue, 29 Feb 2000 16:57:56 -0800 (PST) Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: On Tue, 29 Feb 2000, Ken Manheimer wrote: >... > I agree with mark. Why the sudden rush?? It seems to me to be unfair to > make such a change - one that will break peoples code - without advanced > warning, which typically is handled by a deprecation period. There *are* > going to be people who won't be informed of the change in the short span > of less than a single release. Just because it won't cause you pain isn't > a good reason to disregard the pain of those that will suffer, > particularly when you can do something relatively low-cost to avoid it. Sudden rush?!? Mark said he knew about it for a couple years. Same here. It was a long while ago that .append()'s semantics were specified to "no longer" accept multiple arguments. I see in the HISTORY file, that changes were made to Python 1.4 (October, 1996) to avoid calling append() with multiple arguments. So, that is over three years that append() has had multiple-args deprecated. There was probably discussion even before that, but I can't seem to find something to quote. Seems like plenty of time -- far from rushed. Cheers, -g -- Greg Stein, http://www.lyra.org/ From klm at digicool.com Wed Mar 1 02:02:02 2000 From: klm at digicool.com (Ken Manheimer) Date: Tue, 29 Feb 2000 20:02:02 -0500 (EST) Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: On Tue, 29 Feb 2000, Greg Stein wrote: > On Tue, 29 Feb 2000, Ken Manheimer wrote: > >... > > I agree with mark. Why the sudden rush?? It seems to me to be unfair to > > make such a change - one that will break peoples code - without advanced > > warning, which typically is handled by a deprecation period. There *are* > > going to be people who won't be informed of the change in the short span > > of less than a single release. Just because it won't cause you pain isn't > > a good reason to disregard the pain of those that will suffer, > > particularly when you can do something relatively low-cost to avoid it. > > Sudden rush?!? > > Mark said he knew about it for a couple years. Same here. It was a long > while ago that .append()'s semantics were specified to "no longer" accept > multiple arguments. > > I see in the HISTORY file, that changes were made to Python 1.4 (October, > 1996) to avoid calling append() with multiple arguments. > > So, that is over three years that append() has had multiple-args > deprecated. There was probably discussion even before that, but I can't > seem to find something to quote. Seems like plenty of time -- far from > rushed. None the less, for those practicing it, the incorrectness of it will be fresh news. I would be less sympathetic with them if there was recent warning, eg, the schedule for changing it in the next release was part of the current release. But if you tell somebody you're going to change something, and then don't for a few years, you probably need to renew the warning before you make the change. Don't you think so? Why not? Ken klm at digicool.com From paul at prescod.net Wed Mar 1 03:56:33 2000 From: paul at prescod.net (Paul Prescod) Date: Tue, 29 Feb 2000 18:56:33 -0800 Subject: [Python-Dev] breaking list.append() References: Message-ID: <38BC86E1.53F69776@prescod.net> Software configuration management is HARD. Every sudden backwards incompatible change (warranted or not) makes it harder. Mutli-arg append is not hurting anyone as much as a sudden change to it would. It would be better to leave append() alone and publicize its near-term removal rather than cause random, part-time supported modules to stop working because their programmers may be too busy to update them right now. So no, I'm not stepping up to do it. But I'm also saying that the better "lazy" option is to put something in a prominent place in the documentation and otherwise leave it alone. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made possible the modern world." - from "Advent of the Algorithm" David Berlinski http://www.opengroup.com/mabooks/015/0151003386.shtml From guido at python.org Wed Mar 1 05:11:02 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Feb 2000 23:11:02 -0500 Subject: [Python-Dev] breaking list.append() In-Reply-To: Your message of "Tue, 29 Feb 2000 18:56:33 PST." <38BC86E1.53F69776@prescod.net> References: <38BC86E1.53F69776@prescod.net> Message-ID: <200003010411.XAA12988@eric.cnri.reston.va.us> > Software configuration management is HARD. Every sudden backwards > incompatible change (warranted or not) makes it harder. Mutli-arg append > is not hurting anyone as much as a sudden change to it would. It would > be better to leave append() alone and publicize its near-term removal > rather than cause random, part-time supported modules to stop working > because their programmers may be too busy to update them right now. I'm tired of this rhetoric. It's not like I'm changing existing Python installations retroactively. I'm planning to release a new version of Python which no longer supports certain long-obsolete and undocumented behavior. If you maintain a non-core Python module, you should test it against the new release and fix anything that comes up. This is why we have an alpha and beta test cycle and even before that the CVS version. If you are a Python user who depends on a 3rd party module, you need to find out whether the new version is compatible with the 3rd party code you are using, or whether there's a newer version available that solves the incompatibility. There are people who still run Python 1.4 (really!) because they haven't upgraded. I don't have a problem with that -- they don't get much support, but it's their choice, and they may not need the new features introduced since then. I expect that lots of people won't upgrade their Python 1.5.2 to 1.6 right away -- they'll wait until the other modules/packages they need are compatible with 1.6. Multi-arg append probably won't be the only reason why e.g. Digital Creations may need to release an update to Zope for Python 1.6. Zope comes with its own version of Python anyway, so they have control over when they make the switch. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one at email.msn.com Wed Mar 1 06:04:35 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 00:04:35 -0500 Subject: [Python-Dev] Size of int across machines (was RE: Blowfish in Python?) In-Reply-To: Message-ID: <000201bf833b$a3b01bc0$412d153f@tim> [Markus Stenberg] > ... > speed was horrendous. > > I think the main reason was the fact that I had to use _long ints_ for > calculations, as the normal ints are signed, and apparently the bitwise > operators do not work as advertised when bit32 is set (=number is > negative). [Tim, takes "bitwise operators" to mean & | ^ ~, and expresses surprise] [Markus, takes umbrage, and expresses umbrage ] > Hmm.. As far as I'm concerned, shifts for example do screw up. Do you mean "for example" as in "there are so many let's just pick one at random", or as in "this is the only one I've stumbled into" <0.9 wink>? > i.e. > > 0xffffffff >> 30 > > [64bit Python: 3] > [32bit Python: -1] > > As far as I'm concerned, that should _not_ happen. Or maybe it's just me. I could not have guessed that your complaint was about 64-bit Python from your "when bit32 is set (=number is negative)" description . The behavior shown in a Python compiled under a C in which sizeof(long)==4 matches the Reference Manual (see the "Integer and long integer literals" and "shifting operations" sections). So that can't be considered broken (you may not *like* it, but it's functioning as designed & as documented). The behavior under a sizeof(long)==8 C seems more of an ill-documented (and debatable to me too) feature. The possibility is mentioned in the "The standard type hierarchy" section (under Numbers -> Integers -> Plain integers) but really not fleshed out, and the "Integer and long integer literals" section plainly contradicts it. Python's going to have to clean up its act here -- 64-bit machines are getting more common. There's a move afoot to erase the distinction between Python ints and longs (in the sense of auto-converting from one to the other under the covers, as needed). In that world, your example would work like the "64bit Python" one. There are certainly compatability issues, though, in that int left shifts are end-off now, and on a 32-bit machine any int for which i & 0x8000000 is true "is negative" (and so sign-extends on a right shift; note that Python guarantees sign-extending right shifts *regardless* of what the platform C does (C doesn't define what happens here -- Python does)). [description of pain getting a fast C-like "mod 2**32 int +" to work too] Python really wasn't designed for high-performance bit-fiddling, so you're (as you've discovered ) swimming upstream with every stroke. Given that you can't write a C module here, there's nothing better than to do the ^ & | ~ parts with ints, and fake the rest slowly & painfully. Note that you can at least determine the size of a Python int via inspecting sys.maxint. sympathetically-unhelpfully y'rs - tim From guido at python.org Wed Mar 1 06:44:10 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 01 Mar 2000 00:44:10 -0500 Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python In-Reply-To: Your message of "Tue, 29 Feb 2000 15:34:21 MST." <20000229153421.A16502@acs.ucalgary.ca> References: <20000229153421.A16502@acs.ucalgary.ca> Message-ID: <200003010544.AAA13155@eric.cnri.reston.va.us> [I don't like to cross-post to patches and python-dev, but I think this belongs in patches because it's a followup to Neil's post there and also in -dev because of its longer-term importance.] Thanks for the new patches, Neil! We had a visitor here at CNRI today, Eric Tiedemann , who had a look at your patches before. Eric knows his way around the Scheme, Lisp and GC literature, and presented a variant on your approach which takes the bite out of the recursive passes. Eric had commented earlier on Neil's previous code, and I had used the morning to make myself familiar with Neil's code. This was relatively easy because Neil's code is very clear. Today, Eric proposed to do away with Neil's hash table altogether -- as long as we're wasting memory, we might as well add 3 fields to each container object rather than allocating the same amount in a separate hash table. Eric expects that this will run faster, although this obviously needs to be tried. Container types are: dict, list, tuple, class, instance; plus potentially user-defined container types such as kjbuckets. I have a feeling that function objects should also be considered container types, because of the cycle involving globals. Eric's algorithm, then, consists of the following parts. Each container object has three new fields: gc_next, gc_prev, and gc_refs. (Eric calls the gc_refs "refcount-zero".) We color objects white (initial), gray (root), black (scanned root). (The terms are explained later; we believe we don't actually need bits in the objects to store the color; see later.) All container objects are chained together in a doubly-linked list -- this is the same as Neil's code except Neil does it only for dicts. (Eric postulates that you need a list header.) When GC is activated, all objects are colored white; we make a pass over the entire list and set gc_refs equal to the refcount for each object. Next, we make another pass over the list to collect the internal references. Internal references are (just like in Neil's version) references from other container types. In Neil's version, this was recursive; in Eric's version, we don't need recursion, since the list already contains all containers. So we simple visit the containers in the list in turn, and for each one we go over all the objects it references and subtract one from *its* gc_refs field. (Eric left out the little detail that we ened to be able to distinguish between container and non-container objects amongst those references; this can be a flag bit in the type field.) Now, similar to Neil's version, all objects for which gc_refs == 0 have only internal references, and are potential garbage; all objects for which gc_refs > 0 are "roots". These have references to them from other places, e.g. from globals or stack frames in the Python virtual machine. We now start a second list, to which we will move all roots. The way to do this is to go over the first list again and to move each object that has gc_refs > 0 to the second list. Objects placed on the second list in this phase are considered colored gray (roots). Of course, some roots will reference some non-roots, which keeps those non-roots alive. We now make a pass over the second list, where for each object on the second list, we look at every object it references. If a referenced object is a container and is still in the first list (colored white) we *append* it to the second list (colored gray). Because we append, objects thus added to the second list will eventually be considered by this same pass; when we stop finding objects that sre still white, we stop appending to the second list, and we will eventually terminate this pass. Conceptually, objects on the second list that have been scanned in this pass are colored black (scanned root); but there is no need to to actually make the distinction. (How do we know whether an object pointed to is white (in the first list) or gray or black (in the second)? We could use an extra bitfield, but that's a waste of space. Better: we could set gc_refs to a magic value (e.g. 0xffffffff) when we move the object to the second list. During the meeting, I proposed to set the back pointer to NULL; that might work too but I think the gc_refs field is more elegant. We could even just test for a non-zero gc_refs field; the roots moved to the second list initially all have a non-zero gc_refs field already, and for the objects with a zero gc_refs field we could indeed set it to something arbitrary.) Once we reach the end of the second list, all objects still left in the first list are garbage. We can destroy them in a similar to the way Neil does this in his code. Neil calls PyDict_Clear on the dictionaries, and ignores the rest. Under Neils assumption that all cycles (that he detects) involve dictionaries, that is sufficient. In our case, we may need a type-specific "clear" function for containers in the type object. We discussed more things, but not as thoroughly. Eric & Eric stressed the importance of making excellent statistics available about the rate of garbage collection -- probably as data structures that Python code can read rather than debugging print statements. Eric T also sketched an incremental version of the algorithm, usable for real-time applications. This involved keeping the gc_refs field ("external" reference counts) up-to-date at all times, which would require two different versions of the INCREF/DECREF macros: one for adding/deleting a reference from a container, and another for adding/deleting a root reference. Also, a 4th color (red) was added, to distinguish between scanned roots and scanned non-roots. We decided not to work this out in more detail because the overhead cost appeared to be much higher than for the previous algorithm; instead, we recommed that for real-time requirements the whole GC is disabled (there should be run-time controls for this, not just compile-time). We also briefly discussed possibilities for generational schemes. The general opinion was that we should first implement and test the algorithm as sketched above, and then changes or extensions could be made. I was pleasantly surprised to find Neil's code in my inbox when we came out of the meeting; I think it would be worthwhile to compare and contrast the two approaches. (Hm, maybe there's a paper in it?) The rest of the afternoon was spent discussing continuations, coroutines and generators, and the fundamental reason why continuations are so hard (the C stack getting in the way everywhere). But that's a topic for another mail, maybe. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one at email.msn.com Wed Mar 1 06:57:49 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 00:57:49 -0500 Subject: need .append patch (was RE: [Python-Dev] Re: Python-checkins digest, Vol 1 #370 - 8 msgs) In-Reply-To: <200002291302.IAA04581@eric.cnri.reston.va.us> Message-ID: <000601bf8343$13575040$412d153f@tim> [Tim, runs checkappend.py over the entire CVS tree, comes up with surprisingly many remaining problems, and surprisingly few false hits] [Guido fixes mailerdaemon.py, and argues for nuking Demo\tkinter\www\ (the whole directory) Demo\sgi\video\VcrIndex.py (unclear whether the dir or just the file) Demo\sgi\gl\glstdwin\glstdwin.py (stdwin-related) Demo\ibrowse\ibrowse.py (stdwin-related) > All these are stdwin-related. Stdwin will also go out of service per > 1.6. ] Then the sooner someone nukes them from the CVS tree, the sooner my automated hourly checkappend complaint generator will stop pestering Python-Dev about them . > (Conclusion: most multi-arg append() calls are *very* old, But part of that is because we went thru this exercise a couple years ago too, and you repaired all the ones in the less obscure parts of the distribution then. > or contributed by others. Sigh. I must've given bad examples long > ago...) Na, I doubt that. Most people will not read a language defn, at least not until "something doesn't work". If the compiler accepts a thing, they simply *assume* it's correct. It's pretty easy (at least for me!) to make this particular mistake as a careless typo, so I assume that's the "source origin" for many of these too. As soon you *notice* you've done it, and that nothing bad happened, the natural tendencies are to (a) believe it's OK, and (b) save 4 keystrokes (incl. the SHIFTs) over & over again in the glorious indefinite future . Reminds me of a c.l.py thread a while back, wherein someone did stuff like None, x, y, None = function_returning_a_4_tuple to mean that they didn't care what the 1st & 4th values were. It happened to work, so they did it more & more. Eventually a function containing this mistake needed to reference None after that line, and "suddenly for no reason at all Python stopped working". To the extent that you're serious about CP4E, you're begging for more of this, not less . newbies-even-keep-on-doing-things-that-*don't*-work!-ly y'rs - tim From tim_one at email.msn.com Wed Mar 1 07:50:44 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 01:50:44 -0500 Subject: [Python-Dev] Unicode mapping tables In-Reply-To: <38BBD1A2.CD29AADD@lemburg.com> Message-ID: <000701bf834a$77acdfe0$412d153f@tim> [M.-A. Lemburg] > ... > Currently, mapping tables map characters to Unicode characters > and vice-versa. Now the .translate method will use a different > kind of table: mapping integer ordinals to integer ordinals. You mean that if I want to map u"a" to u"A", I have to set up some sort of dict mapping ord(u"a") to ord(u"A")? I simply couldn't follow this. > Question: What is more of efficient: having lots of integers > in a dictionary or lots of characters ? My bet is "lots of integers", to reduce both space use and comparison time. > ... > Something else that changed is the way .capitalize() works. The > Unicode version uses the Unicode algorithm for it (see TechRep. 13 > on the www.unicode.org site). #13 is "Unicode Newline Guidelines". I assume you meant #21 ("Case Mappings"). > Here's the new doc string: > > S.capitalize() -> unicode > > Return a capitalized version of S, i.e. words start with title case > characters, all remaining cased characters have lower case. > > Note that *all* characters are touched, not just the first one. > The change was needed to get it in sync with the .iscapitalized() > method which is based on the Unicode algorithm too. > > Should this change be propogated to the string implementation ? Unicode makes distinctions among "upper case", "lower case" and "title case", and you're trying to get away with a single "capitalize" function. Java has separate toLowerCase, toUpperCase and toTitleCase methods, and that's the way to do it. Whatever you do, leave .capitalize alone for 8-bit strings -- there's no reason to break code that currently works. "capitalize" seems a terrible choice of name for a titlecase method anyway, because of its baggage connotations from 8-bit strings. Since this stuff is complicated, I say it would be much better to use the same names for these things as the Unicode and Java folk do: there's excellent documentation elsewhere for all this stuff, and it's Bad to make users mentally translate unique Python terminology to make sense of the official docs. So my vote is: leave capitalize the hell alone . Do not implement capitialize for Unicode strings. Introduce a new titlecase method for Unicode strings. Add a new titlecase method to 8-bit strings too. Unicode strings should also have methods to get at uppercase and lowercase (as Unicode defines those). From tim_one at email.msn.com Wed Mar 1 08:36:03 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 02:36:03 -0500 Subject: [Python-Dev] Re: Python / Haskell (fwd) In-Reply-To: Message-ID: <000801bf8350$cc4ec580$412d153f@tim> [Greg Wilson, quoting Philip Wadler] > Well, what I most want is typing. But you already know that. So invite him to contribute to the Types-SIG <0.5 wink>. > Next after typing? Full lexical scoping for closures. I want to write: > > fun x: fun y: x+y > > Not: > > fun x: fun y, x=x: x+y > > Lexically scoped closures would be a big help for the embedding technique > I described [GVW: in a posting to the Software Carpentry discussion list, > archived at > > http://software-carpentry.codesourcery.com/lists/sc-discuss/msg00068.html > > which discussed how to build a flexible 'make' alternative in Python]. So long as we're not deathly concerned over saving a few lines of easy boilerplate code, Python already supports this approach wonderfully well -- but via using classes with __call__ methods instead of lexical closures. I can't make time to debate this now, but suffice it to say dozens on c.l.py would be delighted to . Philip is understandably attached to the "functional way of spelling things", but Python's way is at least as usable for this (and many-- including me --would say more so). > Next after closures? Disjoint sums. E.g., > > fun area(shape) : > switch shape: > case Circle(r): > return pi*r*r > case Rectangle(h,w): > return h*w > > (I'm making up a Python-like syntax.) This is an alternative to the OO > approach. With the OO approach, it is hard to add area, unless you modify > the Circle and Rectangle class definitions. Python allows adding new methods to classes dynamically "from the outside" -- the original definitions don't need to be touched (although it's certainly preferable to add new methods directly!). Take this complaint to the extreme, and I expect you end up reinventing multimethods (suppose you need to add an intersection(shape1, shape2) method: N**2 nesting of "disjoint sums" starts to appear ludicrous ). In any case, the Types-SIG already seems to have decided that some form of "typecase" stmt will be needed; see the archives for that; I expect the use above would be considered abuse, though; Python has no "switch" stmt of any kind today, and the use above can already be spelled via if isinstance(shape, Circle): etc elif isinstace(shape, Rectange): etc else: raise TypeError(etc) From gstein at lyra.org Wed Mar 1 08:51:29 2000 From: gstein at lyra.org (Greg Stein) Date: Tue, 29 Feb 2000 23:51:29 -0800 (PST) Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: On Tue, 29 Feb 2000, Ken Manheimer wrote: >... > None the less, for those practicing it, the incorrectness of it will be > fresh news. I would be less sympathetic with them if there was recent > warning, eg, the schedule for changing it in the next release was part of > the current release. But if you tell somebody you're going to change > something, and then don't for a few years, you probably need to renew the > warning before you make the change. Don't you think so? Why not? I agree. Note that Guido posted a note to c.l.py on Monday. I believe that meets your notification criteria. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Wed Mar 1 09:10:28 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 1 Mar 2000 00:10:28 -0800 (PST) Subject: [Python-Dev] breaking list.append() In-Reply-To: <200003010411.XAA12988@eric.cnri.reston.va.us> Message-ID: On Tue, 29 Feb 2000, Guido van Rossum wrote: > I'm tired of this rhetoric. It's not like I'm changing existing > Python installations retroactively. I'm planning to release a new > version of Python which no longer supports certain long-obsolete and > undocumented behavior. If you maintain a non-core Python module, you > should test it against the new release and fix anything that comes up. > This is why we have an alpha and beta test cycle and even before that > the CVS version. If you are a Python user who depends on a 3rd party > module, you need to find out whether the new version is compatible > with the 3rd party code you are using, or whether there's a newer > version available that solves the incompatibility. > > There are people who still run Python 1.4 (really!) because they > haven't upgraded. I don't have a problem with that -- they don't get > much support, but it's their choice, and they may not need the new > features introduced since then. I expect that lots of people won't > upgrade their Python 1.5.2 to 1.6 right away -- they'll wait until the > other modules/packages they need are compatible with 1.6. Multi-arg > append probably won't be the only reason why e.g. Digital Creations > may need to release an update to Zope for Python 1.6. Zope comes with > its own version of Python anyway, so they have control over when they > make the switch. I wholeheartedly support his approach. Just ask Mark Hammond :-) how many times I've said "let's change the code to make it Right; people aren't required to upgrade [and break their code]." Of course, his counter is that people need to upgrade to fix other, unrelated problems. So I relax and try again later :-). But I still maintain that they can independently grab the specific fixes and leave the other changes we make. Maybe it is grey, but I think this change is quite fine. Especially given Tim's tool. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim_one at email.msn.com Wed Mar 1 09:22:06 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 03:22:06 -0500 Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: <000b01bf8357$3af08d60$412d153f@tim> [Greg Stein] > ... > Maybe it is grey, but I think this change is quite fine. Especially given > Tim's tool. What the heck does Tim's one-eyed trouser snake have to do with this? I know *it* likes to think it's the measure of all things, but, frankly, my tool barely affects the world at all a mere two feet beyond its base . tim-and-his-tool-think-the-change-is-a-mixed-thing-but-on-balance- the-best-thing-ly y'rs - tim From effbot at telia.com Wed Mar 1 09:40:01 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 1 Mar 2000 09:40:01 +0100 Subject: [Python-Dev] breaking list.append() References: Message-ID: <00fb01bf8359$c8196a20$34aab5d4@hagrid> Greg Stein wrote: > Note that Guido posted a note to c.l.py on Monday. I believe that meets > your notification criteria. ahem. do you seriously believe that everyone in the Python universe reads comp.lang.python? afaik, most Python programmers don't. ... so as far as I'm concerned, this was officially deprecated with Guido's post. afaik, no official python documentation has explicitly mentioned this (and the fact that it doesn't explicitly allow it doesn't really matter, since the docs don't explicitly allow the x[a, b, c] syntax either. both work in 1.5.2). has anyone checked the recent crop of Python books, btw? the eff-bot guide uses old syntax in two examples out of 320. how about the others? ... sigh. running checkappend over a 50k LOC application, I just realized that it doesn't catch a very common append pydiom. how fun. even though 99% of all append calls are "legal", this "minor" change will break every single application and library we have :-( oh, wait. xmlrpclib isn't affected. always something! From gstein at lyra.org Wed Mar 1 09:43:02 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 1 Mar 2000 00:43:02 -0800 (PST) Subject: [Python-Dev] breaking list.append() In-Reply-To: <00fb01bf8359$c8196a20$34aab5d4@hagrid> Message-ID: On Wed, 1 Mar 2000, Fredrik Lundh wrote: > Greg Stein wrote: > > Note that Guido posted a note to c.l.py on Monday. I believe that meets > > your notification criteria. > > ahem. do you seriously believe that everyone in the > Python universe reads comp.lang.python? > > afaik, most Python programmers don't. Now you're simply taking my comments out of context. Not a proper thing to do. Ken said that he wanted notification along certain guidelines. I said that I believed Guido's post did just that. Period. Personally, I think it is fine. I also think that a CHANGES file that arrives with 1.6 that points out the incompatibility is also fine. >... > sigh. running checkappend over a 50k LOC application, I > just realized that it doesn't catch a very common append > pydiom. And which is that? Care to help out? Maybe just a little bit? Or do you just want to talk about how bad this change is? :-( Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Wed Mar 1 10:01:52 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 1 Mar 2000 01:01:52 -0800 (PST) Subject: [Python-Dev] breaking list.append() In-Reply-To: <000b01bf8357$3af08d60$412d153f@tim> Message-ID: On Wed, 1 Mar 2000, Tim Peters wrote: > [Greg Stein] > > ... > > Maybe it is grey, but I think this change is quite fine. Especially given > > Tim's tool. > > What the heck does Tim's one-eyed trouser snake have to do with this? I > know *it* likes to think it's the measure of all things, but, frankly, my > tool barely affects the world at all a mere two feet beyond its base . > > tim-and-his-tool-think-the-change-is-a-mixed-thing-but-on-balance- > the-best-thing-ly y'rs - tim Heh. Now how is one supposed to respond to *that* ??! All right. Fine. +3 cool points go to Tim. :-) -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Wed Mar 1 10:03:32 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 1 Mar 2000 01:03:32 -0800 (PST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src Makefile.in,1.82,1.83 In-Reply-To: <14523.56638.286603.340358@weyr.cnri.reston.va.us> Message-ID: On Tue, 29 Feb 2000, Fred L. Drake, Jr. wrote: > Guido van Rossum writes: > > You can already extract this from the updated documetation on the > > website (which has a list of obsolete modules). > > > > But you're righ,t it would be good to be open about this. I'll think > > about it. > > Note that the updated documentation isn't yet "published"; there are > no links to it and it hasn't been checked as much as I need it to be > before announcing it. Isn't the documentation better than what has been released? In other words, if you release now, how could you make things worse? If something does turn up during a check, you can always release again... Cheers, -g -- Greg Stein, http://www.lyra.org/ From effbot at telia.com Wed Mar 1 10:13:13 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 1 Mar 2000 10:13:13 +0100 Subject: [Python-Dev] breaking list.append() References: Message-ID: <011001bf835e$600d1da0$34aab5d4@hagrid> Greg Stein wrote: > On Wed, 1 Mar 2000, Fredrik Lundh wrote: > > Greg Stein wrote: > > > Note that Guido posted a note to c.l.py on Monday. I believe that meets > > > your notification criteria. > > > > ahem. do you seriously believe that everyone in the > > Python universe reads comp.lang.python? > > > > afaik, most Python programmers don't. > > Now you're simply taking my comments out of context. Not a proper thing to > do. Ken said that he wanted notification along certain guidelines. I said > that I believed Guido's post did just that. Period. my point was that most Python programmers won't see that notification. when these people download 1.6 final and find that all theirs apps just broke, they probably won't be happy with a pointer to dejanews. > And which is that? Care to help out? Maybe just a little bit? this rather common pydiom: append = list.append for x in something: append(...) it's used a lot where performance matters. > Or do you just want to talk about how bad this change is? :-( yes, I think it's bad. I've been using Python since 1.2, and no other change has had the same consequences (wrt. time/money required to fix it) call me a crappy programmer if you want, but I'm sure there are others out there who are nearly as bad. and lots of them won't be aware of this change until some- one upgrades the python interpreter on their server. From mal at lemburg.com Wed Mar 1 09:38:52 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 01 Mar 2000 09:38:52 +0100 Subject: [Python-Dev] Unicode mapping tables References: <000701bf834a$77acdfe0$412d153f@tim> Message-ID: <38BCD71C.3592E6A@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > ... > > Currently, mapping tables map characters to Unicode characters > > and vice-versa. Now the .translate method will use a different > > kind of table: mapping integer ordinals to integer ordinals. > > You mean that if I want to map u"a" to u"A", I have to set up some sort of > dict mapping ord(u"a") to ord(u"A")? I simply couldn't follow this. I meant: 'a': u'A' vs. ord('a'): ord(u'A') The latter wins ;-) Reasoning for the first was that it allows character sequences to be handled by the same mapping algorithm. I decided to leave those techniques to some future implementation, since mapping integers has the nice side-effect of also allowing sequences to be used as mapping tables... resulting in some speedup at the cost of memory consumption. BTW, there are now three different ways to do char translations: 1. char -> unicode (char mapping codec's decode) 2. unicode -> char (char mapping codec's encode) 3. unicode -> unicode (unicode's .translate() method) > > Question: What is more of efficient: having lots of integers > > in a dictionary or lots of characters ? > > My bet is "lots of integers", to reduce both space use and comparison time. Right. That's what I found too... it's "lots of integers" now :-) > > ... > > Something else that changed is the way .capitalize() works. The > > Unicode version uses the Unicode algorithm for it (see TechRep. 13 > > on the www.unicode.org site). > > #13 is "Unicode Newline Guidelines". I assume you meant #21 ("Case > Mappings"). Dang. You're right. Here's the URL in case someone wants to join in: http://www.unicode.org/unicode/reports/tr21/tr21-2.html > > Here's the new doc string: > > > > S.capitalize() -> unicode > > > > Return a capitalized version of S, i.e. words start with title case > > characters, all remaining cased characters have lower case. > > > > Note that *all* characters are touched, not just the first one. > > The change was needed to get it in sync with the .iscapitalized() > > method which is based on the Unicode algorithm too. > > > > Should this change be propogated to the string implementation ? > > Unicode makes distinctions among "upper case", "lower case" and "title > case", and you're trying to get away with a single "capitalize" function. > Java has separate toLowerCase, toUpperCase and toTitleCase methods, and > that's the way to do it. The Unicode implementation has the corresponding: .upper(), .lower() and .capitalize() They work just like .toUpperCase, .toLowerCase, .toTitleCase resp. (well at least they should ;). > Whatever you do, leave .capitalize alone for 8-bit > strings -- there's no reason to break code that currently works. > "capitalize" seems a terrible choice of name for a titlecase method anyway, > because of its baggage connotations from 8-bit strings. Since this stuff is > complicated, I say it would be much better to use the same names for these > things as the Unicode and Java folk do: there's excellent documentation > elsewhere for all this stuff, and it's Bad to make users mentally translate > unique Python terminology to make sense of the official docs. Hmm, that's an argument but it breaks the current method naming scheme of all lowercase letter. Perhaps I should simply provide a new method for .toTitleCase(), e.g. .title(), and leave the previous definition of .capitalize() intact... > So my vote is: leave capitalize the hell alone . Do not implement > capitialize for Unicode strings. Introduce a new titlecase method for > Unicode strings. Add a new titlecase method to 8-bit strings too. Unicode > strings should also have methods to get at uppercase and lowercase (as > Unicode defines those). ...looks like you're more or less on the same wave length here ;-) Here's what I'll do: * implement .capitalize() in the traditional way for Unicode objects (simply convert the first char to uppercase) * implement u.title() to mean the same as Java's toTitleCase() * don't implement s.title(): the reasoning here is that it would confuse the user when she get's different return values for the same string (titlecase chars usually live in higher Unicode code ranges not reachable in Latin-1) Thanks for the feedback, -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim_one at email.msn.com Wed Mar 1 11:06:58 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 05:06:58 -0500 Subject: [Python-Dev] breaking list.append() In-Reply-To: <00fb01bf8359$c8196a20$34aab5d4@hagrid> Message-ID: <000e01bf8365$e1e0b9c0$412d153f@tim> [/F] > ... > so as far as I'm concerned, this was officially deprecated > with Guido's post. afaik, no official python documentation > has explicitly mentioned this (and the fact that it doesn't > explicitly allow it doesn't really matter, since the docs don't > explicitly allow the x[a, b, c] syntax either. both work in > 1.5.2). The "Subscriptions" section of the Reference Manual explicitly allows for dict[a, b, c] and explicitly does not allow for sequence[a, b, c] The "Mapping Types" section of the Library Ref does not explicitly allow for it, though, and if you read it as implicitly allowing for it (based on the Reference Manual's clarification of "key" syntax), you would also have to read the Library Ref as allowing for dict.has_key(a, b, c) Which 1.5.2 does allow, but which Guido very recently patched to treat as a syntax error. > ... > sigh. running checkappend over a 50k LOC application, I > just realized that it doesn't catch a very common append > pydiom. [And, later, after prodding by GregS] > this rather common pydiom: > > append = list.append > for x in something: > append(...) This limitation was pointed out in checkappend's module docstring. Doesn't make it any easier for you to swallow, but I needed to point out that you didn't *have* to stumble into this the hard way . > how fun. even though 99% of all append calls are "legal", > this "minor" change will break every single application and > library we have :-( > > oh, wait. xmlrpclib isn't affected. always something! What would you like to do, then? The code will be at least as broken a year from now, and probably more so -- unless you fix it. So this sounds like an indirect argument for never changing Python's behavior here. Frankly, I expect you could fix the 50K LOC in less time than it took me to write this naggy response <0.50K wink>. embrace-change-ly y'rs - tim From tim_one at email.msn.com Wed Mar 1 11:31:12 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 05:31:12 -0500 Subject: [Python-Dev] breaking list.append() In-Reply-To: <000e01bf8365$e1e0b9c0$412d153f@tim> Message-ID: <001001bf8369$453e9fc0$412d153f@tim> [Tim. needing sleep] > dict.has_key(a, b, c) > > Which 1.5.2 does allow, but which Guido very recently patched to > treat as a syntax error. No, a runtime error. haskeynanny.py, anyone? not-me-ly y'rs - tim From fredrik at pythonware.com Wed Mar 1 12:14:18 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 1 Mar 2000 12:14:18 +0100 Subject: [Python-Dev] breaking list.append() References: <000e01bf8365$e1e0b9c0$412d153f@tim> Message-ID: <002101bf836f$4a012220$f29b12c2@secret.pythonware.com> Tim Peters wrote: > The "Subscriptions" section of the Reference Manual explicitly allows for > > dict[a, b, c] > > and explicitly does not allow for > > sequence[a, b, c] I'd thought we'd agreed that nobody reads the reference manual ;-) > What would you like to do, then? more time to fix it, perhaps? it's surely a minor code change, but fixing it can be harder than you think (just witness Gerrit's bogus patches) after all, python might be free, but more and more people are investing lots of money in using it [1]. > The code will be at least as broken a year > from now, and probably more so -- unless you fix it. sure. we've already started. but it's a lot of work, and it's quite likely that it will take a while until we can be 100% confident that all the changes are pro- perly done. (not all software have a 100% complete test suite that simply says "yes, this works" or "no, it doesn't") 1) fwiw, some poor soul over here posted a short note to the pythonworks mailing, mentioning that we've now fixed the price. a major flamewar erupted, and my mail- box is now full of mail from unknowns telling me that I must be a complete moron that doesn't understand that Python is just a toy system, which everyone uses just be- cause they cannot afford anything better... From tim_one at email.msn.com Wed Mar 1 12:26:21 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 06:26:21 -0500 Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python In-Reply-To: <200003010544.AAA13155@eric.cnri.reston.va.us> Message-ID: <001101bf8370$f881dfa0$412d153f@tim> Very briefly: [Guido] > ... > Today, Eric proposed to do away with Neil's hash table altogether -- > as long as we're wasting memory, we might as well add 3 fields to each > container object rather than allocating the same amount in a separate > hash table. Eric expects that this will run faster, although this > obviously needs to be tried. No, it doesn't : it will run faster. > Container types are: dict, list, tuple, class, instance; plus > potentially user-defined container types such as kjbuckets. I > have a feeling that function objects should also be considered > container types, because of the cycle involving globals. Note that the list-migrating steps you sketch later are basically the same as (but hairier than) the ones JimF and I worked out for M&S-on-RC a few years ago, right down to using appending to effect a breadth-first traversal without requiring recursion -- except M&S doesn't have to bother accounting for sources of refcounts. Since *this* scheme does more work per item per scan, to be as fast in the end it has to touch less stuff than M&S. But the more kinds of types you track, the more stuff this scheme will have to chase. The tradeoffs are complicated & unclear, so I'll just raise an uncomfortable meta-point : you balked at M&S the last time around because of the apparent need for two link fields + a bit or two per object of a "chaseable type". If that's no longer perceived as being a showstopper, M&S should be reconsidered too. I happen to be a fan of both approaches . The worst part of M&S-on-RC (== the one I never had a good answer for) is that a non-cooperating extension type E can't be chased, hence objects reachable only from objects of type E never get marked, so are vulnerable to bogus collection. In the Neil/Toby scheme, objects of type E merely act as sources of "external" references, so the scheme fails safe (in the sense of never doing a bogus collection due to non-cooperating types). Hmm ... if both approaches converge on keeping a list of all chaseable objects, and being careful of uncoopoerating types, maybe the only real difference in the end is whether the root set is given explicitly (as in traditional M&S) or inferred indirectly (but where "root set" has a different meaning in the scheme you sketched). > ... > In our case, we may need a type-specific "clear" function for containers > in the type object. I think definitely, yes. full-speed-sideways-ly y'rs - tim From mal at lemburg.com Wed Mar 1 11:40:36 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 01 Mar 2000 11:40:36 +0100 Subject: [Python-Dev] breaking list.append() References: <011001bf835e$600d1da0$34aab5d4@hagrid> Message-ID: <38BCF3A4.1CCADFCE@lemburg.com> Fredrik Lundh wrote: > > Greg Stein wrote: > > On Wed, 1 Mar 2000, Fredrik Lundh wrote: > > > Greg Stein wrote: > > > > Note that Guido posted a note to c.l.py on Monday. I believe that meets > > > > your notification criteria. > > > > > > ahem. do you seriously believe that everyone in the > > > Python universe reads comp.lang.python? > > > > > > afaik, most Python programmers don't. > > > > Now you're simply taking my comments out of context. Not a proper thing to > > do. Ken said that he wanted notification along certain guidelines. I said > > that I believed Guido's post did just that. Period. > > my point was that most Python programmers won't > see that notification. when these people download > 1.6 final and find that all theirs apps just broke, they > probably won't be happy with a pointer to dejanews. Dito. Anyone remember the str(2L) == '2' change, BTW ? That one will cost lots of money in case someone implemented an eShop using the common str(2L)[:-1] idiom... There will need to be a big warning sign somewhere that people see *before* finding the download link. (IMHO, anyways.) > > And which is that? Care to help out? Maybe just a little bit? > > this rather common pydiom: > > append = list.append > for x in something: > append(...) > > it's used a lot where performance matters. Same here. checkappend.py doesn't find these (a great tool BTW, thanks Tim; I noticed that it leaks memory badly though). > > Or do you just want to talk about how bad this change is? :-( > > yes, I think it's bad. I've been using Python since 1.2, > and no other change has had the same consequences > (wrt. time/money required to fix it) > > call me a crappy programmer if you want, but I'm sure > there are others out there who are nearly as bad. and > lots of them won't be aware of this change until some- > one upgrades the python interpreter on their server. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Wed Mar 1 13:07:42 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 01 Mar 2000 07:07:42 -0500 Subject: need .append patch (was RE: [Python-Dev] Re: Python-checkins digest, Vol 1 #370 - 8 msgs) In-Reply-To: Your message of "Wed, 01 Mar 2000 00:57:49 EST." <000601bf8343$13575040$412d153f@tim> References: <000601bf8343$13575040$412d153f@tim> Message-ID: <200003011207.HAA13342@eric.cnri.reston.va.us> > To the extent that you're serious about CP4E, you're begging for more of > this, not less . Which is exactly why I am breaking multi-arg append now -- this is my last chance. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Mar 1 13:27:10 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 01 Mar 2000 07:27:10 -0500 Subject: [Python-Dev] Unicode mapping tables In-Reply-To: Your message of "Wed, 01 Mar 2000 09:38:52 +0100." <38BCD71C.3592E6A@lemburg.com> References: <000701bf834a$77acdfe0$412d153f@tim> <38BCD71C.3592E6A@lemburg.com> Message-ID: <200003011227.HAA13396@eric.cnri.reston.va.us> > Here's what I'll do: > > * implement .capitalize() in the traditional way for Unicode > objects (simply convert the first char to uppercase) > * implement u.title() to mean the same as Java's toTitleCase() > * don't implement s.title(): the reasoning here is that it would > confuse the user when she get's different return values for > the same string (titlecase chars usually live in higher Unicode > code ranges not reachable in Latin-1) Huh? For ASCII at least, titlecase seems to map to ASCII; in your current implementation, only two Latin-1 characters (u'\265' and u'\377', I have no easy way to show them in Latin-1) map outside the Latin-1 range. Anyway, I would suggest to add a title() call to 8-bit strings as well; then we can do away with string.capwords(), which does something similar but different, mostly by accident. --Guido van Rossum (home page: http://www.python.org/~guido/) From jack at oratrix.nl Wed Mar 1 13:34:42 2000 From: jack at oratrix.nl (Jack Jansen) Date: Wed, 01 Mar 2000 13:34:42 +0100 Subject: [Python-Dev] Re: A warning switch? In-Reply-To: Message by Guido van Rossum , Mon, 28 Feb 2000 12:35:12 -0500 , <200002281735.MAA27771@eric.cnri.reston.va.us> Message-ID: <20000301123442.7DEF8371868@snelboot.oratrix.nl> > > What about adding a command-line switch for enabling warnings, as has > > been suggested long ago? The .append() change could then print a > > warning in 1.6alphas (and betas?), but still run, and be turned into > > an error later. > > That's better. I propose that the warnings are normally on, and that > there are flags to turn them off or thrn them into errors. Can we then please have an interface to the "give warning" call (in stead of a simple fprintf)? On the mac (and possibly also in PythonWin) it's probably better to pop up a dialog (possibly with a "don't show again" button) than do a printf which may get lost. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From guido at python.org Wed Mar 1 13:55:42 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 01 Mar 2000 07:55:42 -0500 Subject: [Python-Dev] Re: A warning switch? In-Reply-To: Your message of "Wed, 01 Mar 2000 13:34:42 +0100." <20000301123442.7DEF8371868@snelboot.oratrix.nl> References: <20000301123442.7DEF8371868@snelboot.oratrix.nl> Message-ID: <200003011255.HAA13489@eric.cnri.reston.va.us> > Can we then please have an interface to the "give warning" call (in > stead of a simple fprintf)? On the mac (and possibly also in > PythonWin) it's probably better to pop up a dialog (possibly with a > "don't show again" button) than do a printf which may get lost. Sure. All you have to do is code it (or get someone else to code it). <0.9 wink> --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed Mar 1 14:32:02 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 01 Mar 2000 14:32:02 +0100 Subject: [Python-Dev] Unicode mapping tables References: <000701bf834a$77acdfe0$412d153f@tim> <38BCD71C.3592E6A@lemburg.com> <200003011227.HAA13396@eric.cnri.reston.va.us> Message-ID: <38BD1BD2.792E9B73@lemburg.com> Guido van Rossum wrote: > > > Here's what I'll do: > > > > * implement .capitalize() in the traditional way for Unicode > > objects (simply convert the first char to uppercase) > > * implement u.title() to mean the same as Java's toTitleCase() > > * don't implement s.title(): the reasoning here is that it would > > confuse the user when she get's different return values for > > the same string (titlecase chars usually live in higher Unicode > > code ranges not reachable in Latin-1) > > Huh? For ASCII at least, titlecase seems to map to ASCII; in your > current implementation, only two Latin-1 characters (u'\265' and > u'\377', I have no easy way to show them in Latin-1) map outside the > Latin-1 range. You're right, sorry for the confusion. I was thinking of other encodings like e.g. cp437 which have corresponding characters in the higher Unicode ranges. > Anyway, I would suggest to add a title() call to 8-bit strings as > well; then we can do away with string.capwords(), which does something > similar but different, mostly by accident. Ok, I'll do it this way then: s.title() will use C's toupper() and tolower() for case mapping and u.title() the Unicode routines. This will be in sync with the rest of the 8-bit string world (which is locale aware on many platforms AFAIK), even though it might not return the same string as the corresponding u.title() call. u.capwords() will be disabled in the Unicode implemetation... it wasn't even implemented for the string implementetation, so there's no breakage ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From akuchlin at mems-exchange.org Wed Mar 1 15:59:07 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Wed, 1 Mar 2000 09:59:07 -0500 (EST) Subject: [Python-Dev] breaking list.append() In-Reply-To: <011001bf835e$600d1da0$34aab5d4@hagrid> References: <011001bf835e$600d1da0$34aab5d4@hagrid> Message-ID: <14525.12347.120543.804804@amarok.cnri.reston.va.us> Fredrik Lundh writes: >yes, I think it's bad. I've been using Python since 1.2, >and no other change has had the same consequences >(wrt. time/money required to fix it) There are more things in 1.6 that might require fixing existing code: str(2L) returning '2', the int/long changes, the Unicode changes, and if it gets added, garbage collection -- and bugs caused by those changes might not be catchable by a nanny. IMHO it's too early to point at the .append() change as breaking too much existing code; there may be changes that break a lot more. I'd wait and see what happens once the 1.6 alphas become available; if c.l.p is filled with shrieks and groans, GvR might decide to back the offending change out. (Or he might not...) -- A.M. Kuchling http://starship.python.net/crew/amk/ I have no skills with machines. I fear them, and because I cannot help attributing human qualities to them, I suspect that they hate me and will kill me if they can. -- Robertson Davies, "Reading" From klm at digicool.com Wed Mar 1 16:37:49 2000 From: klm at digicool.com (Ken Manheimer) Date: Wed, 1 Mar 2000 10:37:49 -0500 (EST) Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: On Tue, 29 Feb 2000, Greg Stein wrote: > On Tue, 29 Feb 2000, Ken Manheimer wrote: > >... > > None the less, for those practicing it, the incorrectness of it will be > > fresh news. I would be less sympathetic with them if there was recent > > warning, eg, the schedule for changing it in the next release was part of > > the current release. But if you tell somebody you're going to change > > something, and then don't for a few years, you probably need to renew the > > warning before you make the change. Don't you think so? Why not? > > I agree. > > Note that Guido posted a note to c.l.py on Monday. I believe that meets > your notification criteria. Actually, by "part of the current release", i meant having the deprecation/impending-deletion warning in the release notes for the release before the one where the deletion happens - saying it's being deprecated now, will be deleted next time around. Ken klm at digicool.com I mean, you tell one guy it's blue. He tells his guy it's brown, and it lands on the page sorta purple. Wavy Gravy/Hugh Romney From marangoz at python.inrialpes.fr Wed Mar 1 18:07:07 2000 From: marangoz at python.inrialpes.fr (Vladimir Marangozov) Date: Wed, 1 Mar 2000 18:07:07 +0100 (CET) Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python In-Reply-To: <200003010544.AAA13155@eric.cnri.reston.va.us> from "Guido van Rossum" at Mar 01, 2000 12:44:10 AM Message-ID: <200003011707.SAA01310@python.inrialpes.fr> Guido van Rossum wrote: > > Thanks for the new patches, Neil! Thanks from me too! I notice, however, that hash_resize() still uses a malloc call instead of PyMem_NEW. Neil, please correct this in your version immediately ;-) > > We had a visitor here at CNRI today, Eric Tiedemann > , who had a look at your patches before. Eric > knows his way around the Scheme, Lisp and GC literature, and presented > a variant on your approach which takes the bite out of the recursive > passes. Avoiding the recursion is valuable, as long we're optimizing the implementation of one particular scheme. It doesn't bother me that Neil's scheme is recursive, because I still perceive his code as a proof of concept. You're presenting here another scheme based on refcounts arithmetic, generalized for all container types. The linked list implementation of this generalized scheme is not directly related to the logic. I have some suspitions on the logic, so you'll probably want to elaborate a bit more on it, and convince me that this scheme would actually work. > Today, Eric proposed to do away with Neil's hash table altogether -- > as long as we're wasting memory, we might as well add 3 fields to each > container object rather than allocating the same amount in a separate > hash table. I cannot agree so easily with this statement, but you should have expecting this from me :-) If we're about to opimize storage, I have good reasons to believe that we don't need 3 additional slots per container (but 1 for gc_refs, yes). We could certainly envision allocating the containers within memory pools of 4K (just as it is done in pymalloc, and close to what we have for ints & floats). These pools would be labaled as "container's memory", they would obviously be under our control, and we'd have additional slots per pool, not per object. As long as we isolate the containers from the rest, we can enumerate them easily by walking though the pools. But I'm willing to defer this question for now, as it involves the object allocators (the builtin allocators + PyObject_NEW for extension types E -- user objects of type E would be automatically taken into account for GC if there's a flag in the type struct which identifies them as containers). > Eric expects that this will run faster, although this obviously needs > to be tried. Definitely, although I trust Eric & Tim :-) > > Container types are: dict, list, tuple, class, instance; plus > potentially user-defined container types such as kjbuckets. I have a > feeling that function objects should also be considered container > types, because of the cycle involving globals. + other extension container types. And I insist. Don't forget that we're planning to merge types and classes... > > Eric's algorithm, then, consists of the following parts. > > Each container object has three new fields: gc_next, gc_prev, and > gc_refs. (Eric calls the gc_refs "refcount-zero".) > > We color objects white (initial), gray (root), black (scanned root). > (The terms are explained later; we believe we don't actually need bits > in the objects to store the color; see later.) > > All container objects are chained together in a doubly-linked list -- > this is the same as Neil's code except Neil does it only for dicts. > (Eric postulates that you need a list header.) > > When GC is activated, all objects are colored white; we make a pass > over the entire list and set gc_refs equal to the refcount for each > object. Step 1: for all containers, c->gc_refs = c->ob_refcnt > > Next, we make another pass over the list to collect the internal > references. Internal references are (just like in Neil's version) > references from other container types. In Neil's version, this was > recursive; in Eric's version, we don't need recursion, since the list > already contains all containers. So we simple visit the containers in > the list in turn, and for each one we go over all the objects it > references and subtract one from *its* gc_refs field. (Eric left out > the little detail that we ened to be able to distinguish between > container and non-container objects amongst those references; this can > be a flag bit in the type field.) Step 2: c->gc_refs = c->gc_refs - Nb_referenced_containers_from_c I guess that you realize that after this step, gc_refs can be zero or negative. I'm not sure that you collect "internal" references here (references from other container types). A list referencing 20 containers, being itself referenced by one container + one static variable + two times from the runtime stack, has an initial refcount == 4, so we'll end up with gc_refs == -16. A tuple referencing 1 list, referenced once by the stack, will end up with gc_refs == 0. Neil's scheme doesn't seem to have this "property". > > Now, similar to Neil's version, all objects for which gc_refs == 0 > have only internal references, and are potential garbage; all objects > for which gc_refs > 0 are "roots". These have references to them from > other places, e.g. from globals or stack frames in the Python virtual > machine. > Agreed, some roots have gc_refs > 0 I'm not sure that all of them have it, though... Do they? > We now start a second list, to which we will move all roots. The way > to do this is to go over the first list again and to move each object > that has gc_refs > 0 to the second list. Objects placed on the second > list in this phase are considered colored gray (roots). > Step 3: Roots with gc_refs > 0 go to the 2nd list. All c->gc_refs <= 0 stay in the 1st list. > Of course, some roots will reference some non-roots, which keeps those > non-roots alive. We now make a pass over the second list, where for > each object on the second list, we look at every object it references. > If a referenced object is a container and is still in the first list > (colored white) we *append* it to the second list (colored gray). > Because we append, objects thus added to the second list will > eventually be considered by this same pass; when we stop finding > objects that sre still white, we stop appending to the second list, > and we will eventually terminate this pass. Conceptually, objects on > the second list that have been scanned in this pass are colored black > (scanned root); but there is no need to to actually make the > distinction. > Step 4: Closure on reachable containers which are all moved to the 2nd list. (Assuming that the objects are checked only via their type, without involving gc_refs) > (How do we know whether an object pointed to is white (in the first > list) or gray or black (in the second)? Good question? :-) > We could use an extra bitfield, but that's a waste of space. > Better: we could set gc_refs to a magic value (e.g. 0xffffffff) when > we move the object to the second list. I doubt that this would work for the reasons mentioned above. > During the meeting, I proposed to set the back pointer to NULL; that > might work too but I think the gc_refs field is more elegant. We could > even just test for a non-zero gc_refs field; the roots moved to the > second list initially all have a non-zero gc_refs field already, and > for the objects with a zero gc_refs field we could indeed set it to > something arbitrary.) Not sure that "arbitrary" is a good choice if the differentiation is based solely on gc_refs. > > Once we reach the end of the second list, all objects still left in > the first list are garbage. We can destroy them in a similar to the > way Neil does this in his code. Neil calls PyDict_Clear on the > dictionaries, and ignores the rest. Under Neils assumption that all > cycles (that he detects) involve dictionaries, that is sufficient. In > our case, we may need a type-specific "clear" function for containers > in the type object. Couldn't this be done in the object's dealloc function? Note that both Neil's and this scheme assume that garbage _detection_ and garbage _collection_ is an atomic operation. I must say that I don't care of having some living garbage if it doesn't hurt my work. IOW, the used criterion for triggering the detection phase _may_ eventually differ from the one used for the collection phase. But this is where we reach the incremental approaches, implying different reasoning as a whole. My point is that the introduction of a "clear" function depends on the adopted scheme, whose logic depends on pertinent statistics on memory consumption of the cyclic garbage. To make it simple, we first need stats on memory consumption, then we can discuss objectively on how to implement some particular GC scheme. I second Eric on the need for excellent statistics. > > The general opinion was that we should first implement and test the > algorithm as sketched above, and then changes or extensions could be > made. I'd like to see it discussed first in conjunction with (1) the possibility of having a proprietary malloc, (2) the envisioned type/class unification. Perhaps I'm getting too deep, but once something gets in, it's difficult to take it out, even when a better solution is found subsequently. Although I'm enthousiastic about this work on GC, I'm not in a position to evaluate the true benefits of the proposed schemes, as I still don't have a basis for evaluating how much garbage my program generates and whether it hurts the interpreter compared to its overal memory consumption. > > I was pleasantly surprised to find Neil's code in my inbox when we > came out of the meeting; I think it would be worthwhile to compare and > contrast the two approaches. (Hm, maybe there's a paper in it?) I'm all for it! -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From jeremy at cnri.reston.va.us Wed Mar 1 18:53:13 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Wed, 1 Mar 2000 12:53:13 -0500 (EST) Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python In-Reply-To: <200003011707.SAA01310@python.inrialpes.fr> References: <200003010544.AAA13155@eric.cnri.reston.va.us> <200003011707.SAA01310@python.inrialpes.fr> Message-ID: <14525.22793.963077.707198@goon.cnri.reston.va.us> >>>>> "VM" == Vladimir Marangozov writes: [">>" == Guido explaining Eric Tiedemann's GC design] >> Next, we make another pass over the list to collect the internal >> references. Internal references are (just like in Neil's >> version) references from other container types. In Neil's >> version, this was recursive; in Eric's version, we don't need >> recursion, since the list already contains all containers. So we >> simple visit the containers in the list in turn, and for each one >> we go over all the objects it references and subtract one from >> *its* gc_refs field. (Eric left out the little detail that we >> ened to be able to distinguish between container and >> non-container objects amongst those references; this can be a >> flag bit in the type field.) VM> Step 2: c->gc_refs = c->gc_refs - VM> Nb_referenced_containers_from_c VM> I guess that you realize that after this step, gc_refs can be VM> zero or negative. I think Guido's explanation is slightly ambiguous. When he says, "subtract one from *its" gc_refs field" he means subtract one from the _contained_ object's gc_refs field. VM> I'm not sure that you collect "internal" references here VM> (references from other container types). A list referencing 20 VM> containers, being itself referenced by one container + one VM> static variable + two times from the runtime stack, has an VM> initial refcount == 4, so we'll end up with gc_refs == -16. The strategy is not that the container's gc_refs is decremented once for each object it contains. Rather, the container decrements each contained object's gc_refs by one. So you should never end of with gc_refs < 0. >> During the meeting, I proposed to set the back pointer to NULL; >> that might work too but I think the gc_refs field is more >> elegant. We could even just test for a non-zero gc_refs field; >> the roots moved to the second list initially all have a non-zero >> gc_refs field already, and for the objects with a zero gc_refs >> field we could indeed set it to something arbitrary.) I believe we discussed this further and concluded that setting the back pointer to NULL would not work. If we make the second list doubly-linked (like the first one), it is trivial to end GC by swapping the first and second lists. If we've zapped the NULL pointer, then we have to go back and re-set them all. Jeremy From mal at lemburg.com Wed Mar 1 19:44:58 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 01 Mar 2000 19:44:58 +0100 Subject: [Python-Dev] Unicode Snapshot 2000-03-01 Message-ID: <38BD652A.EA2EB0A3@lemburg.com> There is a new Unicode implementation snaphot available at the secret URL. It contains quite a few small changes to the internal APIs, doc strings for all methods and some new methods (e.g. .title()) on the Unicode and the string objects. The code page mappings are now integer->integer which should make them more performant. Some of the C codec APIs have changed, so you may need to adapt code that already uses these (Fredrik ?!). Still missing is a MSVC project file... haven't gotten around yet to build one. The code does compile on WinXX though, as Finn Bock told me in private mail. Please try out the new stuff... Most interesting should be the code in Lib/codecs.py as it provides a very high level interface to all those builtin codecs. BTW: I would like to implement a .readline() method using only the .read() method as basis. Does anyone have a good idea on how this could be done without buffering ? (Unicode has a slightly larger choice of line break chars as C; the .splitlines() method will deal with these) Gotta run... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From effbot at telia.com Wed Mar 1 20:20:12 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 1 Mar 2000 20:20:12 +0100 Subject: [Python-Dev] breaking list.append() References: <011001bf835e$600d1da0$34aab5d4@hagrid> <14525.12347.120543.804804@amarok.cnri.reston.va.us> Message-ID: <034a01bf83b3$e97c8620$34aab5d4@hagrid> Andrew M. Kuchling wrote: > There are more things in 1.6 that might require fixing existing code: > str(2L) returning '2', the int/long changes, the Unicode changes, and > if it gets added, garbage collection -- and bugs caused by those > changes might not be catchable by a nanny. hey, you make it sound like "1.6" should really be "2.0" ;-) From nascheme at enme.ucalgary.ca Wed Mar 1 20:29:02 2000 From: nascheme at enme.ucalgary.ca (nascheme at enme.ucalgary.ca) Date: Wed, 1 Mar 2000 12:29:02 -0700 Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python In-Reply-To: <200003011707.SAA01310@python.inrialpes.fr>; from marangoz@python.inrialpes.fr on Wed, Mar 01, 2000 at 06:07:07PM +0100 References: <200003010544.AAA13155@eric.cnri.reston.va.us> <200003011707.SAA01310@python.inrialpes.fr> Message-ID: <20000301122902.B7773@acs.ucalgary.ca> On Wed, Mar 01, 2000 at 06:07:07PM +0100, Vladimir Marangozov wrote: > Guido van Rossum wrote: > > Once we reach the end of the second list, all objects still left in > > the first list are garbage. We can destroy them in a similar to the > > way Neil does this in his code. Neil calls PyDict_Clear on the > > dictionaries, and ignores the rest. Under Neils assumption that all > > cycles (that he detects) involve dictionaries, that is sufficient. In > > our case, we may need a type-specific "clear" function for containers > > in the type object. > > Couldn't this be done in the object's dealloc function? No, I don't think so. The object still has references to it. You have to be careful about how you break cycles so that memory is not accessed after it is freed. Neil -- "If elected mayor, my first act will be to kill the whole lot of you, and burn your town to cinders!" -- Groundskeeper Willie From gvwilson at nevex.com Wed Mar 1 21:19:30 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Wed, 1 Mar 2000 15:19:30 -0500 (EST) Subject: [Python-Dev] DDJ article on Python GC Message-ID: Jon Erickson (editor-in-chief) of "Doctor Dobb's Journal" would like an article on what's involved in adding garbage collection to Python. Please email me if you're interested in tackling it... Thanks, Greg From fdrake at acm.org Wed Mar 1 21:37:49 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 1 Mar 2000 15:37:49 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src Makefile.in,1.82,1.83 In-Reply-To: References: <14523.56638.286603.340358@weyr.cnri.reston.va.us> Message-ID: <14525.32669.909212.716484@weyr.cnri.reston.va.us> Greg Stein writes: > Isn't the documentation better than what has been released? In other > words, if you release now, how could you make things worse? If something > does turn up during a check, you can always release again... Releasing is still somewhat tedious, and I don't want to ask people to do several substantial downloads & installs. So far, a major navigation bug has been fonud in the test version I posted (just now fixed online); *thats* why I don't like to release too hastily! I don't think waiting two more weeks is a problem. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido at python.org Wed Mar 1 23:53:26 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 01 Mar 2000 17:53:26 -0500 Subject: [Python-Dev] DDJ article on Python GC In-Reply-To: Your message of "Wed, 01 Mar 2000 15:19:30 EST." References: Message-ID: <200003012253.RAA16056@eric.cnri.reston.va.us> > Jon Erickson (editor-in-chief) of "Doctor Dobb's Journal" would like an > article on what's involved in adding garbage collection to Python. Please > email me if you're interested in tackling it... I might -- although I should get Neil, Eric and Tim as co-authors. I'm halfway implementing the scheme that Eric showed yesterday. It's very elegant, but I don't have an idea about its impact performance yet. Say hi to Jon -- we've met a few times. I liked his March editorial, having just read the same book and had the same feeling of "wow, an open source project in the 19th century!" --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond at skippinet.com.au Thu Mar 2 00:09:23 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 2 Mar 2000 10:09:23 +1100 Subject: [Python-Dev] Re: A warning switch? In-Reply-To: <200003011255.HAA13489@eric.cnri.reston.va.us> Message-ID: > > Can we then please have an interface to the "give warning" call (in > > stead of a simple fprintf)? On the mac (and possibly also in > > PythonWin) it's probably better to pop up a dialog (possibly with a > > "don't show again" button) than do a printf which may get lost. > > Sure. All you have to do is code it (or get someone else to code it). How about just having either a "sys.warning" function, or maybe even a sys.stdwarn stream? Then a simple C API to call this, and we are done :-) sys.stdwarn sounds OK - it just defaults to sys.stdout, so the Mac and Pythonwin etc should "just work" by sending the output wherever sys.stdout goes today... Mark. From tim_one at email.msn.com Thu Mar 2 06:08:39 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 2 Mar 2000 00:08:39 -0500 Subject: [Python-Dev] breaking list.append() In-Reply-To: <38BCF3A4.1CCADFCE@lemburg.com> Message-ID: <001001bf8405$5f9582c0$732d153f@tim> [/F] > append = list.append > for x in something: > append(...) [M.-A. Lemburg] > Same here. checkappend.py doesn't find these As detailed in a c.l.py posting, I have yet to find a single instance of this actually called with multiple arguments. Pointing out that it's *possible* isn't the same as demonstrating it's an actual problem. I'm quite willing to believe that it is, but haven't yet seen evidence of it. For whatever reason, people seem much (and, in my experience so far, infinitely ) more prone to make the list.append(1, 2, 3) error than the maybethisisanappend(1, 2, 3) error. > (a great tool BTW, thanks Tim; I noticed that it leaks memory badly > though). Which Python? Which OS? How do you know? What were you running it over? Using 1.5.2 under Win95, according to wintop, & over the whole CVS tree, the total (code + data) virtual memory allocated to it peaked at about 2Mb a few seconds into the run, and actually decreased as time went on. So, akin to the bound method multi-argument append problem, the "checkappend leak problem" is something I simply have no reason to believe . Check your claim again? checkappend.py itself obviously creates no cycles or holds on to any state across files, so if you're seeing a leak it must be a bug in some other part of the version of Python + std libraries you're using. Maybe a new 1.6 bug? Something you did while adding Unicode? Etc. Tell us what you were running. Has anyone else seen a leak? From tim_one at email.msn.com Thu Mar 2 06:50:19 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 2 Mar 2000 00:50:19 -0500 Subject: [Python-Dev] str vs repr at prompt again (FW: String printing behavior?) Message-ID: <001401bf840b$3177ba60$732d153f@tim> Another unsolicited testimonial that countless users are oppressed by auto-repr (as opposed to auto-str) at the interpreter prompt. Just trying to keep a once-hot topic from going stone cold forever . -----Original Message----- From: python-list-admin at python.org [mailto:python-list-admin at python.org] On Behalf Of Ted Drain Sent: Wednesday, March 01, 2000 5:42 PM To: python-list at python.org Subject: String printing behavior? Hi all, I've got a question about the string printing behavior. If I define a functions as: >>> def foo(): ... return "line1\nline2" >>> foo() 'line1\013line2' >>> print foo() line1 line2 >>> It seems to me that the default printing behavior for strings should match behavior of the print routine. I realize that some people may want to see embedded control codes, but I would advocate a seperate method for printing raw byte sequences. We are using the python interactive prompt as a pseudo-matlab like user interface and the current printing behavior is very confusing to users. It also means that functions that return text (like help routines) must print the string rather than returning it. Returning the string is much more flexible because it allows the string to be captured easily and redirected. Any thoughts? Ted -- Ted Drain Jet Propulsion Laboratory Ted.Drain at jpl.nasa.gov -- http://www.python.org/mailman/listinfo/python-list From mal at lemburg.com Thu Mar 2 08:42:33 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 02 Mar 2000 08:42:33 +0100 Subject: [Python-Dev] breaking list.append() References: <001001bf8405$5f9582c0$732d153f@tim> Message-ID: <38BE1B69.E0B88B41@lemburg.com> Tim Peters wrote: > > [/F] > > append = list.append > > for x in something: > > append(...) > > [M.-A. Lemburg] > > Same here. checkappend.py doesn't find these > > As detailed in a c.l.py posting, I have yet to find a single instance of > this actually called with multiple arguments. Pointing out that it's > *possible* isn't the same as demonstrating it's an actual problem. I'm > quite willing to believe that it is, but haven't yet seen evidence of it. Haven't had time to check this yet, but I'm pretty sure there are some instances of this idiom in my code. Note that I did in fact code like this on purpose: it saves a tuple construction for every append, which can make a difference in tight loops... > For whatever reason, people seem much (and, in my experience so far, > infinitely ) more prone to make the > > list.append(1, 2, 3) > > error than the > > maybethisisanappend(1, 2, 3) > > error. Of course... still there are hidden instances of the problem which are yet to be revealed. For my own code the siutation is even worse, since I sometimes did: add = list.append for x in y: add(x,1,2) > > (a great tool BTW, thanks Tim; I noticed that it leaks memory badly > > though). > > Which Python? Which OS? How do you know? What were you running it over? That's Python 1.5 on Linux2. I let the script run over a large lib directory and my projects directory. In the projects directory the script consumed as much as 240MB of process size. > Using 1.5.2 under Win95, according to wintop, & over the whole CVS tree, the > total (code + data) virtual memory allocated to it peaked at about 2Mb a few > seconds into the run, and actually decreased as time went on. So, akin to > the bound method multi-argument append problem, the "checkappend leak > problem" is something I simply have no reason to believe . Check your > claim again? checkappend.py itself obviously creates no cycles or holds on > to any state across files, so if you're seeing a leak it must be a bug in > some other part of the version of Python + std libraries you're using. > Maybe a new 1.6 bug? Something you did while adding Unicode? Etc. Tell us > what you were running. I'll try the same thing again using Python1.5.2 and the CVS version. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Thu Mar 2 08:46:49 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 02 Mar 2000 08:46:49 +0100 Subject: [Python-Dev] breaking list.append() References: <001001bf8405$5f9582c0$732d153f@tim> <38BE1B69.E0B88B41@lemburg.com> Message-ID: <38BE1C69.C8A9E6B0@lemburg.com> "M.-A. Lemburg" wrote: > > > > (a great tool BTW, thanks Tim; I noticed that it leaks memory badly > > > though). > > > > Which Python? Which OS? How do you know? What were you running it over? > > That's Python 1.5 on Linux2. I let the script run over > a large lib directory and my projects directory. In the > projects directory the script consumed as much as 240MB > of process size. > > > Using 1.5.2 under Win95, according to wintop, & over the whole CVS tree, the > > total (code + data) virtual memory allocated to it peaked at about 2Mb a few > > seconds into the run, and actually decreased as time went on. So, akin to > > the bound method multi-argument append problem, the "checkappend leak > > problem" is something I simply have no reason to believe . Check your > > claim again? checkappend.py itself obviously creates no cycles or holds on > > to any state across files, so if you're seeing a leak it must be a bug in > > some other part of the version of Python + std libraries you're using. > > Maybe a new 1.6 bug? Something you did while adding Unicode? Etc. Tell us > > what you were running. > > I'll try the same thing again using Python1.5.2 and the CVS version. Using the Unicode patched CVS version there's no leak anymore. Couldn't find a 1.5.2 version on my machine... I'll build one later. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Thu Mar 2 16:32:32 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 02 Mar 2000 10:32:32 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? Message-ID: <200003021532.KAA17088@eric.cnri.reston.va.us> I was looking at the code that invokes __del__, with the intent to implement a feature from Java: in Java, a finalizer is only called once per object, even if calling it makes the object live longer. To implement this, we need a flag in each instance that means "__del__ was called". I opened the creation code for instances, looking for the right place to set the flag. I then realized that it might be smart, now that we have this flag anyway, to set it to "true" during initialization. There are a number of exits from the initialization where the object is created but not fully initialized, where the new object is DECREF'ed and NULL is returned. When such an exit is taken, __del__ is called on an incompletely initialized object! Example: >>> class C: def __del__(self): print "deleting", self >>> x = C(1) !--> deleting <__main__.C instance at 1686d8> Traceback (innermost last): File "", line 1, in ? TypeError: this constructor takes no arguments >>> Now I have a choice to make. If the class has an __init__, should I clear the flag only after __init__ succeeds? This means that if __init__ raises an exception, __del__ is never called. This is an incompatibility. It's possible that someone has written code that relies on __del__ being called even when __init__ fails halfway, and then their code would break. But it is just as likely that calling __del__ on a partially uninitialized object is a bad mistake, and I am doing all these cases a favor by not calling __del__ when __init__ failed! Any opinions? If nobody speaks up, I'll make the change. --Guido van Rossum (home page: http://www.python.org/~guido/) From bwarsaw at cnri.reston.va.us Thu Mar 2 17:44:00 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 2 Mar 2000 11:44:00 -0500 (EST) Subject: [Python-Dev] Design question: call __del__ only after successful __init__? References: <200003021532.KAA17088@eric.cnri.reston.va.us> Message-ID: <14526.39504.36065.657527@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> Now I have a choice to make. If the class has an __init__, GvR> should I clear the flag only after __init__ succeeds? This GvR> means that if __init__ raises an exception, __del__ is never GvR> called. This is an incompatibility. It's possible that GvR> someone has written code that relies on __del__ being called GvR> even when __init__ fails halfway, and then their code would GvR> break. It reminds me of the separation between object allocation and initialization in ObjC. GvR> But it is just as likely that calling __del__ on a partially GvR> uninitialized object is a bad mistake, and I am doing all GvR> these cases a favor by not calling __del__ when __init__ GvR> failed! GvR> Any opinions? If nobody speaks up, I'll make the change. I think you should set the flag right before you call __init__(), i.e. after (nearly all) the C level initialization has occurred. Here's why: your "favor" can easily be accomplished by Python constructs in the __init__(): class MyBogo: def __init__(self): self.get_delified = 0 do_sumtin_exceptional() self.get_delified = 1 def __del__(self): if self.get_delified: ah_sweet_release() -Barry From gstein at lyra.org Thu Mar 2 18:14:35 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 2 Mar 2000 09:14:35 -0800 (PST) Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: <200003021532.KAA17088@eric.cnri.reston.va.us> Message-ID: On Thu, 2 Mar 2000, Guido van Rossum wrote: >... > But it is just as likely that calling __del__ on a partially > uninitialized object is a bad mistake, and I am doing all these cases > a favor by not calling __del__ when __init__ failed! > > Any opinions? If nobody speaks up, I'll make the change. +1 on calling __del__ IFF __init__ completes successfully. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jeremy at cnri.reston.va.us Thu Mar 2 18:15:14 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Thu, 2 Mar 2000 12:15:14 -0500 (EST) Subject: [Python-Dev] str vs repr at prompt again (FW: String printing behavior?) In-Reply-To: <001401bf840b$3177ba60$732d153f@tim> References: <001401bf840b$3177ba60$732d153f@tim> Message-ID: <14526.41378.374653.497993@goon.cnri.reston.va.us> >>>>> "TP" == Tim Peters writes: TP> Another unsolicited testimonial that countless users are TP> oppressed by auto-repr (as opposed to auto-str) at the TP> interpreter prompt. Just trying to keep a once-hot topic from TP> going stone cold forever . [Signature from the included message:] >> -- Ted Drain Jet Propulsion Laboratory Ted.Drain at jpl.nasa.gov -- This guy is probably a rocket scientist. We want the language to be useful for everybody, not just rocket scientists. Jeremy From guido at python.org Thu Mar 2 23:45:37 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 02 Mar 2000 17:45:37 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: Your message of "Thu, 02 Mar 2000 11:44:00 EST." <14526.39504.36065.657527@anthem.cnri.reston.va.us> References: <200003021532.KAA17088@eric.cnri.reston.va.us> <14526.39504.36065.657527@anthem.cnri.reston.va.us> Message-ID: <200003022245.RAA20265@eric.cnri.reston.va.us> > >>>>> "GvR" == Guido van Rossum writes: > > GvR> Now I have a choice to make. If the class has an __init__, > GvR> should I clear the flag only after __init__ succeeds? This > GvR> means that if __init__ raises an exception, __del__ is never > GvR> called. This is an incompatibility. It's possible that > GvR> someone has written code that relies on __del__ being called > GvR> even when __init__ fails halfway, and then their code would > GvR> break. [Barry] > It reminds me of the separation between object allocation and > initialization in ObjC. Is that good or bad? > GvR> But it is just as likely that calling __del__ on a partially > GvR> uninitialized object is a bad mistake, and I am doing all > GvR> these cases a favor by not calling __del__ when __init__ > GvR> failed! > > GvR> Any opinions? If nobody speaks up, I'll make the change. > > I think you should set the flag right before you call __init__(), > i.e. after (nearly all) the C level initialization has occurred. > Here's why: your "favor" can easily be accomplished by Python > constructs in the __init__(): > > class MyBogo: > def __init__(self): > self.get_delified = 0 > do_sumtin_exceptional() > self.get_delified = 1 > > def __del__(self): > if self.get_delified: > ah_sweet_release() But the other behavior (call __del__ even when __init__ fails) can also easily be accomplished in Python: class C: def __init__(self): try: ...stuff that may fail... except: self.__del__() raise def __del__(self): ...cleanup... I believe that in almost all cases the programmer would be happier if __del__ wasn't called when their __init__ fails. This makes it easier to write a __del__ that can assume that all the object's fields have been properly initialized. In my code, typically when __init__ fails, this is a symptom of a really bad bug (e.g. I just renamed one of __init__'s arguments and forgot to fix all references), and I don't care much about cleanup behavior. --Guido van Rossum (home page: http://www.python.org/~guido/) From bwarsaw at cnri.reston.va.us Thu Mar 2 23:52:31 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Thu, 2 Mar 2000 17:52:31 -0500 (EST) Subject: [Python-Dev] Design question: call __del__ only after successful __init__? References: <200003021532.KAA17088@eric.cnri.reston.va.us> <14526.39504.36065.657527@anthem.cnri.reston.va.us> <200003022245.RAA20265@eric.cnri.reston.va.us> Message-ID: <14526.61615.362973.624022@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> But the other behavior (call __del__ even when __init__ GvR> fails) can also easily be accomplished in Python: It's a fair cop. GvR> I believe that in almost all cases the programmer would be GvR> happier if __del__ wasn't called when their __init__ fails. GvR> This makes it easier to write a __del__ that can assume that GvR> all the object's fields have been properly initialized. That's probably fine; I don't have strong feelings either way. -Barry P.S. Interesting what X-Oblique-Strategy was randomly inserted in this message (but I'm not sure which approach is more "explicit" :). -Barry From tim_one at email.msn.com Fri Mar 3 06:38:59 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 3 Mar 2000 00:38:59 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: <200003021532.KAA17088@eric.cnri.reston.va.us> Message-ID: <000001bf84d2$c711e2e0$092d153f@tim> [Guido] > I was looking at the code that invokes __del__, with the intent to > implement a feature from Java: in Java, a finalizer is only called > once per object, even if calling it makes the object live longer. Why? That is, in what way is this an improvement over current behavior? Note that Java is a bit subtle: a finalizer is only called once by magic; explicit calls "don't count". The Java rules add up to quite a confusing mish-mash. Python's rules are *currently* clearer. I deal with possible exceptions in Python constructors the same way I do in C++ and Java: if there's a destructor, don't put anything in __init__ that may raise an uncaught exception. Anything dangerous is moved into a separate .reset() (or .clear() or ...) method. This works well in practice. > To implement this, we need a flag in each instance that means "__del__ > was called". At least . > I opened the creation code for instances, looking for the right place > to set the flag. I then realized that it might be smart, now that we > have this flag anyway, to set it to "true" during initialization. There > are a number of exits from the initialization where the object is created > but not fully initialized, where the new object is DECREF'ed and NULL is > returned. When such an exit is taken, __del__ is called on an > incompletely initialized object! I agree *that* isn't good. Taken on its own, though, it argues for adding an "instance construction completed" flag that __del__ later checks, as if its body were: if self.__instance_construction_completed: body That is, the problem you've identified here could be addressed directly. > Now I have a choice to make. If the class has an __init__, should I > clear the flag only after __init__ succeeds? This means that if > __init__ raises an exception, __del__ is never called. This is an > incompatibility. It's possible that someone has written code that > relies on __del__ being called even when __init__ fails halfway, and > then their code would break. > > But it is just as likely that calling __del__ on a partially > uninitialized object is a bad mistake, and I am doing all these cases > a favor by not calling __del__ when __init__ failed! > > Any opinions? If nobody speaks up, I'll make the change. I'd be in favor of fixing the actual problem; I don't understand the point to the rest of it, especially as it has the potential to break existing code and I don't see a compensating advantage (surely not compatibility w/ JPython -- JPython doesn't invoke __del__ methods at all by magic, right? or is that changing, and that's what's driving this?). too-much-magic-is-dizzying-ly y'rs - tim From bwarsaw at cnri.reston.va.us Fri Mar 3 06:50:16 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 3 Mar 2000 00:50:16 -0500 (EST) Subject: [Python-Dev] Design question: call __del__ only after successful __init__? References: <200003021532.KAA17088@eric.cnri.reston.va.us> <000001bf84d2$c711e2e0$092d153f@tim> Message-ID: <14527.21144.9421.958311@anthem.cnri.reston.va.us> >>>>> "TP" == Tim Peters writes: TP> (surely not compatibility w/ JPython -- JPython doesn't invoke TP> __del__ methods at all by magic, right? or is that changing, TP> and that's what's driving this?). No, JPython doesn't invoke __del__ methods by magic, and I don't have any plans to change that. -Barry From ping at lfw.org Fri Mar 3 10:00:21 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 3 Mar 2000 01:00:21 -0800 (PST) Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: Message-ID: On Thu, 2 Mar 2000, Greg Stein wrote: > On Thu, 2 Mar 2000, Guido van Rossum wrote: > >... > > But it is just as likely that calling __del__ on a partially > > uninitialized object is a bad mistake, and I am doing all these cases > > a favor by not calling __del__ when __init__ failed! > > > > Any opinions? If nobody speaks up, I'll make the change. > > +1 on calling __del__ IFF __init__ completes successfully. That would be my vote as well. What convinced me of this is the following: If it's up to the implementation of __del__ to deal with a problem that happened during initialization, you only know about the problem with very coarse granularity. It's a pain (or even impossible) to then rediscover the information you need to recover adequately. If on the other hand you deal with the problem in __init__, then you have much better control over what is happening, because you can position try/except blocks precisely where you need them to deal with specific potential problems. Each block can take care of its case appropriately, and re-raise if necessary. In general, it seems to me that what you want to do when __init__ runs afoul is going to be different from what you want to do to take care of object cleanup in __del__. So it doesn't belong there -- it belongs in an except: clause in __init__. Even though it's an incompatibility, i really think this is the right behaviour. -- ?!ng "To be human is to continually change. Your desire to remain as you are is what ultimately limits you." -- The Puppet Master, Ghost in the Shell From guido at python.org Fri Mar 3 17:13:16 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 03 Mar 2000 11:13:16 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: Your message of "Fri, 03 Mar 2000 00:38:59 EST." <000001bf84d2$c711e2e0$092d153f@tim> References: <000001bf84d2$c711e2e0$092d153f@tim> Message-ID: <200003031613.LAA21571@eric.cnri.reston.va.us> > [Guido] > > I was looking at the code that invokes __del__, with the intent to > > implement a feature from Java: in Java, a finalizer is only called > > once per object, even if calling it makes the object live longer. [Tim] > Why? That is, in what way is this an improvement over current behavior? > > Note that Java is a bit subtle: a finalizer is only called once by magic; > explicit calls "don't count". Of course. Same in my proposal. But I wouldn't call it "by magic" -- just "on behalf of the garbage collector". > The Java rules add up to quite a confusing mish-mash. Python's rules are > *currently* clearer. I don't find the Java rules confusing. It seems quite useful that the GC promises to call the finalizer at most once -- this can simplify the finalizer logic. (Otherwise it may have to ask itself, "did I clean this already?" and leave notes for itself.) Explicit finalizer calls are always a mistake and thus "don't count" -- the response to that should in general be "don't do that" (unless you have particularly stupid callers -- or very fearful lawyers :-). > I deal with possible exceptions in Python constructors the same way I do in > C++ and Java: if there's a destructor, don't put anything in __init__ that > may raise an uncaught exception. Anything dangerous is moved into a > separate .reset() (or .clear() or ...) method. This works well in practice. Sure, but the rule "if __init__ fails, __del__ won't be called" means that we don't have to program our __init__ or __del__ quite so defensively. Most people who design a __del__ probably assume that __init__ has run to completion. The typical scenario (which has happened to me! And I *implemented* the damn thing!) is this: __init__ opens a file and assigns it to an instance variable; __del__ closes the file. This is tested a few times and it works great. Now in production the file somehow unexpectedly fails to be openable. Sure, the programmer should've expected that, but she didn't. Now, at best, the failed __del__ creates an additional confusing error message on top of the traceback generated by IOError. At worst, the failed __del__ could wreck the original traceback. Note that I'm not proposing to change the C level behavior; when a Py_New() function is halfway its initialization and decides to bail out, it does a DECREF(self) and you bet that at this point the _dealloc() function gets called (via self->ob_type->tp_dealloc). Occasionally I need to initialize certain fields to NULL so that the dealloc() function doesn't try to free memory that wasn't allocated. Often it's as simple as using XDECREF instead of DECREF in the dealloc() function (XDECREF is safe when the argument is NULL, DECREF dumps core, saving a load-and-test if you are sure its arg is a valid object). > > To implement this, we need a flag in each instance that means "__del__ > > was called". > > At least . > > > I opened the creation code for instances, looking for the right place > > to set the flag. I then realized that it might be smart, now that we > > have this flag anyway, to set it to "true" during initialization. There > > are a number of exits from the initialization where the object is created > > but not fully initialized, where the new object is DECREF'ed and NULL is > > returned. When such an exit is taken, __del__ is called on an > > incompletely initialized object! > > I agree *that* isn't good. Taken on its own, though, it argues for adding > an "instance construction completed" flag that __del__ later checks, as if > its body were: > > if self.__instance_construction_completed: > body > > That is, the problem you've identified here could be addressed directly. Sure -- but I would argue that when __del__ returns, __instance_construction_completed should be reset to false, because the destruction (conceptually, at least) cancels out the construction! > > Now I have a choice to make. If the class has an __init__, should I > > clear the flag only after __init__ succeeds? This means that if > > __init__ raises an exception, __del__ is never called. This is an > > incompatibility. It's possible that someone has written code that > > relies on __del__ being called even when __init__ fails halfway, and > > then their code would break. > > > > But it is just as likely that calling __del__ on a partially > > uninitialized object is a bad mistake, and I am doing all these cases > > a favor by not calling __del__ when __init__ failed! > > > > Any opinions? If nobody speaks up, I'll make the change. > > I'd be in favor of fixing the actual problem; I don't understand the point > to the rest of it, especially as it has the potential to break existing code > and I don't see a compensating advantage (surely not compatibility w/ > JPython -- JPython doesn't invoke __del__ methods at all by magic, right? > or is that changing, and that's what's driving this?). JPython's a red herring here. I think that the proposed change probably *fixes* much morecode that is subtly wrong than it breaks code that is relying on __del__ being called after a partial __init__. All the rules relating to __del__ are confusing (e.g. what __del__ can expect to survive in its globals). Also note Ping's observation: | If it's up to the implementation of __del__ to deal with a problem | that happened during initialization, you only know about the problem | with very coarse granularity. It's a pain (or even impossible) to | then rediscover the information you need to recover adequately. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one at email.msn.com Fri Mar 3 17:49:52 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 3 Mar 2000 11:49:52 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: <200003031613.LAA21571@eric.cnri.reston.va.us> Message-ID: <000501bf8530$7f8c78a0$b0a0143f@tim> [Tim] >> Note that Java is a bit subtle: a finalizer is only called >> once by magic; explicit calls "don't count". [Guido] > Of course. Same in my proposal. OK -- that wasn't clear. > But I wouldn't call it "by magic" -- just "on behalf of the garbage > collector". Yup, magically called . >> The Java rules add up to quite a confusing mish-mash. Python's >> rules are *currently* clearer. > I don't find the Java rules confusing. "add up" == "taken as a whole"; include the Java spec's complex state machine for cleanup semantics, and the later complications added by three (four?) distinct flavors of weak reference, and I doubt 1 Java programmer in 1,000 actually understands the rules. This is why I'm wary of moving in the Java *direction* here. Note that Java programmers in past c.l.py threads have generally claimed Java's finalizers are so confusing & unpredictable they don't use them at all! Which, in the end, is probably a good idea in Python too <0.5 wink>. > It seems quite useful that the GC promises to call the finalizer at > most once -- this can simplify the finalizer logic. Granting that explicit calls are "use at your own risk", the only user-visible effect of "called only once" is in the presence of resurrection. Now in my Python experience, on the few occasions I've resurrected an object in __del__, *of course* I expected __del__ to get called again if the object is about to die again! Typical: def __del__(self): if oops_i_still_need_to_stay_alive: resurrect(self) else: # really going away release(self.critical_resource) Call __del__ only once, and code like this is busted bigtime. OTOH, had I written __del__ logic that relied on being called only once, switching the implementation to call it more than once would break *that* bigtime. Neither behavior is an obvious all-cases win to me, or even a plausibly most-cases win. But Python already took a stand on this & so I think you need a *good* reason to change semantics now. > ... > Sure, but the rule "if __init__ fails, __del__ won't be called" means > that we don't have to program our __init__ or __del__ quite so > defensively. Most people who design a __del__ probably assume that > __init__ has run to completion. ... This is (or can easily be made) a separate issue, & I agreed the first time this seems worth fixing (although if nobody has griped about it in a decade of use, it's hard to call it a major bug ). > ... > Sure -- but I would argue that when __del__ returns, >__instance_construction_completed should be reset to false, because > the destruction (conceptually, at least) cancels out the construction! In the __del__ above (which is typical of the cases of resurrection I've seen), there is no such implication. Perhaps this is philosophical abuse of Python's intent, but if so it relied only on trusting its advertised semantics. > I think that the proposed change probably *fixes* much morecode that > is subtly wrong than it breaks code that is relying on __del__ being > called after a partial __init__. Yes, again, I have no argument against refusing to call __del__ unless __init__ succeeded. Going beyond that to a new "called at most once" rule is indeed going beyond that, *will* break reasonable old code, and holds no particular attraction that I can see (it trades making one kind of resurrection scenario easier at the cost of making other kinds harder). If there needs to be incompatible change here, curiously enough I'd be more in favor of making resurrection illegal period (which could *really* simplify gc's headaches). > All the rules relating to __del__ are confusing (e.g. what __del__ can > expect to survive in its globals). Problems unique to final shutdown don't seem relevant here. > Also note Ping's observation: ... I can't agree with that yet another time without being quadruply redundant . From guido at python.org Fri Mar 3 17:50:08 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 03 Mar 2000 11:50:08 -0500 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Your message of "Wed, 01 Mar 2000 00:44:10 EST." <200003010544.AAA13155@eric.cnri.reston.va.us> References: <20000229153421.A16502@acs.ucalgary.ca> <200003010544.AAA13155@eric.cnri.reston.va.us> Message-ID: <200003031650.LAA21647@eric.cnri.reston.va.us> We now have two implementations of Eric Tiedemann's idea: Neil and I both implemented it. It's too soon to post the patch sets (both are pretty rough) but I've got another design question. Once we've identified a bunch of objects that are only referring to each other (i.e., one or more cycles) we have to dispose of them. The question is, how? We can't just call free on each of the objects; some may not be allocated with malloc, and some may contain pointers to other malloc'ed memory that also needs to be freed. So we have to get their destructors involved. But how? Calling ob->ob_type->tp_dealloc(ob) for an object who reference count is unsafe -- this will destroy the object while there are still references to it! Those references are all coming from other objects that are part of the same cycle; those objects will also be deallocated and they will reference the deallocated objects (if only to DECREF them). Neil uses the same solution that I use when finalizing the Python interpreter -- find the dictionaries and call PyDict_Clear() on them. (In his unpublished patch, he also clears the lists using PyList_SetSlice(list, 0, list->ob_size, NULL). He's also generalized so that *every* object can define a tp_clear function in its type object.) As long as every cycle contains at least one dictionary or list object, this will break cycles reliably and get rid of all the garbage. (If you wonder why: clearing the dict DECREFs the next object(s) in the cycle; if the last dict referencing a particular object is cleared, the last DECREF will deallocate that object, which will in turn DECREF the objects it references, and so forth. Since none of the objects in the cycle has incoming references from outside the cycle, we can prove that this will delete all objects as long as there's a dict or list in each cycle. However, there's a snag. It's the same snag as what finalizing the Python interpreter runs into -- it has to do with __del__ methods and the undefined order in which the dictionaries are cleared. For example, it's quite possible that the first dictionary we clear is the __dict__ of an instance, so this zaps all its instance variables. Suppose this breaks the cycle, so then the instance itself gets DECREFed to zero. Its deallocator will be called. If it's got a __del__, this __del__ will be called -- but all the instance variables have already been zapped, so it will fail miserably! It's also possible that the __dict__ of a class involved in a cycle gets cleared first, in which case the __del__ no longer "exists", and again the cleanup is skipped. So the question is: What to *do*? My solution is to make an extra pass over all the garbage objects *before* we clear dicts and lists, and for those that are instances and have __del__ methods, call their __del__ ("by magic", as Tim calls it in another post). The code in instance_dealloc() already does the right thing here: it calls __del__, then discovers that the reference count is > 0 ("I'm not dead yet" :-), and returns without freeing the object. (This is also why I want to introduce a flag ensuring that __del__ gets called by instance_dealloc at most once: later when the instance gets DECREFed to 0, instance_dealloc is called again and will correctly free the object; but we don't want __del__ called again.) [Note for Neil: somehow I forgot to add this logic to the code; in_del_called isn't used! The change is obvious though.] This still leaves a problem for the user: if two class instances reference each other and both have a __del__, we can't predict whose __del__ is called first when they are called as part of cycle collection. The solution is to write each __del__ so that it doesn't depend on the other __del__. Someone (Tim?) in the past suggested a different solution (probably found in another language): for objects that are collected as part of a cycle, the destructor isn't called at all. The memory is freed (since it's no longer reachable), but the destructor is not called -- it is as if the object lives on forever. This is theoretically superior, but not practical: when I have an object that creates a temp file, I want to be able to reliably delete the temp file in my destructor, even when I'm part of a cycle! --Guido van Rossum (home page: http://www.python.org/~guido/) From jack at oratrix.nl Fri Mar 3 17:57:54 2000 From: jack at oratrix.nl (Jack Jansen) Date: Fri, 03 Mar 2000 17:57:54 +0100 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Message by Guido van Rossum , Fri, 03 Mar 2000 11:50:08 -0500 , <200003031650.LAA21647@eric.cnri.reston.va.us> Message-ID: <20000303165755.490EA371868@snelboot.oratrix.nl> The __init__ rule for calling __del__ has me confused. Is this per-class or per-object? I.e. what will happen in the following case: class Purse: def __init__(self): self.balance = WithdrawCashFromBank(1000) def __del__(self): PutCashBackOnBank(self.balance) self.balance = 0 class LossyPurse(Purse): def __init__(self): Purse.__init__(self) raise 'kaboo! kaboo!' If the new scheme means that the __del__ method of Purse isn't called I think I don't like it. In the current scheme I can always program defensively: def __del__(self): try: b = self.balance self.balance = 0 except AttributeError: pass else: PutCashBackOnBank(b) but in a new scheme with a per-object "__del__ must be called" flag I can't... -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From guido at python.org Fri Mar 3 18:05:00 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 03 Mar 2000 12:05:00 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: Your message of "Fri, 03 Mar 2000 11:49:52 EST." <000501bf8530$7f8c78a0$b0a0143f@tim> References: <000501bf8530$7f8c78a0$b0a0143f@tim> Message-ID: <200003031705.MAA21700@eric.cnri.reston.va.us> OK, so we're down to this one point: if __del__ resurrects the object, should __del__ be called again later? Additionally, should resurrection be made illegal? I can easily see how __del__ could *accidentally* resurrect the object as part of its normal cleanup -- e.g. you make a call to some other routine that helps with the cleanup, passing self as an argument, and this other routine keeps a helpful cache of the last argument for some reason. I don't see how we could forbid this type of resurrection. (What are you going to do? You can't raise an exception from instance_dealloc, since it is called from DECREF. You can't track down the reference and replace it with a None easily.) In this example, the helper routine will eventually delete the object from its cache, at which point it is truly deleted. It would be harmful, not helpful, if __del__ was called again at this point. Now, it is true that the current docs for __del__ imply that resurrection is possible. The intention of that note was to warn __del__ writers that in the case of accidental resurrection __del__ might be called again. The intention certainly wasn't to allow or encourage intentional resurrection. Would there really be someone out there who uses *intentional* resurrection? I severely doubt it. I've never heard of this. [Jack just finds a snag] > The __init__ rule for calling __del__ has me confused. Is this per-class or > per-object? > > I.e. what will happen in the following case: > > class Purse: > def __init__(self): > self.balance = WithdrawCashFromBank(1000) > > def __del__(self): > PutCashBackOnBank(self.balance) > self.balance = 0 > > class LossyPurse(Purse): > def __init__(self): > Purse.__init__(self) > raise 'kaboo! kaboo!' > > If the new scheme means that the __del__ method of Purse isn't called I think > I don't like it. In the current scheme I can always program defensively: > def __del__(self): > try: > b = self.balance > self.balance = 0 > except AttributeError: > pass > else: > PutCashBackOnBank(b) > but in a new scheme with a per-object "__del__ must be called" flag I can't... Yes, that's a problem. But there are other ways for the subclass to break the base class's invariant (e.g. it could override __del__ without calling the base class' __del__). So I think it's a red herring. In Python 3000, typechecked classes may declare invariants that are enforced by the inheritance mechanism; then we may need to keep track which base class constructors succeeded and only call corresponding destructors. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Fri Mar 3 19:17:11 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 03 Mar 2000 19:17:11 +0100 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? References: <000501bf8530$7f8c78a0$b0a0143f@tim> <200003031705.MAA21700@eric.cnri.reston.va.us> Message-ID: <38C001A7.6CF8F365@lemburg.com> Guido van Rossum wrote: > > OK, so we're down to this one point: if __del__ resurrects the object, > should __del__ be called again later? Additionally, should > resurrection be made illegal? Yes and no :-) One example comes to mind: implementations of weak references, which manage weak object references themselves (as soon as __del__ is called the weak reference implementation takes over the object). Another example is that of free list like implementations which reduce object creation times by implementing smart object recycling, e.g. objects could keep allocated dictionaries alive or connections to databases open, etc. As for the second point: Calling __del__ again is certainly needed to keep application logic sane... after all, __del__ should be called whenever the refcount reaches 0 -- and that can happend more than once in the objects life-time if reanimation occurs. > I can easily see how __del__ could *accidentally* resurrect the object > as part of its normal cleanup -- e.g. you make a call to some other > routine that helps with the cleanup, passing self as an argument, and > this other routine keeps a helpful cache of the last argument for some > reason. I don't see how we could forbid this type of resurrection. > (What are you going to do? You can't raise an exception from > instance_dealloc, since it is called from DECREF. You can't track > down the reference and replace it with a None easily.) > In this example, the helper routine will eventually delete the object > from its cache, at which point it is truly deleted. It would be > harmful, not helpful, if __del__ was called again at this point. I'd say this is an application logic error -- nothing that the mechanism itself can help with automagically. OTOH, turning multi calls to __del__ off, would make certain techniques impossible. > Now, it is true that the current docs for __del__ imply that > resurrection is possible. The intention of that note was to warn > __del__ writers that in the case of accidental resurrection __del__ > might be called again. The intention certainly wasn't to allow or > encourage intentional resurrection. I don't think that docs are the right argument here ;-) It is simply the reference counting logic that plays its role: __del__ is called when refcount reaches 0, which usually means that the object is about to be garbage collected... unless the object is rereferenced by some other object and thus gets reanimated. > Would there really be someone out there who uses *intentional* > resurrection? I severely doubt it. I've never heard of this. BTW, I can't see what the original question has to do with this discussion ... calling __del__ only after successful __init__ is ok, IMHO, but what does this have to do with the way __del__ itself is implemented ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Fri Mar 3 19:30:36 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 03 Mar 2000 19:30:36 +0100 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? References: <20000229153421.A16502@acs.ucalgary.ca> <200003010544.AAA13155@eric.cnri.reston.va.us> <200003031650.LAA21647@eric.cnri.reston.va.us> Message-ID: <38C004CC.1FE0A501@lemburg.com> [Guido about ways to cleanup cyclic garbage] FYI, I'm using a special protocol for disposing of cyclic garbage: the __cleanup__ protocol. The purpose of this call is probably similar to Neil's tp_clear: it is intended to let objects break possible cycles in their own storage scope, e.g. instances can delete instance variables which they know can cause cyclic garbage. The idea is simple: give all power to the objects rather than try to solve everything with one magical master plan. The mxProxy package has details on the protocol. The __cleanup__ method is called by the Proxy when the Proxy is about to be deleted. If all references to an object go through the Proxy, the __cleanup__ method call can easily break cycles to have the refcount reach zero in which case __del__ is called. Since the object knows about this scheme it can take precautions to make sure that __del__ still works after __cleanup__ was called. Anyway, just a thought... there are probably many ways to do all this. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tismer at tismer.com Fri Mar 3 19:51:55 2000 From: tismer at tismer.com (Christian Tismer) Date: Fri, 03 Mar 2000 19:51:55 +0100 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? References: <000501bf8530$7f8c78a0$b0a0143f@tim> <200003031705.MAA21700@eric.cnri.reston.va.us> Message-ID: <38C009CB.72BD49CA@tismer.com> Guido van Rossum wrote: > > OK, so we're down to this one point: if __del__ resurrects the object, > should __del__ be called again later? Additionally, should > resurrection be made illegal? [much stuff] Just a random note: What if we had a __del__ with zombie behavior? Assume an instance that is about to be destructed. Then __del__ is called via normal method lookup. What we want is to let this happen only once. Here the Zombie: After method lookup, place a dummy __del__ into the to-be-deleted instance dict, and we are sure that this does not harm. Kinda "yes its there, but a broken link ". The zombie always works by doing nothing. Makes some sense? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From gstein at lyra.org Sat Mar 4 00:09:48 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 3 Mar 2000 15:09:48 -0800 (PST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> Message-ID: You may as well remove the entire "vi" concept from ConfigParser. Since "vi" can be *only* a '=' or ':', then you aren't truly checking anything in the "if" statement. Further, "vi" is used nowhere else, so that variable and the corresponding regex group can be nuked altogether. IMO, I'm not sure why the ";" comment form was initially restricted to just one option format in the first place. Cheers, -g On Fri, 3 Mar 2000, Jeremy Hylton wrote: > Update of /projects/cvsroot/python/dist/src/Lib > In directory bitdiddle:/home/jhylton/python/src/Lib > > Modified Files: > ConfigParser.py > Log Message: > allow comments beginning with ; in key: value as well as key = value > > > Index: ConfigParser.py > =================================================================== > RCS file: /projects/cvsroot/python/dist/src/Lib/ConfigParser.py,v > retrieving revision 1.16 > retrieving revision 1.17 > diff -C2 -r1.16 -r1.17 > *** ConfigParser.py 2000/02/28 23:23:55 1.16 > --- ConfigParser.py 2000/03/03 20:43:57 1.17 > *************** > *** 359,363 **** > optname, vi, optval = mo.group('option', 'vi', 'value') > optname = string.lower(optname) > ! if vi == '=' and ';' in optval: > # ';' is a comment delimiter only if it follows > # a spacing character > --- 359,363 ---- > optname, vi, optval = mo.group('option', 'vi', 'value') > optname = string.lower(optname) > ! if vi in ('=', ':') and ';' in optval: > # ';' is a comment delimiter only if it follows > # a spacing character > > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://www.python.org/mailman/listinfo/python-checkins > -- Greg Stein, http://www.lyra.org/ From jeremy at cnri.reston.va.us Sat Mar 4 00:15:32 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Fri, 3 Mar 2000 18:15:32 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> Message-ID: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> Thanks for catching that. I didn't look at the context. I'm going to wait, though, until I talk to Fred to mess with the code any more. General question for python-dev readers: What are your experiences with ConfigParser? I just used it to build a simple config parser for IDLE and found it hard to use for several reasons. The biggest problem was that the file format is undocumented. I also found it clumsy to have to specify section and option arguments. I ended up writing a proxy that specializes on section so that get takes only an option argument. It sounds like ConfigParser code and docs could use a general cleanup. Are there any other issues to take care of as part of that cleanup? Jeremy From gstein at lyra.org Sat Mar 4 00:35:09 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 3 Mar 2000 15:35:09 -0800 (PST) Subject: [Python-Dev] ConfigParser stuff (was: CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17) In-Reply-To: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> Message-ID: On Fri, 3 Mar 2000, Jeremy Hylton wrote: > Thanks for catching that. I didn't look at the context. I'm going to > wait, though, until I talk to Fred to mess with the code any more. Not a problem. I'm glad that diffs are now posted to -checkins. :-) > General question for python-dev readers: What are your experiences > with ConfigParser? Love it! > I just used it to build a simple config parser for > IDLE and found it hard to use for several reasons. The biggest > problem was that the file format is undocumented. In my most complex use of ConfigParser, I had to override SECTCRE to allow periods in the section name. Of course, that was quite interesting since the variable is __SECTRE in 1.5.2 (i.e. I had to compensate for the munging). I also change OPTCRE to allow a few more charaters ("@" in particular, which even the update doesn't do). Not a problem nowadays since those are public. My subclass also defines a set() method and a delsection() method. These are used because I write the resulting changes back out to a file. It might be nice to have a method which writes out a config file (with an "AUTOGENERATED BY ConfigParser.py -- DO NOT EDIT BY HAND"; or maybe "... BY ..."). > I also found it > clumsy to have to specify section and option arguments. I found these were critical in my application. I also take advantage of the sections in my "edna" application for logical organization. > I ended up > writing a proxy that specializes on section so that get takes only an > option argument. > > It sounds like ConfigParser code and docs could use a general cleanup. > Are there any other issues to take care of as part of that cleanup? A set() method and a writefile() type of method would be nice. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim_one at email.msn.com Sat Mar 4 02:38:43 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 3 Mar 2000 20:38:43 -0500 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: <200003031650.LAA21647@eric.cnri.reston.va.us> Message-ID: <000001bf857a$60b45ac0$c6a0143f@tim> [Guido] > ... > Someone (Tim?) in the past suggested a different solution (probably > found in another language): for objects that are collected as part of > a cycle, the destructor isn't called at all. The memory is freed > (since it's no longer reachable), but the destructor is not called -- > it is as if the object lives on forever. Stroustrup has written in favor of this for C++. It's exactly the kind of overly slick "good argument" he would never accept from anyone else <0.1 wink>. > This is theoretically superior, but not practical: when I have an > object that creates a temp file, I want to be able to reliably delete > the temp file in my destructor, even when I'm part of a cycle! A member of the C++ committee assured me Stroustrup is overwhelmingly opposed on this. I don't even agree it's theoretically superior: it relies on the fiction that gc "may never occur", and that's just silly in practice. You're moving down the Java path. I can't possibly do a better job of explaining the Java rules than the Java Language Spec. does for itself. So pick that up and study section 12.6 (Finalization of Class Instances). The end result makes little sense to users, but is sufficient to guarantee that Java itself never blows up. Note, though, that there is NO good answer to finalizers in cycles! The implementation cannot be made smart enough to both avoid trouble and "do the right thing" from the programmer's POV, because the latter is unknowable. Somebody has to lose, one way or another. Rather than risk doing a wrong thing, the BDW collector lets cycles with finalizers leak. But it also has optional hacks to support exceptions for use with C++ (which sometimes creates self-cycles) and Java. See http://reality.sgi.com/boehm_mti/finalization.html for Boehm's best concentrated thoughts on the subject. The only principled approach I know of comes out of the Scheme world. Scheme has no finalizers, of course. But it does have gc, and the concept of "guardians" was invented to address all gc finalization problems in one stroke. It's extremely Scheme-like in providing a perfectly general mechanism with no policy whatsoever. You (the Scheme programmer) can create guardian objects, and "register" other objects with a guardian. At any time, you can ask a guardian whether some object registered with it is "ready to die" (i.e., the only thing keeping it alive is its registration with the guardian). If so, you can ask it to give you one. Everything else is up to you: if you want to run a finalizer, your problem. If there are cycles, also your problem. Even if there are simple non-cyclic dependencies, your problem. Etc. So those are the extremes: BDW avoids blame by refusing to do anything. Java avoids blame by exposing an impossibly baroque implementation-driven finalization model. Scheme avoids blame by refusing to do anything "by magic", but helps you to shoot yourself with the weapon of your choice. That bad news is that I don't know of a scheme *not* at an extreme! It's extremely un-Pythonic to let things leak (despite that it has let things leak for a decade ), but also extremely un-Pythonic to make some wild-ass guess. So here's what I'd consider doing: explicit is better than implicit, and in the face of ambiguity refuse the temptation to guess. If a trash cycle contains a finalizer (my, but that has to be rare. in practice, in well-designed code!), don't guess, but make it available to the user. A gc.guardian() call could expose such beasts, or perhaps a callback could be registered, invoked when gc finds one of these things. Anyone crazy enough to create cyclic trash with finalizers then has to take responsibility for breaking the cycle themself. This puts the burden on the person creating the problem, and they can solve it in the way most appropriate to *their* specific needs. IOW, the only people who lose under this scheme are the ones begging to lose, and their "loss" consists of taking responsibility. when-a-problem-is-impossible-to-solve-favor-sanity-ly y'rs - tim From gstein at lyra.org Sat Mar 4 03:59:26 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 3 Mar 2000 18:59:26 -0800 (PST) Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: <000001bf857a$60b45ac0$c6a0143f@tim> Message-ID: On Fri, 3 Mar 2000, Tim Peters wrote: >... > Note, though, that there is NO good answer to finalizers in cycles! The "Note" ?? Not just a note, but I'd say an axiom :-) By definition, you have two objects referring to each other in some way. How can you *definitely* know how to break the link between them? Do you call A's finalizer or B's first? If they're instances, do you just whack their __dict__ and hope for the best? >... > So here's what I'd consider doing: explicit is better than implicit, and in > the face of ambiguity refuse the temptation to guess. If a trash cycle > contains a finalizer (my, but that has to be rare. in practice, in > well-designed code!), don't guess, but make it available to the user. A > gc.guardian() call could expose such beasts, or perhaps a callback could be > registered, invoked when gc finds one of these things. Anyone crazy enough > to create cyclic trash with finalizers then has to take responsibility for > breaking the cycle themself. This puts the burden on the person creating > the problem, and they can solve it in the way most appropriate to *their* > specific needs. IOW, the only people who lose under this scheme are the > ones begging to lose, and their "loss" consists of taking responsibility. I'm not sure if Tim is saying the same thing, but I'll write down a concreate idea for cleaning garbage cycles. First, a couple observations: * Some objects can always be reliably "cleaned": lists, dicts, tuples. They just drop their contents, with no invocations against any of them. Note that an instance without a __del__ has no opinion on how it is cleaned. (this is related to Tim's point about whether a cycle has a finalizer) * The other objects may need to *use* their referenced objects in some way to clean out cycles. Since the second set of objects (possibly) need more care during their cleanup, we must concentrate on how to solve their problem. Back up a step: to determine where an object falls, let's define a tp_clean type slot. It returns an integer and takes one parameter: an operation integer. Py_TPCLEAN_CARE_CHECK /* check whether care is needed */ Py_TPCLEAN_CARE_EXEC /* perform the careful cleaning */ Py_TPCLEAN_EXEC /* perform a non-careful cleaning */ Given a set of objects that require special cleaning mechanisms, there is no way to tell where to start first. So... just pick the first one. Call its tp_clean type slot with CARE_EXEC. For instances, this maps to __clean__. If the instance does not have a __clean__, then tp_clean returns FALSE meaning that it could not clean this object. The algorithm moves on to the next object in the set. If tp_clean returns TRUE, then the object has been "cleaned" and is moved to the "no special care needed" list of objects, awaiting its reference count to hit zero. Note that objects in the "care" and "no care" lists may disappear during the careful-cleaning process. If the careful-cleaning algorithm hits the end of the careful set of objects and the set is non-empty, then throw an exception: GCImpossibleError. The objects in this set each said they could not be cleaned carefully AND they were not dealloc'd during other objects' cleaning. [ it could be possible to define a *dynamic* CARE_EXEC that will succeed if you call it during a second pass; I'm not sure this is a Good Thing to allow, however. ] This also implies that a developer should almost *always* consider writing a __clean__ method whenever they write a __del__ method. That method MAY be called when cycles need to be broken; the object should delete any non-essential variables in such a way that integrity is retained (e.g. it fails gracefully when methods are called and __del__ won't raise an error). For example, __clean__ could call a self.close() to shut down its operation. Whatever... you get the idea. At the end of the iteration of the "care" set, then you may have objects remaining in the "no care" set. By definition, these objects don't care about their internal references to other objects (they don't need them during deallocation). We iterate over this set, calling tp_clean(EXEC). For lists, dicts, and tuples, the tp_clean(EXEC) call simply clears out the references to other objects (but does not dealloc the object!). Again: objects in the "no care" set will go away during this process. By the end of the iteration over the "no care" set, it should be empty. [ note: the iterations over these sets should probably INCREF/DECREF across the calls; otherwise, the object could be dealloc'd during the tp_clean call. ] [ if the set is NOT empty, then tp_clean(EXEC) did not remove all possible references to other objects; not sure what this means. is it an error? maybe you just force a tp_dealloc on the remaining objects. ] Note that the tp_clean mechanism could probably be used during the Python finalization, where Python does a bunch of special-casing to clean up modules. Specifically: a module does not care about its contents during its deallocation, so it is a "no care" object; it responds to tp_clean(EXEC) by clearing its dictionary. Class objects are similar: they can clear their dict (which contains a module reference which usually causes a loop) during tp_clean(EXEC). Module cleanup is easy once objects with CARE_CHECK have been handled -- all that funny logic in there is to deal with "care" objects. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim_one at email.msn.com Sat Mar 4 04:26:54 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 3 Mar 2000 22:26:54 -0500 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Message-ID: <000401bf8589$7d1364e0$c6a0143f@tim> [Tim] > Note, though, that there is NO good answer to finalizers in cycles! The [Greg Stein] > "Note" ?? Not just a note, but I'd say an axiom :-) An axiom is accepted without proof: we have plenty of proof that there's no thoroughly good answer (i.e., every language that has ever addressed this issue -- along with every language that ever will ). > By definition, you have two objects referring to each other in some way. > How can you *definitely* know how to break the link between them? Do you > call A's finalizer or B's first? If they're instances, do you just whack > their __dict__ and hope for the best? Exactly. The *programmer* may know the right thing to do, but the Python implementation can't possibly know. Facing both facts squarely constrains the possibilities to the only ones that are all of understandable, predictable and useful. Cycles with finalizers must be a Magic-Free Zone else you lose at least one of those three: even Guido's kung fu isn't strong enough to outguess this. [a nice implementation sketch, of what seems an overly elaborate scheme, if you believe cycles with finalizers are rare in intelligently designed code) ] Provided Guido stays interested in this, he'll make his own fun. I'm just inviting him to move in a sane direction <0.9 wink>. One caution: > ... > If the careful-cleaning algorithm hits the end of the careful set of > objects and the set is non-empty, then throw an exception: > GCImpossibleError. Since gc "can happen at any time", this is very severe (c.f. Guido's objection to making resurrection illegal). Hand a trash cycle back to the programmer instead, via callback or request or whatever, and it's all explicit without more cruft in the implementation. It's alive again when they get it back, and they can do anything they want with it (including resurrecting it, or dropping it again, or breaking cycles -- anything). I'd focus on the cycles themselves, not on the types of objects involved. I'm not pretending to address the "order of finalization at shutdown" question, though (although I'd agree they're deeply related: how do you follow a topological sort when there *isn't* one? well, you don't, because you can't). realistically y'rs - tim From gstein at lyra.org Sat Mar 4 09:43:45 2000 From: gstein at lyra.org (Greg Stein) Date: Sat, 4 Mar 2000 00:43:45 -0800 (PST) Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: <000401bf8589$7d1364e0$c6a0143f@tim> Message-ID: On Fri, 3 Mar 2000, Tim Peters wrote: >... > [a nice implementation sketch, of what seems an overly elaborate scheme, > if you believe cycles with finalizers are rare in intelligently designed > code) > ] Nah. Quite simple to code up, but a bit longer to explain in English :-) The hardest part is finding the cycles, but Guido already posted a long explanation about that. Once that spits out the doubly-linked list of objects, then you're set. 1) scan the list calling tp_clean(CARE_CHECK), shoving "care needed" objects to a second list 2) scan the care-needed list calling tp_clean(CARE_EXEC). if TRUE is returned, then the object was cleaned and moves to the "no care" list. 3) assert len(care-needed list) == 0 4) scan the no-care list calling tp_clean(EXEC) 5) (questionable) assert len(no-care list) == 0 The background makes it longer. The short description of the algorithm is easy. Step (1) could probably be merged right into one of the scans in the GC algorithm (e.g. during the placement into the "these are cyclical garbage" list) > Provided Guido stays interested in this, he'll make his own fun. I'm just > inviting him to move in a sane direction <0.9 wink>. hehe... Agreed. > One caution: > > > ... > > If the careful-cleaning algorithm hits the end of the careful set of > > objects and the set is non-empty, then throw an exception: > > GCImpossibleError. > > Since gc "can happen at any time", this is very severe (c.f. Guido's > objection to making resurrection illegal). GCImpossibleError would simply be a subclass of MemoryError. Makes sense to me, and definitely allows for its "spontaneity." > Hand a trash cycle back to the > programmer instead, via callback or request or whatever, and it's all > explicit without more cruft in the implementation. It's alive again when > they get it back, and they can do anything they want with it (including > resurrecting it, or dropping it again, or breaking cycles -- anything). I'd > focus on the cycles themselves, not on the types of objects involved. I'm > not pretending to address the "order of finalization at shutdown" question, > though (although I'd agree they're deeply related: how do you follow a > topological sort when there *isn't* one? well, you don't, because you > can't). I disagree. I don't think a Python-level function is going to have a very good idea of what to do. IMO, this kind of semantics belong down in the interpreter with a specific, documented algorithm. Throwing it out to Python won't help -- that function will still have to use a "standard pattern" for getting the cyclical objects to toss themselves. I think that standard pattern should be a language definition. Without a standard pattern, then you're saying the application will know what to do, but that is kind of weird -- what happens when an unexpected cycle arrives? Cheers, -g -- Greg Stein, http://www.lyra.org/ From moshez at math.huji.ac.il Sat Mar 4 10:50:19 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 4 Mar 2000 11:50:19 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> Message-ID: On Fri, 3 Mar 2000, Jeremy Hylton wrote: > It sounds like ConfigParser code and docs could use a general cleanup. > Are there any other issues to take care of as part of that cleanup? One thing that bothered me once: I want to be able to have something like: [section] tag = 1 tag = 2 And be able to retrieve ("section", "tag") -> ["1", "2"]. Can be awfully useful for things that make sense several time. Perhaps there should be two functions, one that reads a single-tag and one that reads a multi-tag? File format: I'm sure I'm going to get yelled at, but why don't we make it XML? Hard to edit, yadda, yadda, but you can easily write a special purpose widget to edit XConfig (that's what we'll call the DTD) files. hopefull-yet-not-naive-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From gstein at lyra.org Sat Mar 4 11:05:15 2000 From: gstein at lyra.org (Greg Stein) Date: Sat, 4 Mar 2000 02:05:15 -0800 (PST) Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: Message-ID: On Sat, 4 Mar 2000, Moshe Zadka wrote: > On Fri, 3 Mar 2000, Jeremy Hylton wrote: > > It sounds like ConfigParser code and docs could use a general cleanup. > > Are there any other issues to take care of as part of that cleanup? > > One thing that bothered me once: > > I want to be able to have something like: > > [section] > tag = 1 > tag = 2 > > And be able to retrieve ("section", "tag") -> ["1", "2"]. > Can be awfully useful for things that make sense several time. > Perhaps there should be two functions, one that reads a single-tag and > one that reads a multi-tag? Structured values would be nice. Several times, I've needed to decompose the right hand side into lists. > File format: I'm sure I'm going to get yelled at, but why don't we > make it XML? Hard to edit, yadda, yadda, but you can easily write a > special purpose widget to edit XConfig (that's what we'll call the DTD) > files. Write a whole new module. ConfigParser is for files that look like the above. There isn't a reason to NOT use XML, but it shouldn't go into ConfigParser. I find the above style much easier for *humans*, than an XML file, to specify options. XML is good for computers; not so good for humans. Cheers, -g -- Greg Stein, http://www.lyra.org/ From moshez at math.huji.ac.il Sat Mar 4 11:46:40 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 4 Mar 2000 12:46:40 +0200 (IST) Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: <000001bf857a$60b45ac0$c6a0143f@tim> Message-ID: [Tim Peters] > ...If a trash cycle > contains a finalizer (my, but that has to be rare. in practice, in > well-designed code!), This shows something Tim himself has often said -- he never programmed a GUI. It's very hard to build a GUI (especially with Tkinter) which is cycle-less, but the classes implementing the GUI often have __del__'s to break system-allocated resources. So, it's not as rare as we would like to believe, which is the reason I haven't given this answer. which-is-not-the-same-thing-as-disagreeing-with-it-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From moshez at math.huji.ac.il Sat Mar 4 12:16:19 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 4 Mar 2000 13:16:19 +0200 (IST) Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Message-ID: On Sat, 4 Mar 2000, Greg Stein wrote: > I disagree. I don't think a Python-level function is going to have a very > good idea of what to do Much better then the Python interpreter... > Throwing it out to Python won't help > what happens when an unexpected cycle arrives? Don't delete it. It's as simple as that, since it's a bug. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From moshez at math.huji.ac.il Sat Mar 4 12:29:33 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 4 Mar 2000 13:29:33 +0200 (IST) Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: Message-ID: On Sat, 4 Mar 2000, Greg Stein wrote: > Write a whole new module. ConfigParser is for files that look like the > above. Gotcha. One problem: two configurations modules might cause the classic "which should I use?" confusion. > > I find the above style much easier for *humans*, than an XML file, to > specify options. XML is good for computers; not so good for humans. > Of course: what human could delimit his text with and ? oh-no-another-c.l.py-bot-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From gstein at lyra.org Sat Mar 4 12:38:46 2000 From: gstein at lyra.org (Greg Stein) Date: Sat, 4 Mar 2000 03:38:46 -0800 (PST) Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Message-ID: On Sat, 4 Mar 2000, Moshe Zadka wrote: > On Sat, 4 Mar 2000, Greg Stein wrote: > > I disagree. I don't think a Python-level function is going to have a very > > good idea of what to do > > > Much better then the Python interpreter... If your function receives two instances (A and B), what are you going to do? How can you know what their policy is for cleaning up in the face of a cycle? I maintain that you would call the equivalent of my proposed __clean__. There isn't much else you'd be able to do, unless you had a completely closed system, you expected cycles between specific types of objects, and you knew a way to clean them up. Even then, you would still be calling something like __clean__ to let the objects do whatever they needed. I'm suggesting that __clean__ should be formalized (as part of tp_clean). Throwing the handling "up to Python" isn't going to do much for you. Seriously... I'm all for coding more stuff in Python rather than C, but this just doesn't feel right. Getting the objects GC'd is a language feature, and a specific pattern/method/recommendation is best formulated as an interpreter mechanism. > > > Throwing it out to Python won't help > > > what happens when an unexpected cycle arrives? > > Don't delete it. > It's as simple as that, since it's a bug. The point behind this stuff is to get rid of it, rather than let it linger on. If the objects have finalizers (which is how we get to this step!), then it typically means there is a resource they must release. Getting the object cleaned and dealloc'd becomes quite important. Cheers, -g p.s. did you send in a patch for the instance_contains() thing yet? -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sat Mar 4 12:43:12 2000 From: gstein at lyra.org (Greg Stein) Date: Sat, 4 Mar 2000 03:43:12 -0800 (PST) Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: Message-ID: On Sat, 4 Mar 2000, Moshe Zadka wrote: > On Sat, 4 Mar 2000, Greg Stein wrote: > > Write a whole new module. ConfigParser is for files that look like the > > above. > > Gotcha. > > One problem: two configurations modules might cause the classic "which > should I use?" confusion. Nah. They wouldn't *both* be called ConfigParser. And besides, I see the XML format more as a persistence mechanism rather than a configuration mechanism. I'd call the module something like "XMLPersist". > > > > I find the above style much easier for *humans*, than an XML file, to > > specify options. XML is good for computers; not so good for humans. > > > > Of course: what human could delimit his text with and ? Feh. As a communciation mechanism, dropping in that stuff... it's easy. ButI wouldnotwant ... bleck. I wouldn't want to use XML for configuration stuff. It just gets ugly. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gvwilson at nevex.com Sat Mar 4 17:46:24 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Sat, 4 Mar 2000 11:46:24 -0500 (EST) Subject: [Python-Dev] HTMLgen-style interface to SQL? Message-ID: [short form] I'm looking for an object-oriented toolkit that will do for SQL what Perl's CGI.pm module, or Python's HTMLgen, does for HTML. Pointers, examples, or expressions of interest would be welcome. [long form] Lincoln Stein's CGI.pm module for Perl allows me to build HTML in an object-oriented way, instead of getting caught in the Turing tarpit of string substitution and printf. DOM does the same (in a variety of languages) for XML. Right now, if I want to interact with an SQL database from Perl or Python, I have to embed SQL strings in my programs. I would like to have a DOM-like ability to build and manipulate queries as objects, then call a method that translate the query structure into SQL to send to the database. Alternatively, if there is an XML DTD for SQL (how's that for a chain of TLAs?), and some tool to convert the XML/SQL to pure SQL, so that I could build my query using DOM, that would be cool too. RSVP, Greg Wilson gvwilson at nevex.com From moshez at math.huji.ac.il Sat Mar 4 19:02:54 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 4 Mar 2000 20:02:54 +0200 (IST) Subject: [Python-Dev] Re: [Patches] selfnanny.py: checking for "self" in every method In-Reply-To: <200003041724.MAA05053@eric.cnri.reston.va.us> Message-ID: On Sat, 4 Mar 2000, Guido van Rossum wrote: > Before we all start writing nannies and checkers, how about a standard > API design first? I thoroughly agree -- we should have a standard API. I tried to write selfnanny so it could be callable from any API possible (e.g., it can take either a file, a string, an ast or a tuple representation) > I will want to call various nannies from a "Check" > command that I plan to add to IDLE. Very cool: what I imagine is a sort of modular PyLint. > I already did this with tabnanny, > and found that it's barely possible -- it's really written to run like > a script. Mine definitely isn't: it's designed to run both like a script and like a module. One outstanding bug: no docos. To be supplied upon request <0.5 wink>. I just wanted to float it out and see if people think that this particular nanny is worth while. > Since parsing is expensive, we probably want to share the parse tree. Yes. Probably as an AST, and transform to tuples/lists inside the checkers. > Ideas? Here's a strawman API: There's a package called Nanny Every module in that package should have a function called check_ast. It's argument is an AST object, and it's output should be a list of three-tuples: (line-number, error-message, None) or (line-number, error-message, (column-begin, column-end)) (each tuple can be a different form). Problems? (I'm CCing to python-dev. Please follow up to that discussion to python-dev only, as I don't believe it belongs in patches) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From gvwilson at nevex.com Sat Mar 4 19:26:20 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Sat, 4 Mar 2000 13:26:20 -0500 (EST) Subject: [Python-Dev] Re: selfnanny.py / nanny architecture In-Reply-To: Message-ID: > > Guido van Rossum wrote: > > Before we all start writing nannies and checkers, how about a standard > > API design first? > Moshe Zadka wrote: > Here's a strawman API: > There's a package called Nanny > Every module in that package should have a function called check_ast. > It's argument is an AST object, and it's output should be a list > of three-tuples: (line-number, error-message, None) or > (line-number, error-message, (column-begin, column-end)) (each tuple can > be a different form). Greg Wilson wrote: The SUIF (Stanford University Intermediate Format) group has been working on an extensible compiler framework for about ten years now. The framework is based on an extensible AST spec; anyone can plug in a new analysis or optimization algorithm by writing one or more modules that read and write decorated ASTs. (See http://suif.stanford.edu for more information.) Based on their experience, I'd suggest that every nanny take an AST as an argument, and add complaints in place as decorations to the nodes. A terminal nanny could then collect these and display them to the user. I think this architecture will make it simpler to write meta-nannies. I'd further suggest that the AST be something that can be manipulated through DOM, since (a) it's designed for tree-crunching, (b) it's already documented reasonably well, (c) it'll save us re-inventing a wheel, and (d) generating human-readable output in a variety of customizable formats ought to be simple (well, simpler than the alternatives). Greg From jeremy at cnri.reston.va.us Sun Mar 5 03:10:28 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Sat, 4 Mar 2000 21:10:28 -0500 (EST) Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: References: Message-ID: <14529.49684.219826.466310@bitdiddle.cnri.reston.va.us> >>>>> "MZ" == Moshe Zadka writes: MZ> On Sat, 4 Mar 2000, Greg Stein wrote: >> Write a whole new module. ConfigParser is for files that look >> like the above. MZ> Gotcha. MZ> One problem: two configurations modules might cause the classic MZ> "which should I use?" confusion. I don't think this is a hard decision to make. ConfigParser is good for simple config files that are going to be maintained by humans with a text editor. An XML-based configuration file is probably the right solution when humans aren't going to maintain the config files by hand. Perhaps XML will eventually be the right solution in both cases, but only if XML editors are widely available. >> I find the above style much easier for *humans*, than an >> XML file, to specify options. XML is good for computers; not so >> good for humans. MZ> Of course: what human could delimit his text with and MZ> ? Could? I'm sure there are more ways on Linux and Windows to mark up text than are dreamt of in your philosophy, Moshe . The question is what is easiest to read and understand? Jeremy From tim_one at email.msn.com Sun Mar 5 03:22:16 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 4 Mar 2000 21:22:16 -0500 Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" in every method In-Reply-To: <200003041724.MAA05053@eric.cnri.reston.va.us> Message-ID: <000201bf8649$a17383e0$f42d153f@tim> [Guido van Rossum] > Before we all start writing nannies and checkers, how about a standard > API design first? I will want to call various nannies from a "Check" > command that I plan to add to IDLE. I already did this with tabnanny, > and found that it's barely possible -- it's really written to run like > a script. I like Moshe's suggestion fine, except with an abstract base class named Nanny with a virtual method named check_ast. Nannies should (of course) derive from that. > Since parsing is expensive, we probably want to share the parse tree. What parse tree? Python's parser module produces an AST not nearly "A enough" for reasonably productive nanny writing. GregS & BillT have improved on that, but it's not in the std distrib. Other "problems" include the lack of original source lines in the trees, and lack of column-number info. Note that by the time Python has produced a parse tree, all evidence of the very thing tabnanny is looking for has been removed. That's why she used the tokenize module to begin with. God knows tokenize is too funky to use too when life gets harder (check out checkappend.py's tokeneater state machine for a preliminary taste of that). So the *only* solution is to adopt Christian's Stackless so I can rewrite tokenize as a coroutine like God intended . Seriously, I don't know of anything that produces a reasonably usable (for nannies) parse tree now, except via modifying a Python grammar for use with John Aycock's SPARK; the latter also comes with very pleasant & powerful tree pattern-matching abilities. But it's probably too slow for everyday "just folks" use. Grabbing the GregS/BillT enhancement is probably the most practical thing we could build on right now (but tabnanny will have to remain a special case). unsure-about-the-state-of-simpleparse-on-mxtexttools-for-this-ly y'rs - tim From tim_one at email.msn.com Sun Mar 5 04:24:18 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 4 Mar 2000 22:24:18 -0500 Subject: [Python-Dev] breaking list.append() In-Reply-To: <38BE1B69.E0B88B41@lemburg.com> Message-ID: <000301bf8652$4aadaf00$f42d153f@tim> Just noting that two instances of this were found in Zope. [/F] > append = list.append > for x in something: > append(...) [Tim] > As detailed in a c.l.py posting, I have yet to find a single instance of > this actually called with multiple arguments. Pointing out that it's > *possible* isn't the same as demonstrating it's an actual problem. I'm > quite willing to believe that it is, but haven't yet seen evidence of it. From fdrake at acm.org Sun Mar 5 04:55:27 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Sat, 4 Mar 2000 22:55:27 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> Message-ID: <14529.55983.263225.691427@weyr.cnri.reston.va.us> Jeremy Hylton writes: > Thanks for catching that. I didn't look at the context. I'm going to > wait, though, until I talk to Fred to mess with the code any more. I did it that way since the .ini format allows comments after values (the ';' comments after a '=' vi; '#' comments are a ConfigParser thing), but there's no equivalent concept for RFC822 parsing, other than '(...)' in addresses. The code was trying to allow what was expected from the .ini crowd without breaking the "native" use of ConfigParser. > General question for python-dev readers: What are your experiences > with ConfigParser? I just used it to build a simple config parser for > IDLE and found it hard to use for several reasons. The biggest > problem was that the file format is undocumented. I also found it > clumsy to have to specify section and option arguments. I ended up > writing a proxy that specializes on section so that get takes only an > option argument. > > It sounds like ConfigParser code and docs could use a general cleanup. > Are there any other issues to take care of as part of that cleanup? I agree that the API to ConfigParser sucks, and I think also that the use of it as a general solution is a big mistake. It's a messy bit of code that doesn't need to be, supports a really nasty mix of syntaxes, and can easily bite users who think they're getting something .ini-like (the magic names and interpolation is a bad idea!). While it suited the original application well enough, something with .ini syntax and interpolation from a subclass would have been *much* better. I think we should create a new module, inilib, that implements exactly .ini syntax in a base class that can be intelligently extended. ConfigParser should be deprecated. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From tim_one at email.msn.com Sun Mar 5 05:11:12 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 4 Mar 2000 23:11:12 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: <200003031705.MAA21700@eric.cnri.reston.va.us> Message-ID: <000601bf8658$d81d34e0$f42d153f@tim> [Guido] > OK, so we're down to this one point: if __del__ resurrects the object, > should __del__ be called again later? Additionally, should > resurrection be made illegal? I give up on the latter, so it really is just one. > I can easily see how __del__ could *accidentally* resurrect the object > as part of its normal cleanup ... > In this example, the helper routine will eventually delete the object > from its cache, at which point it is truly deleted. It would be > harmful, not helpful, if __del__ was called again at this point. If this is something that happens easily, and current behavior is harmful, don't you think someone would have complained about it by now? That is, __del__ *is* "called again at this point" now, and has been for years & years. And if it happens easily, it *is* happening now, and in an unknown amount of existing code. (BTW, I doubt it happens at all -- people tend to write very simple __del__ methods, so far as I've ever seen) > Now, it is true that the current docs for __del__ imply that > resurrection is possible. "imply" is too weak. The Reference Manual's "3.3.1 Basic customization" flat-out says it's possible ("though not recommended"). The precise meaning of the word "may" in the following sentence is open to debate, though. > The intention of that note was to warn __del__ writers that in the case > of accidental resurrection Sorry, but I can't buy this: saying that *accidents* are "not recommended" is just too much of a stretch . > __del__ might be called again. That's a plausible reading of the following "may", but not the only one. I believe it's the one you intended, but it's not the meaning I took prior to this. > The intention certainly wasn't to allow or encourage intentional resurrection. Well, I think it plainly says it's supported ("though not recommended"). I used it intentionally at KSR, and even recommended it on c.l.py in the dim past (in one of those "dark & useless" threads ). > Would there really be someone out there who uses *intentional* > resurrection? I severely doubt it. I've never heard of this. Why would anyone tell you about something that *works*?! You rarely hear the good stuff, you know. I gave the typical pattern in the preceding msg. To flesh out the motivation more, you have some external resource that's very expensive to set up (in KSR's case, it was an IPC connection to a remote machine). Rights to use that resource are handed out in the form of an object. When a client is done using the resource, they *should* explicitly use the object's .release() method, but you can't rely on that. So the object's __del__ method looks like (for example): def __del__(self): # Code not shown to figure out whether to disconnect: the downside to # disconnecting is that it can cost a bundle to create a new connection. # If the whole app is shutting down, then of course we want to disconnect. # Or if a timestamp trace shows that we haven't been making good use of # all the open connections lately, we may want to disconnect too. if decided_to_disconnect: self.external_resource.disconnect() else: # keep the connection alive for reuse global_available_connection_objects.append(self) This is simple & effective, and it relies on both intentional resurrection and __del__ getting called repeatedly. I don't claim there's no other way to write it, just that there's *been* no problem doing this for a millennium . Note that MAL spontaneously sketched similar examples, although I can't say whether he's actually done stuff like this. Going back up a level, in another msg you finally admitted that you want "__del__ called only once" for the same reason Java wants it: because gc has no idea what to do when faced with finalizers in a trash cycle, and settles for an unprincipled scheme whose primary virtue is that "it doesn't blow up" -- and "__del__ called only once" happens to be convenient for that scheme. But toss such cycles back to the user to deal with at the Python level, and all those problems go away (along with the artificial need to change __del__). The user can break the cycles in an order that makes sense to the app (or they can let 'em leak! up to them). >>> print gc.get_cycle.__doc__ Return a list of objects comprising a single garbage cycle; [] if none. At least one of the objects has a finalizer, so Python can't determine the intended order of destruction. If you don't break the cycle, Python will neither run any finalizers for the contained objects nor reclaim their memory. If you do break the cycle, and dispose of the list, Python will follow its normal reference-counting rules for running finalizers and reclaiming memory. That this "won't blow up" either is just the least of its virtues . you-break-it-you-buy-it-ly y'rs - tim From tim_one at email.msn.com Sun Mar 5 05:56:54 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 4 Mar 2000 23:56:54 -0500 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Message-ID: <000001bf865f$3acb99a0$432d153f@tim> [Tim sez "toss insane cycles back on the user"] [Greg Stein] > I disagree. I don't think a Python-level function is going to have a very > good idea of what to do. You've already assumed that Python coders know exactly what to do, else they couldn't have coded the new __clean__ method your proposal relies on. I'm taking what strikes me as the best part of Scheme's Guardian idea: don't assume *anything* about what users "should" do to clean up their trash. Leave it up to them: their problem, their solution. I think finalizers in trash cycles should be so rare in well-written code that it's just not worth adding much of anything in the implementation to cater to it. > IMO, this kind of semantics belong down in the interpreter with a > specific, documented algorithm. Throwing it out to Python won't help > -- that function will still have to use a "standard pattern" for getting > the cyclical objects to toss themselves. They can use any pattern they want, and if the pattern doesn't *need* to be coded in C as part of the implementation, it shouldn't be. > I think that standard pattern should be a language definition. I distrust our ability to foresee everything users may need over the next 10 years: how can we know today that the first std pattern you dreamed up off the top of your head is the best approach to an unbounded number of problems we haven't yet seen a one of ? > Without a standard pattern, then you're saying the application will know > what to do, but that is kind of weird -- what happens when an unexpected > cycle arrives? With the hypothetical gc.get_cycle() function I mentioned before, they should inspect objects in the list they get back, and if they find they don't know what to do with them, they can still do anything they want. Examples include raising an exception, dialing my home pager at 3am to insist I come in to look at it, or simply let the list go away (at which point the objects in the list will again become a trash cycle containing a finalizer). If several distinct third-party modules get into this act, I *can* see where it could become a mess. That's why Scheme "guardians" is plural: a given module could register its "problem objects" in advance with a specific guardian of its own, and query only that guardian later for things ready to die. This probably can't be implemented in Python, though, without support for weak references (or lots of brittle assumptions about specific refcount values). agreeably-disagreeing-ly y'rs - tim From tim_one at email.msn.com Sun Mar 5 05:56:58 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 4 Mar 2000 23:56:58 -0500 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Message-ID: <000101bf865f$3cb0d460$432d153f@tim> [Tim] > ...If a trash cycle contains a finalizer (my, but that has to be rare. > in practice, in well-designed code!), [Moshe Zadka] > This shows something Tim himself has often said -- he never programmed a > GUI. It's very hard to build a GUI (especially with Tkinter) which is > cycle-less, but the classes implementing the GUI often have __del__'s > to break system-allocated resources. > > So, it's not as rare as we would like to believe, which is the reason > I haven't given this answer. I wrote Cyclops.py when trying to track down leaks in IDLE. The extraordinary thing we discovered is that "even real gc" would not have reclaimed the cycles. They were legitimately reachable, because, indeed, "everything points to everything else". Guido fixed almost all of them by explicitly calling new "close" methods. I believe IDLE has no __del__ methods at all now. Tkinter.py currently contains two. so-they-contained-__del__-but-weren't-trash-ly y'rs - tim From tim_one at email.msn.com Sun Mar 5 07:05:24 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 5 Mar 2000 01:05:24 -0500 Subject: [Python-Dev] Unicode mapping tables In-Reply-To: <38BCD71C.3592E6A@lemburg.com> Message-ID: <000601bf8668$cbbdd640$432d153f@tim> [M.-A. Lemburg] > ... > Here's what I'll do: > > * implement .capitalize() in the traditional way for Unicode > objects (simply convert the first char to uppercase) Given .title(), is .capitalize() of use for Unicode strings? Or is it just a temptation to do something senseless in the Unicode world? If it doesn't make sense, leave it out (this *seems* like compulsion to implement all current string methods in *some* way for Unicode, whether or not they make sense). From moshez at math.huji.ac.il Sun Mar 5 07:16:22 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 5 Mar 2000 08:16:22 +0200 (IST) Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" in every method In-Reply-To: <000201bf8649$a17383e0$f42d153f@tim> Message-ID: On Sat, 4 Mar 2000, Tim Peters wrote: > I like Moshe's suggestion fine, except with an abstract base class named > Nanny with a virtual method named check_ast. Nannies should (of course) > derive from that. Why? The C++ you're programming damaged your common sense cycles? > > Since parsing is expensive, we probably want to share the parse tree. > > What parse tree? Python's parser module produces an AST not nearly "A > enough" for reasonably productive nanny writing. As a note, selfnanny uses the parser module AST. > GregS & BillT have > improved on that, but it's not in the std distrib. Other "problems" include > the lack of original source lines in the trees, The parser module has source lines. > and lack of column-number info. Yes, that sucks. > Note that by the time Python has produced a parse tree, all evidence of the > very thing tabnanny is looking for has been removed. That's why she used > the tokenize module to begin with. Well, it's one of the few nannies which would be in that position. > God knows tokenize is too funky to use too when life gets harder (check out > checkappend.py's tokeneater state machine for a preliminary taste of that). Why doesn't checkappend.py uses the parser module? > Grabbing the GregS/BillT enhancement is probably the most > practical thing we could build on right now You got some pointers? > (but tabnanny will have to remain a special case). tim-will-always-be-a-special-case-in-our-hearts-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From tim_one at email.msn.com Sun Mar 5 08:01:12 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 5 Mar 2000 02:01:12 -0500 Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" inevery method In-Reply-To: Message-ID: <000901bf8670$97d8f320$432d153f@tim> [Tim] >> [make Nanny a base class] [Moshe Zadka] > Why? Because it's an obvious application for OO design. A common base class formalizes the interface and can provide useful utilities for subclasses. > The C++ you're programming damaged your common sense cycles? Yes, very, but that isn't relevant here . It's good Python sense too. >> [parser module produces trees far too concrete for comfort] > As a note, selfnanny uses the parser module AST. Understood, but selfnanny has a relatively trivial task. Hassling with tuples nested dozens deep for even relatively simple stmts is both a PITA and a time sink. >> [parser doesn't give source lines] > The parser module has source lines. No, it does not (it only returns terminals, as isolated strings). The tokenize module does deliver original source lines in their entirety (as well as terminals, as isolated strings; and column numbers). >> and lack of column-number info. > Yes, that sucks. > ... > Why doesn't checkappend.py uses the parser module? Because it wanted to display the acutal source line containing an offending "append" (which, again, the parse module does not supply). Besides, it was a trivial variation on tabnanny.py, of which I have approximately 300 copies on my disk . >> Grabbing the GregS/BillT enhancement is probably the most >> practical thing we could build on right now > You got some pointers? Download python2c (http://www.mudlib.org/~rassilon/p2c/) and grab transformer.py from the zip file. The latter supplies a very useful post-processing pass over the parse module's output, squashing it *way* down. From moshez at math.huji.ac.il Sun Mar 5 08:08:41 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 5 Mar 2000 09:08:41 +0200 (IST) Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" inevery method In-Reply-To: <000901bf8670$97d8f320$432d153f@tim> Message-ID: On Sun, 5 Mar 2000, Tim Peters wrote: > [Tim] > >> [make Nanny a base class] > > [Moshe Zadka] > > Why? > > Because it's an obvious application for OO design. A common base class > formalizes the interface and can provide useful utilities for subclasses. The interface is just one function. You're welcome to have a do-nothing nanny that people *can* derive from: I see no point in making them derive from a base class. > > As a note, selfnanny uses the parser module AST. > > Understood, but selfnanny has a relatively trivial task. That it does, and it was painful. > >> [parser doesn't give source lines] > > > The parser module has source lines. > > No, it does not (it only returns terminals, as isolated strings). Sorry, misunderstanding: it seemed obvious to me you wanted line numbers. For lines, use the linecache module... > > You got some pointers? > > Download python2c (http://www.mudlib.org/~rassilon/p2c/) and grab > transformer.py from the zip file. I'll have a look. Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From effbot at telia.com Sun Mar 5 10:24:37 2000 From: effbot at telia.com (Fredrik Lundh) Date: Sun, 5 Mar 2000 10:24:37 +0100 Subject: [Python-Dev] return statements in lambda Message-ID: <006f01bf8686$391ced80$34aab5d4@hagrid> from "Python for Lisp Programmers": http://www.norvig.com/python-lisp.html > Don't forget return. Writing def twice(x): x+x is tempting > and doesn't signal a warning or > ception, but you probably > meant to have a return in there. This is particularly irksome > because in a lambda you are prohibited from writing return, > but the semantics is to do the return. maybe adding an (optional but encouraged) "return" to lambda would be an improvement? lambda x: x + 10 vs. lambda x: return x + 10 or is this just more confusing... opinions? From guido at python.org Sun Mar 5 13:04:56 2000 From: guido at python.org (Guido van Rossum) Date: Sun, 05 Mar 2000 07:04:56 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: Your message of "Sat, 04 Mar 2000 22:55:27 EST." <14529.55983.263225.691427@weyr.cnri.reston.va.us> References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> <14529.55983.263225.691427@weyr.cnri.reston.va.us> Message-ID: <200003051204.HAA05367@eric.cnri.reston.va.us> [Fred] > I agree that the API to ConfigParser sucks, and I think also that > the use of it as a general solution is a big mistake. It's a messy > bit of code that doesn't need to be, supports a really nasty mix of > syntaxes, and can easily bite users who think they're getting > something .ini-like (the magic names and interpolation is a bad > idea!). While it suited the original application well enough, > something with .ini syntax and interpolation from a subclass would > have been *much* better. > I think we should create a new module, inilib, that implements > exactly .ini syntax in a base class that can be intelligently > extended. ConfigParser should be deprecated. Amen. Some thoughts: - You could put it all in ConfigParser.py but with new classnames. (Not sure though, since the ConfigParser class, which is really a kind of weird variant, will be assumed to be the main class because its name is that of the module.) - Variants on the syntax could be given through some kind of option system rather than through subclassing -- they should be combinable independently. Som possible options (maybe I'm going overboard here) could be: - comment characters: ('#', ';', both, others?) - comments after variables allowed? on sections? - variable characters: (':', '=', both, others?) - quoting of values with "..." allowed? - backslashes in "..." allowed? - does backslash-newline mean a continuation? - case sensitivity for section names (default on) - case sensitivity for option names (default off) - variables allowed before first section name? - first section name? (default "main") - character set allowed in section names - character set allowed in variable names - %(...) substitution? (Well maybe the whole substitution thing should really be done through a subclass -- it's too weird for normal use.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun Mar 5 13:17:31 2000 From: guido at python.org (Guido van Rossum) Date: Sun, 05 Mar 2000 07:17:31 -0500 Subject: [Python-Dev] Unicode mapping tables In-Reply-To: Your message of "Sun, 05 Mar 2000 01:05:24 EST." <000601bf8668$cbbdd640$432d153f@tim> References: <000601bf8668$cbbdd640$432d153f@tim> Message-ID: <200003051217.HAA05395@eric.cnri.reston.va.us> > [M.-A. Lemburg] > > ... > > Here's what I'll do: > > > > * implement .capitalize() in the traditional way for Unicode > > objects (simply convert the first char to uppercase) [Tim] > Given .title(), is .capitalize() of use for Unicode strings? Or is it just > a temptation to do something senseless in the Unicode world? If it doesn't > make sense, leave it out (this *seems* like compulsion to implement > all current string methods in *some* way for Unicode, whether or not they > make sense). The intention of this is to make code that does something using strings do exactly the same strings if those strings happen to be Unicode strings with the same values. The capitalize method returns self[0].upper() + self[1:] -- that may not make sense for e.g. Japanese, but it certainly does for Russian or Greek. It also does this in JPython. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun Mar 5 13:24:41 2000 From: guido at python.org (Guido van Rossum) Date: Sun, 05 Mar 2000 07:24:41 -0500 Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" inevery method In-Reply-To: Your message of "Sun, 05 Mar 2000 02:01:12 EST." <000901bf8670$97d8f320$432d153f@tim> References: <000901bf8670$97d8f320$432d153f@tim> Message-ID: <200003051224.HAA05410@eric.cnri.reston.va.us> > >> [parser doesn't give source lines] > > > The parser module has source lines. > > No, it does not (it only returns terminals, as isolated strings). The > tokenize module does deliver original source lines in their entirety (as > well as terminals, as isolated strings; and column numbers). Moshe meant line numbers - -it has those. > > Why doesn't checkappend.py uses the parser module? > > Because it wanted to display the acutal source line containing an offending > "append" (which, again, the parse module does not supply). Besides, it was > a trivial variation on tabnanny.py, of which I have approximately 300 copies > on my disk . Of course another argument for making things more OO. (The code used in tabnanny.py to process files and recursively directories fronm sys.argv is replicated a thousand times in various scripts of mine -- Tim took it from my now-defunct takpolice.py. This should be in the std library somehow...) > >> Grabbing the GregS/BillT enhancement is probably the most > >> practical thing we could build on right now > > > You got some pointers? > > Download python2c (http://www.mudlib.org/~rassilon/p2c/) and grab > transformer.py from the zip file. The latter supplies a very useful > post-processing pass over the parse module's output, squashing it *way* > down. Those of you who have seen the compiler-sig should know that Jeremy made an improvement which will find its way into p2c. It's currently on display in the Python CVS tree in the nondist branch: see http://www.python.org/pipermail/compiler-sig/2000-February/000011.html and the ensuing thread for more details. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun Mar 5 14:46:13 2000 From: guido at python.org (Guido van Rossum) Date: Sun, 05 Mar 2000 08:46:13 -0500 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Your message of "Fri, 03 Mar 2000 22:26:54 EST." <000401bf8589$7d1364e0$c6a0143f@tim> References: <000401bf8589$7d1364e0$c6a0143f@tim> Message-ID: <200003051346.IAA05539@eric.cnri.reston.va.us> I'm beginning to believe that handing cycles with finalizers to the user is better than calling __del__ with a different meaning, and I tentatively withdraw my proposal to change the rules for when __del__ is called (even when __init__ fails; I haven't had any complaints about that either). There seem to be two competing suggestions for solutions: (1) call some kind of __cleanup__ (Marc-Andre) or tp_clean (Greg) method of the object; (2) Tim's proposal of an interface to ask the garbage collector for a trash cycle with a finalizer (or for an object with a finalizer in a trash cycle?). Somehow Tim's version looks less helpful to me, because it *seems* that whoever gets to handle the cycle (the main code of the program?) isn't necessarily responsible for creating it (some library you didn't even know was used under the covers of some other library you called). Of course, it's also posssible that a trash cycle is created by code outside the responsibility of the finalizer. But still, I have a hard time understanding how Tim's version would be used. Greg or Marc-Andre's version I understand. What keeps nagging me though is what to do when there's a finalizer but no cleanup method. I guess the trash cycle remains alive. Is this acceptable? (I guess so, because we've given the programmer a way to resolve the trash: provide a cleanup method.) If we detect individual cycles (the current algorithm doesn't do that yet, though it seems easy enough to do another scan), could we special-case cycles with only one finalizer and no cleaner-upper? (I'm tempted to call the finalizer because it seems little harm can be done -- but then of course there's the problem of the finalizer being called again when the refcount really goes to zero. :-( ) > Exactly. The *programmer* may know the right thing to do, but the Python > implementation can't possibly know. Facing both facts squarely constrains > the possibilities to the only ones that are all of understandable, > predictable and useful. Cycles with finalizers must be a Magic-Free Zone > else you lose at least one of those three: even Guido's kung fu isn't > strong enough to outguess this. > > [a nice implementation sketch, of what seems an overly elaborate scheme, > if you believe cycles with finalizers are rare in intelligently designed > code) > ] > > Provided Guido stays interested in this, he'll make his own fun. I'm just > inviting him to move in a sane direction <0.9 wink>. My current tendency is to go with the basic __cleanup__ and nothing more, calling each instance's __cleanup__ before clobbering directories and lists -- which should break all cycles safely. > One caution: > > > ... > > If the careful-cleaning algorithm hits the end of the careful set of > > objects and the set is non-empty, then throw an exception: > > GCImpossibleError. > > Since gc "can happen at any time", this is very severe (c.f. Guido's > objection to making resurrection illegal). Not quite. Cycle detection is presumably only called every once in a while on memory allocation, and memory *allocation* (as opposed to deallocation) is allowed to fail. Of course, this will probably run into various coding bugs where allocation failure isn't dealt with properly, because in practice this happens so rarely... > Hand a trash cycle back to the > programmer instead, via callback or request or whatever, and it's all > explicit without more cruft in the implementation. It's alive again when > they get it back, and they can do anything they want with it (including > resurrecting it, or dropping it again, or breaking cycles -- > anything). That was the idea with calling the finalizer too: it would be called between INCREF/DECREF, so the object would be considered alive for the duration of the finalizer call. Here's another way of looking at my error: for dicts and lists, I would call a special *clear* function; but for instances, I would call *dealloc*, however intending it to perform a *clear*. I wish we didn't have to special-case finalizers on class instances (since each dealloc function is potentially a combination of a finalizer and a deallocation routine), but the truth is that they *are* special -- __del__ has no responsibility for deallocating memory, only for deallocating external resources (such as temp files). And even if we introduced a tp_clean protocol that would clear dicts and lists and call __cleanup__ for instances, we'd still want to call it first for instances, because an instance depends on its __dict__ for its __cleanup__ to succeed (but the __dict__ doesn't depend on the instance for its cleanup). Greg's 3-phase tp_clean protocol seems indeed overly elaborate but I guess it deals with such dependencies in the most general fashion. > I'd focus on the cycles themselves, not on the types of objects > involved. I'm not pretending to address the "order of finalization > at shutdown" question, though (although I'd agree they're deeply > related: how do you follow a topological sort when there *isn't* > one? well, you don't, because you can't). In theory, you just delete the last root (a C global pointing to sys.modules) and you run the garbage collector. It might be more complicated in practiceto track down all roots. Another practical consideration is that now there are cycles of the form <=> which suggests that we should make function objects traceable. Also, modules can cross-reference, so module objects should be made traceable. I don't think that this will grow the sets of traced objects by too much (since the dicts involved are already traced, and a typical program has way fewer functions and modules than it has class instances). On the other hand, we may also have to trace (un)bound method objects, and these may be tricky because they are allocated and deallocated at high rates (once per typical method call). Back to the drawing board... --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Sun Mar 5 17:42:30 2000 From: skip at mojam.com (Skip Montanaro) Date: Sun, 5 Mar 2000 10:42:30 -0600 (CST) Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: <200003051346.IAA05539@eric.cnri.reston.va.us> References: <000401bf8589$7d1364e0$c6a0143f@tim> <200003051346.IAA05539@eric.cnri.reston.va.us> Message-ID: <14530.36471.11654.666900@beluga.mojam.com> Guido> What keeps nagging me though is what to do when there's a Guido> finalizer but no cleanup method. I guess the trash cycle remains Guido> alive. Is this acceptable? (I guess so, because we've given the Guido> programmer a way to resolve the trash: provide a cleanup method.) That assumes the programmer even knows there's a cycle, right? I'd like to see this scheme help provide debugging assistance. If a cycle is discovered but the programmer hasn't declared a cleanup method for the object it wants to cleanup, a default cleanup method is called if it exists (e.g. sys.default_cleanup), which would serve mostly as an alert (print magic hex values to stderr, popup a Tk bomb dialog, raise the blue screen of death, ...) as opposed to actually breaking any cycles. Presumably the programmer would define sys.default_cleanup during development and leave it undefined during production. Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From paul at prescod.net Sat Mar 4 02:04:43 2000 From: paul at prescod.net (Paul Prescod) Date: Fri, 03 Mar 2000 17:04:43 -0800 Subject: [Python-Dev] breaking list.append() References: <38BC86E1.53F69776@prescod.net> <200003010411.XAA12988@eric.cnri.reston.va.us> Message-ID: <38C0612B.7C92F8C4@prescod.net> Guido van Rossum wrote: > > .. > Multi-arg > append probably won't be the only reason why e.g. Digital Creations > may need to release an update to Zope for Python 1.6. Zope comes with > its own version of Python anyway, so they have control over when they > make the switch. My concernc is when I want to build an application with a module that only works with Python 1.5.2 and another one that only works with Python 1.6. If we can avoid that situation by making 1.6 compatible with 1.5.2. we should. By the time 1.7 comes around I will accept that everyone has had enough time to update their modules. Remember that many module authors are just part time volunteers. They may only use Python every few months when they get a spare weekend! I really hope that Andrew is wrong when he predicts that there may be lots of different places where Python 1.6 breaks code! I'm in favor of being a total jerk when it comes to Py3K but Python has been pretty conservative thus far. Could someone remind in one sentence what the downside is for treating this as a warning condition as Java does with its deprecated features? Then the CP4E people don't get into bad habits and those same CP4E people trying to use older modules don't run into frustrating runtime errors. Do it for the CP4E people! (how's that for rhetoric) -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "We still do not know why mathematics is true and whether it is certain. But we know what we do not know in an immeasurably richer way than we did. And learning this has been a remarkable achievement, among the greatest and least known of the modern era." - from "Advent of the Algorithm" David Berlinski http://www.opengroup.com/mabooks/015/0151003386.shtml From jeremy at cnri.reston.va.us Sun Mar 5 18:46:14 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Sun, 5 Mar 2000 12:46:14 -0500 (EST) Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" inevery method In-Reply-To: <000901bf8670$97d8f320$432d153f@tim> References: <000901bf8670$97d8f320$432d153f@tim> Message-ID: <14530.40294.593407.777859@bitdiddle.cnri.reston.va.us> >>>>> "TP" == Tim Peters writes: >>> Grabbing the GregS/BillT enhancement is probably the most >>> practical thing we could build on right now >> You got some pointers? TP> Download python2c (http://www.mudlib.org/~rassilon/p2c/) and TP> grab transformer.py from the zip file. The latter supplies a TP> very useful post-processing pass over the parse module's output, TP> squashing it *way* down. The compiler tools in python/nondist/src/Compiler include Bill & Greg's transformer code, a class-based AST (each node is a subclass of the generic node), and a visitor framework for walking the AST. The APIs and organization are in a bit of flux; Mark Hammond suggested some reorganization that I've not finished yet. I may finish it up this evening. The transformer module does a good job of incuding line numbers, but I've occasionally run into a node that didn't have a lineno attribute when I expected it would. I haven't taken the time to figure out if my expection was unreasonable or if the transformer should be fixed. The compiler-sig might be a good place to discuss this further. A warning framework was one of my original goals for the SIG. I imagine we could convince Guido to move warnings + compiler tools into the standard library if they end up being useful. Jeremy From mal at lemburg.com Sun Mar 5 20:57:32 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 05 Mar 2000 20:57:32 +0100 Subject: [Python-Dev] Unicode mapping tables References: <000601bf8668$cbbdd640$432d153f@tim> Message-ID: <38C2BC2C.FFEB72C3@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > ... > > Here's what I'll do: > > > > * implement .capitalize() in the traditional way for Unicode > > objects (simply convert the first char to uppercase) > > Given .title(), is .capitalize() of use for Unicode strings? Or is it just > a temptation to do something senseless in the Unicode world? If it doesn't > make sense, leave it out (this *seems* like compulsion to implement > all current string methods in *some* way for Unicode, whether or not they > make sense). .capitalize() only touches the first char of the string - not sure whether it makes sense in both worlds ;-) Anyhow, the difference is there but subtle: string.capitalize() will use C's toupper() which is locale dependent, while unicode.capitalize() uses Unicode's toTitleCase() for the first character. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Sun Mar 5 21:15:47 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 05 Mar 2000 21:15:47 +0100 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? References: <000601bf8658$d81d34e0$f42d153f@tim> Message-ID: <38C2C073.CD51688@lemburg.com> Tim Peters wrote: > > [Guido] > > Would there really be someone out there who uses *intentional* > > resurrection? I severely doubt it. I've never heard of this. > > Why would anyone tell you about something that *works*?! You rarely hear > the good stuff, you know. I gave the typical pattern in the preceding msg. > To flesh out the motivation more, you have some external resource that's > very expensive to set up (in KSR's case, it was an IPC connection to a > remote machine). Rights to use that resource are handed out in the form of > an object. When a client is done using the resource, they *should* > explicitly use the object's .release() method, but you can't rely on that. > So the object's __del__ method looks like (for example): > > def __del__(self): > > # Code not shown to figure out whether to disconnect: the downside to > # disconnecting is that it can cost a bundle to create a new connection. > # If the whole app is shutting down, then of course we want to > disconnect. > # Or if a timestamp trace shows that we haven't been making good use of > # all the open connections lately, we may want to disconnect too. > > if decided_to_disconnect: > self.external_resource.disconnect() > else: > # keep the connection alive for reuse > global_available_connection_objects.append(self) > > This is simple & effective, and it relies on both intentional resurrection > and __del__ getting called repeatedly. I don't claim there's no other way > to write it, just that there's *been* no problem doing this for a millennium > . > > Note that MAL spontaneously sketched similar examples, although I can't say > whether he's actually done stuff like this. Not exactly this, but similar things in the weak reference implementation of mxProxy. The idea came from a different area: the C implementation of Python uses free lists a lot and these are basically implementations of the same idiom: save an allocated resource for reviving it at some later point. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From nascheme at enme.ucalgary.ca Mon Mar 6 01:27:54 2000 From: nascheme at enme.ucalgary.ca (nascheme at enme.ucalgary.ca) Date: Sun, 5 Mar 2000 17:27:54 -0700 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: <000001bf857a$60b45ac0$c6a0143f@tim>; from tim_one@email.msn.com on Fri, Mar 03, 2000 at 08:38:43PM -0500 References: <200003031650.LAA21647@eric.cnri.reston.va.us> <000001bf857a$60b45ac0$c6a0143f@tim> Message-ID: <20000305172754.A14998@acs.ucalgary.ca> On Fri, Mar 03, 2000 at 08:38:43PM -0500, Tim Peters wrote: > So here's what I'd consider doing: explicit is better than implicit, and in > the face of ambiguity refuse the temptation to guess. I like Marc's suggestion. Here is my proposal: Allow classes to have a new method, __cleanup__ or whatever you want to call it. When tp_clear is called for an instance, it checks for this method. If it exists, call it, otherwise delete the container objects from the instance's dictionary. When collecting cycles, call tp_clear for instances first. Its simple and allows the programmer to cleanly break cycles if they insist on creating them and using __del__ methods. Neil From tim_one at email.msn.com Mon Mar 6 08:13:21 2000 From: tim_one at email.msn.com (Tim Peters) Date: Mon, 6 Mar 2000 02:13:21 -0500 Subject: [Python-Dev] breaking list.append() In-Reply-To: <38C0612B.7C92F8C4@prescod.net> Message-ID: <000401bf873b$745f8320$ea2d153f@tim> [Paul Prescod] > ... > Could someone remind in one sentence what the downside is for treating > this as a warning condition as Java does with its deprecated features? Simply the lack of anything to build on: Python has no sort of runtime warning system now, and nobody has volunteered to create one. If you do , remember that stdout & stderr may go to the bit bucket in a GUI app. The bit about dropping the "L" suffix on longs seems unwarnable-about in any case (short of warning every time anyone uses long()). remember-that-you-asked-for-the-problems-not-for-solutions-ly y'rs - tim From tim_one at email.msn.com Mon Mar 6 08:33:49 2000 From: tim_one at email.msn.com (Tim Peters) Date: Mon, 6 Mar 2000 02:33:49 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: <38C2C073.CD51688@lemburg.com> Message-ID: <000701bf873e$5032eca0$ea2d153f@tim> [M.-A. Lemburg, on the resurrection/multiple-__del__ "idiom"] > ... > The idea came from a different area: the C implementation > of Python uses free lists a lot and these are basically > implementations of the same idiom: save an allocated > resource for reviving it at some later point. Excellent analogy! Thanks. Now that you phrased it in this clarifying way, I recall that very much the same point was raised in the papers that resulted in the creation of guardians in Scheme. I don't know that anyone is actually using Python __del__ this way today (I am not), but you reminded me why I thought it was natural at one time . generally-__del__-aversive-now-except-in-c++-where-destructors-are- guaranteed-to-be-called-when-you-except-them-to-be-ly y'rs - tim From tim_one at email.msn.com Mon Mar 6 09:12:06 2000 From: tim_one at email.msn.com (Tim Peters) Date: Mon, 6 Mar 2000 03:12:06 -0500 Subject: [Python-Dev] return statements in lambda In-Reply-To: <006f01bf8686$391ced80$34aab5d4@hagrid> Message-ID: <000901bf8743$a9f61aa0$ea2d153f@tim> [/F] > maybe adding an (optional but encouraged) "return" > to lambda would be an improvement? > > lambda x: x + 10 > > vs. > > lambda x: return x + 10 > > or is this just more confusing... opinions? It was an odd complaint to begin with, since Lisp-heads aren't used to using "return" anyway. More of a symptom of taking a shallow syntactic approach to a new (to them) language. For non-Lisp heads, I think it's more confusing in the end, blurring the distinction between stmts and expressions ("the body of a lambda must be an expression" ... "ok, i lied, unless it's a 'return' stmt). If Guido had it to do over again, I vote he rejects the original patch . Short of that, would have been better if the lambda arglist required parens, and if the body were required to be a single return stmt (that would sure end the "lambda x: print x" FAQ -- few would *expect* "return print x" to work!). hindsight-is-great-ly y'rs - tim From tim_one at email.msn.com Mon Mar 6 10:09:45 2000 From: tim_one at email.msn.com (Tim Peters) Date: Mon, 6 Mar 2000 04:09:45 -0500 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: <200003051346.IAA05539@eric.cnri.reston.va.us> Message-ID: <000b01bf874b$b6fe9da0$ea2d153f@tim> [Guido] > I'm beginning to believe that handing cycles with finalizers to the > user is better than calling __del__ with a different meaning, You won't be sorry: Python has the chance to be the first language that's both useful and sane here! > and I tentatively withdraw my proposal to change the rules for when > __del__is called (even when __init__ fails; I haven't had any complaints > about that either). Well, everyone liked the parenthetical half of that proposal, although Jack's example did point out a real surprise with it. > There seem to be two competing suggestions for solutions: (1) call > some kind of __cleanup__ (Marc-Andre) or tp_clean (Greg) method of the > object; (2) Tim's proposal of an interface to ask the garbage > collector for a trash cycle with a finalizer (or for an object with a > finalizer in a trash cycle?). Or a maximal strongly-connected component, or *something* -- unsure. > Somehow Tim's version looks less helpful to me, because it *seems* > that whoever gets to handle the cycle (the main code of the program?) > isn't necessarily responsible for creating it (some library you didn't > even know was used under the covers of some other library you called). Yes, to me too. This is the Scheme "guardian" idea in a crippled form (Scheme supports as many distinct guardians as the programmer cares to create), and even in its full-blown form it supplies "a perfectly general mechanism with no policy whatsoever". Greg convinced me (although I haven't admitted this yet ) that "no policy whatsoever" is un-Pythonic too. *Some* policy is helpful, so I won't be pushing the guardian idea any more (although see immediately below for an immediate backstep on that ). > ... > What keeps nagging me though is what to do when there's a finalizer > but no cleanup method. I guess the trash cycle remains alive. Is > this acceptable? (I guess so, because we've given the programmer a > way to resolve the trash: provide a cleanup method.) BDW considers it better to leak than to risk doing a wrong thing, and I agree wholeheartedly with that. GC is one place you want to have a "100% language". This is where something like a guardian can remain useful: while leaking is OK because you've given them an easy & principled alternative, leaking without giving them a clear way to *know* about it is not OK. If gc pushes the leaked stuff off to the side, the gc module should (say) supply an entry point that returns all the leaked stuff in a list. Then users can *know* they're leaking, know how badly they're leaking, and examine exactly the objects that are leaking. Then they've got the info they need to repair their program (or at least track down the 3rd-party module that's leaking). As with a guardian, they *could* also build a reclamation scheme on top of it, but that would no longer be the main (or even an encouraged) thrust. > If we detect individual cycles (the current algorithm doesn't do that > yet, though it seems easy enough to do another scan), could we > special-case cycles with only one finalizer and no cleaner-upper? > (I'm tempted to call the finalizer because it seems little harm can be > done -- but then of course there's the problem of the finalizer being > called again when the refcount really goes to zero. :-( ) "Better safe than sorry" is my immediate view on this -- you can't know that the finalizer won't resurrect the cycle, and "finalizer called iff refcount hits 0" is a wonderfully simple & predictable rule. That's worth a lot to preserve, unless & until it proves to be a disaster in practice. As to the details of cleanup, I haven't succeeded in making the time to understand all the proposals. But I've done my primary job here if I've harassed everyone into not repeating the same mistakes all previous languages have made <0.9 wink>. > ... > I wish we didn't have to special-case finalizers on class instances > (since each dealloc function is potentially a combination of a > finalizer and a deallocation routine), but the truth is that they > *are* special -- __del__ has no responsibility for deallocating > memory, only for deallocating external resources (such as temp files). And the problem is that __del__ can do anything whatsoever than can be expressed in Python, so there's not a chance in hell of outguessing it. > ... > Another practical consideration is that now there are cycles of the form > > <=> > > which suggests that we should make function objects traceable. Also, > modules can cross-reference, so module objects should be made > traceable. I don't think that this will grow the sets of traced > objects by too much (since the dicts involved are already traced, and > a typical program has way fewer functions and modules than it has > class instances). On the other hand, we may also have to trace > (un)bound method objects, and these may be tricky because they are > allocated and deallocated at high rates (once per typical method > call). This relates to what I was trying to get at with my response to your gc implementation sketch: mark-&-sweep needs to chase *everything*, so the set of chased types is maximal from the start. Adding chased types to the "indirectly infer what's unreachable via accounting for internal refcounts within the transitive closure" scheme can end up touching nearly as much as a full M-&-S pass per invocation. I don't know where the break-even point is, but the more stuff you chase in the latter scheme the less often you want to run it. About high rates, so long as a doubly-linked list allows efficient removal of stuff that dies via refcount exhaustion, you won't actually *chase* many bound method objects (i.e., they'll usually go away by themselves). Note in passing that bound method objects often showed up in cycles in IDLE, although you usually managed to break those in other ways. > Back to the drawing board... Good! That means you're making real progress . glad-someone-is-ly y'rs - tim From mal at lemburg.com Mon Mar 6 11:01:31 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 06 Mar 2000 11:01:31 +0100 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? References: <200003031650.LAA21647@eric.cnri.reston.va.us> <000001bf857a$60b45ac0$c6a0143f@tim> <20000305172754.A14998@acs.ucalgary.ca> Message-ID: <38C381FB.E222D6E4@lemburg.com> nascheme at enme.ucalgary.ca wrote: > > On Fri, Mar 03, 2000 at 08:38:43PM -0500, Tim Peters wrote: > > So here's what I'd consider doing: explicit is better than implicit, and in > > the face of ambiguity refuse the temptation to guess. > > I like Marc's suggestion. Here is my proposal: > > Allow classes to have a new method, __cleanup__ or whatever you > want to call it. When tp_clear is called for an instance, it > checks for this method. If it exists, call it, otherwise delete > the container objects from the instance's dictionary. When > collecting cycles, call tp_clear for instances first. > > Its simple and allows the programmer to cleanly break cycles if > they insist on creating them and using __del__ methods. Right :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Mar 6 12:57:29 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 06 Mar 2000 12:57:29 +0100 Subject: [Python-Dev] Unicode character property methods Message-ID: <38C39D29.A29CE67F@lemburg.com> As you may have noticed, the Unicode objects provide new methods .islower(), .isupper() and .istitle(). Finn Bock mentioned that Java also provides .isdigit() and .isspace(). Question: should Unicode also provide these character property methods: .isdigit(), .isnumeric(), .isdecimal() and .isspace() ? Plus maybe .digit(), .numeric() and .decimal() for the corresponding decoding ? Similar APIs are already available through the unicodedata module, but could easily be moved to the Unicode object (they cause the builtin interpreter to grow a bit in size due to the new mapping tables). BTW, string.atoi et al. are currently not mapped to string methods... should they be ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Mon Mar 6 14:29:04 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 06 Mar 2000 08:29:04 -0500 Subject: [Python-Dev] Unicode character property methods In-Reply-To: Your message of "Mon, 06 Mar 2000 12:57:29 +0100." <38C39D29.A29CE67F@lemburg.com> References: <38C39D29.A29CE67F@lemburg.com> Message-ID: <200003061329.IAA09529@eric.cnri.reston.va.us> > As you may have noticed, the Unicode objects provide > new methods .islower(), .isupper() and .istitle(). Finn Bock > mentioned that Java also provides .isdigit() and .isspace(). > > Question: should Unicode also provide these character > property methods: .isdigit(), .isnumeric(), .isdecimal() > and .isspace() ? Plus maybe .digit(), .numeric() and > .decimal() for the corresponding decoding ? What would be the difference between isdigit, isnumeric, isdecimal? I'd say don't do more than Java. I don't understand what the "corresponding decoding" refers to. What would "3".decimal() return? > Similar APIs are already available through the unicodedata > module, but could easily be moved to the Unicode object > (they cause the builtin interpreter to grow a bit in size > due to the new mapping tables). > > BTW, string.atoi et al. are currently not mapped to > string methods... should they be ? They are mapped to int() c.s. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Mon Mar 6 16:09:55 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 6 Mar 2000 10:09:55 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: <200003051204.HAA05367@eric.cnri.reston.va.us> References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> <14529.55983.263225.691427@weyr.cnri.reston.va.us> <200003051204.HAA05367@eric.cnri.reston.va.us> Message-ID: <14531.51779.650532.881626@weyr.cnri.reston.va.us> Guido van Rossum writes: > - You could put it all in ConfigParser.py but with new classnames. > (Not sure though, since the ConfigParser class, which is really a > kind of weird variant, will be assumed to be the main class because > its name is that of the module.) The ConfigParser class could be clearly marked as deprecated both in the source/docstring and in the documentation. But the class itself should not be used in any way. > - Variants on the syntax could be given through some kind of option > system rather than through subclassing -- they should be combinable > independently. Som possible options (maybe I'm going overboard here) > could be: Yes, you are going overboard. It should contain exactly what's right for .ini files, and that's it. There are really three aspects to the beast: reading, using, and writing. I think there should be a class which does the right thing for using the informatin in the file, and reading & writing can be handled through functions or helper classes. That separates the parsing issues from the use issues, and alternate syntaxes will be easy enough to implement by subclassing the helper or writing a new function. An "editable" version that allows loading & saving without throwing away comments, ordering, etc. would require a largely separate implementation of all three aspects (or at least the reader and writer). > (Well maybe the whole substitution thing should really be done through > a subclass -- it's too weird for normal use.) That and the ad hoc syntax are my biggest beefs with ConfigParser. But it can easily be added by a subclass as long as the method to override is clearly specified in the documenation (it should only require one!). -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake at acm.org Mon Mar 6 18:47:44 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 6 Mar 2000 12:47:44 -0500 (EST) Subject: [Python-Dev] PyBufferProcs Message-ID: <14531.61248.941076.803617@weyr.cnri.reston.va.us> While working on the documentation, I've noticed a naming inconsistency regarding PyBufferProcs; it's peers are all named Py*Methods (PySequenceMethods, PyNumberMethods, etc.). I'd like to propose that a synonym, PyBufferMethods, be made for PyBufferProcs, and use that in the core implementations and the documentation. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From jeremy at cnri.reston.va.us Mon Mar 6 20:28:12 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Mon, 6 Mar 2000 14:28:12 -0500 (EST) Subject: [Python-Dev] example checkers based on compiler package Message-ID: <14532.1740.90292.440395@goon.cnri.reston.va.us> There was some discussion on python-dev over the weekend about generating warnings, and Moshe Zadke posted a selfnanny that warned about methods that didn't have self as the first argument. I think these kinds of warnings are useful, and I'd like to see a more general framework for them built are Python abstract syntax originally from P2C. Ideally, they would be available as command line tools and integrated into GUIs like IDLE in some useful way. I've included a couple of quick examples I coded up last night based on the compiler package (recently re-factored) that is resident in python/nondist/src/Compiler. The analysis on the one that checks for name errors is a bit of a mess, but the overall structure seems right. I'm hoping to collect a few more examples of checkers and generalize from them to develop a framework for checking for errors and reporting them. Jeremy ------------ checkself.py ------------ """Check for methods that do not have self as the first argument""" from compiler import parseFile, walk, ast, misc class Warning: def __init__(self, filename, klass, method, lineno, msg): self.filename = filename self.klass = klass self.method = method self.lineno = lineno self.msg = msg _template = "%(filename)s:%(lineno)s %(klass)s.%(method)s: %(msg)s" def __str__(self): return self._template % self.__dict__ class NoArgsWarning(Warning): super_init = Warning.__init__ def __init__(self, filename, klass, method, lineno): self.super_init(filename, klass, method, lineno, "no arguments") class NotSelfWarning(Warning): super_init = Warning.__init__ def __init__(self, filename, klass, method, lineno, argname): self.super_init(filename, klass, method, lineno, "self slot is named %s" % argname) class CheckSelf: def __init__(self, filename): self.filename = filename self.warnings = [] self.scope = misc.Stack() def inClass(self): if self.scope: return isinstance(self.scope.top(), ast.Class) return 0 def visitClass(self, klass): self.scope.push(klass) self.visit(klass.code) self.scope.pop() return 1 def visitFunction(self, func): if self.inClass(): classname = self.scope.top().name if len(func.argnames) == 0: w = NoArgsWarning(self.filename, classname, func.name, func.lineno) self.warnings.append(w) elif func.argnames[0] != "self": w = NotSelfWarning(self.filename, classname, func.name, func.lineno, func.argnames[0]) self.warnings.append(w) self.scope.push(func) self.visit(func.code) self.scope.pop() return 1 def check(filename): global p, check p = parseFile(filename) check = CheckSelf(filename) walk(p, check) for w in check.warnings: print w if __name__ == "__main__": import sys # XXX need to do real arg processing check(sys.argv[1]) ------------ badself.py ------------ def foo(): return 12 class Foo: def __init__(): pass def foo(self, foo): pass def bar(this, that): def baz(this=that): return this return baz def bar(): class Quux: def __init__(self): self.sum = 1 def quam(x, y): self.sum = self.sum + (x * y) return Quux() ------------ checknames.py ------------ """Check for NameErrors""" from compiler import parseFile, walk from compiler.misc import Stack, Set import __builtin__ from UserDict import UserDict class Warning: def __init__(self, filename, funcname, lineno): self.filename = filename self.funcname = funcname self.lineno = lineno def __str__(self): return self._template % self.__dict__ class UndefinedLocal(Warning): super_init = Warning.__init__ def __init__(self, filename, funcname, lineno, name): self.super_init(filename, funcname, lineno) self.name = name _template = "%(filename)s:%(lineno)s %(funcname)s undefined local %(name)s" class NameError(UndefinedLocal): _template = "%(filename)s:%(lineno)s %(funcname)s undefined name %(name)s" class NameSet(UserDict): """Track names and the line numbers where they are referenced""" def __init__(self): self.data = self.names = {} def add(self, name, lineno): l = self.names.get(name, []) l.append(lineno) self.names[name] = l class CheckNames: def __init__(self, filename): self.filename = filename self.warnings = [] self.scope = Stack() self.gUse = NameSet() self.gDef = NameSet() # _locals is the stack of local namespaces # locals is the top of the stack self._locals = Stack() self.lUse = None self.lDef = None self.lGlobals = None # var declared global # holds scope,def,use,global triples for later analysis self.todo = [] def enterNamespace(self, node): ## print node.name self.scope.push(node) self.lUse = use = NameSet() self.lDef = _def = NameSet() self.lGlobals = gbl = NameSet() self._locals.push((use, _def, gbl)) def exitNamespace(self): ## print self.todo.append((self.scope.top(), self.lDef, self.lUse, self.lGlobals)) self.scope.pop() self._locals.pop() if self._locals: self.lUse, self.lDef, self.lGlobals = self._locals.top() else: self.lUse = self.lDef = self.lGlobals = None def warn(self, warning, funcname, lineno, *args): args = (self.filename, funcname, lineno) + args self.warnings.append(apply(warning, args)) def defName(self, name, lineno, local=1): ## print "defName(%s, %s, local=%s)" % (name, lineno, local) if self.lUse is None: self.gDef.add(name, lineno) elif local == 0: self.gDef.add(name, lineno) self.lGlobals.add(name, lineno) else: self.lDef.add(name, lineno) def useName(self, name, lineno, local=1): ## print "useName(%s, %s, local=%s)" % (name, lineno, local) if self.lUse is None: self.gUse.add(name, lineno) elif local == 0: self.gUse.add(name, lineno) self.lUse.add(name, lineno) else: self.lUse.add(name, lineno) def check(self): for s, d, u, g in self.todo: self._check(s, d, u, g, self.gDef) # XXX then check the globals def _check(self, scope, _def, use, gbl, globals): # check for NameError # a name is defined iff it is in def.keys() # a name is global iff it is in gdefs.keys() gdefs = UserDict() gdefs.update(globals) gdefs.update(__builtin__.__dict__) defs = UserDict() defs.update(gdefs) defs.update(_def) errors = Set() for name in use.keys(): if not defs.has_key(name): firstuse = use[name][0] self.warn(NameError, scope.name, firstuse, name) errors.add(name) # check for UndefinedLocalNameError # order == use & def sorted by lineno # elements are lineno, flag, name # flag = 0 if use, flag = 1 if def order = [] for name, lines in use.items(): if gdefs.has_key(name) and not _def.has_key(name): # this is a global ref, we can skip it continue for lineno in lines: order.append(lineno, 0, name) for name, lines in _def.items(): for lineno in lines: order.append(lineno, 1, name) order.sort() # ready contains names that have been defined or warned about ready = Set() for lineno, flag, name in order: if flag == 0: # use if not ready.has_elt(name) and not errors.has_elt(name): self.warn(UndefinedLocal, scope.name, lineno, name) ready.add(name) # don't warn again else: ready.add(name) # below are visitor methods def visitFunction(self, node, noname=0): for expr in node.defaults: self.visit(expr) if not noname: self.defName(node.name, node.lineno) self.enterNamespace(node) for name in node.argnames: self.defName(name, node.lineno) self.visit(node.code) self.exitNamespace() return 1 def visitLambda(self, node): return self.visitFunction(node, noname=1) def visitClass(self, node): for expr in node.bases: self.visit(expr) self.defName(node.name, node.lineno) self.enterNamespace(node) self.visit(node.code) self.exitNamespace() return 1 def visitName(self, node): self.useName(node.name, node.lineno) def visitGlobal(self, node): for name in node.names: self.defName(name, node.lineno, local=0) def visitImport(self, node): for name in node.names: self.defName(name, node.lineno) visitFrom = visitImport def visitAssName(self, node): self.defName(node.name, node.lineno) def check(filename): global p, checker p = parseFile(filename) checker = CheckNames(filename) walk(p, checker) checker.check() for w in checker.warnings: print w if __name__ == "__main__": import sys # XXX need to do real arg processing check(sys.argv[1]) ------------ badnames.py ------------ # XXX can we detect race conditions on accesses to global variables? # probably can (conservatively) by noting variables _created_ by # global decls in funcs import string import time def foo(x): return x + y def foo2(x): return x + z a = 4 def foo3(x): a, b = x, a def bar(x): z = x global z def bar2(x): f = string.strip a = f(x) import string return string.lower(a) def baz(x, y): return x + y + z def outer(x): def inner(y): return x + y return inner From gstein at lyra.org Mon Mar 6 22:09:33 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 6 Mar 2000 13:09:33 -0800 (PST) Subject: [Python-Dev] PyBufferProcs In-Reply-To: <14531.61248.941076.803617@weyr.cnri.reston.va.us> Message-ID: On Mon, 6 Mar 2000, Fred L. Drake, Jr. wrote: > While working on the documentation, I've noticed a naming > inconsistency regarding PyBufferProcs; it's peers are all named > Py*Methods (PySequenceMethods, PyNumberMethods, etc.). > I'd like to propose that a synonym, PyBufferMethods, be made for > PyBufferProcs, and use that in the core implementations and the > documentation. +0 Although.. I might say that it should be renamed, and a synonym (#define or typedef?) be provided for the old name. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Mon Mar 6 23:04:14 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 06 Mar 2000 23:04:14 +0100 Subject: [Python-Dev] Unicode character property methods References: <38C39D29.A29CE67F@lemburg.com> <200003061329.IAA09529@eric.cnri.reston.va.us> Message-ID: <38C42B5E.42801755@lemburg.com> Guido van Rossum wrote: > > > As you may have noticed, the Unicode objects provide > > new methods .islower(), .isupper() and .istitle(). Finn Bock > > mentioned that Java also provides .isdigit() and .isspace(). > > > > Question: should Unicode also provide these character > > property methods: .isdigit(), .isnumeric(), .isdecimal() > > and .isspace() ? Plus maybe .digit(), .numeric() and > > .decimal() for the corresponding decoding ? > > What would be the difference between isdigit, isnumeric, isdecimal? > I'd say don't do more than Java. I don't understand what the > "corresponding decoding" refers to. What would "3".decimal() return? These originate in the Unicode database; see ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html Here are the descriptions: """ 6 Decimal digit value normative This is a numeric field. If the character has the decimal digit property, as specified in Chapter 4 of the Unicode Standard, the value of that digit is represented with an integer value in this field 7 Digit value normative This is a numeric field. If the character represents a digit, not necessarily a decimal digit, the value is here. This covers digits which do not form decimal radix forms, such as the compatibility superscript digits 8 Numeric value normative This is a numeric field. If the character has the numeric property, as specified in Chapter 4 of the Unicode Standard, the value of that character is represented with an integer or rational number in this field. This includes fractions as, e.g., "1/5" for U+2155 VULGAR FRACTION ONE FIFTH Also included are numerical values for compatibility characters such as circled numbers. u"3".decimal() would return 3. u"\u2155". Some more examples from the unicodedata module (which makes all fields of the database available in Python): >>> unicodedata.decimal(u"3") 3 >>> unicodedata.decimal(u"?") 2 >>> unicodedata.digit(u"?") 2 >>> unicodedata.numeric(u"?") 2.0 >>> unicodedata.numeric(u"\u2155") 0.2 >>> unicodedata.numeric(u'\u215b') 0.125 > > Similar APIs are already available through the unicodedata > > module, but could easily be moved to the Unicode object > > (they cause the builtin interpreter to grow a bit in size > > due to the new mapping tables). > > > > BTW, string.atoi et al. are currently not mapped to > > string methods... should they be ? > > They are mapped to int() c.s. Hmm, I just noticed that int() et friends don't like Unicode... shouldn't they use the "t" parser marker instead of requiring a string or tp_int compatible type ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Tue Mar 7 00:12:33 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 06 Mar 2000 18:12:33 -0500 Subject: [Python-Dev] Unicode character property methods In-Reply-To: Your message of "Mon, 06 Mar 2000 23:04:14 +0100." <38C42B5E.42801755@lemburg.com> References: <38C39D29.A29CE67F@lemburg.com> <200003061329.IAA09529@eric.cnri.reston.va.us> <38C42B5E.42801755@lemburg.com> Message-ID: <200003062312.SAA11697@eric.cnri.reston.va.us> [MAL] > > > As you may have noticed, the Unicode objects provide > > > new methods .islower(), .isupper() and .istitle(). Finn Bock > > > mentioned that Java also provides .isdigit() and .isspace(). > > > > > > Question: should Unicode also provide these character > > > property methods: .isdigit(), .isnumeric(), .isdecimal() > > > and .isspace() ? Plus maybe .digit(), .numeric() and > > > .decimal() for the corresponding decoding ? [Guido] > > What would be the difference between isdigit, isnumeric, isdecimal? > > I'd say don't do more than Java. I don't understand what the > > "corresponding decoding" refers to. What would "3".decimal() return? [MAL] > These originate in the Unicode database; see > > ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html > > Here are the descriptions: > > """ > 6 > Decimal digit value > normative > This is a numeric field. If the > character has the decimal digit > property, as specified in Chapter > 4 of the Unicode Standard, the > value of that digit is represented > with an integer value in this field > 7 > Digit value > normative > This is a numeric field. If the > character represents a digit, not > necessarily a decimal digit, the > value is here. This covers digits > which do not form decimal radix > forms, such as the compatibility > superscript digits > 8 > Numeric value > normative > This is a numeric field. If the > character has the numeric > property, as specified in Chapter > 4 of the Unicode Standard, the > value of that character is > represented with an integer or > rational number in this field. This > includes fractions as, e.g., "1/5" for > U+2155 VULGAR FRACTION > ONE FIFTH Also included are > numerical values for compatibility > characters such as circled > numbers. > > u"3".decimal() would return 3. u"\u2155". > > Some more examples from the unicodedata module (which makes > all fields of the database available in Python): > > >>> unicodedata.decimal(u"3") > 3 > >>> unicodedata.decimal(u"?") > 2 > >>> unicodedata.digit(u"?") > 2 > >>> unicodedata.numeric(u"?") > 2.0 > >>> unicodedata.numeric(u"\u2155") > 0.2 > >>> unicodedata.numeric(u'\u215b') > 0.125 Hm, very Unicode centric. Probably best left out of the general string methods. Isspace() seems useful, and an isdigit() that is only true for ASCII '0' - '9' also makes sense. What about "123".isdigit()? What does Java say? Or do these only apply to single chars there? I think "123".isdigit() should be true if "abc".islower() is true. > > > Similar APIs are already available through the unicodedata > > > module, but could easily be moved to the Unicode object > > > (they cause the builtin interpreter to grow a bit in size > > > due to the new mapping tables). > > > > > > BTW, string.atoi et al. are currently not mapped to > > > string methods... should they be ? > > > > They are mapped to int() c.s. > > Hmm, I just noticed that int() et friends don't like > Unicode... shouldn't they use the "t" parser marker > instead of requiring a string or tp_int compatible > type ? Good catch. Go ahead. --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez at math.huji.ac.il Tue Mar 7 06:25:43 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 7 Mar 2000 07:25:43 +0200 (IST) Subject: [Python-Dev] Re: example checkers based on compiler package In-Reply-To: <14532.1740.90292.440395@goon.cnri.reston.va.us> Message-ID: On Mon, 6 Mar 2000, Jeremy Hylton wrote: > I think these kinds of warnings are useful, and I'd like to see a more > general framework for them built are Python abstract syntax originally > from P2C. Ideally, they would be available as command line tools and > integrated into GUIs like IDLE in some useful way. Yes! Guido already suggested we have a standard API to them. One thing I suggested was that the abstract API include not only the input (one form or another of an AST), but the output: so IDE's wouldn't have to parse strings, but get a warning class. Something like a: An output of a warning can be a subclass of GeneralWarning, and should implemented the following methods: 1. line-no() -- returns an integer 2. columns() -- returns either a pair of integers, or None 3. message() -- returns a string containing a message 4. __str__() -- comes for free if inheriting GeneralWarning, and formats the warning message. > I've included a couple of quick examples I coded up last night based > on the compiler package (recently re-factored) that is resident in > python/nondist/src/Compiler. The analysis on the one that checks for > name errors is a bit of a mess, but the overall structure seems right. One thing I had trouble with is that in my implementation of selfnanny, I used Python's stack for recursion while you used an explicit stack. It's probably because of the visitor pattern, which is just another argument for co-routines and generators. > I'm hoping to collect a few more examples of checkers and generalize > from them to develop a framework for checking for errors and reporting > them. Cool! Brainstorming: what kind of warnings would people find useful? In selfnanny, I wanted to include checking for assigment to self, and checking for "possible use before definition of local variables" sounds good. Another check could be a CP4E "checking that no two identifiers differ only by case". I might code up a few if I have the time... What I'd really want (but it sounds really hard) is a framework for partial ASTs: warning people as they write code. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From mwh21 at cam.ac.uk Tue Mar 7 09:31:23 2000 From: mwh21 at cam.ac.uk (Michael Hudson) Date: 07 Mar 2000 08:31:23 +0000 Subject: [Python-Dev] Re: [Compiler-sig] Re: example checkers based on compiler package In-Reply-To: Moshe Zadka's message of "Tue, 7 Mar 2000 07:25:43 +0200 (IST)" References: Message-ID: Moshe Zadka writes: > On Mon, 6 Mar 2000, Jeremy Hylton wrote: > > > I think these kinds of warnings are useful, and I'd like to see a more > > general framework for them built are Python abstract syntax originally > > from P2C. Ideally, they would be available as command line tools and > > integrated into GUIs like IDLE in some useful way. > > Yes! Guido already suggested we have a standard API to them. One thing > I suggested was that the abstract API include not only the input (one form > or another of an AST), but the output: so IDE's wouldn't have to parse > strings, but get a warning class. That would be seriously cool. > Something like a: > > An output of a warning can be a subclass of GeneralWarning, and should > implemented the following methods: > > 1. line-no() -- returns an integer > 2. columns() -- returns either a pair of integers, or None > 3. message() -- returns a string containing a message > 4. __str__() -- comes for free if inheriting GeneralWarning, > and formats the warning message. Wouldn't it make sense to include function/class name here too? A checker is likely to now, and it would save reparsing to find it out. [little snip] > > I'm hoping to collect a few more examples of checkers and generalize > > from them to develop a framework for checking for errors and reporting > > them. > > Cool! > Brainstorming: what kind of warnings would people find useful? In > selfnanny, I wanted to include checking for assigment to self, and > checking for "possible use before definition of local variables" sounds > good. Another check could be a CP4E "checking that no two identifiers > differ only by case". I might code up a few if I have the time... Is there stuff in the current Compiler code to do control flow analysis? You'd need that to check for use before definition in meaningful cases, and also if you ever want to do any optimisation... > What I'd really want (but it sounds really hard) is a framework for > partial ASTs: warning people as they write code. I agree (on both points). Cheers, M. -- very few people approach me in real life and insist on proving they are drooling idiots. -- Erik Naggum, comp.lang.lisp From mal at lemburg.com Tue Mar 7 10:14:25 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 07 Mar 2000 10:14:25 +0100 Subject: [Python-Dev] Unicode character property methods References: <38C39D29.A29CE67F@lemburg.com> <200003061329.IAA09529@eric.cnri.reston.va.us> <38C42B5E.42801755@lemburg.com> <200003062312.SAA11697@eric.cnri.reston.va.us> Message-ID: <38C4C871.F47E17A3@lemburg.com> Guido van Rossum wrote: > [MAL about adding .isdecimal(), .isdigit() and .isnumeric()] > > Some more examples from the unicodedata module (which makes > > all fields of the database available in Python): > > > > >>> unicodedata.decimal(u"3") > > 3 > > >>> unicodedata.decimal(u"?") > > 2 > > >>> unicodedata.digit(u"?") > > 2 > > >>> unicodedata.numeric(u"?") > > 2.0 > > >>> unicodedata.numeric(u"\u2155") > > 0.2 > > >>> unicodedata.numeric(u'\u215b') > > 0.125 > > Hm, very Unicode centric. Probably best left out of the general > string methods. Isspace() seems useful, and an isdigit() that is only > true for ASCII '0' - '9' also makes sense. Well, how about having all three on Unicode objects and only .isdigit() on string objects ? > What about "123".isdigit()? What does Java say? Or do these only > apply to single chars there? I think "123".isdigit() should be true > if "abc".islower() is true. In the current uPython implementation u"123".isdigit() is true; same for the other two methods. > > > > Similar APIs are already available through the unicodedata > > > > module, but could easily be moved to the Unicode object > > > > (they cause the builtin interpreter to grow a bit in size > > > > due to the new mapping tables). > > > > > > > > BTW, string.atoi et al. are currently not mapped to > > > > string methods... should they be ? > > > > > > They are mapped to int() c.s. > > > > Hmm, I just noticed that int() et friends don't like > > Unicode... shouldn't they use the "t" parser marker > > instead of requiring a string or tp_int compatible > > type ? > > Good catch. Go ahead. Done. float(), int() and long() now accept charbuf compatible objects as argument. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Tue Mar 7 10:23:35 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 07 Mar 2000 10:23:35 +0100 Subject: [Python-Dev] Adding Unicode methods to string objects Message-ID: <38C4CA97.5D0AA9D@lemburg.com> Before starting to code away, I would like to know which of the new Unicode methods should also be available on string objects. Here are the currently available methods: Unicode objects string objects ------------------------------------ capitalize capitalize center count count encode endswith endswith expandtabs find find index index isdecimal isdigit islower isnumeric isspace istitle isupper join join ljust lower lower lstrip lstrip replace replace rfind rfind rindex rindex rjust rstrip rstrip split split splitlines startswith startswith strip strip swapcase swapcase title title translate translate (*) upper upper zfill (*) The two hvae slightly different implementations, e.g. deletions are handled differently. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik at pythonware.com Tue Mar 7 12:54:56 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 7 Mar 2000 12:54:56 +0100 Subject: [Python-Dev] Adding Unicode methods to string objects References: <38C4CA97.5D0AA9D@lemburg.com> Message-ID: <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com> > Unicode objects string objects > expandtabs yes. I'm pretty sure there's "expandtabs" code in the strop module. maybe barry missed it? > center > ljust > rjust probably. the implementation is trivial, and ljust/rjust are somewhat useful, so you might as well add them all (just cut and paste from the unicode class). what about rguido and lguido, btw? > zfill no. From guido at python.org Tue Mar 7 14:52:00 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 07 Mar 2000 08:52:00 -0500 Subject: [Python-Dev] finalization again Message-ID: <200003071352.IAA13571@eric.cnri.reston.va.us> Warning: long message. If you're not interested in reading all this, please skip to "Conclusion" at the end. At Tim's recommendation I had a look at what section 12.6 of the Java language spec says about finalizers. The stuff there is sure seductive for language designers... Have a look at te diagram at http://java.sun.com/docs/books/jls/html/12.doc.html#48746. In all its (seeming) complexity, it helped me understand some of the issues of finalization better. Rather than the complex 8-state state machine that it appears to be, think of it as a simple 3x3 table. The three rows represent the categories reachable, finalizer-reachable (abbreviated in the diagram as f-reachable), and unreachable. These categories correspond directly to categories of objects that the Schemenauer-Tiedemann cycle-reclamation scheme deals with: after moving all the reachable objects to the second list (first the roots and then the objects reachable from the roots), the first list is left with the unreachable and finalizer-reachable objects. If we want to distinguish between unreachable and finalizer-reachable at this point, a straightforward application of the same algorithm will work well: Create a third list (this will contain the finalizer-reachable objects). Start by filling it with all the objects from the first list (which contains the potential garbage at this point) that have a finalizer. We can look for objects that have __del__ or __clean__ or for which tp_clean(CARE_EXEC)==true, it doesn't matter here.(*) Then walk through the third list, following each object's references, and move all referenced objects that are still in the first list to the third list. Now, we have: List 1: truly unreachable objects. These have no finalizers and can be discarded right away. List 2: truly reachable objects. (Roots and objects reachable from roots.) Leave them alone. List 3: finalizer-reachable objects. This contains objects that are unreachable but have a finalizer, and objects that are only reachable through those. We now have to decide on a policy for invoking finalizers. Java suggests the following: Remember the "roots" of the third list -- the nodes that were moved there directly from the first list because they have a finalizer. These objects are marked *finalizable* (a category corresponding to the second *column* of the Java diagram). The Java spec allows the Java garbage collector to call all of these finalizers in any order -- even simultaneously in separate threads. Java never allows an object to go back from the finalizable to the unfinalized state (there are no arrows pointing left in the diagram). The first finalizer that is called could make its object reachable again (up arrow), thereby possibly making other finalizable objects reachable too. But this does not cancel their scheduled finalization! The conclusion is that Java can sometimes call finalization on unreachable objects -- but only if those objects have gone through a phase in their life where they were unreachable or at least finalizer-unreachable. I agree that this is the best that Java can do: if there are cycles containing multiple objects with finalizers, there is no way (short of asking the programmer(s)) to decide which object to finalize first. We could pick one at random, run its finalizer, and start garbage collection all over -- if the finalizer doesn't resurrect anything, this will give us the same set of unreachable objects, from which we could pick the next finalizable object, and so on. That looks very inefficient, might not terminate (the same object could repeatedly show up as the candidate for finalization), and it's still arbitrary: the programmer(s) still can't predict which finalizer in a cycle with multiple finalizers will be called first. Assuming the recommended characteristics of finalizers (brief and robust), it won't make much difference if we call all finalizers (of the now-finalizeable objects) "without looking back". Sure, some objects may find themselves in a perfectly reachable position with their finalizer called -- but they did go through a "near-death experience". I don't find this objectionable, and I don't see how Java could possibly do better for cycles with multiple finalizers. Now let's look again at the rule that an object's finalizer will be called at most once automatically by the garbage collector. The transitions between the colums of the Java diagram enforce this: the columns are labeled from left to right with unfinalized, finalizable, and finalized, and there are no transition arrows pointing left. (In my description above, already finalized objects are considered not to have a finalizer.) I think this rule makes a lot of sense given Java's multi-threaded garbage collection: the invocation of finalizers could run concurreltly with another garbage collection, and we don't want this to find some of the same finalizable objects and call their finalizers again! We could mark them with a "finalization in progress" flag only while their finalizer is running, but in a cycle with multiple finalizers it seems we should keep this flag set until *all* finalizers for objects in the cycle have run. But we don't actually know exactly what the cycles are: all we know is "these objects are involved in trash cycles". More detailed knowledge would require yet another sweep, plus a more hairy two-dimensional data structure (a list of separate cycles). And for what? as soon as we run finalizers from two separate cycles, those cycles could be merged again (e.g. the first finalizer could resurrect its cycle, and the second one could link to it). Now we have a pool of objects that are marked "finalization in progress" until all their finalizations terminate. For an incremental concurrent garbage collector, this seems a pain, since it may continue to find new finalizable objects and add them to the pile. Java takes the logical conclusion: the "finalization in progress" flag is never cleared -- and renamed to "finalized". Conclusion ---------- Are the Java rules complex? Yes. Are there better rules possible? I'm not so sure, given the requirement of allowing concurrent incremental garbage collection algorithms that haven't even been invented yet. (Plus the implied requirement that finalizers in trash cycles should be invoked.) Are the Java rules difficult for the user? Only for users who think they can trick finalizers into doing things for them that they were not designed to do. I would think the following guidelines should do nicely for the rest of us: 1. Avoid finalizers if you can; use them only to release *external* (e.g. OS) resources. 2. Write your finalizer as robust as you can, with as little use of other objects as you can. 3. Your only get one chance. Use it. Unlike Scheme guardians or the proposed __cleanup__ mechanism, you don't have to know whether your object is involved in a cycle -- your finalizer will still be called. I am reconsidering to use the __del__ method as the finalizer. As a compromise to those who want their __del__ to run whenever the reference count reaches zero, the finalized flag can be cleared explicitly. I am considering to use the following implementation: after retrieving the __del__ method, but before calling it, self.__del__ is set to None (better, self.__dict__['__del__'] = None, to avoid confusing __setattr__ hooks). The object call remove self.__del__ to clear the finalized flag. I think I'll use the same mechanism to prevent __del__ from being called upon a failed initialization. Final note: the semantics "__del__ is called whenever the reference count reaches zero" cannot be defended in the light of a migration to different forms of garbage collection (e.g. JPython). There may not be a reference count. --Guido van Rossum (home page: http://www.python.org/~guido/) ____ (*) Footnote: there's one complication: to ask a Python class instance if it has a finalizer, we have to use PyObject_Getattr(obj, ...). If the object's class has a __getattr__ hook, this can invoke arbitrary Python code -- even if the answer to the question is "no"! This can make the object reachable again (in the Java diagram, arrows pointing up or up and right). We could either use instance_getattr1(), which avoids the __getattr__ hook, or mark all class instances as finalizable until proven innocent. From gward at cnri.reston.va.us Tue Mar 7 15:04:30 2000 From: gward at cnri.reston.va.us (Greg Ward) Date: Tue, 7 Mar 2000 09:04:30 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: <200003051204.HAA05367@eric.cnri.reston.va.us>; from guido@python.org on Sun, Mar 05, 2000 at 07:04:56AM -0500 References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> <14529.55983.263225.691427@weyr.cnri.reston.va.us> <200003051204.HAA05367@eric.cnri.reston.va.us> Message-ID: <20000307090430.A16948@cnri.reston.va.us> On 05 March 2000, Guido van Rossum said: > - Variants on the syntax could be given through some kind of option > system rather than through subclassing -- they should be combinable > independently. Som possible options (maybe I'm going overboard here) > could be: > > - comment characters: ('#', ';', both, others?) > - comments after variables allowed? on sections? > - variable characters: (':', '=', both, others?) > - quoting of values with "..." allowed? > - backslashes in "..." allowed? > - does backslash-newline mean a continuation? > - case sensitivity for section names (default on) > - case sensitivity for option names (default off) > - variables allowed before first section name? > - first section name? (default "main") > - character set allowed in section names > - character set allowed in variable names > - %(...) substitution? I agree with Fred that this level of flexibility is probably overkill for a config file parser; you don't want every application author who uses the module to have to explain his particular variant of the syntax. However, if you're interested in a class that *does* provide some of the above flexibility, I have written such a beast. It's currently used to parse the Distutils MANIFEST.in file, and I've considered using it for the mythical Distutils config files. (And it also gets heavy use in my day job.) It's really a class for reading a file in preparation for "text processing the Unix way", though: it doesn't say anything about syntax, it just worries about blank lines, comments, continuations, and a few other things. Here's the class docstring: class TextFile: """Provides a file-like object that takes care of all the things you commonly want to do when processing a text file that has some line-by-line syntax: strip comments (as long as "#" is your comment character), skip blank lines, join adjacent lines by escaping the newline (ie. backslash at end of line), strip leading and/or trailing whitespace, and collapse internal whitespace. All of these are optional and independently controllable. Provides a 'warn()' method so you can generate warning messages that report physical line number, even if the logical line in question spans multiple physical lines. Also provides 'unreadline()' for implementing line-at-a-time lookahead. Constructor is called as: TextFile (filename=None, file=None, **options) It bombs (RuntimeError) if both 'filename' and 'file' are None; 'filename' should be a string, and 'file' a file object (or something that provides 'readline()' and 'close()' methods). It is recommended that you supply at least 'filename', so that TextFile can include it in warning messages. If 'file' is not supplied, TextFile creates its own using the 'open()' builtin. The options are all boolean, and affect the value returned by 'readline()': strip_comments [default: true] strip from "#" to end-of-line, as well as any whitespace leading up to the "#" -- unless it is escaped by a backslash lstrip_ws [default: false] strip leading whitespace from each line before returning it rstrip_ws [default: true] strip trailing whitespace (including line terminator!) from each line before returning it skip_blanks [default: true} skip lines that are empty *after* stripping comments and whitespace. (If both lstrip_ws and rstrip_ws are true, then some lines may consist of solely whitespace: these will *not* be skipped, even if 'skip_blanks' is true.) join_lines [default: false] if a backslash is the last non-newline character on a line after stripping comments and whitespace, join the following line to it to form one "logical line"; if N consecutive lines end with a backslash, then N+1 physical lines will be joined to form one logical line. collapse_ws [default: false] after stripping comments and whitespace and joining physical lines into logical lines, all internal whitespace (strings of whitespace surrounded by non-whitespace characters, and not at the beginning or end of the logical line) will be collapsed to a single space. Note that since 'rstrip_ws' can strip the trailing newline, the semantics of 'readline()' must differ from those of the builtin file object's 'readline()' method! In particular, 'readline()' returns None for end-of-file: an empty string might just be a blank line (or an all-whitespace line), if 'rstrip_ws' is true but 'skip_blanks' is not.""" Interested in having something like this in the core? Adding more options is possible, but the code is already on the hairy side to support all of these. And I'm not a big fan of the subtle difference in semantics with file objects, but honestly couldn't think of a better way at the time. If you're interested, you can download it from http://www.mems-exchange.org/exchange/software/python/text_file/ or just use the version in the Distutils CVS tree. Greg From mal at lemburg.com Tue Mar 7 15:38:09 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 07 Mar 2000 15:38:09 +0100 Subject: [Python-Dev] Adding Unicode methods to string objects References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com> Message-ID: <38C51451.D38B21FE@lemburg.com> Fredrik Lundh wrote: > > > Unicode objects string objects > > expandtabs > > yes. > > I'm pretty sure there's "expandtabs" code in the > strop module. maybe barry missed it? > > > center > > ljust > > rjust > > probably. > > the implementation is trivial, and ljust/rjust are > somewhat useful, so you might as well add them > all (just cut and paste from the unicode class). > > what about rguido and lguido, btw? Ooops, forgot those, thanks :-) > > zfill > > no. Why not ? Since the string implementation had all of the above marked as TBD, I added all four. What about the other new methods (.isXXX() and .splitlines()) ? .isXXX() are mostly needed due to the extended character properties in Unicode. They would be new to the string object world. .splitlines() is Unicode aware and also treats CR/LF combinations across platforms: S.splitlines([maxsplit]]) -> list of strings Return a list of the lines in S, breaking at line boundaries. If maxsplit is given, at most maxsplit are done. Line breaks are not included in the resulting list. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Tue Mar 7 16:38:18 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 07 Mar 2000 10:38:18 -0500 Subject: [Python-Dev] Adding Unicode methods to string objects In-Reply-To: Your message of "Tue, 07 Mar 2000 15:38:09 +0100." <38C51451.D38B21FE@lemburg.com> References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com> <38C51451.D38B21FE@lemburg.com> Message-ID: <200003071538.KAA13977@eric.cnri.reston.va.us> > > > zfill > > > > no. > > Why not ? Zfill is (or ought to be) deprecated. It stems from times before we had things like "%08d" % x and no longer serves a useful purpose. I doubt anyone would miss it. (Of course, now /F will claim that PIL will break in 27 places because of this. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one at email.msn.com Tue Mar 7 18:07:40 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 7 Mar 2000 12:07:40 -0500 Subject: [Python-Dev] finalization again In-Reply-To: <200003071352.IAA13571@eric.cnri.reston.va.us> Message-ID: <000701bf8857$a56ed660$a72d153f@tim> [Guido] > ... > Conclusion > ---------- > > Are the Java rules complex? Yes. Are there better rules possible? I'm > not so sure, given the requirement of allowing concurrent incremental > garbage collection algorithms that haven't even been invented > yet. Guy Steele worked his ass off on Java's rules. He had as much real-world experience with implementing GC as anyone, via his long & deep Lisp implementation background (both SW & HW), and indeed invented several key techniques in high-performance GC. But he had no background in GC with user-defined finalizers -- and it shows! > (Plus the implied requirement that finalizers in trash cycles > should be invoked.) Are the Java rules difficult for the user? Only > for users who think they can trick finalizers into doing things for > them that they were not designed to do. This is so implementation-centric it's hard to know what to say <0.5 wink>. The Java rules weren't designed to do much of anything except guarantee that Java (1) would eventually reclaim all unreachable objects, and (2) wouldn't expose dangling pointers to user finalizers, or chase any itself. Whatever *useful* finalizer semantics may remain are those that just happened to survive. > ... > Unlike Scheme guardians or the proposed __cleanup__ mechanism, you > don't have to know whether your object is involved in a cycle -- your > finalizer will still be called. This is like saying a user doesn't have to know whether the new drug prescribed for them by their doctor has potentially fatal side effects -- they'll be forced to take it regardless . > ... > Final note: the semantics "__del__ is called whenever the reference > count reaches zero" cannot be defended in the light of a migration to > different forms of garbage collection (e.g. JPython). There may not > be a reference count. 1. I don't know why JPython doesn't execute __del__ methods at all now, but have to suspect that the Java rules imply an implementation so grossly inefficient in the presence of __del__ that Barry simply doesn't want to endure the speed complaints. The Java spec itself urges implementations to special-case the snot out of classes that don't override the default do-nothing finalizer, for "go fast" reasons too. 2. The "refcount reaches 0" rule in CPython is merely a wonderfully concrete way to get across the idea of "destruction occurs in an order consistent with a topological sort of the points-to graph". The latter is explicit in the BDW collector, which has no refcounts; the topsort concept is applicable and thoroughly natural in all languages; refcounts in CPython give an exploitable hint about *when* collection will occur, but add no purely semantic constraint beyond the topsort requirement (they neatly *imply* the topsort requirement). There is no topsort in the presence of cycles, so cycles create problems in all languages. The same "throw 'em back at the user" approach makes just as much sense from the topsort view as the RC view; it doesn't rely on RC at all. stop-the-insanity-ly y'rs - tim From guido at python.org Tue Mar 7 18:33:31 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 07 Mar 2000 12:33:31 -0500 Subject: [Python-Dev] finalization again In-Reply-To: Your message of "Tue, 07 Mar 2000 12:07:40 EST." <000701bf8857$a56ed660$a72d153f@tim> References: <000701bf8857$a56ed660$a72d153f@tim> Message-ID: <200003071733.MAA14926@eric.cnri.reston.va.us> [Tim tells Guido again that he finds the Java rules bad, slinging some mud at Guy Steel, but without explaining what the problem with them is, and then asks:] > 1. I don't know why JPython doesn't execute __del__ methods at all now, but > have to suspect that the Java rules imply an implementation so grossly > inefficient in the presence of __del__ that Barry simply doesn't want to > endure the speed complaints. The Java spec itself urges implementations to > special-case the snot out of classes that don't override the default > do-nothing finalizer, for "go fast" reasons too. Something like that, yes, although it was Jim Hugunin. I have a feeling it has to do with the dynamic of __del__ -- this would imply that *all* Python class instances would appear to Java to have a finalizer -- just in most cases it would do a failing lookup of __del__ and bail out quickly. Maybe some source code or class analysis looking for a __del__ could fix this, at the cost of not allowing one to patch __del__ into an existing class after instances have already been created. I don't find that breach of dynamicism a big deal -- e.g. CPython keeps copies of __getattr__, __setattr__ and __delattr__ in the class for similar reasons. > 2. The "refcount reaches 0" rule in CPython is merely a wonderfully concrete > way to get across the idea of "destruction occurs in an order consistent > with a topological sort of the points-to graph". The latter is explicit in > the BDW collector, which has no refcounts; the topsort concept is applicable > and thoroughly natural in all languages; refcounts in CPython give an > exploitable hint about *when* collection will occur, but add no purely > semantic constraint beyond the topsort requirement (they neatly *imply* the > topsort requirement). There is no topsort in the presence of cycles, so > cycles create problems in all languages. The same "throw 'em back at the > user" approach makes just as much sense from the topsort view as the RC > view; it doesn't rely on RC at all. Indeed. I propose to throw it back at the user by calling __del__. The typical user defines __del__ because they want to close a file, say goodbye nicely on a socket connection, or delete a temp file. That sort of thing. This is what finalizers are *for*. As an author of this kind of finalizer, I don't see why I need to know whether I'm involved in a cycle or not. I want my finalizer called when my object goes away, and I don't want my object kept alive by unreachable cycles. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Tue Mar 7 18:39:15 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 07 Mar 2000 18:39:15 +0100 Subject: [Python-Dev] Adding Unicode methods to string objects References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com> <38C51451.D38B21FE@lemburg.com> Message-ID: <38C53EC3.5292ECF@lemburg.com> I've ported most of the Unicode methods to strings now. Here's the new table: Unicode objects string objects ------------------------------------------------------------ capitalize capitalize center center count count encode endswith endswith expandtabs expandtabs find find index index isdecimal isdigit isdigit islower islower isnumeric isspace isspace istitle istitle isupper isupper join join ljust ljust lower lower lstrip lstrip replace replace rfind rfind rindex rindex rjust rjust rstrip rstrip split split splitlines splitlines startswith startswith strip strip swapcase swapcase title title translate translate upper upper zfill zfill I don't think that .isdecimal() and .isnumeric() are needed for strings since most of the added mappings refer to Unicode char points. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Tue Mar 7 18:42:53 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 07 Mar 2000 18:42:53 +0100 Subject: [Python-Dev] Adding Unicode methods to string objects References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com> <38C51451.D38B21FE@lemburg.com> <200003071538.KAA13977@eric.cnri.reston.va.us> Message-ID: <38C53F9D.44C3A0F3@lemburg.com> Guido van Rossum wrote: > > > > > zfill > > > > > > no. > > > > Why not ? > > Zfill is (or ought to be) deprecated. It stems from times before we > had things like "%08d" % x and no longer serves a useful purpose. > I doubt anyone would miss it. > > (Of course, now /F will claim that PIL will break in 27 places because > of this. :-) Ok, I'll remove it from both implementations again... (there was some email overlap). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From bwarsaw at cnri.reston.va.us Tue Mar 7 20:24:39 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 7 Mar 2000 14:24:39 -0500 (EST) Subject: [Python-Dev] finalization again References: <200003071352.IAA13571@eric.cnri.reston.va.us> <000701bf8857$a56ed660$a72d153f@tim> Message-ID: <14533.22391.447739.901802@anthem.cnri.reston.va.us> >>>>> "TP" == Tim Peters writes: TP> 1. I don't know why JPython doesn't execute __del__ methods at TP> all now, but have to suspect that the Java rules imply an TP> implementation so grossly inefficient in the presence of TP> __del__ that Barry simply doesn't want to endure the speed TP> complaints. Actually, it was JimH that discovered this performance gotcha. The problem is that if you want to support __del__, you've got to take the finalize() hit for every instance (i.e. PyInstance object) and it's just not worth it. I just realized that it would be relatively trivial to add a subclass of PyInstance differing only in that it has a finalize() method which would invoke __del__(). Now when the class gets defined, the __del__() would be mined and cached and we'd look at that cache when creating an instance. If there's a function there, we create a PyFinalizableInstance, otherwise we create a PyInstance. The cache means you couldn't dynamically add a __del__ later, but I don't think that's a big deal. It wouldn't be hard to look up the __del__ every time, but that'd be a hit for every instance creation (as opposed to class creation), so again, it's probably not worth it. I just did a quick and dirty hack and it seems at first blush to work. I'm sure there's something I'm missing :). For those of you who don't care about JPython, you can skip the rest. Okay, first the Python script to exercise this, then the PyFinalizableInstance.java file, and then the diffs to PyClass.java. JPython-devers, is it worth adding this? -------------------- snip snip --------------------del.py class B: def __del__(self): print 'In my __del__' b = B() del b from java.lang import System System.gc() -------------------- snip snip --------------------PyFinalizableInstance.java // Copyright ? Corporation for National Research Initiatives // These are just like normal instances, except that their classes included // a definition for __del__(), i.e. Python's finalizer. These two instance // types have to be separated due to Java performance issues. package org.python.core; public class PyFinalizableInstance extends PyInstance { public PyFinalizableInstance(PyClass iclass) { super(iclass); } // __del__ method is invoked upon object finalization. protected void finalize() { __class__.__del__.__call__(this); } } -------------------- snip snip -------------------- Index: PyClass.java =================================================================== RCS file: /projects/cvsroot/jpython/dist/org/python/core/PyClass.java,v retrieving revision 2.8 diff -c -r2.8 PyClass.java *** PyClass.java 1999/10/04 20:44:28 2.8 --- PyClass.java 2000/03/07 19:02:29 *************** *** 21,27 **** // Store these methods for performance optimization // These are only used by PyInstance ! PyObject __getattr__, __setattr__, __delattr__, __tojava__; // Holds the classes for which this is a proxy // Only used when subclassing from a Java class --- 21,27 ---- // Store these methods for performance optimization // These are only used by PyInstance ! PyObject __getattr__, __setattr__, __delattr__, __tojava__, __del__; // Holds the classes for which this is a proxy // Only used when subclassing from a Java class *************** *** 111,116 **** --- 111,117 ---- __setattr__ = lookup("__setattr__", false); __delattr__ = lookup("__delattr__", false); __tojava__ = lookup("__tojava__", false); + __del__ = lookup("__del__", false); } protected void findModule(PyObject dict) { *************** *** 182,188 **** } public PyObject __call__(PyObject[] args, String[] keywords) { ! PyInstance inst = new PyInstance(this); inst.__init__(args, keywords); return inst; } --- 183,194 ---- } public PyObject __call__(PyObject[] args, String[] keywords) { ! PyInstance inst; ! if (__del__ == null) ! inst = new PyInstance(this); ! else ! // the class defined an __del__ method ! inst = new PyFinalizableInstance(this); inst.__init__(args, keywords); return inst; } From bwarsaw at cnri.reston.va.us Tue Mar 7 20:35:44 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 7 Mar 2000 14:35:44 -0500 (EST) Subject: [Python-Dev] finalization again References: <000701bf8857$a56ed660$a72d153f@tim> <200003071733.MAA14926@eric.cnri.reston.va.us> Message-ID: <14533.23056.517661.633574@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> Maybe some source code or class analysis looking for a GvR> __del__ could fix this, at the cost of not allowing one to GvR> patch __del__ into an existing class after instances have GvR> already been created. I don't find that breach of dynamicism GvR> a big deal -- e.g. CPython keeps copies of __getattr__, GvR> __setattr__ and __delattr__ in the class for similar reasons. For those of you who enter the "Being Guido van Rossum" door like I just did, please keep in mind that it dumps you out not on the NJ Turnpike, but in the little ditch back behind CNRI. Stop by and say hi after you brush yourself off. -Barry From Tim_Peters at Dragonsys.com Tue Mar 7 23:30:16 2000 From: Tim_Peters at Dragonsys.com (Tim_Peters at Dragonsys.com) Date: Tue, 7 Mar 2000 17:30:16 -0500 Subject: [Python-Dev] finalization again Message-ID: <8525689B.007AB2BA.00@notes-mta.dragonsys.com> [Guido] > Tim tells Guido again that he finds the Java rules bad, slinging some > mud at Guy Steele, but without explaining what the problem with them > is ... Slinging mud? Let's back off here. You've read the Java spec and were impressed. That's fine -- it is impressive . But go on from there and see where it leads in practice. That Java's GC model did a masterful job but includes a finalization model users dislike is really just conventional wisdom in the Java world. My sketch of Guy Steele's involvement was an attempt to explain why both halves of that are valid. I didn't think "explaining the problem" was necessary, as it's been covered in depth multiple times in c.l.py threads, by Java programmers as well as by me. Searching the web for articles about this turns up many; the first one I hit is typical: http://www.quoininc.com/quoininc/Design_Java0197.html eventually concludes Consequently we recommend that [Java] programmers support but do not rely on finalization. That is, place all finalization semantics in finalize() methods, but call those methods explicitly and in the order required. The points below provide more detail. That's par for the Java course: advice to write finalizers to survive being called multiple times, call them explicitly, and do all you can to ensure that the "by magic" call is a nop. The lack of ordering rules in the language forces people to "do it by hand" (as the Java spec acknowledges: "It is straightforward to implement a Java class that will cause a set of finalizer-like methods to be invoked in a specified order for a set of objects when all the objects become unreachable. Defining such a class is left as an exercise for the reader." But from what I've seen, that exercise is beyond the imagination of most Java programmers! The perceived need for ordering is not.). It's fine that you want to restrict finalizers to "simple" cases; it's not so fine if the language can't ensure that simple cases are the only ones the user can write, & can neither detect & complain at runtime about cases it didn't intend to support. The Java spec is unhelpful here too: Therefore, we recommend that the design of finalize methods be kept simple and that they be programmed defensively, so that they will work in all cases. Mom and apple pie, but what does it mean, exactly? The spec realizes that you're going to be tempted to try things that won't work, but can't really explain what those are in terms simpler than the full set of implementation consequences. As a result, users hate it -- but don't take my word for that! If you look & don't find that Java's finalization rules are widely viewed as "a problem to be wormed around" by serious Java programmers, fine -- then you've got a much better search engine than mine . As for why I claim following topsort rules is very likely to work out better, they follow from the nature of the problem, and can be explained as such, independent of implementation details. See the Boehm reference for more about topsort. will-personally-use-python-regardless-ly y'rs - tim From guido at python.org Wed Mar 8 01:50:38 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 07 Mar 2000 19:50:38 -0500 Subject: [Python-Dev] finalization again In-Reply-To: Your message of "Tue, 07 Mar 2000 17:30:16 EST." <8525689B.007AB2BA.00@notes-mta.dragonsys.com> References: <8525689B.007AB2BA.00@notes-mta.dragonsys.com> Message-ID: <200003080050.TAA19264@eric.cnri.reston.va.us> > [Guido] > > Tim tells Guido again that he finds the Java rules bad, slinging some > > mud at Guy Steele, but without explaining what the problem with them > > is ... > > Slinging mud? Let's back off here. You've read the Java spec and were > impressed. That's fine -- it is impressive . But go on from > there and see where it leads in practice. That Java's GC model did a > masterful job but includes a finalization model users dislike is really > just conventional wisdom in the Java world. My sketch of Guy Steele's > involvement was an attempt to explain why both halves of that are valid. Granted. I can read Java code and sometimes I write some, but I'm not a Java programmer by any measure, and I wasn't aware that finalize() has a general bad rep. > I didn't think "explaining the problem" was necessary, as it's been > covered in depth multiple times in c.l.py threads, by Java programmers > as well as by me. Searching the web for articles about this turns up > many; the first one I hit is typical: > > http://www.quoininc.com/quoininc/Design_Java0197.html > > eventually concludes > > Consequently we recommend that [Java] programmers support but do > not rely on finalization. That is, place all finalization semantics > in finalize() methods, but call those methods explicitly and in the > order required. The points below provide more detail. > > That's par for the Java course: advice to write finalizers to survive > being called multiple times, call them explicitly, and do all you can > to ensure that the "by magic" call is a nop. It seems the authors make one big mistake: they recommend to call finalize() explicitly. This may be par for the Java course: the quality of the materials is often poor, and that has to be taken into account when certain features have gotten a bad rep. (These authors also go on at length about the problems of GC in a real-time situation -- attempts to use Java in sutations for which it is inappropriate are also par for the cours, inspired by all the hype.) Note that e.g. Bruce Eckel in "Thinking in Java" makes it clear that you should never call finalize() explicitly (except that you should always call super.fuinalize() in your finalize() method). (Bruce goes on at length explaining that there aren't a lot of things you should use finalize() for -- except to observe the garbage collector. :-) > The lack of ordering > rules in the language forces people to "do it by hand" (as the Java > spec acknowledges: "It is straightforward to implement a Java class > that will cause a set of finalizer-like methods to be invoked in a > specified order for a set of objects when all the objects become > unreachable. Defining such a class is left as an exercise for the > reader." But from what I've seen, that exercise is beyond the > imagination of most Java programmers! The perceived need for ordering > is not.). True, but note that Python won't have the ordering problem, at least not as long as we stick to reference counting as the primary means of GC. The ordering problem in Python will only happen when there are cycles, and there you really can't blame the poor GC design! > It's fine that you want to restrict finalizers to "simple" cases; it's > not so fine if the language can't ensure that simple cases are the only > ones the user can write, & can neither detect & complain at runtime > about cases it didn't intend to support. The Java spec is unhelpful > here too: > > Therefore, we recommend that the design of finalize methods be kept > simple and that they be programmed defensively, so that they will > work in all cases. > > Mom and apple pie, but what does it mean, exactly? The spec realizes > that you're going to be tempted to try things that won't work, but > can't really explain what those are in terms simpler than the full set > of implementation consequences. As a result, users hate it -- but > don't take my word for that! If you look & don't find that Java's > finalization rules are widely viewed as "a problem to be wormed around" > by serious Java programmers, fine -- then you've got a much better > search engine than mine . Hm. Of course programmers hate finalizers. They hate GC as well. But they hate even more not to have it (witness the relentless complaints about Python's "lack of GC" -- and Java's GC is often touted as one of the reasons for its superiority over C++). I think this stuff is just hard! (Otherwise why would we be here having this argument?) > As for why I claim following topsort rules is very likely to work out > better, they follow from the nature of the problem, and can be > explained as such, independent of implementation details. See the > Boehm reference for more about topsort. Maybe we have a disconnect? We *are* using topsort -- for non-cyclical data structures. Reference counting ensure that. Nothing in my design changes that. The issue at hand is what to do with *cyclical* data structures, where topsort doesn't help. Boehm, on http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html, says: "Cycles involving one or more finalizable objects are never finalized." The question remains, what to do with trash cycles? I find having a separate __cleanup__ protocol cumbersome. I think that the "finalizer only called once by magic" rule is reasonable. I believe that the ordering problems will be much less than in Java, because we use topsort whenever we can. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one at email.msn.com Wed Mar 8 07:25:56 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 8 Mar 2000 01:25:56 -0500 Subject: [Python-Dev] finalization again In-Reply-To: <200003080050.TAA19264@eric.cnri.reston.va.us> Message-ID: <001401bf88c7$29f2a320$452d153f@tim> [Guido] > Granted. I can read Java code and sometimes I write some, but I'm not > a Java programmer by any measure, and I wasn't aware that finalize() > has a general bad rep. It does, albeit often for bad reasons. 1. C++ programmers seeking to emulate techniques based on C++'s rigid specification of the order and timing of destruction of autos. 2. People pushing the limits (as in the URL I happened to post). 3. People trying to do anything . Java's finalization semantics are very weak, and s-l-o-w too (under most current implementations). Now I haven't used Java for real in about two years, and avoided finalizers completely when I did use it. I can't recall any essential use of __del__ I make in Python code, either. So what Python does here makes no personal difference to me. However, I frequently respond to complaints & questions on c.l.py, and don't want to get stuck trying to justify Java's uniquely baroque rules outside of comp.lang.java <0.9 wink>. >> [Tim, passes on the first relevant URL he finds: >> http://www.quoininc.com/quoininc/Design_Java0197.html] > It seems the authors make one big mistake: they recommend to call > finalize() explicitly. This may be par for the Java course: the > quality of the materials is often poor, and that has to be taken into > account when certain features have gotten a bad rep. Well, in the "The Java Programming Language", Gosling recommends to: a) Add a method called close(), that tolerates being called multiple times. b) Write a finalize() method whose body calls close(). People tended to do that at first, but used a bunch of names other than "close" too. I guess people eventually got weary of having two methods that did the same thing, so decided to just use the single name Java guaranteed would make sense. > (These authors also go on at length about the problems of GC in a real- > time situation -- attempts to use Java in sutations for which it is > inappropriate are also par for the course, inspired by all the hype.) I could have picked any number of other URLs, but don't regret picking this one: you can't judge a ship in smooth waters, and people will push *all* features beyond their original intents. Doing so exposes weaknesses. Besides, Sun won't come out & say Java is unsuitable for real-time, no matter how obvious it is . > Note that e.g. Bruce Eckel in "Thinking in Java" makes it clear that > you should never call finalize() explicitly (except that you should > always call super.fuinalize() in your finalize() method). You'll find lots of conflicting advice here, be it about Java or C++. Java may be unique, though, in the universality of the conclusion Bruce draws here: > (Bruce goes on at length explaining that there aren't a lot of things > you should use finalize() for -- except to observe the garbage collector. :-) Frankly, I think Java would be better off without finalizers. Python could do fine without __del__ too -- if you and I were the only users <0.6 wink>. [on Java's lack of ordering promises] > True, but note that Python won't have the ordering problem, at least > not as long as we stick to reference counting as the primary means of > GC. The ordering problem in Python will only happen when there are > cycles, and there you really can't blame the poor GC design! I cannot. Nor do I intend to. The cyclic ordering problem isn't GC's fault, it's the program's; but GC's *response* to it is entirely GC's responsibility. >> ... The Java spec is unhelpful here too: >> >> Therefore, we recommend that the design of finalize methods be kept >> simple and that they be programmed defensively, so that they will >> work in all cases. >> >> Mom and apple pie, but what does it mean, exactly? The spec realizes >> that you're going to be tempted to try things that won't work, but >> can't really explain what those are in terms simpler than the full set >> of implementation consequences. As a result, users hate it -- but >> don't take my word for that! If you look & don't find that Java's >> finalization rules are widely viewed as "a problem to be wormed around" >> by serious Java programmers, fine -- then you've got a much better >> search engine than mine . > Hm. Of course programmers hate finalizers. Oh no! C++ programmers *love* destructors! I mean it, they're absolutely gaga over them. I haven't detected signs that CPython programmers hate __del__ either, except at shutdown time. Regardless of language, they love them when they're predictable and work as expected, they hate them when they're unpredictable and confusing. C++ auto destructors are extremely predictable (e.g., after "{SomeClass a, b; ...}", b is destructed before a, and both destructions are guaranteed before leaving the block they're declared in, regardless of whether via return, exception, goto or falling off the end). CPython's __del__ is largely predictable (modulo shutdown, cycles, and sometimes exceptions). The unhappiness in the Java world comes from Java finalizers' unpredictability and consequent all-around uselessness in messy real life. > They hate GC as well. Yes, when it's unpredictable and confusing . > But they hate even more not to have it (witness the relentless > complaints about Python's "lack of GC" -- and Java's GC is often > touted as one of the reasons for its superiority over C++). Back when JimF & I were looking at gc, we may have talked each other into really believing that paying careful attention to RC issues leads to cleaner and more robust designs. In fact, I still believe that, and have never clamored for "real gc" in Python. Jim now may even be opposed to "real gc". But Jim and I and you all think a lot about the art of programming, and most users just don't have time or inclination for that -- the slowly changing nature of c.l.py is also clear evidence of this. I'm afraid this makes growing "real GC" a genuine necessity for Python's continued growth. It's not a *bad* thing in any case. Think of it as a marketing requirement <0.7 wink>. > I think this stuff is just hard! (Otherwise why would we be here > having this argument?) Honest to Guido, I think it's because you're sorely tempted to go down an un-Pythonic path here, and I'm fighting that. I said early on there are no thoroughly good answers (yes, it's hard), but that's nothing new for Python! We're having this argument solely because you're confusing Python with some other language . [a 2nd or 3rd plug for taking topsort seriously] > Maybe we have a disconnect? Not in the technical analysis, but in what conclusions to take from it. > We *are* using topsort -- for non-cyclical data structures. Reference > counting ensure that. Nothing in my design changes that. And it's great! Everyone understands the RC rules pretty quickly, lots of people like them a whole lot, and if it weren't for cyclic trash everything would be peachy. > The issue at hand is what to do with *cyclical* data structures, where > topsort doesn't help. Boehm, on > http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html, > says: "Cycles involving one or more finalizable objects are never > finalized." This is like some weird echo chamber, where the third time I shout something the first one comes back without any distortion at all . Yes, Boehm's first rule is "Do No Harm". It's a great rule. Python follows the same rule all over the place; e.g., when you see x = "4" + 2 you can't possibly know what was intended, so you refuse to guess: you would rather *kill* the program than make a blind guess! I see cycles with finalizers as much the same: it's plain wrong to guess when you can't possibly know what was intended. Because topsort is the only principled way to decide order of finalization, and they've *created* a situation where a topsort doesn't exist, what they're handing you is no less amibiguous than in trying to add a string to an int. This isn't the time to abandon topsort as inconvenient, it's the time to defend it as inviolate principle! The only throughly rational response is "you know, this doesn't make sense -- since I can't know what you want here, I refuse to pretend that I can". Since that's "the right" response everywhere else in Python, what the heck is so special about this case? It's like you decided Python *had* to allow adding strings to ints, and now we're going to argue about whether Perl, Awk or Tcl makes the best unprincipled guess . > The question remains, what to do with trash cycles? A trash cycle without a finalizer isn't a problem, right? In that case, topsort rules have no visible consquence so it doesn't matter in what order you merely reclaim the memory. If it has an object with a finalizer, though, at the very worst you can let it leak, and make the collection of leaked objects available for inspection. Even that much is a *huge* "improvement" over what they have today: most cycles won't have a finalizer and so will get reclaimed, and for the rest they'll finally have a simple way to identify exactly where the problem is, and a simple criterion for predicting when it will happen. If that's not "good enough", then without abandoning principle the user needs to have some way to reduce such a cycle *to* a topsort case themself. > I find having a separate __cleanup__ protocol cumbersome. Same here, but if you're not comfortable leaking, and you agree Python is not in the business of guesing in inherently ambiguous situations, maybe that's what it takes! MAL and GregS both gravitated to this kind of thing at once, and that's at least suggestive; and MAL has actually been using his approach. It's explicit, and that's Pythonic on the face of it. > I think that the "finalizer only called once by magic" rule is reasonable. If it weren't for its specific use in emulating Java's scheme, would you still be in favor of that? It's a little suspicious that it never came up before . > I believe that the ordering problems will be much less than in Java, because > we use topsort whenever we can. No argument here, except that I believe there's never sufficient reason to abandon topsort ordering. Note that BDW's adamant refusal to yield on this hasn't stopped "why doesn't Python use BDW?" from becoming a FAQ . a-case-where-i-expect-adhering-to-principle-is-more-pragmatic- in-the-end-ly y'rs - tim From tim_one at email.msn.com Wed Mar 8 08:48:24 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 8 Mar 2000 02:48:24 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? Message-ID: <001801bf88d2$af0037c0$452d153f@tim> Mike has a darned good point here. Anyone have a darned good answer ? -----Original Message----- From: python-list-admin at python.org [mailto:python-list-admin at python.org] On Behalf Of Mike Fletcher Sent: Tuesday, March 07, 2000 2:08 PM To: Python Listserv (E-mail) Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage all over the hard-disk, traffic rerouted through the bit-bucket, you aren't getting to work anytime soon Mrs. Programmer) and wondering why we have a FAQ instead of having the win32pipe stuff rolled into the os module to fix it. Is there some incompatibility? Is there a licensing problem? Ideas? Mike __________________________________ Mike C. Fletcher Designer, VR Plumber http://members.home.com/mcfletch -- http://www.python.org/mailman/listinfo/python-list From mal at lemburg.com Wed Mar 8 09:36:57 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 08 Mar 2000 09:36:57 +0100 Subject: [Python-Dev] finalization again References: <8525689B.007AB2BA.00@notes-mta.dragonsys.com> <200003080050.TAA19264@eric.cnri.reston.va.us> Message-ID: <38C61129.2F8C9E95@lemburg.com> > [Guido] > The question remains, what to do with trash cycles? I find having a > separate __cleanup__ protocol cumbersome. I think that the "finalizer > only called once by magic" rule is reasonable. I believe that the > ordering problems will be much less than in Java, because we use > topsort whenever we can. Note that the __cleanup__ protocol is intended to break cycles *before* calling the garbage collector. After those cycles are broken, ordering is not a problem anymore and because __cleanup__ can do its task on a per-object basis all magic is left in the hands of the programmer. The __cleanup__ protocol as I use it is designed to be called in situations where the system knows that all references into a cycle are about to be dropped (I typically use small cyclish object systems in my application, e.g. ones that create and reference namespaces which include a reference to the hosting object itself). In my application that is done by using mxProxies at places where I know these cyclic object subsystems are being referenced. In Python the same could be done whenever the interpreter knows that a certain object is about to be deleted, e.g. during shutdown (important for embedding Python in other applications such as Apache) or some other major subsystem finalization, e.g. unload of a module or killing of a thread (yes, I know these are nonos, but they could be useful, esp. the thread kill operation in multi-threaded servers). After __cleanup__ has done its thing, the finalizer can either choose to leave all remaining cycles in memory (and leak) or apply its own magic to complete the task. In any case, __del__ should be called when the refcount reaches 0. (I find it somewhat strange that people are argueing to keep external resources alive even though there is a chance of freeing them.) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Mar 8 09:46:14 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 08 Mar 2000 09:46:14 +0100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: <001801bf88d2$af0037c0$452d153f@tim> Message-ID: <38C61356.E0598DBF@lemburg.com> Tim Peters wrote: > > Mike has a darned good point here. Anyone have a darned good answer ? > > -----Original Message----- > From: python-list-admin at python.org [mailto:python-list-admin at python.org] > On Behalf Of Mike Fletcher > Sent: Tuesday, March 07, 2000 2:08 PM > To: Python Listserv (E-mail) > Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be > adopted? > > Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage > all over the hard-disk, traffic rerouted through the bit-bucket, you aren't > getting to work anytime soon Mrs. Programmer) and wondering why we have a > FAQ instead of having the win32pipe stuff rolled into the os module to fix > it. Is there some incompatibility? Is there a licensing problem? > > Ideas? I'd suggest moving the popen from the C modules into os.py as Python API and then applying all necessary magic to either use the win32pipe implementation (if available) or the native C one from the posix module in os.py. Unless, of course, the win32 stuff (or some of it) makes it into the core. I'm mostly interested in this for my platform.py module... BTW, is there any interest of moving it into the core ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Wed Mar 8 13:10:53 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 08 Mar 2000 07:10:53 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: Your message of "Wed, 08 Mar 2000 09:46:14 +0100." <38C61356.E0598DBF@lemburg.com> References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> Message-ID: <200003081210.HAA19931@eric.cnri.reston.va.us> > Tim Peters wrote: > > > > Mike has a darned good point here. Anyone have a darned good answer ? > > Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be > > adopted? > > > > Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage > > all over the hard-disk, traffic rerouted through the bit-bucket, you aren't > > getting to work anytime soon Mrs. Programmer) and wondering why we have a > > FAQ instead of having the win32pipe stuff rolled into the os module to fix > > it. Is there some incompatibility? Is there a licensing problem? MAL: > I'd suggest moving the popen from the C modules into os.py > as Python API and then applying all necessary magic to either > use the win32pipe implementation (if available) or the native > C one from the posix module in os.py. > > Unless, of course, the win32 stuff (or some of it) makes it into > the core. No concrete plans -- except that I think the registry access is supposed to go in. Haven't seen the code on patches at python.org yet though. > I'm mostly interested in this for my platform.py module... > BTW, is there any interest of moving it into the core ? "it" == platform.py? Little interest from me personally; I suppose it could go in Tools/scripts/... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Mar 8 15:06:53 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 08 Mar 2000 09:06:53 -0500 Subject: [Python-Dev] finalization again In-Reply-To: Your message of "Wed, 08 Mar 2000 01:25:56 EST." <001401bf88c7$29f2a320$452d153f@tim> References: <001401bf88c7$29f2a320$452d153f@tim> Message-ID: <200003081406.JAA20033@eric.cnri.reston.va.us> > A trash cycle without a finalizer isn't a problem, right? In that case, > topsort rules have no visible consquence so it doesn't matter in what order > you merely reclaim the memory. When we have a pile of garbage, we don't know whether it's all connected or whether it's lots of little cycles. So if we find [objects with -- I'm going to omit this] finalizers, we have to put those on a third list and put everything reachable from them on that list as well (the algorithm I described before). What's left on the first list then consists of finalizer-free garbage. We dispose of this garbage by clearing dicts and lists. Hopefully this makes the refcount of some of the finalizers go to zero -- those are finalized in the normal way. And now we have to deal with the inevitable: finalizers that are part of cycles. It makes sense to reduce the graph of objects to a graph of finalizers only. Example: A <=> b -> C <=> d A and C have finalizers. C is part of a cycle (C-d) that contains no other finalizers, but C is also reachable from A. A is part of a cycle (A-b) that keeps it alive. The interesting thing here is that if we only look at the finalizers, there are no cycles! If we reduce the graph to only finalizers (setting aside for now the problem of how to do that -- we may need to allocate more memory to hold the reduced greaph), we get: A -> C We can now finalize A (even though its refcount is nonzero!). And that's really all we can do! A could break its own cycle, thereby disposing of itself and b. It could also break C's cycle, disposing of C and d. It could do nothing. Or it could resurrect A, thereby resurrecting all of A, b, C, and d. This leads to (there's that weird echo again :-) Boehm's solution: Call A's finalizer and leave the rest to the next time the garbage collection runs. Note that we're now calling finalizers on objects with a non-zero refcount. At some point (probably as a result of finalizing A) its refcount will go to zero. We should not finalize it again -- this would serve no purpose. Possible solution: INCREF(A); A->__del__(); if (A->ob_refcnt == 1) A->__class__ = NULL; /* Make a finalizer-less */ DECREF(A); This avoids finalizing twice if the first finalization broke all cycles in which A is involved. But if it doesn't, A is still cyclical garbage with a finalizer! Even if it didn't resurrect itself. Instead of the code fragment above, we could mark A as "just finalized" and when it shows up at the head of the tree (of finalizers in cyclical trash) again on the next garbage collection, to discard it without calling the finalizer again (because this clearly means that it didn't resurrect itself -- at least not for a very long time). I would be happier if we could still have a rule that says that a finalizer is called only once by magic -- even if we have two forms of magic: refcount zero or root of the tree. Tim: I don't know if you object against this rule as a matter of principle (for the sake of finalizers that resurrect the object) or if your objection is really against the unordered calling of finalizers legitimized by Java's rules. I hope the latter, since I think it that this rule (__del__ called only once by magic) by itself is easy to understand and easy to deal with, and I believe it may be necessary to guarantee progress for the garbage collector. The problem is that the collector can't easily tell whether A has resurrected itself. Sure, if the refcount is 1 after the finalizer run, I know it didn't resurrect itself. But even if it's higher than before, that doesn't mean it's resurrected: it could have linked to itself. Without doing a full collection I can't tell the difference. If I wait until a full collection happens again naturally, and look at the "just finalized flag", I can't tell the difference between the case whereby the object resurrected itself but died again before the next collection, and the case where it was dead already. So I don't know how many times it was expecting the "last rites" to be performed, and the object can't know whether to expect them again or not. This seems worse than the only-once rule to me. Even if someone once found a good use for resurrecting inside __del__, against all recommendations, I don't mind breaking their code, if it's for a good cause. The Java rules aren't a good cause. But top-sorted finalizer calls seem a worthy cause. So now we get to discuss what to do with multi-finalizer cycles, like: A <=> b <=> C Here the reduced graph is: A <=> C About this case you say: > If it has an object with a finalizer, though, at the very worst you can let > it leak, and make the collection of leaked objects available for > inspection. Even that much is a *huge* "improvement" over what they have > today: most cycles won't have a finalizer and so will get reclaimed, and > for the rest they'll finally have a simple way to identify exactly where the > problem is, and a simple criterion for predicting when it will happen. If > that's not "good enough", then without abandoning principle the user needs > to have some way to reduce such a cycle *to* a topsort case themself. > > > I find having a separate __cleanup__ protocol cumbersome. > > Same here, but if you're not comfortable leaking, and you agree Python is > not in the business of guesing in inherently ambiguous situations, maybe > that's what it takes! MAL and GregS both gravitated to this kind of thing > at once, and that's at least suggestive; and MAL has actually been using his > approach. It's explicit, and that's Pythonic on the face of it. > > > I think that the "finalizer only called once by magic" rule is reasonable. > > If it weren't for its specific use in emulating Java's scheme, would you > still be in favor of that? It's a little suspicious that it never came up > before . Suspicious or not, it still comes up. I still like it. I still think that playing games with resurrection is evil. (Maybe my spiritual beliefs shine through here -- I'm a convinced atheist. :-) Anyway, once-only rule aside, we still need a protocol to deal with cyclical dependencies between finalizers. The __cleanup__ approach is one solution, but it also has a problem: we have a set of finalizers. Whose __cleanup__ do we call? Any? All? Suggestions? Note that I'd like some implementation freedom: I may not want to bother with the graph reduction algorithm at first (which seems very hairy) so I'd like to have the right to use the __cleanup__ API as soon as I see finalizers in cyclical trash. I don't mind disposing of finalizer-free cycles first, but once I have more than one finalizer left in the remaining cycles, I'd like the right not to reduce the graph for topsort reasons -- that algorithm seems hard. So we're back to the __cleanup__ design. Strawman proposal: for all finalizers in a trash cycle, call their __cleanup__ method, in arbitrary order. After all __cleanup__ calls are done, if the objects haven't all disposed of themselves, they are all garbage-collected without calling __del__. (This seems to require another garbage colelction cycle -- so perhaps there should also be a once-only rule for __cleanup__?) Separate question: what if there is no __cleanup__? This should probably be reported: "You have cycles with finalizers, buddy! What do you want to do about them?" This same warning could be given when there is a __cleanup__ but it doesn't break all cycles. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed Mar 8 14:34:06 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 08 Mar 2000 14:34:06 +0100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us> Message-ID: <38C656CE.B0ACFF35@lemburg.com> Guido van Rossum wrote: > > > Tim Peters wrote: > > > > > > Mike has a darned good point here. Anyone have a darned good answer ? > > > Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be > > > adopted? > > > > > > Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage > > > all over the hard-disk, traffic rerouted through the bit-bucket, you aren't > > > getting to work anytime soon Mrs. Programmer) and wondering why we have a > > > FAQ instead of having the win32pipe stuff rolled into the os module to fix > > > it. Is there some incompatibility? Is there a licensing problem? > > MAL: > > I'd suggest moving the popen from the C modules into os.py > > as Python API and then applying all necessary magic to either > > use the win32pipe implementation (if available) or the native > > C one from the posix module in os.py. > > > > Unless, of course, the win32 stuff (or some of it) makes it into > > the core. > > No concrete plans -- except that I think the registry access is > supposed to go in. Haven't seen the code on patches at python.org yet > though. Ok, what about the optional "use win32pipe if available" idea then ? > > I'm mostly interested in this for my platform.py module... > > BTW, is there any interest of moving it into the core ? > > "it" == platform.py? Right. > Little interest from me personally; I suppose it > could go in Tools/scripts/... Hmm, it wouldn't help much in there I guess... after all, it defines APIs which are to be queried by other scripts. The default action to print the platform information to stdout is just a useful addition. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Wed Mar 8 15:33:53 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 08 Mar 2000 09:33:53 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: Your message of "Wed, 08 Mar 2000 14:34:06 +0100." <38C656CE.B0ACFF35@lemburg.com> References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us> <38C656CE.B0ACFF35@lemburg.com> Message-ID: <200003081433.JAA20177@eric.cnri.reston.va.us> > > MAL: > > > I'd suggest moving the popen from the C modules into os.py > > > as Python API and then applying all necessary magic to either > > > use the win32pipe implementation (if available) or the native > > > C one from the posix module in os.py. > > > > > > Unless, of course, the win32 stuff (or some of it) makes it into > > > the core. [Guido] > > No concrete plans -- except that I think the registry access is > > supposed to go in. Haven't seen the code on patches at python.org yet > > though. > > Ok, what about the optional "use win32pipe if available" idea then ? Sorry, I meant please send me the patch! --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Wed Mar 8 15:59:46 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 8 Mar 2000 09:59:46 -0500 (EST) Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <200003081210.HAA19931@eric.cnri.reston.va.us> References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us> Message-ID: <14534.27362.139106.701784@weyr.cnri.reston.va.us> Guido van Rossum writes: > "it" == platform.py? Little interest from me personally; I suppose it > could go in Tools/scripts/... I think platform.py is pretty nifty, but I'm not entirely sure how it's expected to be used. Perhaps Marc-Andre could explain further the motivation behind the module? My biggest requirement is that it be accompanied by documentation. The coolness factor and shared use of hackerly knowledge would probably get *me* to put it in, but there are a lot of things about which I'll disagree with Guido just to hear his (well-considered) thoughts on the matter. ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mal at lemburg.com Wed Mar 8 18:37:43 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 08 Mar 2000 18:37:43 +0100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 ... code for thought. References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us> <38C656CE.B0ACFF35@lemburg.com> <200003081433.JAA20177@eric.cnri.reston.va.us> Message-ID: <38C68FE7.63943C5C@lemburg.com> Guido van Rossum wrote: > > > > MAL: > > > > I'd suggest moving the popen from the C modules into os.py > > > > as Python API and then applying all necessary magic to either > > > > use the win32pipe implementation (if available) or the native > > > > C one from the posix module in os.py. > > > > > > > > Unless, of course, the win32 stuff (or some of it) makes it into > > > > the core. > [Guido] > > > No concrete plans -- except that I think the registry access is > > > supposed to go in. Haven't seen the code on patches at python.org yet > > > though. > > > > Ok, what about the optional "use win32pipe if available" idea then ? > > Sorry, I meant please send me the patch! Here's the popen() interface I use in platform.py. It should serve well as basis for a os.popen patch... (don't have time to do it myself right now): class _popen: """ Fairly portable (alternative) popen implementation. This is mostly needed in case os.popen() is not available, or doesn't work as advertised, e.g. in Win9X GUI programs like PythonWin or IDLE. XXX Writing to the pipe is currently not supported. """ tmpfile = '' pipe = None bufsize = None mode = 'r' def __init__(self,cmd,mode='r',bufsize=None): if mode != 'r': raise ValueError,'popen()-emulation only support read mode' import tempfile self.tmpfile = tmpfile = tempfile.mktemp() os.system(cmd + ' > %s' % tmpfile) self.pipe = open(tmpfile,'rb') self.bufsize = bufsize self.mode = mode def read(self): return self.pipe.read() def readlines(self): if self.bufsize is not None: return self.pipe.readlines() def close(self, remove=os.unlink,error=os.error): if self.pipe: rc = self.pipe.close() else: rc = 255 if self.tmpfile: try: remove(self.tmpfile) except error: pass return rc # Alias __del__ = close def popen(cmd, mode='r', bufsize=None): """ Portable popen() interface. """ # Find a working popen implementation preferring win32pipe.popen # over os.popen over _popen popen = None if os.environ.get('OS','') == 'Windows_NT': # On NT win32pipe should work; on Win9x it hangs due to bugs # in the MS C lib (see MS KnowledgeBase article Q150956) try: import win32pipe except ImportError: pass else: popen = win32pipe.popen if popen is None: if hasattr(os,'popen'): popen = os.popen # Check whether it works... it doesn't in GUI programs # on Windows platforms if sys.platform == 'win32': # XXX Others too ? try: popen('') except os.error: popen = _popen else: popen = _popen if bufsize is None: return popen(cmd,mode) else: return popen(cmd,mode,bufsize) if __name__ == '__main__': print """ I confirm that, to the best of my knowledge and belief, this contribution is free of any claims of third parties under copyright, patent or other rights or interests ("claims"). To the extent that I have any such claims, I hereby grant to CNRI a nonexclusive, irrevocable, royalty-free, worldwide license to reproduce, distribute, perform and/or display publicly, prepare derivative versions, and otherwise use this contribution as part of the Python software and its related documentation, or any derivative versions thereof, at no cost to CNRI or its licensed users, and to authorize others to do so. I acknowledge that CNRI may, at its sole discretion, decide whether or not to incorporate this contribution in the Python software and its related documentation. I further grant CNRI permission to use my name and other identifying information provided to CNRI by me for use in connection with the Python software and its related documentation. """ -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Mar 8 18:44:59 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 08 Mar 2000 18:44:59 +0100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us> <14534.27362.139106.701784@weyr.cnri.reston.va.us> Message-ID: <38C6919B.EA3EE2E7@lemburg.com> "Fred L. Drake, Jr." wrote: > > Guido van Rossum writes: > > "it" == platform.py? Little interest from me personally; I suppose it > > could go in Tools/scripts/... > > I think platform.py is pretty nifty, but I'm not entirely sure how > it's expected to be used. Perhaps Marc-Andre could explain further > the motivation behind the module? It was first intended to provide a way to format a platform identifying file name for the mxCGIPython project and then quickly moved on to provide many different APIs to query platform specific information. architecture(executable='/usr/local/bin/python', bits='', linkage='') : Queries the given executable (defaults to the Python interpreter binary) for various architecture informations. Returns a tuple (bits,linkage) which contain information about the bit architecture and the linkage format used for the executable. Both values are returned as strings. Values that cannot be determined are returned as given by the parameter presets. If bits is given as '', the sizeof(long) is used as indicator for the supported pointer size. The function relies on the system's "file" command to do the actual work. This is available on most if not all Unix platforms. On some non-Unix platforms and then only if the executable points to the Python interpreter defaults from _default_architecture are used. dist(distname='', version='', id='') : Tries to determine the name of the OS distribution name The function first looks for a distribution release file in /etc and then reverts to _dist_try_harder() in case no suitable files are found. Returns a tuple distname,version,id which default to the args given as parameters. java_ver(release='', vendor='', vminfo=('', '', ''), osinfo=('', '', '')) : Version interface for JPython. Returns a tuple (release,vendor,vminfo,osinfo) with vminfo being a tuple (vm_name,vm_release,vm_vendor) and osinfo being a tuple (os_name,os_version,os_arch). Values which cannot be determined are set to the defaults given as parameters (which all default to ''). libc_ver(executable='/usr/local/bin/python', lib='', version='') : Tries to determine the libc version against which the file executable (defaults to the Python interpreter) is linked. Returns a tuple of strings (lib,version) which default to the given parameters in case the lookup fails. Note that the function has intimate knowledge of how different libc versions add symbols to the executable is probably only useable for executables compiled using gcc. The file is read and scanned in chunks of chunksize bytes. mac_ver(release='', versioninfo=('', '', ''), machine='') : Get MacOS version information and return it as tuple (release, versioninfo, machine) with versioninfo being a tuple (version, dev_stage, non_release_version). Entries which cannot be determined are set to ''. All tuple entries are strings. Thanks to Mark R. Levinson for mailing documentation links and code examples for this function. Documentation for the gestalt() API is available online at: http://www.rgaros.nl/gestalt/ machine() : Returns the machine type, e.g. 'i386' An empty string is returned if the value cannot be determined. node() : Returns the computer's network name (may not be fully qualified !) An empty string is returned if the value cannot be determined. platform(aliased=0, terse=0) : Returns a single string identifying the underlying platform with as much useful information as possible (but no more :). The output is intended to be human readable rather than machine parseable. It may look different on different platforms and this is intended. If "aliased" is true, the function will use aliases for various platforms that report system names which differ from their common names, e.g. SunOS will be reported as Solaris. The system_alias() function is used to implement this. Setting terse to true causes the function to return only the absolute minimum information needed to identify the platform. processor() : Returns the (true) processor name, e.g. 'amdk6' An empty string is returned if the value cannot be determined. Note that many platforms do not provide this information or simply return the same value as for machine(), e.g. NetBSD does this. release() : Returns the system's release, e.g. '2.2.0' or 'NT' An empty string is returned if the value cannot be determined. system() : Returns the system/OS name, e.g. 'Linux', 'Windows' or 'Java'. An empty string is returned if the value cannot be determined. system_alias(system, release, version) : Returns (system,release,version) aliased to common marketing names used for some systems. It also does some reordering of the information in some cases where it would otherwise cause confusion. uname() : Fairly portable uname interface. Returns a tuple of strings (system,node,release,version,machine,processor) identifying the underlying platform. Note that unlike the os.uname function this also returns possible processor information as additional tuple entry. Entries which cannot be determined are set to ''. version() : Returns the system's release version, e.g. '#3 on degas' An empty string is returned if the value cannot be determined. win32_ver(release='', version='', csd='', ptype='') : Get additional version information from the Windows Registry and return a tuple (version,csd,ptype) referring to version number, CSD level and OS type (multi/single processor). As a hint: ptype returns 'Uniprocessor Free' on single processor NT machines and 'Multiprocessor Free' on multi processor machines. The 'Free' refers to the OS version being free of debugging code. It could also state 'Checked' which means the OS version uses debugging code, i.e. code that checks arguments, ranges, etc. (Thomas Heller). Note: this functions only works if Mark Hammond's win32 package is installed and obviously only runs on Win32 compatible platforms. XXX Is there any way to find out the processor type on WinXX ? XXX Is win32 available on Windows CE ? Adapted from code posted by Karl Putland to comp.lang.python. > My biggest requirement is that it be accompanied by documentation. > The coolness factor and shared use of hackerly knowledge would > probably get *me* to put it in, but there are a lot of things about > which I'll disagree with Guido just to hear his (well-considered) > thoughts on the matter. ;) The module is doc-string documented (see above). This should server well as basis for the latex docs. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From DavidA at ActiveState.com Wed Mar 8 19:36:01 2000 From: DavidA at ActiveState.com (David Ascher) Date: Wed, 8 Mar 2000 10:36:01 -0800 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <200003081210.HAA19931@eric.cnri.reston.va.us> Message-ID: > "it" == platform.py? Little interest from me personally; I suppose it > could go in Tools/scripts/... FWIW, I think it belongs in the standard path. It allows one to do the equivalent of if os.platform == '...' but in a much more useful way. --david From mhammond at skippinet.com.au Wed Mar 8 22:36:12 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 9 Mar 2000 08:36:12 +1100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <200003081210.HAA19931@eric.cnri.reston.va.us> Message-ID: > No concrete plans -- except that I think the registry access is > supposed to go in. Haven't seen the code on patches at python.org yet > though. FYI, that is off with Trent who is supposed to be testing it on the Alpha. Re win32pipe - I responded to that post suggesting that we do with os.pipe and win32pipe what was done with os.path.abspath/win32api - optionally try to import the win32 specific module and use it. My only "concern" is that this then becomes more code for Guido to maintain in the core, even though Guido has expressed a desire to get out of the installers business. Assuming the longer term plan is for other people to put together installation packages, and that these people are free to redistribute win32api/win32pipe, Im wondering if it is worth bothering with? Mark. From trentm at ActiveState.com Wed Mar 8 15:42:06 2000 From: trentm at ActiveState.com (Trent Mick) Date: Wed, 8 Mar 2000 14:42:06 -0000 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <38C6919B.EA3EE2E7@lemburg.com> Message-ID: MAL: > architecture(executable='/usr/local/bin/python', bits='', > linkage='') : > > Values that cannot be determined are returned as given by the > parameter presets. If bits is given as '', the sizeof(long) is > used as indicator for the supported pointer size. Just a heads up, using sizeof(long) will not work on forthcoming WIN64 (LLP64 data model) to determine the supported pointer size. You would want to use the 'P' struct format specifier instead, I think (I am speaking in relative ignorance). However, the docs say that a PyInt is used to store 'P' specified value, which as a C long, will not hold a pointer on LLP64. Hmmmm. The keyword perhaps is "forthcoming". This is the code in question in platform.py: # Use the sizeof(long) as default number of bits if nothing # else is given as default. if not bits: import struct bits = str(struct.calcsize('l')*8) + 'bit' Guido: > > No concrete plans -- except that I think the registry access is > > supposed to go in. Haven't seen the code on patches at python.org yet > > though. > Mark Hammond: > FYI, that is off with Trent who is supposed to be testing it on the Alpha. My Alpha is in pieces right now! I will get to it soon. I will try it on Win64 as well, if I can. Trent Trent Mick trentm at activestate.com From guido at python.org Thu Mar 9 03:59:51 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 08 Mar 2000 21:59:51 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: Your message of "Thu, 09 Mar 2000 08:36:12 +1100." References: Message-ID: <200003090259.VAA20928@eric.cnri.reston.va.us> > My only "concern" is that this then becomes more code for Guido to maintain > in the core, even though Guido has expressed a desire to get out of the > installers business. Theoretically, it shouldn't need much maintenance. I'm more concerned that it will have different semantics than on Unix so that in practice you'd need to know about the platform anyway (apart from the fact that the installed commands are different, of course). > Assuming the longer term plan is for other people to put together > installation packages, and that these people are free to redistribute > win32api/win32pipe, Im wondering if it is worth bothering with? So that everybody could use os.popen() regardless of whether they're on Windows or Unix. --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond at skippinet.com.au Thu Mar 9 04:31:21 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 9 Mar 2000 14:31:21 +1100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <200003090259.VAA20928@eric.cnri.reston.va.us> Message-ID: [Me] > > Assuming the longer term plan is for other people to put together > > installation packages, and that these people are free to redistribute > > win32api/win32pipe, Im wondering if it is worth bothering with? [Guido] > So that everybody could use os.popen() regardless of whether they're > on Windows or Unix. Sure. But what I meant was "should win32pipe code move into the core, or should os.pipe() just auto-detect and redirect to win32pipe if installed?" I was suggesting that over the longer term, it may be reasonable to assume that win32pipe _will_ be installed, as everyone who releases installers for Python should include it :-) It could also be written in such a way that it prints a warning message when win32pipe doesnt exist, so in 99% of cases, it will answer the FAQ before they have had a chance to ask it :-) It also should be noted that the win32pipe support for popen on Windows 95/98 includes a small, dedicated .exe - this just adds to the maintenance burden. But it doesnt worry me at all what happens - I was just trying to save you work . Anyone is free to take win32pipe and move the relevant code into the core anytime they like, with my and Bill's blessing. It quite suits me that people have to download win32all to get this working, so I doubt I will get around to it any time soon :-) Mark. From tim_one at email.msn.com Thu Mar 9 04:52:58 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 8 Mar 2000 22:52:58 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: Message-ID: <000401bf897a$f5a7e620$0d2d153f@tim> I had another take on all this, which I'll now share since nobody seems inclined to fold in the Win32 popen: perhaps os.popen should not be supported at all under Windows! The current function is a mystery wrapped in an enigma -- sometimes it works, sometimes it doesn't, and I've never been able to outguess which one will obtain (there's more to it than just whether a console window is attached). If it's not reliable (it's not), and we can't document the conditions under which it can be used safely (I can't), Python shouldn't expose it. Failing that, the os.popen docs should caution it's "use at your own risk" under Windows, and that this is directly inherited from MS's popen implementation. From tim_one at email.msn.com Thu Mar 9 10:40:26 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 9 Mar 2000 04:40:26 -0500 Subject: [Python-Dev] finalization again In-Reply-To: <200003081406.JAA20033@eric.cnri.reston.va.us> Message-ID: <000701bf89ab$80cb8e20$0d2d153f@tim> [Guido, with some implementation details and nice examples] Normally I'd eat this up -- today I'm gasping for air trying to stay afloat. I'll have to settle for sketching the high-level approach I've had in the back of my mind. I start with the pile of incestuous stuff Toby/Neil discovered have no external references. It consists of dead cycles, and perhaps also non-cycles reachable only from dead cycles. 1. The "points to" relation on this pile defines a graph G. 2. From any graph G, we can derive a related graph G' consisting of the maximal strongly connected components (SCCs) of G. Each (super)node of G' is an SCC of G, where (super)node A' of G' points to (super)node B' of G' iff there exists a node A in A' that points to (wrt G) some node B in B'. It's not obvious, but the SCCs can be found in linear time (via Tarjan's algorithm, which is simple but subtle; Cyclops.py uses a much dumber brute-force approach, which is nevertheless perfectly adequate in the absence of massively large cycles -- premature optimization is the root etc <0.5 wink>). 3. G' is necessarily a DAG. For if distinct A' and B' are both reachable from each other in G', then every pair of A in A' and B in B' are reachable from each other in G, contradicting that A' and B' are distinct maximal SCCs (that is, the union of A' and B' is also an SCC). 4. The point to all this: Every DAG can be topsorted. Start with the nodes of G' without predecessors. There must be at least one, because G' is a DAG. 5. For every node A' in G' without predecessors (wrt G'), it either does or does not contain an object with a potentially dangerous finalizer. If it does not, let's call it a safe node. If there are no safe nodes without predecessors, GC is stuck, and for good reason: every object in the whole pile is reachable from an object with a finalizer, which could change the topology in near-arbitrary ways. The unsafe nodes without predecessors (and again, by #4, there must be at least one) are the heart of the problem, and this scheme identifies them precisely. 6. Else there is a safe node A'. For each A in A', reclaim it, following the normal refcount rules (or in an implementation w/o RC, by following a topsort of "points to" in the original G). This *may* cause reclamation of an object X with a finalizer outside of A'. But doing so cannot cause resurrection of anything in A' (X is reachable from A' else cleaning up A' couldn't have affected X, and if anything in A' were also reachable from X, X would have been in A' to begin with (SCC!), contradicting that A' is safe). So the objects in A' can get reclaimed without difficulty. 7. The simplest thing to do now is just stop: rebuild it from scratch the next time the scheme is invoked. If it was *possible* to make progress without guessing, we did; and if it was impossible, we identified the precise SCC(s) that stopped us. Anything beyond that is optimization <0.6 wink>. Seems the most valuable optimization would be to keep track of whether an object with a finalizer gets reclaimed in step 6 (so long as that doesn't happen, the mutations that can occur to the structure of G' seem nicely behaved enough that it should be possible to loop back to step #5 without crushing pain). On to Guido's msg: [Guido] > When we have a pile of garbage, we don't know whether it's all > connected or whether it's lots of little cycles. So if we find > [objects with -- I'm going to omit this] finalizers, we have to put > those on a third list and put everything reachable from them on that > list as well (the algorithm I described before). SCC determination gives precise answers to all that. > What's left on the first list then consists of finalizer-free garbage. > We dispose of this garbage by clearing dicts and lists. Hopefully > this makes the refcount of some of the finalizers go to zero -- those > are finalized in the normal way. In Python it's even possible for a finalizer to *install* a __del__ method that didn't previously exist, into the class of one of the objects on your "first list". The scheme above is meant to be bulletproof in the face of abuses even I can't conceive of . More mundanely, clearing an item on your first list can cause a chain of events that runs a finalizer, which in turn can resurrect one of the objects on your first list (and so it should *not* get reclaimed). Without doing the SCC bit, I don't think you can out-think that (the reasoning above showed that the finalizer can't resurrect something in the *same* SCC as the object that started it all, but that argument cannot be extended to objects in other safe SCCs: they're vulnerable). > And now we have to deal with the inevitable: finalizers that are part > of cycles. It makes sense to reduce the graph of objects to a graph > of finalizers only. Example: > > A <=> b -> C <=> d > > A and C have finalizers. C is part of a cycle (C-d) that contains no > other finalizers, but C is also reachable from A. A is part of a > cycle (A-b) that keeps it alive. The interesting thing here is that > if we only look at the finalizers, there are no cycles! The scheme above derives G': A' -> C' where A' consists of the A<=>b cycle and C' the C<=>d cycle. That there are no cycles in G' isn't surprising, it's just the natural consequence of doing the natural analysis . The scheme above refuses to do anything here, because the only node in G' without a predecessor (namely A') isn't "safe". > If we reduce the graph to only finalizers (setting aside for now the > problem of how to do that -- we may need to allocate more memory to > hold the reduced greaph), we get: > > A -> C You should really have self-loops on both A and C, right? (because A is reachable from itself via chasing pointers; ditto for C) > We can now finalize A (even though its refcount is nonzero!). And > that's really all we can do! A could break its own cycle, thereby > disposing of itself and b. It could also break C's cycle, disposing > of C and d. It could do nothing. Or it could resurrect A, thereby > resurrecting all of A, b, C, and d. > > This leads to (there's that weird echo again :-) Boehm's solution: > Call A's finalizer and leave the rest to the next time the garbage > collection runs. This time the echo came back distorted : [Boehm] Cycles involving one or more finalizable objects are never finalized. A<=>b is "a cycle involving one or more finalizable objects", so he won't touch it. The scheme at the top doesn't either. If you handed him your *derived* graph (but also without the self-loops), he would; me too. KISS! > Note that we're now calling finalizers on objects with a non-zero > refcount. I don't know why you want to do this. As the next several paragraphs confirm, it creates real headaches for the implementation, and I'm unclear on what it buys in return. Is "we'll do something by magic for cycles with no more than one finalizer" a major gain for the user over "we'll do something by magic for cycles with no finalizer"? 0, 1 and infinity *are* the only interesting numbers , but the difference between 0 and 1 *here* doesn't seem to me worth signing up for any pain at all. > At some point (probably as a result of finalizing A) its > refcount will go to zero. We should not finalize it again -- this > would serve no purpose. I don't believe BDW (or the scheme at the top) has this problem (simply because the only way to run finalizer in a cycle under them is for the user to break the cycle explicitly -- so if an object's finalizer gets run, the user caused it directly, and so can never claim surprise). > Possible solution: > > INCREF(A); > A->__del__(); > if (A->ob_refcnt == 1) > A->__class__ = NULL; /* Make a finalizer-less */ > DECREF(A); > > This avoids finalizing twice if the first finalization broke all > cycles in which A is involved. But if it doesn't, A is still cyclical > garbage with a finalizer! Even if it didn't resurrect itself. > > Instead of the code fragment above, we could mark A as "just > finalized" and when it shows up at the head of the tree (of finalizers > in cyclical trash) again on the next garbage collection, to discard it > without calling the finalizer again (because this clearly means that > it didn't resurrect itself -- at least not for a very long time). I don't think you need to do any of this -- unless you think you need to do the thing that created the need for this, which I didn't think you needed to do either . > I would be happier if we could still have a rule that says that a > finalizer is called only once by magic -- even if we have two forms of > magic: refcount zero or root of the tree. Tim: I don't know if you > object against this rule as a matter of principle (for the sake of > finalizers that resurrect the object) or if your objection is really > against the unordered calling of finalizers legitimized by Java's > rules. I hope the latter, since I think it that this rule (__del__ > called only once by magic) by itself is easy to understand and easy to > deal with, and I believe it may be necessary to guarantee progress for > the garbage collector. My objections to Java's rules have been repeated enough. I would have no objection to "__del__ called only once" if it weren't for that Python currently does something different. I don't know whether people rely on that now; if they do, it's a much more dangerous thing to change than adding a new keyword (the compiler gives automatic 100% coverage of the latter; but nothing mechanical can help people track down reliance-- whether deliberate or accidental --on the former). My best *guess* is that __del__ is used rarely; e.g., there are no more than 40 instances of it in the whole CVS tree, including demo directories; and they all look benign (at least three have bodies consisting of "pass"!). The most complicated one I found in my own code is: def __del__(self): self.break_cycles() def break_cycles(self): for rule in self.rules: if rule is not None: rule.cleanse() But none of this self-sampling is going to comfort some guy in France who has a megaline of code relying on it. Good *bet*, though . > [and another cogent explanation of why breaking the "leave cycles with > finalizers" alone injunction creates headaches] > ... > Even if someone once found a good use for resurrecting inside __del__, > against all recommendations, I don't mind breaking their code, if it's > for a good cause. The Java rules aren't a good cause. But top-sorted > finalizer calls seem a worthy cause. They do to me too, except that I say even a cycle involving but a single object (w/ finalizer) looping on itself is the user's problem. > So now we get to discuss what to do with multi-finalizer cycles, like: > > A <=> b <=> C > > Here the reduced graph is: > > A <=> C The SCC reduction is simply to A and, right, the scheme at the top punts. > [more the on once-only rule chopped] > ... > Anyway, once-only rule aside, we still need a protocol to deal with > cyclical dependencies between finalizers. The __cleanup__ approach is > one solution, but it also has a problem: we have a set of finalizers. > Whose __cleanup__ do we call? Any? All? Suggestions? This is why a variant of guardians were more appealing to me at first: I could ask a guardian for the entire SCC, so I get the *context* of the problem as well as the final microscopic symptom. I see Marc-Andre already declined to get sucked into the magical part of this . Greg should speak for his scheme, and I haven't made time to understand it fully; my best guess is to call x.__cleanup__ for every object in the SCC (but there's no clear way to decide which order to call them in, and unless they're more restricted than __del__ methods they can create all the same problems __del__ methods can!). > Note that I'd like some implementation freedom: I may not want to > bother with the graph reduction algorithm at first (which seems very > hairy) so I'd like to have the right to use the __cleanup__ API > as soon as I see finalizers in cyclical trash. I don't mind disposing > of finalizer-free cycles first, but once I have more than one > finalizer left in the remaining cycles, I'd like the right not to > reduce the graph for topsort reasons -- that algorithm seems hard. I hate to be realistic , but modern GC algorithms are among the hardest you'll ever see in any field; even the outer limits of what we've talked about here is baby stuff. Sun's Java group (the one in Chelmsford, MA, down the road from me) had a group of 4+ people (incl. the venerable Mr. Steele) working full-time for over a year on the last iteration of Java's GC. The simpler BDW is a megabyte of code spread over 100+ files. Etc -- state of the art GC can be crushingly hard. So I've got nothing against taking shortcuts at first -- there's actually no realistic alternative. I think we're overlooking the obvious one, though: if any finalizer appears in any trash cycle, tough luck. Python 3000 -- which may be a spelling of 1.7 , but doesn't *need* to be a spelling of 1.6. > So we're back to the __cleanup__ design. Strawman proposal: for all > finalizers in a trash cycle, call their __cleanup__ method, in > arbitrary order. After all __cleanup__ calls are done, if the objects > haven't all disposed of themselves, they are all garbage-collected > without calling __del__. (This seems to require another garbage > colelction cycle -- so perhaps there should also be a once-only rule > for __cleanup__?) > > Separate question: what if there is no __cleanup__? This should > probably be reported: "You have cycles with finalizers, buddy! What > do you want to do about them?" This same warning could be given when > there is a __cleanup__ but it doesn't break all cycles. If I *ever* have a trash cycle with a finalizer in my code (> 0 -- "exactly 1" isn't special to me), I will consider it to be a bug. So I want a way to get it back from gc, so I can see what the heck it is, so I can fix my code (or harass whoever did it to me). __cleanup__ suffices for that, so the very act of calling it is all I'm really after ("Python invoked __cleanup__ == Tim has a bug"). But after I outgrow that , I'll certainly want the option to get another kind of complaint if __cleanup__ doesn't break the cycles, and after *that* I couldn't care less. I've given you many gracious invitations to say that you don't mind leaking in the face of a buggy program , but as you've declined so far, I take it that never hearing another gripe about leaking is a Primary Life Goal. So collection without calling __del__ is fine -- but so is collection with calling it! If we're going to (at least implicitly) approve of this stuff, it's probably better *to* call __del__, if for no other reason than to catch your case of some poor innocent object caught in a cycle not of its making that expects its __del__ to abort starting World War III if it becomes unreachable . whatever-we-don't-call-a-mistake-is-a-feature-ly y'rs - tim From fdrake at acm.org Thu Mar 9 15:25:35 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 9 Mar 2000 09:25:35 -0500 (EST) Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <000401bf897a$f5a7e620$0d2d153f@tim> References: <000401bf897a$f5a7e620$0d2d153f@tim> Message-ID: <14535.46175.991970.135642@weyr.cnri.reston.va.us> Tim Peters writes: > Failing that, the os.popen docs should caution it's "use at your own risk" > under Windows, and that this is directly inherited from MS's popen > implementation. Tim (& others), Would this additional text be sufficient for the os.popen() documentation? \strong{Note:} This function behaves unreliably under Windows due to the native implementation of \cfunction{popen()}. If someone cares to explain what's weird about it, that might be appropriate as well, but I've never used this under Windows. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mal at lemburg.com Thu Mar 9 15:42:37 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 09 Mar 2000 15:42:37 +0100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: Message-ID: <38C7B85D.E6090670@lemburg.com> Trent Mick wrote: > > MAL: > > architecture(executable='/usr/local/bin/python', bits='', > > linkage='') : > > > > Values that cannot be determined are returned as given by the > > parameter presets. If bits is given as '', the sizeof(long) is > > used as indicator for the supported pointer size. > > Just a heads up, using sizeof(long) will not work on forthcoming WIN64 > (LLP64 data model) to determine the supported pointer size. You would want > to use the 'P' struct format specifier instead, I think (I am speaking in > relative ignorance). However, the docs say that a PyInt is used to store 'P' > specified value, which as a C long, will not hold a pointer on LLP64. Hmmmm. > The keyword perhaps is "forthcoming". > > This is the code in question in platform.py: > > # Use the sizeof(long) as default number of bits if nothing > # else is given as default. > if not bits: > import struct > bits = str(struct.calcsize('l')*8) + 'bit' Python < 1.5.2 doesn't support 'P', but anyway, I'll change those lines according to your suggestion. Does struct.calcsize('P')*8 return 64 on 64bit-platforms as it should (probably ;) ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jim at interet.com Thu Mar 9 16:45:54 2000 From: jim at interet.com (James C. Ahlstrom) Date: Thu, 09 Mar 2000 10:45:54 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: <000401bf897a$f5a7e620$0d2d153f@tim> Message-ID: <38C7C732.D9086C34@interet.com> Tim Peters wrote: > > I had another take on all this, which I'll now share since nobody > seems inclined to fold in the Win32 popen: perhaps os.popen should not be > supported at all under Windows! > > The current function is a mystery wrapped in an enigma -- sometimes it > works, sometimes it doesn't, and I've never been able to outguess which one > will obtain (there's more to it than just whether a console window is > attached). If it's not reliable (it's not), and we can't document the > conditions under which it can be used safely (I can't), Python shouldn't > expose it. OK, I admit I don't understand this either, but here goes... It looks like Python popen() uses the Windows _popen() function. The _popen() docs say that it creates a spawned copy of the command processor (shell) with the given string argument. It further states that it does NOT work in a Windows program and ONLY works when called from a Windows Console program. From tim_one at email.msn.com Thu Mar 9 18:14:17 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 9 Mar 2000 12:14:17 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <38C7C732.D9086C34@interet.com> Message-ID: <000401bf89ea$e6e54180$79a0143f@tim> [James C. Ahlstrom] > OK, I admit I don't understand this either, but here goes... > > It looks like Python popen() uses the Windows _popen() function. > The _popen() docs say ... Screw the docs. Pretend you're a newbie and *try* it. Here: import os p = os.popen("dir") while 1: line = p.readline() if not line: break print line Type that in by hand, or stick it in a file & run it from a cmdline python.exe (which is a Windows console program). Under Win95 the process freezes solid, and even trying to close the DOS box doesn't work. You have to bring up the task manager and kill it that way. I once traced this under the debugger -- it's hung inside an MS DLL. "dir" is not entirely arbitrary here: for *some* cmds it works fine, for others not. The set of which work appears to vary across Windows flavors. Sometimes you can worm around it by wrapping "a bad" cmd in a .bat file, and popen'ing the latter instead; but sometimes not. After hours of poke-&-hope (in the past), as I said, I've never been able to predict which cases will work. > ... > It further states that it does NOT work in a Windows program and ONLY > works when called from a Windows Console program. The latter is a necessary condition but not sufficient; don't know what *is* sufficient, and AFAIK nobody else does either. > From this I assume that popen() works from python.exe (it is a Console > app) if the command can be directly executed by the shell (like "dir"), See above for a counterexample to both . I actually have much better luck with cmds command.com *doesn't* know anything about. So this appears to vary by shell too. > ... > If there is something wrong with _popen() then the way to fix it is > to avoid using it and create the pipes directly. libc pipes ares as flaky as libc popen under Windows, Jim! MarkH has the only versions of these things that come close to working under Windows (he wraps the native Win32 spellings of these things; MS's libc entry points (which Python uses now) are much worse). > ... > Of course, the strength of Python is portable code. popen() should be > fixed the right way. pipes too, but users get baffled by popen much more often simply because they try popen much more often. there's-no-question-about-whether-it-works-right-it-doesn't-ly y'rs - tim From gstein at lyra.org Thu Mar 9 18:47:23 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 9 Mar 2000 09:47:23 -0800 (PST) Subject: [Python-Dev] platform.py (was: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?) In-Reply-To: <38C7B85D.E6090670@lemburg.com> Message-ID: On Thu, 9 Mar 2000, M.-A. Lemburg wrote: >... > Python < 1.5.2 doesn't support 'P', but anyway, I'll change > those lines according to your suggestion. > > Does struct.calcsize('P')*8 return 64 on 64bit-platforms as > it should (probably ;) ? Yes. It returns sizeof(void *). Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Thu Mar 9 15:55:36 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 09 Mar 2000 15:55:36 +0100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: <000401bf897a$f5a7e620$0d2d153f@tim> <14535.46175.991970.135642@weyr.cnri.reston.va.us> Message-ID: <38C7BB68.9FAE3BE9@lemburg.com> "Fred L. Drake, Jr." wrote: > > Tim Peters writes: > > Failing that, the os.popen docs should caution it's "use at your own risk" > > under Windows, and that this is directly inherited from MS's popen > > implementation. > > Tim (& others), > Would this additional text be sufficient for the os.popen() > documentation? > > \strong{Note:} This function behaves unreliably under Windows > due to the native implementation of \cfunction{popen()}. > > If someone cares to explain what's weird about it, that might be > appropriate as well, but I've never used this under Windows. Ehm, hasn't anyone looked at the code I posted yesterday ? It goes a long way to deal with these inconsistencies... even though its not perfect (yet ;). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake at acm.org Thu Mar 9 19:52:40 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 9 Mar 2000 13:52:40 -0500 (EST) Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <38C7BB68.9FAE3BE9@lemburg.com> References: <000401bf897a$f5a7e620$0d2d153f@tim> <14535.46175.991970.135642@weyr.cnri.reston.va.us> <38C7BB68.9FAE3BE9@lemburg.com> Message-ID: <14535.62200.158087.102380@weyr.cnri.reston.va.us> M.-A. Lemburg writes: > Ehm, hasn't anyone looked at the code I posted yesterday ? > It goes a long way to deal with these inconsistencies... even > though its not perfect (yet ;). I probably sent that before I'd read everything, and I'm not the one to change the popen() implementation. At this point, I'm waiting for someone who understands the details to decide what happens (if anything) to the implementation before I check in any changes to the docs. My inclination is to fix popen() on Windows to do the right thing, but I don't know enough about pipes & process management on Windows to get into that fray. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From nascheme at enme.ucalgary.ca Thu Mar 9 20:37:31 2000 From: nascheme at enme.ucalgary.ca (nascheme at enme.ucalgary.ca) Date: Thu, 9 Mar 2000 12:37:31 -0700 Subject: [Python-Dev] finalization again Message-ID: <20000309123731.A3664@acs.ucalgary.ca> [Tim, explaining something I was thinking about more clearly than I ever could] >It's not obvious, but the SCCs can be found in linear time (via Tarjan's >algorithm, which is simple but subtle; Wow, it seems like it should be more expensive than that. What are the space requirements? Also, does the simple algorithm you used in Cyclops have a name? >If there are no safe nodes without predecessors, GC is stuck, >and for good reason: every object in the whole pile is reachable >from an object with a finalizer, which could change the topology >in near-arbitrary ways. The unsafe nodes without predecessors >(and again, by #4, there must be at least one) are the heart of >the problem, and this scheme identifies them precisely. Exactly. What is our policy on these unsafe nodes? Guido seems to feel that it is okay for the programmer to create them and Python should have a way of collecting them. Tim seems to feel that the programmer should not create them in the first place. I agree with Tim. If topological finalization is used, it is possible for the programmer to design their classes so that this problem does not happen. This is explained on Hans Boehm's finalization web page. If the programmer can or does not redesign their classes I don't think it is unreasonable to leak memory. We can link these cycles to a global list of garbage or print a debugging message. This is a large improvement over the current situation (ie. leaking memory with no debugging even for cycles without finalizers). Neil -- "If you're a great programmer, you make all the routines depend on each other, so little mistakes can really hurt you." -- Bill Gates, ca. 1985. From gstein at lyra.org Thu Mar 9 20:50:29 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 9 Mar 2000 11:50:29 -0800 (PST) Subject: [Python-Dev] finalization again In-Reply-To: <20000309123731.A3664@acs.ucalgary.ca> Message-ID: On Thu, 9 Mar 2000 nascheme at enme.ucalgary.ca wrote: >... > If the programmer can or does not redesign their classes I don't > think it is unreasonable to leak memory. We can link these > cycles to a global list of garbage or print a debugging message. > This is a large improvement over the current situation (ie. > leaking memory with no debugging even for cycles without > finalizers). I think we throw an error (as a subclass of MemoryError). As an alternative, is it possible to move those cycles to the garbage list and then never look at them again? That would speed up future collection processing. Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido at python.org Thu Mar 9 20:51:46 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 09 Mar 2000 14:51:46 -0500 Subject: [Python-Dev] finalization again In-Reply-To: Your message of "Thu, 09 Mar 2000 11:50:29 PST." References: Message-ID: <200003091951.OAA26184@eric.cnri.reston.va.us> > As an alternative, is it possible to move those cycles to the garbage list > and then never look at them again? That would speed up future collection > processing. With the current approach, that's almost automatic :-) I'd rather reclaim the memory too. --Guido van Rossum (home page: http://www.python.org/~guido/) From gmcm at hypernet.com Thu Mar 9 20:54:16 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Thu, 9 Mar 2000 14:54:16 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <000401bf89ea$e6e54180$79a0143f@tim> References: <38C7C732.D9086C34@interet.com> Message-ID: <1259490837-400325@hypernet.com> [Tim re popen on Windows] ... > the debugger -- it's hung inside an MS DLL. "dir" is not entirely arbitrary > here: for *some* cmds it works fine, for others not. The set of which work > appears to vary across Windows flavors. Sometimes you can worm around it by > wrapping "a bad" cmd in a .bat file, and popen'ing the latter instead; but > sometimes not. It doesn't work for commands builtin to whatever "shell" you're using. That's different between cmd and command, and the various flavors, versions and extensions thereof. FWIW, I gave up a long time ago. I use redirection and a tempfile. The few times I've wanted "interactive" control, I've used Win32Process, dup'ed, inherited handles... the whole 9 yards. Why? Look at all the questions about popen and child processes in general, on platforms where it *works*, (if it weren't for Donn Cave, nobody'd get it to work anywhere ). To reiterate Tim's point: *none* of the c runtime routines for process control on Windows are adequate (beyond os.system and living with a DOS box popping up). The raw Win32 CreateProcess does everything you could possibly want, but takes a week or more to understand, (if this arg is a that, then that arg is a whatsit, and the next is limited to the values X and Z unless...). your-brain-on-Windows-ly y'rs - Gordon From guido at python.org Thu Mar 9 20:55:23 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 09 Mar 2000 14:55:23 -0500 Subject: [Python-Dev] finalization again In-Reply-To: Your message of "Thu, 09 Mar 2000 04:40:26 EST." <000701bf89ab$80cb8e20$0d2d153f@tim> References: <000701bf89ab$80cb8e20$0d2d153f@tim> Message-ID: <200003091955.OAA26217@eric.cnri.reston.va.us> [Tim describes a more formal approach based on maximal strongly connected components (SCCs).] I like the SCC approach -- it's what I was struggling to invent but came short of discovering. However: [me] > > What's left on the first list then consists of finalizer-free garbage. > > We dispose of this garbage by clearing dicts and lists. Hopefully > > this makes the refcount of some of the finalizers go to zero -- those > > are finalized in the normal way. [Tim] > In Python it's even possible for a finalizer to *install* a __del__ method > that didn't previously exist, into the class of one of the objects on your > "first list". The scheme above is meant to be bulletproof in the face of > abuses even I can't conceive of . Are you *sure* your scheme deals with this? Let's look at an example. (Again, lowercase nodes have no finalizers.) Take G: a <=> b -> C This is G' (a and b are strongly connected): a' -> C' C is not reachable from any root node. We decide to clear a and b. Let's suppose we happen to clear b first. This removes the last reference to C, C's finalizer runs, and it installs a finalizer on a.__class__. So now a' has turned into A', and we're halfway committing a crime we said we would never commit (touching cyclical trash with finalizers). I propose to disregard this absurd possibility, except to the extent that Python shouldn't crash -- but we make no guarantees to the user. > More mundanely, clearing an item on your first list can cause a chain of > events that runs a finalizer, which in turn can resurrect one of the objects > on your first list (and so it should *not* get reclaimed). Without doing > the SCC bit, I don't think you can out-think that (the reasoning above > showed that the finalizer can't resurrect something in the *same* SCC as the > object that started it all, but that argument cannot be extended to objects > in other safe SCCs: they're vulnerable). I don't think so. While my poor wording ("finalizer-free garbage") didn't make this clear, my references to earlier algorithms were intended to imply that this is garbage that consists of truly unreachable objects. I have three lists: let's call them T(rash), R(oot-reachable), and F(inalizer-reachable). The Schemenauer c.s. algorithm moves all reachable nodes to R. I then propose to move all finalizers to F, and to run another pass of Schemenauer c.s. to also move all finalizer-reachable (but not root-reachable) nodes to F. I truly believe that (barring the absurdity of installing a new __del__) the objects on T at this point cannot be resurrected by a finalizer that runs, since they aren't reachable from any finalizers: by virtue of Schemenauer c.s. (which computes a reachability closure given some roots) anything reachable from a finalizer is on F by now (if it isn't on R -- again, nothing on T is reachable from R, because R is calculated a closure). So, unless there's still a bug in my thinking here, I think that as long as we only want to clear SCCs with 0 finalizers, T is exactly the set of nodes we're looking for. > This time the echo came back distorted : > > [Boehm] > Cycles involving one or more finalizable objects are never finalized. > > A<=>b is "a cycle involving one or more finalizable objects", so he won't > touch it. The scheme at the top doesn't either. If you handed him your > *derived* graph (but also without the self-loops), he would; me too. KISS! > > > Note that we're now calling finalizers on objects with a non-zero > > refcount. > > I don't know why you want to do this. As the next several paragraphs > confirm, it creates real headaches for the implementation, and I'm unclear > on what it buys in return. Is "we'll do something by magic for cycles with > no more than one finalizer" a major gain for the user over "we'll do > something by magic for cycles with no finalizer"? 0, 1 and infinity *are* > the only interesting numbers , but the difference between 0 and 1 > *here* doesn't seem to me worth signing up for any pain at all. I do have a reason: if a maximal SCC has only one finalizer, there can be no question about the ordering between finalizer calls. And isn't the whole point of this discussion to have predictable ordering of finalizer calls in the light of trash recycling? > I would have no objection to "__del__ called only once" if it weren't for > that Python currently does something different. I don't know whether people > rely on that now; if they do, it's a much more dangerous thing to change > than adding a new keyword (the compiler gives automatic 100% coverage of the > latter; but nothing mechanical can help people track down reliance-- whether > deliberate or accidental --on the former). [...] > But none of this self-sampling is going to comfort some guy in France who > has a megaline of code relying on it. Good *bet*, though . OK -- so your objection is purely about backwards compatibility. Apart from that, I strongly feel that the only-once rule is a good one. And I don't think that the compatibility issue weighs very strongly here (given all the other problems that typically exist with __del__). > I see Marc-Andre already declined to get sucked into the magical part of > this . Greg should speak for his scheme, and I haven't made time to > understand it fully; my best guess is to call x.__cleanup__ for every object > in the SCC (but there's no clear way to decide which order to call them in, > and unless they're more restricted than __del__ methods they can create all > the same problems __del__ methods can!). Yes, but at least since we're defining a new API (in a reserved portion of the method namespace) there are no previous assumptions to battle. > > Note that I'd like some implementation freedom: I may not want to > > bother with the graph reduction algorithm at first (which seems very > > hairy) so I'd like to have the right to use the __cleanup__ API > > as soon as I see finalizers in cyclical trash. I don't mind disposing > > of finalizer-free cycles first, but once I have more than one > > finalizer left in the remaining cycles, I'd like the right not to > > reduce the graph for topsort reasons -- that algorithm seems hard. > > I hate to be realistic , but modern GC algorithms are among the > hardest you'll ever see in any field; even the outer limits of what we've > talked about here is baby stuff. Sun's Java group (the one in Chelmsford, > MA, down the road from me) had a group of 4+ people (incl. the venerable Mr. > Steele) working full-time for over a year on the last iteration of Java's > GC. The simpler BDW is a megabyte of code spread over 100+ files. Etc -- > state of the art GC can be crushingly hard. > > So I've got nothing against taking shortcuts at first -- there's actually no > realistic alternative. I think we're overlooking the obvious one, though: > if any finalizer appears in any trash cycle, tough luck. Python 3000 -- > which may be a spelling of 1.7 , but doesn't *need* to be a spelling > of 1.6. Kind of sad though -- finally knowing about cycles and then not being able to do anything about them. > > So we're back to the __cleanup__ design. Strawman proposal: for all > > finalizers in a trash cycle, call their __cleanup__ method, in > > arbitrary order. After all __cleanup__ calls are done, if the objects > > haven't all disposed of themselves, they are all garbage-collected > > without calling __del__. (This seems to require another garbage > > colelction cycle -- so perhaps there should also be a once-only rule > > for __cleanup__?) > > > > Separate question: what if there is no __cleanup__? This should > > probably be reported: "You have cycles with finalizers, buddy! What > > do you want to do about them?" This same warning could be given when > > there is a __cleanup__ but it doesn't break all cycles. > > If I *ever* have a trash cycle with a finalizer in my code (> 0 -- "exactly > 1" isn't special to me), I will consider it to be a bug. So I want a way to > get it back from gc, so I can see what the heck it is, so I can fix my code > (or harass whoever did it to me). __cleanup__ suffices for that, so the > very act of calling it is all I'm really after ("Python invoked __cleanup__ > == Tim has a bug"). > > But after I outgrow that , I'll certainly want the option to get > another kind of complaint if __cleanup__ doesn't break the cycles, and after > *that* I couldn't care less. I've given you many gracious invitations to > say that you don't mind leaking in the face of a buggy program , but > as you've declined so far, I take it that never hearing another gripe about > leaking is a Primary Life Goal. So collection without calling __del__ is > fine -- but so is collection with calling it! If we're going to (at least > implicitly) approve of this stuff, it's probably better *to* call __del__, > if for no other reason than to catch your case of some poor innocent object > caught in a cycle not of its making that expects its __del__ to abort > starting World War III if it becomes unreachable . I suppose we can print some obnoxious message to stderr like """Your program has created cyclical trash involving one or more objects with a __del__ method; calling their __cleanup__ method didn't resolve the cycle(s). I'm going to call the __del__ method(s) but I can't guarantee that they will be called in a meaningful order, because of the cyclical dependencies.""" But I'd still like to reclaim the memory. If this is some long-running server process that is executing arbitrary Python commands sent to it by clients, it's not nice to leak, period. (Because of this, I will also need to trace functions, methods and modules -- these create massive cycles that currently require painful cleanup. Of course I also need to track down all the roots then... :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein at lyra.org Thu Mar 9 20:59:48 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 9 Mar 2000 11:59:48 -0800 (PST) Subject: [Python-Dev] finalization again In-Reply-To: <200003091951.OAA26184@eric.cnri.reston.va.us> Message-ID: On Thu, 9 Mar 2000, Guido van Rossum wrote: > > As an alternative, is it possible to move those cycles to the garbage list > > and then never look at them again? That would speed up future collection > > processing. > > With the current approach, that's almost automatic :-) > > I'd rather reclaim the memory too. Well, yah. I would too :-) I'm at ApacheCon right now, so haven't read the thread in detail, but it seems that people saw my algorithm as a bit too complex. Bah. IMO, it's a pretty straightforward way for the interpreter to get cycles cleaned up. (whether the objects in the cycles are lists/dicts, class instances, or extension types!) Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Thu Mar 9 21:18:06 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 9 Mar 2000 12:18:06 -0800 (PST) Subject: [Python-Dev] finalization again In-Reply-To: <200003091955.OAA26217@eric.cnri.reston.va.us> Message-ID: On Thu, 9 Mar 2000, Guido van Rossum wrote: >... > I don't think so. While my poor wording ("finalizer-free garbage") > didn't make this clear, my references to earlier algorithms were > intended to imply that this is garbage that consists of truly > unreachable objects. I have three lists: let's call them T(rash), > R(oot-reachable), and F(inalizer-reachable). The Schemenauer > c.s. algorithm moves all reachable nodes to R. I then propose to move > all finalizers to F, and to run another pass of Schemenauer c.s. to > also move all finalizer-reachable (but not root-reachable) nodes to F. >... > [Tim Peters] > > I see Marc-Andre already declined to get sucked into the magical part of > > this . Greg should speak for his scheme, and I haven't made time to > > understand it fully; my best guess is to call x.__cleanup__ for every object > > in the SCC (but there's no clear way to decide which order to call them in, > > and unless they're more restricted than __del__ methods they can create all > > the same problems __del__ methods can!). My scheme was to identify objects in F, but only those with a finalizer (not the closure). Then call __cleanup__ on each of them, in arbitrary order. If any are left after the sequence of __cleanup__ calls, then I call it an error. [ note that my proposal defined checking for a finalizer by calling tp_clean(TPCLEAN_CARE_CHECK); this accounts for class instances and for extension types with "heavy" processing in tp_dealloc ] The third step was to use tp_clean to try and clean all other objects in a safe fashion. Specifically: the objects have no finalizers, so there is no special care needed in finalizing, so this third step should nuke references that are stored in the object. This means object pointers are still valid (we haven't dealloc'd), but the insides have been emptied. If the third step does not remove all cycles, then one of the PyType objects did not remove all references during the tp_clean call. >... > > If I *ever* have a trash cycle with a finalizer in my code (> 0 -- "exactly > > 1" isn't special to me), I will consider it to be a bug. So I want a way to > > get it back from gc, so I can see what the heck it is, so I can fix my code > > (or harass whoever did it to me). __cleanup__ suffices for that, so the > > very act of calling it is all I'm really after ("Python invoked __cleanup__ > > == Tim has a bug"). Agreed. >... > I suppose we can print some obnoxious message to stderr like A valid alternative to raising an exception, but it falls into the whole trap of "where does stderr go?" >... > But I'd still like to reclaim the memory. If this is some > long-running server process that is executing arbitrary Python > commands sent to it by clients, it's not nice to leak, period. If an exception is raised, the top-level server loop can catch it, log the error, and keep going. But yes: it will leak. > (Because of this, I will also need to trace functions, methods and > modules -- these create massive cycles that currently require painful > cleanup. Of course I also need to track down all the roots > then... :-) Yes. It would be nice to have these participate in the "cleanup protocol" that I've described. It should help a lot at Python finalization time, effectively moving some special casing from import.c to the objects themselves. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jim at interet.com Thu Mar 9 21:20:23 2000 From: jim at interet.com (James C. Ahlstrom) Date: Thu, 09 Mar 2000 15:20:23 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: <000401bf89ea$e6e54180$79a0143f@tim> Message-ID: <38C80787.7791A1A6@interet.com> Tim Peters wrote: > Screw the docs. Pretend you're a newbie and *try* it. I did try it. > > import os > p = os.popen("dir") > while 1: > line = p.readline() > if not line: > break > print line > > Type that in by hand, or stick it in a file & run it from a cmdline > python.exe (which is a Windows console program). Under Win95 the process > freezes solid, and even trying to close the DOS box doesn't work. You have > to bring up the task manager and kill it that way. I once traced this under Point on the curve: This program works perfectly on my machine running NT. > libc pipes ares as flaky as libc popen under Windows, Jim! MarkH has the > only versions of these things that come close to working under Windows (he > wraps the native Win32 spellings of these things; MS's libc entry points > (which Python uses now) are much worse). I believe you when you say popen() is flakey. It is a little harder to believe it is not possible to write a _popen() replacement using pipes which works. Of course I wanted you to do it instead of me! Well, if I get any time before 1.6 comes out... JimA From gstein at lyra.org Thu Mar 9 21:31:38 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 9 Mar 2000 12:31:38 -0800 (PST) Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <38C80787.7791A1A6@interet.com> Message-ID: On Thu, 9 Mar 2000, James C. Ahlstrom wrote: >... > > libc pipes ares as flaky as libc popen under Windows, Jim! MarkH has the > > only versions of these things that come close to working under Windows (he > > wraps the native Win32 spellings of these things; MS's libc entry points > > (which Python uses now) are much worse). > > I believe you when you say popen() is flakey. It is a little > harder to believe it is not possible to write a _popen() > replacement using pipes which works. > > Of course I wanted you to do it instead of me! Well, if > I get any time before 1.6 comes out... It *has* been done. Bill Tutt did it a long time ago. That's what win32pipe is all about. -g -- Greg Stein, http://www.lyra.org/ From jim at interet.com Thu Mar 9 22:04:59 2000 From: jim at interet.com (James C. Ahlstrom) Date: Thu, 09 Mar 2000 16:04:59 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipestuff going to be adopted? References: Message-ID: <38C811FB.B6096FA4@interet.com> Greg Stein wrote: > > On Thu, 9 Mar 2000, James C. Ahlstrom wrote: > > Of course I wanted you to do it instead of me! Well, if > > I get any time before 1.6 comes out... > > It *has* been done. Bill Tutt did it a long time ago. That's what > win32pipe is all about. Thanks for the heads up! Unfortunately, win32pipe is not in the core, and probably covers more ground than just popen() and so might be a maintenance problem. And popen() is not written in it anyway. So we are Not There Yet (TM). Which I guess was Tim's original point. JimA From mhammond at skippinet.com.au Thu Mar 9 22:36:14 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri, 10 Mar 2000 08:36:14 +1100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <38C80787.7791A1A6@interet.com> Message-ID: > Point on the curve: This program works perfectly on my > machine running NT. And running from Python.exe. I bet you didnt try it from a GUI. The situation is worse WRT Windows 95. MS has a knowledge base article describing the bug, and telling you how to work around it by using a decicated .EXE. So, out of the box, popen works only on a NT from a console - pretty sorry state of affairs :-( > I believe you when you say popen() is flakey. It is a little > harder to believe it is not possible to write a _popen() > replacement using pipes which works. Which is what I believe win32pipe.popen* are. Mark. From guido at python.org Fri Mar 10 02:13:51 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 09 Mar 2000 20:13:51 -0500 Subject: [Python-Dev] writelines() not thread-safe Message-ID: <200003100113.UAA27337@eric.cnri.reston.va.us> Christian Tismer just did an exhaustive search for thread unsafe use of Python operations, and found two weaknesses. One is posix.listdir(), which I had already found; the other is file.writelines(). Here's a program that demonstrates the bug; basically, while writelines is walking down the list, another thread could truncate the list, causing PyList_GetItem() to fail or a string object to be deallocated while writelines is using it. On my SOlaris 7 system it typically crashes in the first or second iteration. It's easy to fix: just don't use release the interpreter lock (get rid of Py_BEGIN_ALLOW_THREADS c.s.). This would however prevent other threads from doing any work while this thread may be blocked for I/O. An alternative solution is to put Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS just around the fwrite() call. This is safe, but would require a lot of lock operations and would probably slow things down too much. Ideas? --Guido van Rossum (home page: http://www.python.org/~guido/) import os import sys import thread import random import time import tempfile def good_guy(fp, list): t0 = time.time() fp.seek(0) fp.writelines(list) t1 = time.time() print fp.tell(), "bytes written" return t1-t0 def bad_guy(dt, list): time.sleep(random.random() * dt) del list[:] def main(): infn = "/usr/dict/words" if sys.argv[1:]: infn = sys.argv[1] print "reading %s..." % infn fp = open(infn) list = fp.readlines() fp.close() print "read %d lines" % len(list) tfn = tempfile.mktemp() fp = None try: fp = open(tfn, "w") print "calibrating..." dt = 0.0 n = 3 for i in range(n): dt = dt + good_guy(fp, list) dt = dt / n # average time it took to write the list to disk print "dt =", round(dt, 3) i = 0 while 1: i = i+1 print "test", i copy = map(lambda x: x[1:], list) thread.start_new_thread(bad_guy, (dt, copy)) good_guy(fp, copy) finally: if fp: fp.close() try: os.unlink(tfn) except os.error: pass main() From tim_one at email.msn.com Fri Mar 10 03:13:51 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 9 Mar 2000 21:13:51 -0500 Subject: [Python-Dev] writelines() not thread-safe In-Reply-To: <200003100113.UAA27337@eric.cnri.reston.va.us> Message-ID: <000601bf8a36$46ebf880$58a2143f@tim> [Guido van Rossum] > Christian Tismer just did an exhaustive search for thread unsafe use > of Python operations, and found two weaknesses. One is > posix.listdir(), which I had already found; the other is > file.writelines(). Here's a program that demonstrates the bug; > basically, while writelines is walking down the list, another thread > could truncate the list, causing PyList_GetItem() to fail or a string > object to be deallocated while writelines is using it. On my SOlaris > 7 system it typically crashes in the first or second iteration. > > It's easy to fix: just don't use release the interpreter lock (get rid > of Py_BEGIN_ALLOW_THREADS c.s.). This would however prevent other > threads from doing any work while this thread may be blocked for I/O. > > An alternative solution is to put Py_BEGIN_ALLOW_THREADS and > Py_END_ALLOW_THREADS just around the fwrite() call. This is safe, but > would require a lot of lock operations and would probably slow things > down too much. > > Ideas? 2.5: 1: Before releasing the lock, make a shallow copy of the list. 1.5: As in #1, but iteratively peeling off "the next N" values, for some N balancing the number of lock operations against the memory burden (I don't care about the speed of a shallow copy here ...). 2. Pull the same trick list.sort() uses: make the list object immutable for the duration (I know you think that's a hack, and it is , but it costs virtually nothing and would raise an approriate error when they attempted the insane mutation). I actually like #2 best now, but won't in the future, because file_writelines() should really accept an argument of any sequence type. This makes 1.5 a better long-term hack. although-adding-1.5-to-1.6-is-confusing-ly y'rs - tim From tim_one at email.msn.com Fri Mar 10 03:52:26 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 9 Mar 2000 21:52:26 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <1259490837-400325@hypernet.com> Message-ID: <000901bf8a3b$ab314660$58a2143f@tim> [Gordon McM, aspires to make sense of the mess] > It doesn't work for commands builtin to whatever "shell" you're > using. That's different between cmd and command, and the > various flavors, versions and extensions thereof. It's not that simple, either; e.g., old apps invoking the 16-bit subsystem can screw up too. Look at Tcl's man page for "exec" and just *try* to wrap your brain around all the caveats they were left with after throwing a few thousand lines of C at this under their Windows port . > FWIW, I gave up a long time ago. I use redirection and a > tempfile. The few times I've wanted "interactive" control, I've > used Win32Process, dup'ed, inherited handles... the whole 9 > yards. Why? Look at all the questions about popen and child > processes in general, on platforms where it *works*, (if it > weren't for Donn Cave, nobody'd get it to work anywhere ). Donn is downright scary that way. I stopped using 'em too, of course. > To reiterate Tim's point: *none* of the c runtime routines for > process control on Windows are adequate (beyond os.system > and living with a DOS box popping up). No, os.system is a problem under command.com flavors of Windows too, as system spawns a new shell and command.com's exit code is *always* 0. So Python's os.system returns 0 no matter what app the user *thinks* they were running, and whether it worked or set the baby on fire. > The raw Win32 CreateProcess does everything you could possibly want, but > takes a week or more to understand, (if this arg is a that, then that arg > is a whatsit, and the next is limited to the values X and Z unless...). Except that CreateProcess doesn't handle shell metacharacters, right? Tcl is the only language I've seen that really works hard at making cmdline-style process control portable. so-all-we-need-to-do-is-a-single-createprocess-to-invoke-tcl-ly y'rs - tim From tim_one at email.msn.com Fri Mar 10 03:52:24 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 9 Mar 2000 21:52:24 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <14535.46175.991970.135642@weyr.cnri.reston.va.us> Message-ID: <000801bf8a3b$aa0c4e60$58a2143f@tim> [Fred L. Drake, Jr.] > Tim (& others), > Would this additional text be sufficient for the os.popen() > documentation? > > \strong{Note:} This function behaves unreliably under Windows > due to the native implementation of \cfunction{popen()}. Yes, that's good! If Mark/Bill's alternatives don't make it in, would also be good to point to the PythonWin extensions (although MarkH will have to give us the Official Name for that). > If someone cares to explain what's weird about it, that might be > appropriate as well, but I've never used this under Windows. As the rest of this thread should have made abundantly clear by now <0.9 wink>, it's such a mess across various Windows flavors that nobody can explain it. From tim_one at email.msn.com Fri Mar 10 04:15:18 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 9 Mar 2000 22:15:18 -0500 Subject: [Python-Dev] RE: finalization again In-Reply-To: <20000309123731.A3664@acs.ucalgary.ca> Message-ID: <000a01bf8a3e$dc8878c0$58a2143f@tim> Quickie: [Tim] >> It's not obvious, but the SCCs can be found in linear time (via Tarjan's >> algorithm, which is simple but subtle; [NeilS] > Wow, it seems like it should be more expensive than that. Oh yes! Many bright people failed to discover the trick; Tarjan didn't discover it until (IIRC) the early 70's, and it was a surprise. It's just a few lines of simple code added to an ordinary depth-first search. However, while the code is simple, a correctness proof is not. BTW, if it wasn't clear, when talking about graph algorithms "linear" is usual taken to mean "in the sum of the number of nodes and edges". Cyclops.py finds all the cycles in linear time in that sense, too (but does not find the SCCs in linear time, at least not in theory -- in practice you can't tell the difference ). > What are the space requirements? Same as depth-first search, plus a way to associate an SCC id with each node, plus a single global "id" vrbl. So it's worst-case linear (in the number of nodes) space. See, e.g., any of the books in Sedgewick's "Algorithms in [Language du Jour]" series for working code. > Also, does the simple algorithm you used in Cyclops have a name? Not officially, but it answers to "hey, dumb-ass!" . then-again-so-do-i-so-make-eye-contact-ly y'rs - tim From bwarsaw at cnri.reston.va.us Fri Mar 10 05:21:46 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 9 Mar 2000 23:21:46 -0500 (EST) Subject: [Python-Dev] finalization again References: <000701bf89ab$80cb8e20$0d2d153f@tim> <200003091955.OAA26217@eric.cnri.reston.va.us> Message-ID: <14536.30810.720836.886023@anthem.cnri.reston.va.us> Okay, I had a flash of inspiration on the way home from my gig tonight. Of course, I'm also really tired so I'm sure Tim will shoot this down in his usual witty but humbling way. I just had to get this out or I wouldn't sleep tonight. What if you timestamp instances when you create them? Then when you have trash cycles with finalizers, you sort them and finalize in chronological order. The nice thing here is that the user can have complete control over finalization order by controlling object creation order. Some random thoughts: - Finalization order of cyclic finalizable trash is completely deterministic. - Given sufficient resolution of your system clock, you should never have two objects with the same timestamp. - You could reduce the memory footprint by only including a timestamp for objects whose classes have __del__'s at instance creation time. Sticking an __del__ into your class dynamically would have no effect on objects that are already created (and I wouldn't poke you with a pointy stick if even post-twiddle instances didn't get timestamped). Thus, such objects would never be finalized -- tough luck. - FIFO order /seems/ more natural to me than FILO, but then I rarely create cyclic objects, and almost never use __del__, so this whole argument has been somewhat academic to me :). - The rule seems easy enough to implement, describe, and understand. I think I came up with a few more points on the drive home, but my post jam, post lightbulb endorphodrenalin rush is quickly subsiding, so I leave the rest until tomorrow. its-simply-a-matter-of-time-ly y'rs, -Barry From moshez at math.huji.ac.il Fri Mar 10 06:32:41 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Fri, 10 Mar 2000 07:32:41 +0200 (IST) Subject: [Python-Dev] finalization again In-Reply-To: Message-ID: On Thu, 9 Mar 2000, Greg Stein wrote: > > But I'd still like to reclaim the memory. If this is some > > long-running server process that is executing arbitrary Python > > commands sent to it by clients, it's not nice to leak, period. > > If an exception is raised, the top-level server loop can catch it, log the > error, and keep going. But yes: it will leak. And Tim's version stops the leaking if the server is smart enough: occasionally, it will call gc.get_dangerous_cycles(), and nuke everything it finds there. (E.g., clean up dicts and lists). Some destructor raises an exception? Ignore it (or whatever). And no willy-nilly "but I'm using a silly OS which has hardly any concept of stderr" problems! If the server wants, it can just send a message to the log. rooting-for-tim-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From tim_one at email.msn.com Fri Mar 10 09:18:29 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 10 Mar 2000 03:18:29 -0500 Subject: [Python-Dev] finalization again In-Reply-To: <200003091955.OAA26217@eric.cnri.reston.va.us> Message-ID: <000001bf8a69$37d57b40$812d153f@tim> This is getting to be fun, but afraid I can only make time for the first easy one tonight: [Tim, conjures a horrid vision of finalizers installing new __del__ methods, then sez ... ] > The scheme above is meant to be bulletproof in the face of abuses even > I can't conceive of . [Guido] > Are you *sure* your scheme deals with this? Never said it did -- only that it *meant* to . Ya, you got me. The things I thought I had *proved* I put in the numbered list, and in a rush put the speculative stuff in the reply body. One practical thing I think I can prove today: after finding SCCs, and identifying the safe nodes without predecessors, all such nodes S1, S2, ... can be cleaned up without fear of resurrection, or of cleaning something in Si causing anything in Sj (i!=j) to get reclaimed either (at the time I wrote it, I could only prove that cleaning *one* Si was non-problematic). Barring, of course, this "__del__ from hell" pathology. Also suspect that this claim is isomorphic to your later elaboration on why the objects on T at this point cannot be resurrected by a finalizer that runs, since they aren't reachable from any finalizers That is, exactly the same is true of "the safe (SCC super)nodes without predecessors", so I expect we've just got two ways of identifying the same set here. Perhaps yours is bigger, though (I realize that isn't clear; later). > Let's look at an example. > (Again, lowercase nodes have no finalizers.) Take G: > > a <=> b -> C > > [and cleaning b can trigger C.__del__ which can create > a.__class__.__del__ before a is decref'ed ...] > > ... and we're halfway committing a crime we said we would never commit > (touching cyclical trash with finalizers). Wholly agreed. > I propose to disregard this absurd possibility, How come you never propose to just shoot people <0.9 wink>? > except to the extent that Python shouldn't crash -- but we make no > guarantees to the user. "Shouldn't crash" is essential, sure. Carry it another step: after C is finalized, we get back to the loop clearing b.__dict__, and the refcount on "a" falls to 0 next. So the new a.__del__ gets called. Since b was visible to a, it's possible for a.__del__ to resurrect b, which latter is now in some bizarre (from the programmer's POV) cleared state (or even in the bit bucket, if we optimistically reclaim b's memory "early"!). I can't (well, don't want to ) believe it will be hard to stop this. It's just irksome to need to think about it at all. making-java's-gc-look-easy?-ly y'rs - tim From guido at python.org Fri Mar 10 14:46:43 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Mar 2000 08:46:43 -0500 Subject: [Python-Dev] finalization again In-Reply-To: Your message of "Thu, 09 Mar 2000 23:21:46 EST." <14536.30810.720836.886023@anthem.cnri.reston.va.us> References: <000701bf89ab$80cb8e20$0d2d153f@tim> <200003091955.OAA26217@eric.cnri.reston.va.us> <14536.30810.720836.886023@anthem.cnri.reston.va.us> Message-ID: <200003101346.IAA27847@eric.cnri.reston.va.us> > What if you timestamp instances when you create them? Then when you > have trash cycles with finalizers, you sort them and finalize in > chronological order. The nice thing here is that the user can have > complete control over finalization order by controlling object > creation order. > > Some random thoughts: > > - Finalization order of cyclic finalizable trash is completely > deterministic. > > - Given sufficient resolution of your system clock, you should never > have two objects with the same timestamp. Forget the clock -- just use a counter that is incremented on each allocation. > - You could reduce the memory footprint by only including a timestamp > for objects whose classes have __del__'s at instance creation time. > Sticking an __del__ into your class dynamically would have no effect > on objects that are already created (and I wouldn't poke you with a > pointy stick if even post-twiddle instances didn't get > timestamped). Thus, such objects would never be finalized -- tough > luck. > > - FIFO order /seems/ more natural to me than FILO, but then I rarely > create cyclic objects, and almost never use __del__, so this whole > argument has been somewhat academic to me :). Ai, there's the rub. Suppose I have a tree with parent and child links. And suppose I have a rule that children need to be finalized before their parents (maybe they represent a Unix directory tree, where you must rm the files before you can rmdir the directory). This suggests that we should choose LIFO: you must create the parents first (you have to create a directory before you can create files in it). However, now we add operations to move nodes around in the tree. Suddenly you can have a child that is older than its parent! Conclusion: the creation time is useless; the application logic and actual link relationships are needed. > - The rule seems easy enough to implement, describe, and understand. > > I think I came up with a few more points on the drive home, but my > post jam, post lightbulb endorphodrenalin rush is quickly subsiding, > so I leave the rest until tomorrow. > > its-simply-a-matter-of-time-ly y'rs, > -Barry Time flies like an arrow -- fruit flies like a banana. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Mar 10 16:06:48 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Mar 2000 10:06:48 -0500 Subject: [Python-Dev] writelines() not thread-safe In-Reply-To: Your message of "Thu, 09 Mar 2000 21:13:51 EST." <000601bf8a36$46ebf880$58a2143f@tim> References: <000601bf8a36$46ebf880$58a2143f@tim> Message-ID: <200003101506.KAA28358@eric.cnri.reston.va.us> OK, here's a patch for writelines() that supports arbitrary sequences and fixes the lock problem using Tim's solution #1.5 (slicing 1000 items at a time). It contains a fast path for when the argument is a list, using PyList_GetSlice; otherwise it uses PyObject_GetItem and a fixed list. Please have a good look at this; I've only tested it lightly. --Guido van Rossum (home page: http://www.python.org/~guido/) Index: fileobject.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/fileobject.c,v retrieving revision 2.70 diff -c -r2.70 fileobject.c *** fileobject.c 2000/02/29 13:59:28 2.70 --- fileobject.c 2000/03/10 14:55:47 *************** *** 884,923 **** PyFileObject *f; PyObject *args; { ! int i, n; if (f->f_fp == NULL) return err_closed(); ! if (args == NULL || !PyList_Check(args)) { PyErr_SetString(PyExc_TypeError, ! "writelines() requires list of strings"); return NULL; } ! n = PyList_Size(args); ! f->f_softspace = 0; ! Py_BEGIN_ALLOW_THREADS ! errno = 0; ! for (i = 0; i < n; i++) { ! PyObject *line = PyList_GetItem(args, i); ! int len; ! int nwritten; ! if (!PyString_Check(line)) { ! Py_BLOCK_THREADS ! PyErr_SetString(PyExc_TypeError, ! "writelines() requires list of strings"); return NULL; } ! len = PyString_Size(line); ! nwritten = fwrite(PyString_AsString(line), 1, len, f->f_fp); ! if (nwritten != len) { ! Py_BLOCK_THREADS ! PyErr_SetFromErrno(PyExc_IOError); ! clearerr(f->f_fp); ! return NULL; } } ! Py_END_ALLOW_THREADS Py_INCREF(Py_None); ! return Py_None; } static PyMethodDef file_methods[] = { --- 884,975 ---- PyFileObject *f; PyObject *args; { ! #define CHUNKSIZE 1000 ! PyObject *list, *line; ! PyObject *result; ! int i, j, index, len, nwritten, islist; ! if (f->f_fp == NULL) return err_closed(); ! if (args == NULL || !PySequence_Check(args)) { PyErr_SetString(PyExc_TypeError, ! "writelines() requires sequence of strings"); return NULL; } ! islist = PyList_Check(args); ! ! /* Strategy: slurp CHUNKSIZE lines into a private list, ! checking that they are all strings, then write that list ! without holding the interpreter lock, then come back for more. */ ! index = 0; ! if (islist) ! list = NULL; ! else { ! list = PyList_New(CHUNKSIZE); ! if (list == NULL) return NULL; + } + result = NULL; + + for (;;) { + if (islist) { + Py_XDECREF(list); + list = PyList_GetSlice(args, index, index+CHUNKSIZE); + if (list == NULL) + return NULL; + j = PyList_GET_SIZE(list); } ! else { ! for (j = 0; j < CHUNKSIZE; j++) { ! line = PySequence_GetItem(args, index+j); ! if (line == NULL) { ! if (PyErr_ExceptionMatches(PyExc_IndexError)) { ! PyErr_Clear(); ! break; ! } ! /* Some other error occurred. ! Note that we may lose some output. */ ! goto error; ! } ! if (!PyString_Check(line)) { ! PyErr_SetString(PyExc_TypeError, ! "writelines() requires sequences of strings"); ! goto error; ! } ! PyList_SetItem(list, j, line); ! } ! } ! if (j == 0) ! break; ! ! Py_BEGIN_ALLOW_THREADS ! f->f_softspace = 0; ! errno = 0; ! for (i = 0; i < j; i++) { ! line = PyList_GET_ITEM(list, i); ! len = PyString_GET_SIZE(line); ! nwritten = fwrite(PyString_AS_STRING(line), ! 1, len, f->f_fp); ! if (nwritten != len) { ! Py_BLOCK_THREADS ! PyErr_SetFromErrno(PyExc_IOError); ! clearerr(f->f_fp); ! Py_DECREF(list); ! return NULL; ! } } + Py_END_ALLOW_THREADS + + if (j < CHUNKSIZE) + break; + index += CHUNKSIZE; } ! Py_INCREF(Py_None); ! result = Py_None; ! error: ! Py_XDECREF(list); ! return result; } static PyMethodDef file_methods[] = { From skip at mojam.com Fri Mar 10 16:28:13 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 10 Mar 2000 09:28:13 -0600 Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler Message-ID: <200003101528.JAA15951@beluga.mojam.com> Consider the following snippet of code from MySQLdb.py: try: self._query(query % escape_row(args, qc)) except TypeError: self._query(query % escape_dict(args, qc)) It's not quite right. There are at least four reasons I can think of why the % operator might raise a TypeError: 1. query has not enough format specifiers 2. query has too many format specifiers 3. argument type mismatch between individual format specifier and corresponding argument 4. query expects dist-style interpolation The except clause only handles the last case. That leaves the other three cases mishandled. The above construct pretends that all TypeErrors possible are handled by calling escape_dict() instead of escape_row(). I stumbled on case 2 yesterday and got a fairly useless error message when the code in the except clause also bombed. Took me a few minutes of head scratching to see that I had an extra %s in my format string. A note to Andy Dustman, MySQLdb's author, yielded the following modified version: try: self._query(query % escape_row(args, qc)) except TypeError, m: if m.args[0] == "not enough arguments for format string": raise if m.args[0] == "not all arguments converted": raise self._query(query % escape_dict(args, qc)) This will do the trick for me for the time being. Note, however, that the only way for Andy to decide which of the cases occurred (case 3 still isn't handled above, but should occur very rarely in MySQLdb since it only uses the more accommodating %s as a format specifier) is to compare the string value of the message to see which of the four cases was raised. This strong coupling via the error message text between the exception being raised (in C code, in this case) and the place where it's caught seems bad to me and encourages authors to either not recover from errors or to recover from them in the crudest fashion. If Guido decides to tweak the TypeError message in any fashion, perhaps to include the count of arguments in the format string and argument tuple, this code will break. It makes me wonder if there's not a better mechanism waiting to be discovered. Would it be possible to publish an interface of some sort via the exceptions module that would allow symbolic names or dictionary references to be used to decide which case is being handled? I envision something like the following in exceptions.py: UNKNOWN_ERROR_CATEGORY = 0 TYP_SHORT_FORMAT = 1 TYP_LONG_FORMAT = 2 ... IND_BAD_RANGE = 1 message_map = { # leave (TypeError, ("not enough arguments for format string",)): TYP_SHORT_FORMAT, (TypeError, ("not all arguments converted",)): TYP_LONG_FORMAT, ... (IndexError, ("list index out of range",)): IND_BAD_RANGE, ... } This would isolate the raw text of exception strings to just a single place (well, just one place on the exception handling side of things). It would be used something like try: self._query(query % escape_row(args, qc)) except TypeError, m: from exceptions import * exc_case = message_map.get((TypeError, m.args), UNKNOWN_ERROR_CATEGORY) if exc_case in [UNKNOWN_ERROR_CATEGORY,TYP_SHORT_FORMAT, TYP_LONG_FORMAT]: raise self._query(query % escape_dict(args, qc)) This could be added to exceptions.py without breaking existing code. Does this (or something like it) seem like a reasonable enhancement for Py2K? If we can narrow things down to an implementable solution I'll create a patch. Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From guido at python.org Fri Mar 10 17:17:56 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Mar 2000 11:17:56 -0500 Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler In-Reply-To: Your message of "Fri, 10 Mar 2000 09:28:13 CST." <200003101528.JAA15951@beluga.mojam.com> References: <200003101528.JAA15951@beluga.mojam.com> Message-ID: <200003101617.LAA28722@eric.cnri.reston.va.us> > Consider the following snippet of code from MySQLdb.py: Skip, I'm not familiar with MySQLdb.py, and I have no idea what your example is about. From the rest of the message I feel it's not about MySQLdb at all, but about string formatting, butthe point escapes me because you never quite show what's in the format string and what error that gives. Could you give some examples based on first principles? A simple interactive session showing the various errors would be helpful... --Guido van Rossum (home page: http://www.python.org/~guido/) From gward at cnri.reston.va.us Fri Mar 10 20:05:04 2000 From: gward at cnri.reston.va.us (Greg Ward) Date: Fri, 10 Mar 2000 14:05:04 -0500 Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler In-Reply-To: <200003101617.LAA28722@eric.cnri.reston.va.us>; from guido@python.org on Fri, Mar 10, 2000 at 11:17:56AM -0500 References: <200003101528.JAA15951@beluga.mojam.com> <200003101617.LAA28722@eric.cnri.reston.va.us> Message-ID: <20000310140503.A8619@cnri.reston.va.us> On 10 March 2000, Guido van Rossum said: > Skip, I'm not familiar with MySQLdb.py, and I have no idea what your > example is about. From the rest of the message I feel it's not about > MySQLdb at all, but about string formatting, butthe point escapes me > because you never quite show what's in the format string and what > error that gives. Could you give some examples based on first > principles? A simple interactive session showing the various errors > would be helpful... I think Skip's point was just this: "TypeError" isn't expressive enough. If you catch TypeError on a statement with multiple possible type errors, you don't know which one you caught. Same holds for any exception type, really: a given statement could blow up with ValueError for any number of reasons. Etc., etc. One possible solution, and I think this is what Skip was getting at, is to add an "error code" to the exception object that identifies the error more reliably than examining the error message. It's just the errno/strerror dichotomy: strerror is for users, errno is for code. I think Skip is just saying that Pythone exception objets need an errno (although it doesn't have to be a number). It would probably only make sense to define error codes for exceptions that can be raised by Python itself, though. Greg From skip at mojam.com Fri Mar 10 21:17:30 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 10 Mar 2000 14:17:30 -0600 (CST) Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler In-Reply-To: <200003101617.LAA28722@eric.cnri.reston.va.us> References: <200003101528.JAA15951@beluga.mojam.com> <200003101617.LAA28722@eric.cnri.reston.va.us> Message-ID: <14537.22618.656740.296408@beluga.mojam.com> Guido> Skip, I'm not familiar with MySQLdb.py, and I have no idea what Guido> your example is about. From the rest of the message I feel it's Guido> not about MySQLdb at all, but about string formatting, My apologies. You're correct, it's really not about MySQLdb. It's about handling multiple cases raised by the same exception. First, a more concrete example that just uses simple string formats: code exception "%s" % ("a", "b") TypeError: 'not all arguments converted' "%s %s" % "a" TypeError: 'not enough arguments for format string' "%(a)s" % ("a",) TypeError: 'format requires a mapping' "%d" % {"a": 1} TypeError: 'illegal argument type for built-in operation' Let's presume hypothetically that it's possible to recover from some subset of the TypeErrors that are raised, but not all of them. Now, also presume that the format strings and the tuple, string or dict literals I've given above can be stored in variables (which they can). If we wrap the code in a try/except statement, we can catch the TypeError exception and try to do something sensible. This is precisely the trick that Andy Dustman uses in MySQLdb: first try expanding the format string using a tuple as the RH operand, then try with a dict if that fails. Unfortunately, as you can see from the above examples, there are four cases that need to be handled. To distinguish them currently, you have to compare the message you get with the exception to string literals that are generally defined in C code in the interpreter. Here's what Andy's original code looked like stripped of the MySQLdb-ese: try: x = format % tuple_generating_function(...) except TypeError: x = format % dict_generating_function(...) That doesn't handle the first two cases above. You have to inspect the message that raise sends out: try: x = format % tuple_generating_function(...) except TypeError, m: if m.args[0] == "not all arguments converted": raise if m.args[0] == "not enough arguments for format string": raise x = format % dict_generating_function(...) This comparison of except arguments with hard-coded strings (especially ones the programmer has no direct control over) seems fragile to me. If you decide to reword the error message strings, you break someone's code. In my previous message I suggested collecting this fragility in the exceptions module where it can be better isolated. My solution is a bit cumbersome, but could probably be cleaned up somewhat, but basically looks like try: x = format % tuple_generating_function(...) except TypeError, m: import exceptions msg_case = exceptions.message_map.get((TypeError, m.args), exceptions.UNKNOWN_ERROR_CATEGORY) # punt on the cases we can't recover from if msg_case == exceptions.TYP_SHORT_FORMAT: raise if msg_case == exceptions.TYP_LONG_FORMAT: raise if msg_case == exceptions.UNKNOWN_ERROR_CATEGORY: raise # handle the one we can x = format % dict_generating_function(...) In private email that crossed my original message, Andy suggested defining more standard exceptions, e.g.: class FormatError(TypeError): pass class TooManyElements(FormatError): pass class TooFewElements(FormatError): pass then raising the appropriate error based on the circumstance. Code that catches TypeError exceptions would still work. So there are two possible changes on the table: 1. define more standard exceptions so you can distinguish classes of errors on a more fine-grained basis using just the first argument of the except clause. 2. provide some machinery in exceptions.py to allow programmers a measure of uncoupling from using hard-coded strings to distinguish cases. Skip From skip at mojam.com Fri Mar 10 21:21:11 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 10 Mar 2000 14:21:11 -0600 (CST) Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler In-Reply-To: <20000310140503.A8619@cnri.reston.va.us> References: <200003101528.JAA15951@beluga.mojam.com> <200003101617.LAA28722@eric.cnri.reston.va.us> <20000310140503.A8619@cnri.reston.va.us> Message-ID: <14537.22839.664131.373727@beluga.mojam.com> Greg> One possible solution, and I think this is what Skip was getting Greg> at, is to add an "error code" to the exception object that Greg> identifies the error more reliably than examining the error Greg> message. It's just the errno/strerror dichotomy: strerror is for Greg> users, errno is for code. I think Skip is just saying that Greg> Pythone exception objets need an errno (although it doesn't have Greg> to be a number). It would probably only make sense to define Greg> error codes for exceptions that can be raised by Python itself, Greg> though. I'm actually allowing the string to be used as the error code. If you raise TypeError with "not all arguments converted" as the argument, then that string literal will appear in the definition of exceptions.message_map as part of a key. The programmer would only refer to the args attribute of the object being raised. either-or-makes-no-real-difference-to-me-ly y'rs, Skip From bwarsaw at cnri.reston.va.us Fri Mar 10 21:56:45 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Fri, 10 Mar 2000 15:56:45 -0500 (EST) Subject: [Python-Dev] finalization again References: <000701bf89ab$80cb8e20$0d2d153f@tim> <200003091955.OAA26217@eric.cnri.reston.va.us> <14536.30810.720836.886023@anthem.cnri.reston.va.us> <200003101346.IAA27847@eric.cnri.reston.va.us> Message-ID: <14537.24973.579056.533282@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: >> Given sufficient resolution of your system >> clock, you should never have two objects with the same >> timestamp. GvR> Forget the clock -- just use a counter that is incremented on GvR> each allocation. Good idea. GvR> Suppose I have a tree with parent and child links. And GvR> suppose I have a rule that children need to be finalized GvR> before their parents (maybe they represent a Unix directory GvR> tree, where you must rm the files before you can rmdir the GvR> directory). This suggests that we should choose LIFO: you GvR> must create the parents first (you have to create a directory GvR> before you can create files in it). However, now we add GvR> operations to move nodes around in the tree. Suddenly you GvR> can have a child that is older than its parent! Conclusion: GvR> the creation time is useless; the application logic and GvR> actual link relationships are needed. One potential way to solve this is to provide an interface for refreshing the counter; for discussion purposes, I'll call this sys.gcrefresh(obj). Throws a TypeError if obj isn't a finalizable instance. Otherwise, it sets the "timestamp" to the current counter value and increments the counter. Thus, in your example, when the child node is reparented, you sys.gcrefresh(child) and now the parent is automatically older. Of course, what if the child has its own children? You've now got an age graph like this parent > child < grandchild with the wrong age relationship between the parent and grandchild. So when you refresh, you've got to walk down the containment tree making sure your grandkids are "younger" than yourself. E.g.: class Node: ... def __del__(self): ... def reparent(self, node): self.parent = node self.refresh() def refresh(self): sys.gcrefresh(self) for c in self.children: c.refresh() The point to all this is that it gives explicit control of the finalizable cycle reclamation order to the user, via a fairly easy to understand, and manipulate mechanism. twas-only-a-flesh-wound-but-waiting-for-the-next-stroke-ly y'rs, -Barry From jim at interet.com Fri Mar 10 22:14:45 2000 From: jim at interet.com (James C. Ahlstrom) Date: Fri, 10 Mar 2000 16:14:45 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: <000801bf8a3b$aa0c4e60$58a2143f@tim> Message-ID: <38C965C4.B164C2D5@interet.com> Tim Peters wrote: > > [Fred L. Drake, Jr.] > > Tim (& others), > > Would this additional text be sufficient for the os.popen() > > documentation? > > > > \strong{Note:} This function behaves unreliably under Windows > > due to the native implementation of \cfunction{popen()}. > > Yes, that's good! If Mark/Bill's alternatives don't make it in, would also > be good to point to the PythonWin extensions (although MarkH will have to > give us the Official Name for that). Well, it looks like this thread has fizzled out. But what did we decide? Changing the docs to say popen() "doesn't work reliably" is a little weak. Maybe removing popen() is better, and demanding that Windows users use win32pipe. I played around with a patch to posixmodule.c which eliminates _popen() and implements os.popen() using CreatePipe(). It sort of works on NT and fails on 95. Anyway, I am stuck on how to make a Python file object from a pipe handle. Would it be a good idea to extract the Wisdom from win32pipe and re-implement os.popen() either in C or by using win32pipe directly? Using C is simple and to the point. I feel Tim's original complaint that popen() is a Problem still hasn't been fixed. JimA From moshez at math.huji.ac.il Fri Mar 10 22:29:05 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Fri, 10 Mar 2000 23:29:05 +0200 (IST) Subject: [Python-Dev] finalization again In-Reply-To: <14537.24973.579056.533282@anthem.cnri.reston.va.us> Message-ID: On Fri, 10 Mar 2000 bwarsaw at cnri.reston.va.us wrote: > One potential way to solve this is to provide an interface for > refreshing the counter; for discussion purposes, I'll call this > sys.gcrefresh(obj). Barry, there are other problems with your scheme, but I won't even try to point those out: having to call a function whose purpose can only be described in terms of a concrete implementation of a garbage collection scheme is simply unacceptable. I can almost see you shouting "Come back here, I'll bite your legs off" . > The point to all this is that it gives explicit control of the > finalizable cycle reclamation order to the user, via a fairly easy to > understand, and manipulate mechanism. Oh? This sounds like the most horrendus mechanism alive.... you-probably-jammed-a-*little*-too-loud-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From bwarsaw at cnri.reston.va.us Fri Mar 10 23:15:27 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Fri, 10 Mar 2000 17:15:27 -0500 (EST) Subject: [Python-Dev] finalization again References: <14537.24973.579056.533282@anthem.cnri.reston.va.us> Message-ID: <14537.29695.532507.197580@anthem.cnri.reston.va.us> Just throwing out ideas. From DavidA at ActiveState.com Fri Mar 10 23:20:45 2000 From: DavidA at ActiveState.com (David Ascher) Date: Fri, 10 Mar 2000 14:20:45 -0800 Subject: [Python-Dev] finalization again In-Reply-To: Message-ID: Moshe, some _arguments_ backing your feelings might give them more weight... As they stand, they're just insults, and if I were Barry I'd ignore them. --david ascher Moshe Zadka: > Barry, there are other problems with your scheme, but I won't even try to > point those out: having to call a function whose purpose can only be > described in terms of a concrete implementation of a garbage collection > scheme is simply unacceptable. I can almost see you shouting "Come back > here, I'll bite your legs off" . > [...] > Oh? This sounds like the most horrendus mechanism alive.... From skip at mojam.com Fri Mar 10 23:40:02 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 10 Mar 2000 16:40:02 -0600 Subject: [Python-Dev] on the suitability of ideas tossed out to python-dev Message-ID: <200003102240.QAA07881@beluga.mojam.com> Folks, let's not forget that python-dev is a place where oftentimes half-baked ideas will get advanced. I came up with an idea about decoupling error handling from exception message strings. I don't expect my idea to be adopted as is. Similarly, Barry's ideas about object timestamps were admittedly conceived late at night in the thrill following an apparently good gig. (I like the idea that every object has a modtime, but for other reasons than Barry suggested.) My feeling is that bad ideas will get winnowed out or drastically modified quickly enough anyway. Think of these early ideas as little more than brainstorms. A lot of times if I have an idea, I feel I need to put it down on my virtual whiteboard quickly, because a) I often don't have a lot of time to pursue stuff (do it now or it won't get done), b) because bad ideas can be the catalyst for better ideas, and c) if I don't do it immediately, I'll probably forget the idea altogether, thus missing the opportunity for reason b altogether. Try and collect a bunch of ideas before shooting any down and see what falls out. The best ideas will survive. When people start proving things and using fancy diagrams like "a <=> b -> C", then go ahead and get picky... ;-) Have a relaxing, thought provoking weekend. I'm going to go see a movie this evening with my wife and youngest son, appropriately enough titled, "My Dog Skip". Enough Pythoneering for one day... bow-wow-ly y'rs, Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From guido at python.org Sat Mar 11 01:20:01 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Mar 2000 19:20:01 -0500 Subject: [Python-Dev] Unicode patches checked in Message-ID: <200003110020.TAA17777@eric.cnri.reston.va.us> I've just checked in a massive patch from Marc-Andre Lemburg which adds Unicode support to Python. This work was financially supported by Hewlett-Packard. Marc-Andre has done a tremendous amount of work, for which I cannot thank him enough. We're still awaiting some more things: Marc-Andre gave me documentation patches which will be reviewed by Fred Drake before they are checked in; Fredrik Lundh has developed a new regular expression which is Unicode-aware and which should be checked in real soon now. Also, the documentation is probably incomplete and will be updated, and of course there may be bugs -- this should be considered alpha software. However, I believe it is quite good already, otherwise I wouldn't have checked it in! I'd like to invite everyone with an interest in Unicode or Python 1.6 to check out this new Unicode-aware Python, so that we can ensure a robust code base by the time Python 1.6 is released (planned release date: June 1, 2000). The download links are below. Links: http://www.python.org/download/cvs.html Instructions on how to get access to the CVS version. (David Ascher is making nightly tarballs of the CVS version available at http://starship.python.net/crew/da/pythondists/) http://starship.python.net/crew/lemburg/unicode-proposal.txt The latest version of the specification on which the Marc has based his implementation. http://www.python.org/sigs/i18n-sig/ Home page of the i18n-sig (Internationalization SIG), which has lots of other links about this and related issues. http://www.python.org/search/search_bugs.html The Python Bugs List. Use this for all bug reports. Note that next Tuesday I'm going on a 10-day trip, with limited time to read email and no time to solve problems. The usual crowd will take care of urgent updates. See you at the Intel Computing Continuum Conference in San Francisco or at the Python Track at Software Development 2000 in San Jose! --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one at email.msn.com Sat Mar 11 03:03:47 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 10 Mar 2000 21:03:47 -0500 Subject: [Python-Dev] Finalization in Eiffel Message-ID: <000701bf8afe$0a0fd800$a42d153f@tim> Eiffel is Bertrand Meyer's "design by contract" OO language. Meyer took extreme care in its design, and has written extensively and articulately about the design -- agree with him or not, he's always worth reading! I used Eiffel briefly a few years ago, just out of curiosity. I didn't recall even bumping into a notion of destructors. Turns out it does have them, but they're appallingly (whether relative to Eiffel's usual clarity, or even relative to C++'s usual lack thereof <0.9 wink>) ill-specified. An Eiffel class can register a destructor by inheriting from the system MEMORY class and overriding the latter's "dispose()". This appears to be viewed as a low-level facility, and neither OOSC (2nd ed) nor "Eiffel: The Language" say much about its semantics. Within dispose, you're explicitly discouraged from invoking methods on *any* other object, and resurrection is right out the window. But the language doesn't appear to check for any of that, which is extremely un-Eiffel-like. Many msgs on comp.lang.eiffel from people who should know suggest that all but one Eiffel implementation pay no attention at all to reachability during gc, and that none support resurrection. If you need ordering during finalization, the advice is to write that part in C/C++. Violations of the vague rules appear to lead to random system damage(!). Looking at various Eiffel pkgs on the web, the sole use of dispose was in one-line bodies that released external resources (like memory & db connections) via calling an external C/C++ function. jealous-&-appalled-at-the-same-time-ly y'rs - tim From tim_one at email.msn.com Sat Mar 11 03:03:50 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 10 Mar 2000 21:03:50 -0500 Subject: [Python-Dev] Conventional wisdom on finalization Message-ID: <000801bf8afe$0b3df7c0$a42d153f@tim> David Chase maintains a well-regarded GC FAQ, at http://www.iecc.com/gclist/GC-faq.html Interested folks should look it up. A couple highlights: On cycles with finalizers: In theory, of course, a cycle in the graph of objects to be finalized will prevent a topological sort from succeeding. In practice, the "right" thing to do appears to be to signal an error (at least when debugging) and let the programmer clean this up. People with experience on large systems report that such cycles are in fact exceedingly rare (note, however, that some languages define "finalizers" for almost every object, and that was not the case for the large systems studied -- there, finalizers were not too common). On Java's "finalizer called only once" rule: if an object is revived in finalization, that is fine, but its finalizer will not run a second time. (It isn't clear if this is a matter of design, or merely an accident of the first implementation of the language, but it is in the specification now. Obviously, this encourages careful use of finalization, in much the same way that driving without seatbelts encourages careful driving.) Until today, I had no idea I was so resolutely conventional . seems-we're-trying-to-do-more-than-anyone-other-than-us-expects-ly y'rs - tim From shichang at icubed.com Fri Mar 10 23:33:11 2000 From: shichang at icubed.com (Shichang Zhao) Date: Fri, 10 Mar 2000 22:33:11 -0000 Subject: [Python-Dev] RE: Unicode patches checked in Message-ID: <01BF8AE0.9E911980.shichang@icubed.com> I would love to test the Python 1.6 (Unicode support) in Chinese language aspect, but I don't know where I can get a copy of OS that supports Chinese. Anyone can point me a direction? -----Original Message----- From: Guido van Rossum [SMTP:guido at python.org] Sent: Saturday, March 11, 2000 12:20 AM To: Python mailing list; python-announce at python.org; python-dev at python.org; i18n-sig at python.org; string-sig at python.org Cc: Marc-Andre Lemburg Subject: Unicode patches checked in I've just checked in a massive patch from Marc-Andre Lemburg which adds Unicode support to Python. This work was financially supported by Hewlett-Packard. Marc-Andre has done a tremendous amount of work, for which I cannot thank him enough. We're still awaiting some more things: Marc-Andre gave me documentation patches which will be reviewed by Fred Drake before they are checked in; Fredrik Lundh has developed a new regular expression which is Unicode-aware and which should be checked in real soon now. Also, the documentation is probably incomplete and will be updated, and of course there may be bugs -- this should be considered alpha software. However, I believe it is quite good already, otherwise I wouldn't have checked it in! I'd like to invite everyone with an interest in Unicode or Python 1.6 to check out this new Unicode-aware Python, so that we can ensure a robust code base by the time Python 1.6 is released (planned release date: June 1, 2000). The download links are below. Links: http://www.python.org/download/cvs.html Instructions on how to get access to the CVS version. (David Ascher is making nightly tarballs of the CVS version available at http://starship.python.net/crew/da/pythondists/) http://starship.python.net/crew/lemburg/unicode-proposal.txt The latest version of the specification on which the Marc has based his implementation. http://www.python.org/sigs/i18n-sig/ Home page of the i18n-sig (Internationalization SIG), which has lots of other links about this and related issues. http://www.python.org/search/search_bugs.html The Python Bugs List. Use this for all bug reports. Note that next Tuesday I'm going on a 10-day trip, with limited time to read email and no time to solve problems. The usual crowd will take care of urgent updates. See you at the Intel Computing Continuum Conference in San Francisco or at the Python Track at Software Development 2000 in San Jose! --Guido van Rossum (home page: http://www.python.org/~guido/) -- http://www.python.org/mailman/listinfo/python-list From moshez at math.huji.ac.il Sat Mar 11 10:10:12 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 11 Mar 2000 11:10:12 +0200 (IST) Subject: [Python-Dev] Unicode: When Things Get Hairy Message-ID: The following "problem" is easy to fix. However, what I wanted to know is if people (Skip and Guido most importantly) think it is a problem: >>> "a" in u"bbba" 1 >>> u"a" in "bbba" Traceback (innermost last): File "", line 1, in ? TypeError: string member test needs char left operand Suggested fix: in stringobject.c, explicitly allow a unicode char left operand. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From mal at lemburg.com Sat Mar 11 11:24:26 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 11 Mar 2000 11:24:26 +0100 Subject: [Python-Dev] Unicode: When Things Get Hairy References: Message-ID: <38CA1EDA.423F8A2C@lemburg.com> Moshe Zadka wrote: > > The following "problem" is easy to fix. However, what I wanted to know is > if people (Skip and Guido most importantly) think it is a problem: > > >>> "a" in u"bbba" > 1 > >>> u"a" in "bbba" > Traceback (innermost last): > File "", line 1, in ? > TypeError: string member test needs char left operand > > Suggested fix: in stringobject.c, explicitly allow a unicode char left > operand. Hmm, this must have been introduced by your contains code... it did work before. The normal action taken by the Unicode and the string code in these mixed type situations is to first convert everything to Unicode and then retry the operation. Strings are interpreted as UTF-8 during this conversion. To simplify this task, I added method APIs to the Unicode object which do the conversion for you (they apply all the necessariy coercion business to all arguments). I guess adding another PyUnicode_Contains() wouldn't hurt :-) Perhaps I should also add a tp_contains slot to the Unicode object which then uses the above API as well. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From moshez at math.huji.ac.il Sat Mar 11 12:05:48 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 11 Mar 2000 13:05:48 +0200 (IST) Subject: [Python-Dev] Unicode: When Things Get Hairy In-Reply-To: <38CA1EDA.423F8A2C@lemburg.com> Message-ID: On Sat, 11 Mar 2000, M.-A. Lemburg wrote: > Hmm, this must have been introduced by your contains code... > it did work before. Nope: the string "in" semantics were forever special-cased. Guido beat me soundly for trying to change the semantics... > The normal action taken by the Unicode and the string > code in these mixed type situations is to first > convert everything to Unicode and then retry the operation. > Strings are interpreted as UTF-8 during this conversion. Hmmm....PySeqeunce_Contains doesn't do any conversion of the arguments. Should it? (Again, it didn't before). If it does, then the order of testing for seq_contains and seq_getitem and conversions > Perhaps I should also add a tp_contains slot to the > Unicode object which then uses the above API as well. But that wouldn't help at all for u"a" in "abbbb" PySequence_Contains only dispatches on the container argument :-( (BTW: I discovered it while contemplating adding a seq_contains (not tp_contains) to unicode objects to optimize the searching for a bit.) PS: MAL: thanks for the a great birthday present! I'm enjoying the unicode patch a lot. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From guido at python.org Sat Mar 11 13:16:06 2000 From: guido at python.org (Guido van Rossum) Date: Sat, 11 Mar 2000 07:16:06 -0500 Subject: [Python-Dev] Unicode: When Things Get Hairy In-Reply-To: Your message of "Sat, 11 Mar 2000 13:05:48 +0200." References: Message-ID: <200003111216.HAA12651@eric.cnri.reston.va.us> [Moshe discovers that u"a" in "bbba" raises TypeError] [Marc-Andre] > > Hmm, this must have been introduced by your contains code... > > it did work before. > > Nope: the string "in" semantics were forever special-cased. Guido beat me > soundly for trying to change the semantics... But I believe that Marc-Andre added a special case for Unicode in PySequence_Contains. I looked for evidence, but the last snapshot that I actually saved and built before Moshe's code was checked in is from 2/18 and it isn't in there. Yet I believe Marc-Andre. The special case needs to be added back to string_contains in stringobject.c. > > The normal action taken by the Unicode and the string > > code in these mixed type situations is to first > > convert everything to Unicode and then retry the operation. > > Strings are interpreted as UTF-8 during this conversion. > > Hmmm....PySeqeunce_Contains doesn't do any conversion of the arguments. > Should it? (Again, it didn't before). If it does, then the order of > testing for seq_contains and seq_getitem and conversions Or it could be done this way. > > Perhaps I should also add a tp_contains slot to the > > Unicode object which then uses the above API as well. Yes. > But that wouldn't help at all for > > u"a" in "abbbb" It could if PySeqeunce_Contains would first look for a string and a unicode argument (in either order) and in that case convert the string to unicode. > PySequence_Contains only dispatches on the container argument :-( > > (BTW: I discovered it while contemplating adding a seq_contains (not > tp_contains) to unicode objects to optimize the searching for a bit.) You may beat Marc-Andre to it, but I'll have to let him look at the code anyway -- I'm not sufficiently familiar with the Unicode stuff myself yet. BTW, I added a tag "pre-unicode" to the CVS tree to the revisions before the Unicode changes were made. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Sat Mar 11 14:32:57 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 11 Mar 2000 14:32:57 +0100 Subject: [Python-Dev] Unicode: When Things Get Hairy References: <200003111216.HAA12651@eric.cnri.reston.va.us> Message-ID: <38CA4B08.7B13438D@lemburg.com> Guido van Rossum wrote: > > [Moshe discovers that u"a" in "bbba" raises TypeError] > > [Marc-Andre] > > > Hmm, this must have been introduced by your contains code... > > > it did work before. > > > > Nope: the string "in" semantics were forever special-cased. Guido beat me > > soundly for trying to change the semantics... > > But I believe that Marc-Andre added a special case for Unicode in > PySequence_Contains. I looked for evidence, but the last snapshot that > I actually saved and built before Moshe's code was checked in is from > 2/18 and it isn't in there. Yet I believe Marc-Andre. The special > case needs to be added back to string_contains in stringobject.c. Moshe was right: I had probably not checked the code because the obvious combinations worked out of the box... the only combination which doesn't work is "unicode in string". I'll fix it next week. BTW, there's a good chance that the string/Unicode integration is not complete yet: just keep looking for them. > > > The normal action taken by the Unicode and the string > > > code in these mixed type situations is to first > > > convert everything to Unicode and then retry the operation. > > > Strings are interpreted as UTF-8 during this conversion. > > > > Hmmm....PySeqeunce_Contains doesn't do any conversion of the arguments. > > Should it? (Again, it didn't before). If it does, then the order of > > testing for seq_contains and seq_getitem and conversions > > Or it could be done this way. > > > > Perhaps I should also add a tp_contains slot to the > > > Unicode object which then uses the above API as well. > > Yes. > > > But that wouldn't help at all for > > > > u"a" in "abbbb" > > It could if PySeqeunce_Contains would first look for a string and a > unicode argument (in either order) and in that case convert the string > to unicode. I think the right way to do this is to add a special case to seq_contains in the string implementation. That's how most other auto-coercions work too. Instead of raising an error, the implementation would then delegate the work to PyUnicode_Contains(). > > PySequence_Contains only dispatches on the container argument :-( > > > > (BTW: I discovered it while contemplating adding a seq_contains (not > > tp_contains) to unicode objects to optimize the searching for a bit.) > > You may beat Marc-Andre to it, but I'll have to let him look at the > code anyway -- I'm not sufficiently familiar with the Unicode stuff > myself yet. I'll add that one too. BTW, Happy Birthday, Moshe :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Sat Mar 11 14:57:34 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 11 Mar 2000 14:57:34 +0100 Subject: [Python-Dev] Unicode: When Things Get Hairy References: <200003111216.HAA12651@eric.cnri.reston.va.us> <38CA4B08.7B13438D@lemburg.com> Message-ID: <38CA50CE.BEEFAB5E@lemburg.com> I couldn't resist :-) Here's the patch... BTW, how should we proceed with future patches ? Should I wrap them together about once a week, or send them as soon as they are done ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ -------------- next part -------------- diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Include/unicodeobject.h Python+Unicode/Include/unicodeobject.h --- CVS-Python/Include/unicodeobject.h Fri Mar 10 23:33:05 2000 +++ Python+Unicode/Include/unicodeobject.h Sat Mar 11 14:45:59 2000 @@ -683,6 +683,17 @@ PyObject *args /* Argument tuple or dictionary */ ); +/* Checks whether element is contained in container and return 1/0 + accordingly. + + element has to coerce to an one element Unicode string. -1 is + returned in case of an error. */ + +extern DL_IMPORT(int) PyUnicode_Contains( + PyObject *container, /* Container string */ + PyObject *element /* Element string */ + ); + /* === Characters Type APIs =============================================== */ /* These should not be used directly. Use the Py_UNICODE_IS* and diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_unicode.py Python+Unicode/Lib/test/test_unicode.py --- CVS-Python/Lib/test/test_unicode.py Sat Mar 11 00:23:20 2000 +++ Python+Unicode/Lib/test/test_unicode.py Sat Mar 11 14:52:29 2000 @@ -219,6 +219,19 @@ test('translate', u"abababc", u'iiic', {ord('a'):None, ord('b'):ord('i')}) test('translate', u"abababc", u'iiix', {ord('a'):None, ord('b'):ord('i'), ord('c'):u'x'}) +# Contains: +print 'Testing Unicode contains method...', +assert ('a' in 'abdb') == 1 +assert ('a' in 'bdab') == 1 +assert ('a' in 'bdaba') == 1 +assert ('a' in 'bdba') == 1 +assert ('a' in u'bdba') == 1 +assert (u'a' in u'bdba') == 1 +assert (u'a' in u'bdb') == 0 +assert (u'a' in 'bdb') == 0 +assert (u'a' in 'bdba') == 1 +print 'done.' + # Formatting: print 'Testing Unicode formatting strings...', assert u"%s, %s" % (u"abc", "abc") == u'abc, abc' diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Misc/unicode.txt Python+Unicode/Misc/unicode.txt --- CVS-Python/Misc/unicode.txt Sat Mar 11 00:14:11 2000 +++ Python+Unicode/Misc/unicode.txt Sat Mar 11 14:53:37 2000 @@ -743,8 +743,9 @@ stream codecs as available through the codecs module should be used. -XXX There should be a short-cut open(filename,mode,encoding) available which - also assures that mode contains the 'b' character when needed. +The codecs module should provide a short-cut open(filename,mode,encoding) +available which also assures that mode contains the 'b' character when +needed. File/Stream Input: @@ -810,6 +811,10 @@ Introduction to Unicode (a little outdated by still nice to read): http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html +For comparison: + Introducing Unicode to ECMAScript -- + http://www-4.ibm.com/software/developer/library/internationalization-support.html + Encodings: Overview: @@ -832,7 +837,7 @@ History of this Proposal: ------------------------- -1.2: +1.2: Removed POD about codecs.open() 1.1: Added note about comparisons and hash values. Added note about case mapping algorithms. Changed stream codecs .read() and .write() method to match the standard file-like object methods diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/stringobject.c Python+Unicode/Objects/stringobject.c --- CVS-Python/Objects/stringobject.c Sat Mar 11 10:55:09 2000 +++ Python+Unicode/Objects/stringobject.c Sat Mar 11 14:47:45 2000 @@ -389,7 +389,9 @@ { register char *s, *end; register char c; - if (!PyString_Check(el) || PyString_Size(el) != 1) { + if (!PyString_Check(el)) + return PyUnicode_Contains(a, el); + if (PyString_Size(el) != 1) { PyErr_SetString(PyExc_TypeError, "string member test needs char left operand"); return -1; diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/unicodeobject.c Python+Unicode/Objects/unicodeobject.c --- CVS-Python/Objects/unicodeobject.c Fri Mar 10 23:53:23 2000 +++ Python+Unicode/Objects/unicodeobject.c Sat Mar 11 14:48:52 2000 @@ -2737,6 +2737,49 @@ return -1; } +int PyUnicode_Contains(PyObject *container, + PyObject *element) +{ + PyUnicodeObject *u = NULL, *v = NULL; + int result; + register const Py_UNICODE *p, *e; + register Py_UNICODE ch; + + /* Coerce the two arguments */ + u = (PyUnicodeObject *)PyUnicode_FromObject(container); + if (u == NULL) + goto onError; + v = (PyUnicodeObject *)PyUnicode_FromObject(element); + if (v == NULL) + goto onError; + + /* Check v in u */ + if (PyUnicode_GET_SIZE(v) != 1) { + PyErr_SetString(PyExc_TypeError, + "string member test needs char left operand"); + goto onError; + } + ch = *PyUnicode_AS_UNICODE(v); + p = PyUnicode_AS_UNICODE(u); + e = p + PyUnicode_GET_SIZE(u); + result = 0; + while (p < e) { + if (*p++ == ch) { + result = 1; + break; + } + } + + Py_DECREF(u); + Py_DECREF(v); + return result; + +onError: + Py_XDECREF(u); + Py_XDECREF(v); + return -1; +} + /* Concat to string or Unicode object giving a new Unicode object. */ PyObject *PyUnicode_Concat(PyObject *left, @@ -3817,6 +3860,7 @@ (intintargfunc) unicode_slice, /* sq_slice */ 0, /* sq_ass_item */ 0, /* sq_ass_slice */ + (objobjproc)PyUnicode_Contains, /*sq_contains*/ }; static int From tim_one at email.msn.com Sat Mar 11 21:10:23 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 11 Mar 2000 15:10:23 -0500 Subject: [Python-Dev] finalization again In-Reply-To: <14536.30810.720836.886023@anthem.cnri.reston.va.us> Message-ID: <000e01bf8b95$d52939e0$c72d153f@tim> [Barry A. Warsaw, jamming after hours] > ... > What if you timestamp instances when you create them? Then when you > have trash cycles with finalizers, you sort them and finalize in > chronological order. Well, I strongly agree that would be better than finalizing them in increasing order of storage address . > ... > - FIFO order /seems/ more natural to me than FILO, Forget cycles for a moment, and consider just programs that manipulate *immutable* containers (the simplest kind to think about): at the time you create an immutable container, everything *contained* must already be in existence, so every pointer goes from a newer object (container) to an older one (containee). This is the "deep" reason for why, e.g., you can't build a cycle out of pure tuples in Python (if every pointer goes new->old, you can't get a loop, else each node in the loop would be (transitively) older than itself!). Then, since a finalizer can see objects pointed *to*, a finalizer can see only older objects. Since it's desirable that a finalizer see only wholly intact (unfinalized) objects, it is in fact the oldest object ("first in") that needs to be cleaned up last ("last out"). So, under the assumption of immutability, FILO is sufficient, but FIFO dangerous. So your muse inflamed you with an interesting tune, but you fingered the riff backwards . One problem is that it all goes out the window as soon as mutation is allowed. It's *still* desirable that a finalizer see only unfinalized objects, but in the presence of mutation that no longer bears any relationship to relative creation time. Another problem is in Guido's directory example, which we can twist to view as an "immutable container" problem that builds its image of the directory bottom-up, and where a finalizer on each node tries to remove the file (or delete the directory, whichever the node represents). In this case the physical remove/delete/unlink operations have to follow a *postorder* traversal of the container tree, so that "finalizer sees only unfinalized objects" is the opposite of what the app needs! The lesson to take from that is that the implementation can't possibly guess what ordering an app may need in a fancy finalizer. At best it can promise to follow a "natural" ordering based on the points-to relationship, and while "finalizer sees only unfinalized objects" is at least clear, it's quite possibly unhelpful (in Guido's particular case, it *can* be exploited, though, by adding a postorder remove/delete/unlink method to nodes, and explicitly calling it from __del__ -- "the rules" guarantee that the root of the tree will get finalized first, and the code can rely on that in its own explicit postorder traversal). > but then I rarely create cyclic objects, and almost never use __del__, > so this whole argument has been somewhat academic to me :). Well, not a one of us creates cycles often in CPython today, simply because we don't want to track down leaks <0.5 wink>. It seems that nobody here uses __del__ much, either; indeed, my primary use of __del__ is simply to call an explicit break_cycles() function from the header node of a graph! The need for that goes away as soon as Python reclaims cycles by itself, and I may never use __del__ at all then in the vast bulk of my code. It's because we've seen no evidence here (and also that I've seen none elsewhere either) that *anyone* is keen on mixing cycles with finalizers that I've been so persistent in saying "screw it -- let it leak, but let the user get at it if they insist on doing it". Seems we're trying to provide slick support for something nobody wants to do. If it happens by accident anyway, well, people sometimes divide by 0 by accident too <0.0 wink>: give them a way to know about it, but don't move heaven & earth trying to treat it like a normal case. if-it-were-easy-to-implement-i-wouldn't-care-ly y'rs - tim From moshez at math.huji.ac.il Sat Mar 11 21:35:43 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 11 Mar 2000 22:35:43 +0200 (IST) Subject: [Python-Dev] finalization again In-Reply-To: <000e01bf8b95$d52939e0$c72d153f@tim> Message-ID: In a continuation (yes, a dangerous word in these parts) of the timbot's looks at the way other languages handle finalization, let me add something from the Sather manual I'm now reading (when I'm done with it, you'll see me begging for iterators here, and having some weird ideas in the types-sig): =============================== Finalization will only occur once, even if new references are created to the object during finalization. Because few guarantees can be made about the environment in which finalization occurs, finalization is considered dangerous and should only be used in the rare cases that conventional coding will not suffice. =============================== (Sather is garbage-collected, BTW) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From tim_one at email.msn.com Sat Mar 11 21:51:47 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 11 Mar 2000 15:51:47 -0500 Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler In-Reply-To: <200003101528.JAA15951@beluga.mojam.com> Message-ID: <001001bf8b9b$9e09d720$c72d153f@tim> [Skip Montanaro, with an expression that may raise TypeError for any of several distinct reasons, and wants to figure out which one after the fact] The existing exception machinery is sufficiently powerful for building a solution, so nothing new is needed in the language. What you really need here is an exhaustive list of all exceptions the language can raise, and when, and why, and a formally supported "detail" field (whether numeric id or string or whatever) that you can rely on to tell them apart at runtime. There are at least a thousand cases that need to be so documented and formalized. That's why not a one of them is now <0.9 wink>. If P3K is a rewrite from scratch, a rational scheme could be built in from the start. Else it would seem to require a volunteer with even less of a life than us . From tim_one at email.msn.com Sat Mar 11 21:51:49 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 11 Mar 2000 15:51:49 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <38C965C4.B164C2D5@interet.com> Message-ID: <001101bf8b9b$9f37f6e0$c72d153f@tim> [James C. Ahlstrom] > Well, it looks like this thread has fizzled out. But what did we > decide? Far as I could tell, nothing specific. > ... > I feel Tim's original complaint that popen() is a Problem > still hasn't been fixed. I was passing it on from MikeF's c.l.py posting. This isn't a new problem, of course, it just drags on year after year -- which is the heart of MikeF's gripe. People have code that *does* work, but for whatever reasons it never gets moved to the core. In the meantime, the Library Ref implies the broken code that is in the core does work. One or the other has to change, and it looks most likely to me that Fred will change the docs for 1.6. While not ideal, that would be a huge improvement over the status quo. luckily-few-people-expect-windows-to-work-anyway<0.9-wink>-ly y'rs - tim From mhammond at skippinet.com.au Mon Mar 13 04:50:35 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon, 13 Mar 2000 14:50:35 +1100 Subject: [Python-Dev] string.replace behaviour change since Unicode patch. Message-ID: Hi, After applying the Unicode changes string.replace() seems to have changed its behaviour: Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> import string >>> string.replace("foo\nbar", "\n", "") 'foobar' >>> But since the Unicode update: Python 1.5.2+ (#0, Feb 2 2000, 16:46:55) [MSC 32 bit (Intel)] on win32 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> import string >>> string.replace("foo\nbar", "\n", "") Traceback (innermost last): File "", line 1, in ? File "L:\src\python-cvs\lib\string.py", line 407, in replace return s.replace(old, new, maxsplit) ValueError: empty replacement string >>> The offending check is stringmodule.c, line 1578: if (repl_len <= 0) { PyErr_SetString(PyExc_ValueError, "empty replacement string"); return NULL; } Changing the check to "< 0" fixes the immediate problem, but it is unclear why the check was added at all, so I didnt bother submitting a patch... Mark. From mal at lemburg.com Mon Mar 13 10:13:50 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 13 Mar 2000 10:13:50 +0100 Subject: [Python-Dev] string.replace behaviour change since Unicode patch. References: Message-ID: <38CCB14D.C07ACC26@lemburg.com> Mark Hammond wrote: > > Hi, > After applying the Unicode changes string.replace() seems to have changed > its behaviour: > > Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> import string > >>> string.replace("foo\nbar", "\n", "") > 'foobar' > >>> > > But since the Unicode update: > > Python 1.5.2+ (#0, Feb 2 2000, 16:46:55) [MSC 32 bit (Intel)] on win32 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> import string > >>> string.replace("foo\nbar", "\n", "") > Traceback (innermost last): > File "", line 1, in ? > File "L:\src\python-cvs\lib\string.py", line 407, in replace > return s.replace(old, new, maxsplit) > ValueError: empty replacement string > >>> > > The offending check is stringmodule.c, line 1578: > if (repl_len <= 0) { > PyErr_SetString(PyExc_ValueError, "empty replacement string"); > return NULL; > } > > Changing the check to "< 0" fixes the immediate problem, but it is unclear > why the check was added at all, so I didnt bother submitting a patch... Dang. Must have been my mistake -- it should read: if (sub_len <= 0) { PyErr_SetString(PyExc_ValueError, "empty pattern string"); return NULL; } Thanks for reporting this... I'll include the fix in the next patch set. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake at acm.org Mon Mar 13 16:43:09 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 13 Mar 2000 10:43:09 -0500 (EST) Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <001101bf8b9b$9f37f6e0$c72d153f@tim> References: <38C965C4.B164C2D5@interet.com> <001101bf8b9b$9f37f6e0$c72d153f@tim> Message-ID: <14541.3213.590243.359394@weyr.cnri.reston.va.us> Tim Peters writes: > code that is in the core does work. One or the other has to change, and it > looks most likely to me that Fred will change the docs for 1.6. While not > ideal, that would be a huge improvement over the status quo. Actually, I just checked in my proposed change for the 1.5.2 doc update that I'm releasing soon. I'd like to remove it for 1.6, if the appropriate implementation is moved into the core. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gvwilson at nevex.com Mon Mar 13 22:10:52 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Mon, 13 Mar 2000 16:10:52 -0500 (EST) Subject: [Python-Dev] Python 1.7 tokenization feature request Message-ID: Once 1.6 is out the door, would people be willing to consider extending Python's token set to make HTML/XML-ish spellings using entity references legal? This would make the following 100% legal Python: i = 0 while i < 10: print i & 1 i = i + 1 which would in turn make it easier to embed Python in XML such as config-files-for-whatever-Software-Carpentry-produces-to-replace-make, PMZ, and so on. Greg From skip at mojam.com Mon Mar 13 22:23:17 2000 From: skip at mojam.com (Skip Montanaro) Date: Mon, 13 Mar 2000 15:23:17 -0600 (CST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: References: Message-ID: <14541.23621.89087.357783@beluga.mojam.com> Greg> Once 1.6 is out the door, would people be willing to consider Greg> extending Python's token set to make HTML/XML-ish spellings using Greg> entity references legal? This would make the following 100% legal Greg> Python: Greg> i = 0 Greg> while i < 10: Greg> print i & 1 Greg> i = i + 1 What makes it difficult to pump your Python code through cgi.escape when embedding it? There doesn't seem to be an inverse function to cgi.escape (at least not in the cgi module), but I suspect it could rather easily be written. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From akuchlin at mems-exchange.org Mon Mar 13 22:23:29 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Mon, 13 Mar 2000 16:23:29 -0500 (EST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: References: Message-ID: <14541.23633.873411.86833@amarok.cnri.reston.va.us> gvwilson at nevex.com writes: >Once 1.6 is out the door, would people be willing to consider extending >Python's token set to make HTML/XML-ish spellings using entity references >legal? This would make the following 100% legal Python: > >i = 0 >while i < 10: > print i & 1 > i = i + 1 I don't think that would be sufficient. What about user-defined entities, as in résultat = max(a,b)? (r?sultat, in French.) Would Python have to also parse a DTD from somewhere? What about other places when Python and XML syntax collide, as in this contrived example: b: print ... Oops! The ]]> looks like the end of the CDATA section, but it's legal Python code. IMHO whatever tool is outputting the XML should handle escaping wacky characters in the Python code, which will be undone by the parser when the XML gets parsed. Users certainly won't be writing this XML by hand; writing 'if (i < 10)' is very strange. -- A.M. Kuchling http://starship.python.net/crew/amk/ Art history is the nightmare from which art is struggling to awake. -- Robert Fulford From gvwilson at nevex.com Mon Mar 13 22:58:27 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Mon, 13 Mar 2000 16:58:27 -0500 (EST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: <14541.23633.873411.86833@amarok.cnri.reston.va.us> Message-ID: > >Greg Wilson wrote: > >...would people be willing to consider extending > >Python's token set to make HTML/XML-ish spellings using entity references > >legal? > > > >i = 0 > >while i < 10: > > print i & 1 > > i = i + 1 > Skip Montanaro wrote: > What makes it difficult to pump your Python code through cgi.escape when > embedding it? Most non-programmers use WYSIWYG editor, and many of these are moving toward XML-compliant formats. Parsing the standard character entities seemed like a good first step toward catering to this (large) audience. > Andrew Kuchling wrote: > I don't think that would be sufficient. What about user-defined > entities, as in résultat = max(a,b)? (r?sultat, in French.) > Would Python have to also parse a DTD from somewhere? Longer term, I believe that someone is going to come out with a programming language that (finally) leaves the flat-ASCII world behind, and lets people use the structuring mechanisms (e.g. XML) that we have developed for everyone else's data. I think it would be to Python's advantage to be first, and if I'm wrong, there's little harm done. User-defined entities, DTD's, and the like are probably part of that, but I don't think I know enough to know what to ask for. Escaping the standard entites seems like an easy start. > Andrew Kuchling also wrote: > What about other places when Python and XML syntax collide, as in this > contrived example: > > # Python code starts here > if a[index[1]]>b: > print ... > > Oops! The ]]> looks like the end of the CDATA section, but it's legal > Python code. Yup; that's one of the reasons I'd like to be able to write: # Python code starts here if a[index[1]]>b: print ... > Users certainly won't be writing this XML by hand; writing 'if (i < > 10)' is very strange. I'd expect my editor to put '<' in the file when I press the '<' key, and to display '<' on the screen when viewing the file. thanks, Greg From beazley at rustler.cs.uchicago.edu Mon Mar 13 23:35:24 2000 From: beazley at rustler.cs.uchicago.edu (David M. Beazley) Date: Mon, 13 Mar 2000 16:35:24 -0600 (CST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: References: Message-ID: <200003132235.QAA08031@rustler.cs.uchicago.edu> gvwilson at nevex.com writes: > Once 1.6 is out the door, would people be willing to consider extending > Python's token set to make HTML/XML-ish spellings using entity references > legal? This would make the following 100% legal Python: > > i = 0 > while i < 10: > print i & 1 > i = i + 1 > > which would in turn make it easier to embed Python in XML such as > config-files-for-whatever-Software-Carpentry-produces-to-replace-make, > PMZ, and so on. > Sure, and while we're at it, maybe we can add support for C trigraph sequences as well. Maybe I'm missing the point, but why can't you just use a filter (cgi.escape() or something comparable)? I for one, am *NOT* in favor of complicating the Python parser in this most bogus manner. Furthermore, with respect to the editor argument, I can't think of a single reason why any sane programmer would be writing programs in Microsoft Word or whatever it is that you're talking about. Therefore, I don't think that the Python parser should be modified in any way to account for XML tags, entities, or other extraneous markup that's not part of the core language. I know that I, for one, would be extremely pissed if I fired up emacs and had to maintain someone else's code that had all of this garbage in it. Just my 0.02. -- Dave From gvwilson at nevex.com Mon Mar 13 23:48:33 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Mon, 13 Mar 2000 17:48:33 -0500 (EST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: <200003132235.QAA08031@rustler.cs.uchicago.edu> Message-ID: > David M. Beazley wrote: > ...and while we're at it, maybe we can add support for C trigraph > sequences as well. I don't know of any mass-market editors that generate C trigraphs. > ...I can't think of a single reason why any sane programmer would be > writing programs in Microsoft Word or whatever it is that you're > talking about. 'S funny --- my non-programmer friends can't figure out why any sane person would use a glorified glass TTY like emacs... or why they should have to, just to program... I just think that someone's going to do this for some language, some time soon, and I'd rather Python be in the lead than play catch-up. Thanks, Greg From effbot at telia.com Tue Mar 14 00:16:41 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 14 Mar 2000 00:16:41 +0100 Subject: [Python-Dev] Python 1.7 tokenization feature request References: Message-ID: <00ca01bf8d42$6a154500$34aab5d4@hagrid> Greg wrote: > > ...I can't think of a single reason why any sane programmer would be > > writing programs in Microsoft Word or whatever it is that you're > > talking about. > > 'S funny --- my non-programmer friends can't figure out why any sane > person would use a glorified glass TTY like emacs... or why they should > have to, just to program... I just think that someone's going to do this > for some language, some time soon, and I'd rather Python be in the lead > than play catch-up. I don't get it. the XML specification contains a lot of stuff, and I completely fail to see how adding support for a very small part of XML would make it possible to use XML editors to write Python code. what am I missing? From DavidA at ActiveState.com Tue Mar 14 00:15:25 2000 From: DavidA at ActiveState.com (David Ascher) Date: Mon, 13 Mar 2000 15:15:25 -0800 Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: Message-ID: > 'S funny --- my non-programmer friends can't figure out why any sane > person would use a glorified glass TTY like emacs... or why they should > have to, just to program... I just think that someone's going to do this > for some language, some time soon, and I'd rather Python be in the lead > than play catch-up. But the scheme you put forth causes major problems for current Python users who *are* using glass TTYs, so I don't think it'll fly for very basic political reasons nicely illustrated by Dave-the-diplomat's response. While storage of Python files in XML documents is a good thing, it's hard to see why XML should be viewed as the only storage format for Python files. I think a much richer XML schema could be useful in some distant future: ... What might be more useful in the short them IMO is to define a _standard_ mechanism for Python-in-XML encoding/decoding, so that all code which encodes Python in XML is done the same way, and so that XML editors can figure out once and for all how to decode Python-in-CDATA. Strawman Encoding # 1: replace < with < and > with > when not in strings, and vice versa on the decoding side. Strawman Encoding # 2: - do Strawman 1, AND - replace space-determined indentation with { and } tokens or other INDENT and DEDENT markers using some rare Unicode characters to work around inevitable bugs in whitespace handling of XML processors. --david From gvwilson at nevex.com Tue Mar 14 00:14:43 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Mon, 13 Mar 2000 18:14:43 -0500 (EST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: Message-ID: > David Ascher wrote: > But the scheme you put forth causes major problems for current Python > users who *are* using glass TTYs, so I don't think it'll fly for very > basic political reasons nicely illustrated by Dave's response. Understood. I thought that handling standard entities might be a useful first step toward storage of Python as XML, which in turn would help make Python more accessible to people who don't want to switch editors just to program. I felt that an all-or-nothing approach would be even less likely to get a favorable response than handling entities... :-) Greg From beazley at rustler.cs.uchicago.edu Tue Mar 14 00:12:55 2000 From: beazley at rustler.cs.uchicago.edu (David M. Beazley) Date: Mon, 13 Mar 2000 17:12:55 -0600 (CST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: References: <200003132235.QAA08031@rustler.cs.uchicago.edu> Message-ID: <200003132312.RAA08107@rustler.cs.uchicago.edu> gvwilson at nevex.com writes: > > 'S funny --- my non-programmer friends can't figure out why any sane > person would use a glorified glass TTY like emacs... or why they should > have to, just to program... Look, I'm all for CP4E and making programming more accessible to the masses, but as a professional programmer, I frankly do not care what non-programmers think about the tools that I (and most of the programming world) use to write software. Furthermore, if all of your non-programmer friends don't want to care about the underlying details, they certainly won't care how programs are represented---including a nice and *simple* text representation without markup, entities, and other syntax that is not an essential part of the language. However, as a professional, I most certainly DO care about how programs are represented--specifically, I want to be able to move them around between machines. Edit them with essentially any editor, transform them as I see fit, and be able to easily read them and have a sense of what is going on. Markup is just going to make this a huge pain in the butt. No, I'm not for this idea one bit. Sorry. > I just think that someone's going to do this > for some language, some time soon, and I'd rather Python be in the lead > than play catch-up. What gives you the idea that Python is behind? What is it playing catch up to? -- Dave From DavidA at ActiveState.com Tue Mar 14 00:36:54 2000 From: DavidA at ActiveState.com (David Ascher) Date: Mon, 13 Mar 2000 15:36:54 -0800 Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: Message-ID: > > David Ascher wrote: > > But the scheme you put forth causes major problems for current Python > > users who *are* using glass TTYs, so I don't think it'll fly for very > > basic political reasons nicely illustrated by Dave's response. > > Understood. I thought that handling standard entities might be a > useful first step toward storage of Python as XML, which in turn would > help make Python more accessible to people who don't want to switch > editors just to program. I felt that an all-or-nothing approach would be > even less likely to get a favorable response than handling entities... :-) > > Greg If you propose a transformation between Python Syntax and XML, then you potentially have something which all parties can agree to as being a good thing. Forcing one into the other is denying the history and current practices of both domains and user populations. You cannot ignore the fact that "I can read anyone's Python" is a key selling point of Python among its current practitioners, or that its cleanliness and lack of magic characters ($ is usually invoked, but < is just as magic/ugly) are part of its appeal/success. No XML editor is going to edit all XML documents without custom editors anyway! I certainly don't expect to be drawing SVG diagrams with a keyboard! That's what schemas and custom editors are for. Define a schema for 'encoded Python' (well, first, find a schema notation that will survive), write a plugin to your favorite XML editor, and then your (theoretical? =) users can use the same 'editor' to edit PythonXML or any other XML. Most XML probably won't be edited with a keyboard but with a pointing device or a speech recognizer anyway... IMO, you're being seduced by the apparent closeness between XML and Python-in-ASCII. It's only superficial... Think of Python-in-ASCII as a rendering of Python-in-XML, Dave will think of Python-in-XML as a rendering of Python-in-ASCII, and everyone will be happy (as long as everyone agrees on the one-to-one transformation). --david From paul at prescod.net Tue Mar 14 00:43:48 2000 From: paul at prescod.net (Paul Prescod) Date: Mon, 13 Mar 2000 15:43:48 -0800 Subject: [Python-Dev] Python 1.7 tokenization feature request References: Message-ID: <38CD7D34.6569C1AA@prescod.net> You should use your entities in the XML files, and then whatever application actually launches Python (PMZ, your make engine, XMetaL) could decode the data and launch Python. This is already how it works in XMetaL. I've just reinstalled recently so I don't have my macro file. Therefore, please excuse the Javascript (not Python) example. This is in "journalist.mcr" in the "Macros" folder of XMetaL. This already works fine for Python. You change lang="Python" and thanks to the benevalence of Bill Gates and the hard work of Mark Hammond, you can use Python for XMetaL macros. It doesn't work perfectly: exceptions crash XMetaL, last I tried. As long as you don't make mistakes, everything works nicely. :) You can write XMetaL macros in Python and the whole thing is stored as XML. Still, XMetaL is not very friendly as a Python editor. It doesn't have nice whitespace handling! -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Out of timber so crooked as that which man is made nothing entirely straight can be built. - Immanuel Kant From paul at prescod.net Tue Mar 14 00:59:23 2000 From: paul at prescod.net (Paul Prescod) Date: Mon, 13 Mar 2000 15:59:23 -0800 Subject: [Python-Dev] Python 1.7 tokenization feature request References: Message-ID: <38CD80DB.39150F33@prescod.net> gvwilson at nevex.com wrote: > > 'S funny --- my non-programmer friends can't figure out why any sane > person would use a glorified glass TTY like emacs... or why they should > have to, just to program... I just think that someone's going to do this > for some language, some time soon, and I'd rather Python be in the lead > than play catch-up. Your goal is worth pursuing but I agree with the others that the syntax change is not the right way. It _is_ possible to teach XMetaL to edit Python programs -- structurally -- just as it does XML. What you do is hook into the macro engine (which already supports Python) and use the Python tokenizer to build a parse tree. You copy that into a DOM using the same elements and attributes you would use if you were doing some kind of batch conversion. Then on "save" you reverse the process. Implementation time: ~3 days. The XMetaL competitor, Documentor has an API specifically designed to make this sort of thing easy. Making either of them into a friendly programmer's editor is a much larger task. I think this is where the majority of the R&D should occur, not at the syntax level. If one invents a fundamentally better way of working with the structures behind Python code, then it would be relatively easy to write code that maps that to today's Python syntax. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Out of timber so crooked as that which man is made nothing entirely straight can be built. - Immanuel Kant From moshez at math.huji.ac.il Tue Mar 14 02:14:09 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 14 Mar 2000 03:14:09 +0200 (IST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: Message-ID: On Mon, 13 Mar 2000 gvwilson at nevex.com wrote: > Once 1.6 is out the door, would people be willing to consider extending > Python's token set to make HTML/XML-ish spellings using entity references > legal? This would make the following 100% legal Python: > > i = 0 > while i < 10: > print i & 1 > i = i + 1 > > which would in turn make it easier to embed Python in XML such as > config-files-for-whatever-Software-Carpentry-produces-to-replace-make, > PMZ, and so on. Why? Whatever XML parser you use will output "i<1" as "i<1", so the Python that comes out of the XML parser is quite all right. Why change Python to do an XML parser job? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mhammond at skippinet.com.au Tue Mar 14 02:18:45 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue, 14 Mar 2000 12:18:45 +1100 Subject: [Python-Dev] unicode objects and C++ Message-ID: I struck a bit of a snag with the Unicode support when trying to use the most recent update in a C++ source file. The problem turned out to be that unicodeobject.h did a #include "wchar.h", but did it while an 'extern "C"' block was open. This upset the MSVC6 wchar.h, as it has special C++ support. Attached below is a patch I made to unicodeobject.h that solved my problem and allowed my compilations to succeed. Theoretically the same problem could exist for wctype.h, and probably lots of other headers, but this is the immediate problem :-) An alternative patch would be to #include "whcar.h" in PC\config.h outside of any 'extern "C"' blocks - wchar.h on Windows has guards that allows for multiple includes, so the unicodeobject.h include of that file will succeed, but not have the side-effect it has now. Im not sure what the preferred solution is - quite possibly the PC\config.h change, but Ive include the unicodeobject.h patch anyway :-) Mark. *** unicodeobject.h 2000/03/13 23:22:24 2.2 --- unicodeobject.h 2000/03/14 01:06:57 *************** *** 85,91 **** --- 85,101 ---- #endif #ifdef HAVE_WCHAR_H + + #ifdef __cplusplus + } /* Close the 'extern "C"' before bringing in system headers */ + #endif + # include "wchar.h" + + #ifdef __cplusplus + extern "C" { + #endif + #endif #ifdef HAVE_USABLE_WCHAR_T From mal at lemburg.com Tue Mar 14 00:31:30 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 14 Mar 2000 00:31:30 +0100 Subject: [Python-Dev] Python 1.7 tokenization feature request References: Message-ID: <38CD7A52.5709DF5F@lemburg.com> gvwilson at nevex.com wrote: > > > David Ascher wrote: > > But the scheme you put forth causes major problems for current Python > > users who *are* using glass TTYs, so I don't think it'll fly for very > > basic political reasons nicely illustrated by Dave's response. > > Understood. I thought that handling standard entities might be a > useful first step toward storage of Python as XML, which in turn would > help make Python more accessible to people who don't want to switch > editors just to program. I felt that an all-or-nothing approach would be > even less likely to get a favorable response than handling entities... :-) This should be easy to implement provided a hook for compile() is added to e.g. the sys-module which then gets used instead of calling the byte code compiler directly... Then you could redirect the compile() arguments to whatever codec you wish (e.g. a SGML entity codec) and the builtin compiler would only see the output of that codec. Well, just a thought... I don't think encoding programs would make life as a programmer easier, but instead harder. It adds one more level of confusion on top of it all. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Tue Mar 14 10:45:49 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 14 Mar 2000 10:45:49 +0100 Subject: [Python-Dev] unicode objects and C++ References: Message-ID: <38CE0A4D.1209B830@lemburg.com> Mark Hammond wrote: > > I struck a bit of a snag with the Unicode support when trying to use the > most recent update in a C++ source file. > > The problem turned out to be that unicodeobject.h did a #include "wchar.h", > but did it while an 'extern "C"' block was open. This upset the MSVC6 > wchar.h, as it has special C++ support. Thanks for reporting this. > Attached below is a patch I made to unicodeobject.h that solved my problem > and allowed my compilations to succeed. Theoretically the same problem > could exist for wctype.h, and probably lots of other headers, but this is > the immediate problem :-) > > An alternative patch would be to #include "whcar.h" in PC\config.h outside > of any 'extern "C"' blocks - wchar.h on Windows has guards that allows for > multiple includes, so the unicodeobject.h include of that file will succeed, > but not have the side-effect it has now. > > Im not sure what the preferred solution is - quite possibly the PC\config.h > change, but Ive include the unicodeobject.h patch anyway :-) > > Mark. > > *** unicodeobject.h 2000/03/13 23:22:24 2.2 > --- unicodeobject.h 2000/03/14 01:06:57 > *************** > *** 85,91 **** > --- 85,101 ---- > #endif > > #ifdef HAVE_WCHAR_H > + > + #ifdef __cplusplus > + } /* Close the 'extern "C"' before bringing in system headers */ > + #endif > + > # include "wchar.h" > + > + #ifdef __cplusplus > + extern "C" { > + #endif > + > #endif > > #ifdef HAVE_USABLE_WCHAR_T > I've included this patch (should solve the problem for all inlcuded system header files, since it wraps only the Unicode APIs in extern "C"): --- /home/lemburg/clients/cnri/CVS-Python/Include/unicodeobject.h Fri Mar 10 23:33:05 2000 +++ unicodeobject.h Tue Mar 14 10:38:08 2000 @@ -1,10 +1,7 @@ #ifndef Py_UNICODEOBJECT_H #define Py_UNICODEOBJECT_H -#ifdef __cplusplus -extern "C" { -#endif /* Unicode implementation based on original code by Fredrik Lundh, modified by Marc-Andre Lemburg (mal at lemburg.com) according to the @@ -167,10 +165,14 @@ typedef unsigned short Py_UNICODE; #define Py_UNICODE_MATCH(string, offset, substring)\ (!memcmp((string)->str + (offset), (substring)->str,\ (substring)->length*sizeof(Py_UNICODE))) +#ifdef __cplusplus +extern "C" { +#endif + /* --- Unicode Type ------------------------------------------------------- */ typedef struct { PyObject_HEAD int length; /* Length of raw Unicode data in buffer */ I'll post a complete Unicode update patch by the end of the week for inclusion in CVS. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From ping at lfw.org Tue Mar 14 12:19:59 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 14 Mar 2000 06:19:59 -0500 (EST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: Message-ID: On Tue, 14 Mar 2000, Moshe Zadka wrote: > On Mon, 13 Mar 2000 gvwilson at nevex.com wrote: > > legal? This would make the following 100% legal Python: > > > > i = 0 > > while i < 10: > > print i & 1 > > i = i + 1 > > Why? Whatever XML parser you use will output "i<1" as "i<1", so > the Python that comes out of the XML parser is quite all right. Why change > Python to do an XML parser job? I totally agree. To me, this is the key issue: it is NOT the responsibility of the programming language to accommodate any particular encoding format. While we're at it, why don't we change Python to accept quoted-printable source code? Or base64-encoded source code? XML already defines a perfectly reasonable mechanism for escaping a plain stream of text -- adding this processing to Python adds nothing but confusion. The possible useful benefit from adding the proposed "feature" is exactly zero. -- ?!ng "This code is better than any code that doesn't work has any right to be." -- Roger Gregory, on Xanadu From ping at lfw.org Tue Mar 14 12:21:59 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 14 Mar 2000 06:21:59 -0500 (EST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: Message-ID: On Mon, 13 Mar 2000, David Ascher wrote: > > If you propose a transformation between Python Syntax and XML, then you > potentially have something which all parties can agree to as being a good > thing. Indeed. I know that i wouldn't have any use for it at the moment, but i can see the potential for usefulness of a structured representation for Python source code (like an AST in XML) which could be directly edited in an XML editor, and processed (by an XSL stylesheet?) to produce actual runnable Python. But attempting to mix the two doesn't get you anywhere. -- ?!ng "This code is better than any code that doesn't work has any right to be." -- Roger Gregory, on Xanadu From effbot at telia.com Tue Mar 14 16:41:01 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 14 Mar 2000 16:41:01 +0100 Subject: [Python-Dev] Python 1.7 tokenization feature request References: Message-ID: <002201bf8dcb$ba9a11c0$34aab5d4@hagrid> Greg: > Understood. I thought that handling standard entities might be a > useful first step toward storage of Python as XML, which in turn would > help make Python more accessible to people who don't want to switch > editors just to program. I felt that an all-or-nothing approach would be > even less likely to get a favorable response than handling entities... :-) well, I would find it easier to support a more aggressive proposal: make sure Python 1.7 can deal with source code written in Unicode, using any supported encoding. with that in place, you can plug in your favourite unicode encoding via the Unicode framework. From effbot at telia.com Tue Mar 14 23:21:38 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 14 Mar 2000 23:21:38 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> Message-ID: <000901bf8e03$abf88420$34aab5d4@hagrid> > I've just checked in a massive patch from Marc-Andre Lemburg which > adds Unicode support to Python. massive, indeed. didn't notice this before, but I just realized that after the latest round of patches, the python15.dll is now 700k larger than it was for 1.5.2 (more than twice the size). my original unicode DLL was 13k. hmm... From akuchlin at mems-exchange.org Tue Mar 14 23:19:44 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Tue, 14 Mar 2000 17:19:44 -0500 (EST) Subject: [Python-Dev] Unicode patches checked in In-Reply-To: <000901bf8e03$abf88420$34aab5d4@hagrid> References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> Message-ID: <14542.47872.184978.985612@amarok.cnri.reston.va.us> Fredrik Lundh writes: >didn't notice this before, but I just realized that after the >latest round of patches, the python15.dll is now 700k larger >than it was for 1.5.2 (more than twice the size). Most of that is due to Modules/unicodedata.c, which is 2.1Mb of source code, and produces a 632168-byte .o file on my Sparc. (Will some compiler systems choke on a file that large? Could we read database info from a file instead, or mmap it into memory?) -- A.M. Kuchling http://starship.python.net/crew/amk/ "Are you OK, dressed like that? You don't seem to notice the cold." "I haven't come ten thousand miles to discuss the weather, Mr Moberly." -- Moberly and the Doctor, in "The Seeds of Doom" From mal at lemburg.com Wed Mar 15 09:32:29 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 15 Mar 2000 09:32:29 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> Message-ID: <38CF4A9D.13A0080@lemburg.com> "Andrew M. Kuchling" wrote: > > Fredrik Lundh writes: > >didn't notice this before, but I just realized that after the > >latest round of patches, the python15.dll is now 700k larger > >than it was for 1.5.2 (more than twice the size). > > Most of that is due to Modules/unicodedata.c, which is 2.1Mb of source > code, and produces a 632168-byte .o file on my Sparc. (Will some > compiler systems choke on a file that large? Could we read database > info from a file instead, or mmap it into memory?) That is dues to the unicodedata module being compiled into the DLL statically. On Unix you can build it shared too -- there are no direct references to it in the implementation. I suppose that on Windows the same should be done... the question really is whether this is intended or not -- moving the module into a DLL is at least technically no problem (someone would have to supply a patch for the MSVC project files though). Note that unicodedata is only needed by programs which do a lot of Unicode manipulations and in the future probably by some codecs too. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From pf at artcom-gmbh.de Wed Mar 15 11:42:26 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 15 Mar 2000 11:42:26 +0100 (MET) Subject: [Python-Dev] Unicode in Python and Tcl/Tk compared (was Unicode patches checked in...) In-Reply-To: <38CF4A9D.13A0080@lemburg.com> from "M.-A. Lemburg" at "Mar 15, 2000 9:32:29 am" Message-ID: Hi! > > Fredrik Lundh writes: > > >didn't notice this before, but I just realized that after the > > >latest round of patches, the python15.dll is now 700k larger > > >than it was for 1.5.2 (more than twice the size). > > > "Andrew M. Kuchling" wrote: > > Most of that is due to Modules/unicodedata.c, which is 2.1Mb of source > > code, and produces a 632168-byte .o file on my Sparc. (Will some > > compiler systems choke on a file that large? Could we read database > > info from a file instead, or mmap it into memory?) > M.-A. Lemburg wrote: > That is dues to the unicodedata module being compiled > into the DLL statically. On Unix you can build it shared too > -- there are no direct references to it in the implementation. > I suppose that on Windows the same should be done... the > question really is whether this is intended or not -- moving > the module into a DLL is at least technically no problem > (someone would have to supply a patch for the MSVC project > files though). > > Note that unicodedata is only needed by programs which do > a lot of Unicode manipulations and in the future probably > by some codecs too. Now as the unicode patches were checked in and as Fredrik Lundh noticed a considerable increase of the size of the python-DLL, which was obviously mostly caused by those tables, I had some fear that a Python/Tcl/Tk based application could eat up much more memory, if we update from Python1.5.2 and Tcl/Tk 8.0.5 to Python 1.6 and Tcl/Tk 8.3.0. As some of you certainly know, some kind of unicode support has also been added to Tcl/Tk since 8.1. So I did some research and would like to share what I have found out so far: Here are the compared sizes of the tcl/tk shared libs on Linux: old: | new: | bloat increase in %: -----------------------+------------------------+--------------------- libtcl8.0.so 533414 | libtcl8.3.so 610241 | 14.4 % libtk8.0.so 714908 | libtk8.3.so 811916 | 13.6 % The addition of unicode wasn't the only change to TclTk. So this seems reasonable. Unfortunately there is no python shared library, so a direct comparison of increased memory consumption is impossible. Nevertheless I've the following figures (stripped binary sizes of the Python interpreter): 1.5.2 382616 CVS_10-02-00 393668 (a month before unicode) CVS_12-03-00 507448 (just after unicode) That is an increase of "only" 111 kBytes. Not so bad but nevertheless a "bloat increase" of 32.6 %. And additionally there is now unicodedata.so 634940 _codecsmodule.so 38955 which (I guess) will also be loaded if the application starts using some of the new features. Since I didn't take care of unicode in the past, I feel unable to compare the implementations of unicode in both systems and what impact they will have on the real memory performance and even more important on the functionality of the combined use of both packages together with Tkinter. Tcl/Tk keeps around a sub-directory called 'encoding', which --I guess-- contains information somehow similar or related to that in 'unicodedata.so', but separated into several files? So below I included a shortened excerpts from the 200k+ tcl8.3.0/changes and the tk8.3.0/changes files about unicode. May be someone else more involved with unicode can shed some light on this topic? Do we need some changes to Tkinter.py or _tkinter or both? ---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ---- [...] ======== Changes for 8.1 go below this line ======== 6/18/97 (new feature) Tcl now supports international character sets: - All C APIs now accept UTF-8 strings instead of iso8859-1 strings, wherever you see "char *", unless explicitly noted otherwise. - All Tcl strings represented in UTF-8, which is a convenient multi-byte encoding of Unicode. Variable names, procedure names, and all other values in Tcl may include arbitrary Unicode characters. For example, the Tcl command "string length" returns how many Unicode characters are in the argument string. - For Java compatibility, embedded null bytes in C strings are represented as \xC080 in UTF-8 strings, but the null byte at the end of a UTF-8 string remains \0. Thus Tcl strings once again do not contain null bytes, except for termination bytes. - For Java compatibility, "\uXXXX" is used in Tcl to enter a Unicode character. "\u0000" through "\uffff" are acceptable Unicode characters. - "\xXX" is used to enter a small Unicode character (between 0 and 255) in Tcl. - Tcl automatically translates between UTF-8 and the normal encoding for the platform during interactions with the system. - The fconfigure command now supports a -encoding option for specifying the encoding of an open file or socket. Tcl will automatically translate between the specified encoding and UTF-8 during I/O. See the directory library/encoding to find out what encodings are supported (eventually there will be an "encoding" command that makes this information more accessible). - There are several new C APIs that support UTF-8 and various encodings. See Utf.3 for procedures that translate between Unicode and UTF-8 and manipulate UTF-8 strings. See Encoding.3 for procedures that create new encodings and translate between encodings. See ToUpper.3 for procedures that perform case conversions on UTF-8 strings. [...] 1/16/98 (new feature) Tk now supports international characters sets: - Font display mechanism overhauled to display Unicode strings containing full set of international characters. You do not need Unicode fonts on your system in order to use tk or see international characters. For those familiar with the Japanese or Chinese patches, there is no "-kanjifont" option. Characters from any available fonts will automatically be used if the widget's originally selected font is not capable of displaying a given character. - Textual widgets are international aware. For instance, cursor positioning commands would now move the cursor forwards/back by 1 international character, not by 1 byte. - Input Method Editors (IMEs) work on Mac and Windows. Unix is still in progress. [...] 10/15/98 (bug fix) Changed regexp and string commands to properly handle case folding according to the Unicode character tables. (stanton) 10/21/98 (new feature) Added an "encoding" command to facilitate translations of strings between different character encodings. See the encoding.n manual entry for more details. (stanton) 11/3/98 (bug fix) The regular expression character classification syntax now includes Unicode characters in the supported classes. (stanton) [...] 11/17/98 (bug fix) "scan" now correctly handles Unicode characters. (stanton) [...] 11/19/98 (bug fix) Fixed menus and titles so they properly display Unicode characters under Windows. [Bug: 819] (stanton) [...] 4/2/99 (new apis) Made various Unicode utility functions public. Tcl_UtfToUniCharDString, Tcl_UniCharToUtfDString, Tcl_UniCharLen, Tcl_UniCharNcmp, Tcl_UniCharIsAlnum, Tcl_UniCharIsAlpha, Tcl_UniCharIsDigit, Tcl_UniCharIsLower, Tcl_UniCharIsSpace, Tcl_UniCharIsUpper, Tcl_UniCharIsWordChar, Tcl_WinUtfToTChar, Tcl_WinTCharToUtf (stanton) [...] 4/5/99 (bug fix) Fixed handling of Unicode in text searches. The -count option was returning byte counts instead of character counts. [...] 5/18/99 (bug fix) Fixed clipboard code so it handles Unicode data properly on Windows NT and 95. [Bug: 1791] (stanton) [...] 6/3/99 (bug fix) Fixed selection code to handle Unicode data in COMPOUND_TEXT and STRING selections. [Bug: 1791] (stanton) [...] 6/7/99 (new feature) Optimized string index, length, range, and append commands. Added a new Unicode object type. (hershey) [...] 6/14/99 (new feature) Merged string and Unicode object types. Added new public Tcl API functions: Tcl_NewUnicodeObj, Tcl_SetUnicodeObj, Tcl_GetUnicode, Tcl_GetUniChar, Tcl_GetCharLength, Tcl_GetRange, Tcl_AppendUnicodeToObj. (hershey) [...] 6/23/99 (new feature) Updated Unicode character tables to reflect Unicode 2.1 data. (stanton) [...] --- Released 8.3.0, February 10, 2000 --- See ChangeLog for details --- ---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ---- Sorry if this was boring old stuff for some of you. Best Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From marangoz at python.inrialpes.fr Wed Mar 15 12:40:21 2000 From: marangoz at python.inrialpes.fr (Vladimir Marangozov) Date: Wed, 15 Mar 2000 12:40:21 +0100 (CET) Subject: [Python-Dev] Unicode patches checked in In-Reply-To: <38CF4A9D.13A0080@lemburg.com> from "M.-A. Lemburg" at Mar 15, 2000 09:32:29 AM Message-ID: <200003151140.MAA30301@python.inrialpes.fr> M.-A. Lemburg wrote: > > Note that unicodedata is only needed by programs which do > a lot of Unicode manipulations and in the future probably > by some codecs too. Perhaps it would make sense to move the Unicode database on the Python side (write it in Python)? Or init the database dynamically in the unicodedata module on import? It's quite big, so if it's possible to avoid the static declaration (and if the unicodata module is enabled by default), I'd vote for a dynamic initialization of the database from reference (Python ?) file(s). M-A, is something in this spirit doable? -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tismer at tismer.com Wed Mar 15 13:57:04 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 15 Mar 2000 13:57:04 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> Message-ID: <38CF88A0.CF876A74@tismer.com> "M.-A. Lemburg" wrote: ... > Note that unicodedata is only needed by programs which do > a lot of Unicode manipulations and in the future probably > by some codecs too. Would it be possible to make the Unicode support configurable? My problem is that patches in the CVS are of different kinds. Some are error corrections and enhancements which I would definately like to use. Others are brand new features like the Unicode support. Absolutely great stuff! But this will most probably change a number of times again, and I think it is a bad idea when I include it into my Stackless distribution. I'd appreciate it very much if I could use the same CVS tree for testing new stuff, and to build my distribution, with new features switched off. Please :-) ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From jim at digicool.com Wed Mar 15 14:35:48 2000 From: jim at digicool.com (Jim Fulton) Date: Wed, 15 Mar 2000 08:35:48 -0500 Subject: [Python-Dev] Finalizers considered questionable ;) Message-ID: <38CF91B4.A36C8C5@digicool.com> Here's my $0.02. I agree with the sentiments that use of finalizers should be discouraged. They are extremely helpful in cases like tempfile.TemporaryFileWrapper, so I think that they should be supported. I do think that the language should not promise a high level of service. Some observations: - I spent a little bit of time on the ANSI Smalltalk committee, where I naively advocated adding finalizers to the language. I was resoundingly told no. :) - Most of the Python objects I deal with these days are persistent. Their lifetimes are a lot more complicated that most Python objects. They get created once, but they get loaded into and out of memory many times. In fact, they can be in memory many times simultaneously. :) A couple of years ago I realized that it only made sense to call __init__ when an object was first created, not when it is subsequently (re)loaded into memory. This led to a change in Python pickling semantics and the deprecation of the loathsome __getinitargs__ protocol. :) For me, a similar case can be made against use of __del__ for persistent objects. For persistent objects, a __del__ method should only be used for cleaning up the most volatile of resources. A persistent object __del__ should not perform any semantically meaningful operations because __del__ has no semantic meaning. - Zope has a few uses of __del__. These are all for non-persistent objects. Interesting, in grepping for __del__, I found a lot of cases where __del__ was used and then commented out. Finalizers seem to be the sort of thing that people want initially and then get over. I'm inclined to essentially keep the current rules and simply not promise that __del__ will be able to run correctly. That is, Python should call __del__ and ignore exceptions raised (or provide some *optional* logging or other debugging facility). There is no reason for __del__ to fail unless it depends on cyclicly-related objects, which should be viewed as a design mistake. OTOH, __del__ should never fail because module globals go away. IMO, the current circular references involving module globals are unnecessary, but that's a different topic. ;) Jim -- Jim Fulton mailto:jim at digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From mal at lemburg.com Wed Mar 15 16:00:14 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 15 Mar 2000 16:00:14 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> Message-ID: <38CFA57E.21A3B3EF@lemburg.com> Christian Tismer wrote: > > "M.-A. Lemburg" wrote: > ... > > > Note that unicodedata is only needed by programs which do > > a lot of Unicode manipulations and in the future probably > > by some codecs too. > > Would it be possible to make the Unicode support configurable? This is currently not planned as the Unicode integration touches many different parts of the interpreter to enhance string/Unicode integration... sorry. Also, I'm not sure whether adding #ifdefs throuhgout the code would increase its elegance ;-) > My problem is that patches in the CVS are of different kinds. > Some are error corrections and enhancements which I would > definately like to use. > Others are brand new features like the Unicode support. > Absolutely great stuff! But this will most probably change > a number of times again, and I think it is a bad idea when > I include it into my Stackless distribution. Why not ? All you have to do is rebuild the distribution every time you push a new version -- just like I did for the Unicode version before the CVS checkin was done. > I'd appreciate it very much if I could use the same CVS tree > for testing new stuff, and to build my distribution, with > new features switched off. Please :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Mar 15 15:57:13 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 15 Mar 2000 15:57:13 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003151140.MAA30301@python.inrialpes.fr> Message-ID: <38CFA4C9.E6B8EB5D@lemburg.com> Vladimir Marangozov wrote: > > M.-A. Lemburg wrote: > > > > Note that unicodedata is only needed by programs which do > > a lot of Unicode manipulations and in the future probably > > by some codecs too. > > Perhaps it would make sense to move the Unicode database on the > Python side (write it in Python)? Or init the database dynamically > in the unicodedata module on import? It's quite big, so if it's > possible to avoid the static declaration (and if the unicodata module > is enabled by default), I'd vote for a dynamic initialization of the > database from reference (Python ?) file(s). The unicodedatabase module contains the Unicode database as static C data - this makes it shareable among (Python) processes. Python modules don't provide this feature: instead a dictionary would have to be built on import which would increase the heap size considerably. Those dicts would *not* be shareable. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tismer at tismer.com Wed Mar 15 16:20:06 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 15 Mar 2000 16:20:06 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> Message-ID: <38CFAA26.2B2F0D01@tismer.com> "M.-A. Lemburg" wrote: > > Christian Tismer wrote: ... > > Absolutely great stuff! But this will most probably change > > a number of times again, and I think it is a bad idea when > > I include it into my Stackless distribution. > > Why not ? All you have to do is rebuild the distribution > every time you push a new version -- just like I did > for the Unicode version before the CVS checkin was done. But how can I then publish my source code, when I always pull Unicode into it. I don't like to be exposed to side effects like 700kb code bloat, just by chance, since it is in the dist right now (and will vanish again). I don't say there must be #ifdefs all and everywhere, but can I build without *using* Unicode? I don't want to introduce something new to my users what they didn't ask for. And I don't want to take care about their installations. Finally I will for sure not replace a 500k DLL by a 1.2M monster, so this is definately not what I want at the moment. How do I build a dist that doesn't need to change a lot of stuff in the user's installation? Note that Stackless Python is a drop-in replacement, not a Python distribution. Or should it be? ciao - chris (who really wants to get SLP 1.1 out) -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From effbot at telia.com Wed Mar 15 17:04:54 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 15 Mar 2000 17:04:54 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> Message-ID: <014001bf8e98$35644480$34aab5d4@hagrid> CT: > How do I build a dist that doesn't need to change a lot of > stuff in the user's installation? somewhere in this thread, Guido wrote: > BTW, I added a tag "pre-unicode" to the CVS tree to the revisions > before the Unicode changes were made. maybe you could base SLP on that one? From marangoz at python.inrialpes.fr Wed Mar 15 17:27:36 2000 From: marangoz at python.inrialpes.fr (Vladimir Marangozov) Date: Wed, 15 Mar 2000 17:27:36 +0100 (CET) Subject: [Python-Dev] Unicode patches checked in In-Reply-To: <38CFA4C9.E6B8EB5D@lemburg.com> from "M.-A. Lemburg" at Mar 15, 2000 03:57:13 PM Message-ID: <200003151627.RAA32543@python.inrialpes.fr> > [me] > > > > Perhaps it would make sense to move the Unicode database on the > > Python side (write it in Python)? Or init the database dynamically > > in the unicodedata module on import? It's quite big, so if it's > > possible to avoid the static declaration (and if the unicodata module > > is enabled by default), I'd vote for a dynamic initialization of the > > database from reference (Python ?) file(s). [Marc-Andre] > > The unicodedatabase module contains the Unicode database > as static C data - this makes it shareable among (Python) > processes. The static data is shared if the module is a shared object (.so). If unicodedata is not a .so, then you'll have a seperate copy of the database in each process. > > Python modules don't provide this feature: instead a dictionary > would have to be built on import which would increase the heap > size considerably. Those dicts would *not* be shareable. I haven't mentioned dicts, have I? I suggested that the entries in the C version of the database be rewritten in Python (or a text file) The unicodedata module would, in it's init function, allocate memory for the database and would populate it before returning "import okay" to Python -- this is one way to init the db dynamically, among others. As to sharing the database among different processes, this is a classic IPC pb, which has nothing to do with the static C declaration of the db. Or, hmmm, one of us is royally confused . -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tismer at tismer.com Wed Mar 15 17:22:42 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 15 Mar 2000 17:22:42 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> <014001bf8e98$35644480$34aab5d4@hagrid> Message-ID: <38CFB8D2.537FCAD9@tismer.com> Fredrik Lundh wrote: > > CT: > > How do I build a dist that doesn't need to change a lot of > > stuff in the user's installation? > > somewhere in this thread, Guido wrote: > > > BTW, I added a tag "pre-unicode" to the CVS tree to the revisions > > before the Unicode changes were made. > > maybe you could base SLP on that one? I have no idea how this works. Would this mean that I cannot get patctes which come after unicode? Meanwhile, I've looked into the sources. It is easy for me to get rid of the problem by supplying my own unicodedata.c, where I replace all functions by some unimplemented exception. Furthermore, I wondered about the data format. Is the unicode database used inyou re package as well? Otherwise, I see only references form unicodedata.c, and that means the data structure can be massively enhanced. At the moment, that baby is 64k entries long, with four bytes and an optional string. This is a big waste. The strings are almost all some distinct prefixes, together with a list of hex smallwords. This is done as strings, probably this makes 80 percent of the space. The only function that uses the "decomposition" field (namely the string) is unicodedata_decomposition. It does nothing more than to wrap it into a PyObject. We can do a little better here. I gues I can bring it down to a third of this space without much effort, just by using - binary encoding for the tags as enumeration - binary encoding of the hexed entries - omission of the spaces Instead of a 64 k of structures which contain pointers anyway, I can use a 64k pointer array with offsets into one packed table. The unicodedata access functions would change *slightly*, just building some hex strings and so on. I guess this is not a time critical section? Should I try this evening? :-) cheers - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From mal at lemburg.com Wed Mar 15 17:04:43 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 15 Mar 2000 17:04:43 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> Message-ID: <38CFB49B.885B8B16@lemburg.com> Christian Tismer wrote: > > "M.-A. Lemburg" wrote: > > > > Christian Tismer wrote: > ... > > > Absolutely great stuff! But this will most probably change > > > a number of times again, and I think it is a bad idea when > > > I include it into my Stackless distribution. > > > > Why not ? All you have to do is rebuild the distribution > > every time you push a new version -- just like I did > > for the Unicode version before the CVS checkin was done. > > But how can I then publish my source code, when I always > pull Unicode into it. I don't like to be exposed to > side effects like 700kb code bloat, just by chance, since it > is in the dist right now (and will vanish again). All you have to do is build the unicodedata module shared and not statically bound into python.dll. This one module causes most of the code bloat... > I don't say there must be #ifdefs all and everywhere, but > can I build without *using* Unicode? I don't want to > introduce something new to my users what they didn't ask for. > And I don't want to take care about their installations. > Finally I will for sure not replace a 500k DLL by a 1.2M > monster, so this is definately not what I want at the moment. > > How do I build a dist that doesn't need to change a lot of > stuff in the user's installation? I don't think that the Unicode stuff will disable the running environment... (haven't tried this though). The unicodedata module is not used by the interpreter and the rest is imported on-the-fly, not during init time, so at least in theory, not using Unicode will result in Python not looking for e.g. the encodings package. > Note that Stackless Python is a drop-in replacement, > not a Python distribution. Or should it be? Probably... I think it's simply easier to install and probably also easier to maintain because it doesn't cause dependencies on other "default" installations. The user will then explicitly know that she is installing something a little different from the default distribution... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Mar 15 18:26:15 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 15 Mar 2000 18:26:15 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> <014001bf8e98$35644480$34aab5d4@hagrid> <38CFB8D2.537FCAD9@tismer.com> Message-ID: <38CFC7B7.A1ABD51C@lemburg.com> Christian Tismer wrote: > > Fredrik Lundh wrote: > > > > CT: > > > How do I build a dist that doesn't need to change a lot of > > > stuff in the user's installation? > > > > somewhere in this thread, Guido wrote: > > > > > BTW, I added a tag "pre-unicode" to the CVS tree to the revisions > > > before the Unicode changes were made. > > > > maybe you could base SLP on that one? > > I have no idea how this works. Would this mean that I cannot > get patctes which come after unicode? > > Meanwhile, I've looked into the sources. It is easy for me > to get rid of the problem by supplying my own unicodedata.c, > where I replace all functions by some unimplemented exception. No need (see my other posting): simply disable the module altogether... this shouldn't hurt any part of the interpreter as the module is a user-land only module. > Furthermore, I wondered about the data format. Is the unicode > database used inyou re package as well? Otherwise, I see > only references form unicodedata.c, and that means the data > structure can be massively enhanced. > At the moment, that baby is 64k entries long, with four bytes > and an optional string. > This is a big waste. The strings are almost all some distinct > prefixes, together with a list of hex smallwords. This > is done as strings, probably this makes 80 percent of the space. I have made no attempt to optimize the structure... (due to lack of time mostly) the current implementation is really not much different from a rewrite of the UnicodeData.txt file availble at the unicode.org site. If you want to, I can mail you the marshalled Python dict version of that database to play with. > The only function that uses the "decomposition" field (namely > the string) is unicodedata_decomposition. It does nothing > more than to wrap it into a PyObject. > We can do a little better here. I gues I can bring it down > to a third of this space without much effort, just by using > - binary encoding for the tags as enumeration > - binary encoding of the hexed entries > - omission of the spaces > Instead of a 64 k of structures which contain pointers anyway, > I can use a 64k pointer array with offsets into one packed > table. > > The unicodedata access functions would change *slightly*, > just building some hex strings and so on. I guess this > is not a time critical section? It may be if these functions are used in codecs, so you should pay attention to speed too... > Should I try this evening? :-) Sure :-) go ahead... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Mar 15 18:39:14 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 15 Mar 2000 18:39:14 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003151627.RAA32543@python.inrialpes.fr> Message-ID: <38CFCAC2.7690DF55@lemburg.com> Vladimir Marangozov wrote: > > > [me] > > > > > > Perhaps it would make sense to move the Unicode database on the > > > Python side (write it in Python)? Or init the database dynamically > > > in the unicodedata module on import? It's quite big, so if it's > > > possible to avoid the static declaration (and if the unicodata module > > > is enabled by default), I'd vote for a dynamic initialization of the > > > database from reference (Python ?) file(s). > > [Marc-Andre] > > > > The unicodedatabase module contains the Unicode database > > as static C data - this makes it shareable among (Python) > > processes. > > The static data is shared if the module is a shared object (.so). > If unicodedata is not a .so, then you'll have a seperate copy of the > database in each process. Uhm, comparing the two versions Python 1.5 and the current CVS Python I get these figures on Linux: Executing : ./python -i -c '1/0' Python 1.5: 1208kB / 728 kB (resident/shared) Python CVS: 1280kB / 808 kB ("/") Not much of a change if you ask me and the CVS version has the unicodedata module linked statically... so there's got to be some sharing and load-on-demand going on behind the scenes: this is what I was referring to when I mentioned static C data. The OS can much better deal with these sharing techniques and delayed loads than anything we could implement on top of it in C or Python. But perhaps this is Linux-specific... > > Python modules don't provide this feature: instead a dictionary > > would have to be built on import which would increase the heap > > size considerably. Those dicts would *not* be shareable. > > I haven't mentioned dicts, have I? I suggested that the entries in the > C version of the database be rewritten in Python (or a text file) > The unicodedata module would, in it's init function, allocate memory > for the database and would populate it before returning "import okay" > to Python -- this is one way to init the db dynamically, among others. I'm leaving this as exercise to the interested reader ;-) Really, if you have better ideas for the unicodedata module, please go ahead. > As to sharing the database among different processes, this is a classic > IPC pb, which has nothing to do with the static C declaration of the db. > Or, hmmm, one of us is royally confused . Could you check this on other platforms ? Perhaps Linux is doing more than other OSes are in this field. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From effbot at telia.com Wed Mar 15 19:23:59 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 15 Mar 2000 19:23:59 +0100 Subject: [Python-Dev] first public SRE snapshot now available! References: <200003151627.RAA32543@python.inrialpes.fr> <38CFCAC2.7690DF55@lemburg.com> Message-ID: <01f901bf8eab$a353e780$34aab5d4@hagrid> I just uploaded the first public SRE snapshot to: http://w1.132.telia.com/~u13208596/sre.htm -- this kit contains windows binaries only (make sure you have built the interpreter from a recent CVS version) -- the engine fully supports unicode target strings. (not sure about the pattern compiler, though...) -- it's probably buggy as hell. for things I'm working on at this very moment, see: http://w1.132.telia.com/~u13208596/sre/status.htm I hope to get around to fix the core dump (it crashes half- ways through sre_fulltest.py, by no apparent reason) and the backreferencing problem later today. stay tuned. PS. note that "public" doesn't really mean "suitable for the c.l.python crowd", or "suitable for production use". in other words, let's keep this one on this list for now. thanks! From tismer at tismer.com Wed Mar 15 19:15:27 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 15 Mar 2000 19:15:27 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> <014001bf8e98$35644480$34aab5d4@hagrid> <38CFB8D2.537FCAD9@tismer.com> <38CFC7B7.A1ABD51C@lemburg.com> Message-ID: <38CFD33F.3C02BF43@tismer.com> "M.-A. Lemburg" wrote: > > Christian Tismer wrote: [the old data comression guy has been reanimated] > If you want to, I can mail you the marshalled Python dict version of > that database to play with. ... > > Should I try this evening? :-) > > Sure :-) go ahead... Thank you. Meanwhile I've heard that there is some well-known bot working on that under the hood, with a much better approach than mine. So I'll take your advice, and continue to write silly stackless enhancements. They say this is my destiny :-) ciao - continuous -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From DavidA at ActiveState.com Wed Mar 15 19:21:40 2000 From: DavidA at ActiveState.com (David Ascher) Date: Wed, 15 Mar 2000 10:21:40 -0800 Subject: [Python-Dev] Unicode patches checked in In-Reply-To: <38CFA4C9.E6B8EB5D@lemburg.com> Message-ID: > The unicodedatabase module contains the Unicode database > as static C data - this makes it shareable among (Python) > processes. > > Python modules don't provide this feature: instead a dictionary > would have to be built on import which would increase the heap > size considerably. Those dicts would *not* be shareable. I know it's complicating things, but wouldn't an mmap'ed buffer allow inter-process sharing while keeping DLL size down and everything on-disk until needed? Yes, I know, mmap calls aren't uniform across platforms and isn't supported on all platforms -- I still think that it's silly not to use it on those platforms where it is available, and I'd like to see mmap unification move forward, so this is as good a motivation as any to bite the bullet. Just a thought, --david From jim at digicool.com Wed Mar 15 19:24:53 2000 From: jim at digicool.com (Jim Fulton) Date: Wed, 15 Mar 2000 13:24:53 -0500 Subject: [Python-Dev] Allowing multiple socket maps in asyncore (and asynchat) Message-ID: <38CFD575.A0536439@digicool.com> I find asyncore to be quite useful, however, it is currently geared to having a single main loop. It uses a global socket map that all asyncore dispatchers register with. I have an application in which I want to have multiple socket maps. I propose that we start moving toward a model in which selection of a socket map and control of the asyncore loop is a bit more explicit. If no one objects, I'll work up some initial patches. Who should I submit these to? Sam? Should the medusa public CVS form the basis? Jim -- Jim Fulton mailto:jim at digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From jcw at equi4.com Wed Mar 15 20:39:37 2000 From: jcw at equi4.com (Jean-Claude Wippler) Date: Wed, 15 Mar 2000 20:39:37 +0100 Subject: [Python-Dev] Unicode patches checked in References: Message-ID: <38CFE6F9.3E8E9385@equi4.com> David Ascher wrote: [shareable unicodedatabase] > I know it's complicating things, but wouldn't an mmap'ed buffer allow > inter-process sharing while keeping DLL size down and everything > on-disk until needed? AFAIK, on platforms which support mmap, static data already gets mmap'ed in by the OS (just like all code), so this might have little effect. I'm more concerned by the distribution size increase. -jcw From bwarsaw at cnri.reston.va.us Wed Mar 15 19:41:00 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 15 Mar 2000 13:41:00 -0500 (EST) Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> <014001bf8e98$35644480$34aab5d4@hagrid> Message-ID: <14543.55612.969101.206695@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> somewhere in this thread, Guido wrote: >> BTW, I added a tag "pre-unicode" to the CVS tree to the >> revisions before the Unicode changes were made. FL> maybe you could base SLP on that one? /F's got it exactly right. Check out a new directory using a stable tag (maybe you want to base your changes on pre-unicode tag, or python 1.52?). Patch in that subtree and then eventually you'll have to merge your changes into the head of the branch. -Barry From rushing at nightmare.com Thu Mar 16 02:52:22 2000 From: rushing at nightmare.com (Sam Rushing) Date: Wed, 15 Mar 2000 17:52:22 -0800 (PST) Subject: [Python-Dev] Allowing multiple socket maps in asyncore (and asynchat) In-Reply-To: <38CFD575.A0536439@digicool.com> References: <38CFD575.A0536439@digicool.com> Message-ID: <14544.15958.546712.466506@seattle.nightmare.com> Jim Fulton writes: > I find asyncore to be quite useful, however, it is currently > geared to having a single main loop. It uses a global socket > map that all asyncore dispatchers register with. > > I have an application in which I want to have multiple > socket maps. But still only a single event loop, yes? Why do you need multiple maps? For a priority system of some kind? > I propose that we start moving toward a model in which selection of > a socket map and control of the asyncore loop is a bit more > explicit. > > If no one objects, I'll work up some initial patches. If it can be done in a backward-compatible fashion, that sounds fine; but it sounds tricky. Even the simple {:object...} change broke so many things that we're still using the old stuff at eGroups. > Who should I submit these to? Sam? > Should the medusa public CVS form the basis? Yup, yup. -Sam From tim_one at email.msn.com Thu Mar 16 08:06:23 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 16 Mar 2000 02:06:23 -0500 Subject: [Python-Dev] Finalizers considered questionable ;) In-Reply-To: <38CF91B4.A36C8C5@digicool.com> Message-ID: <000201bf8f16$237e5e80$662d153f@tim> [Jim Fulton] > ... > There is no reason for __del__ to fail unless it depends on > cyclicly-related objects, which should be viewed as a design > mistake. > > OTOH, __del__ should never fail because module globals go away. > IMO, the current circular references involving module globals are > unnecessary, but that's a different topic. ;) IOW, you view "the current circular references involving module globals" as "a design mistake" . And perhaps they are! I wouldn't call it a different topic, though: so long as people are *viewing* shutdown __del__ problems as just another instance of finalizers in cyclic trash, it makes the latter *seem* inescapably "normal", and so something that has to be catered to. If you have a way to take the shutdown problems out of the discussion, it would help clarify both topics, at the very least by deconflating them. it's-a-mailing-list-so-no-need-to-stay-on-topic-ly y'rs - tim From gstein at lyra.org Thu Mar 16 13:01:36 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 16 Mar 2000 04:01:36 -0800 (PST) Subject: [Python-Dev] Unicode patches checked in In-Reply-To: <38CF88A0.CF876A74@tismer.com> Message-ID: On Wed, 15 Mar 2000, Christian Tismer wrote: >... > Would it be possible to make the Unicode support configurable? This might be interesting from the standpoint of those guys who are doing the tiny Python interpreter thingy for embedded systems. > My problem is that patches in the CVS are of different kinds. > Some are error corrections and enhancements which I would > definately like to use. > Others are brand new features like the Unicode support. > Absolutely great stuff! But this will most probably change > a number of times again, and I think it is a bad idea when > I include it into my Stackless distribution. > > I'd appreciate it very much if I could use the same CVS tree > for testing new stuff, and to build my distribution, with > new features switched off. Please :-) But! I find this reason completely off the mark. In essence, you're arguing that we should not put *any* new feature into the CVS repository because it might mess up what *you* are doing. Sorry, but that just irks me. If you want a stable Python, then don't use the CVS version. Or base it off a specific tag in CVS. Or something. Just don't ask for development to be stopped. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Thu Mar 16 13:08:43 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 16 Mar 2000 04:08:43 -0800 (PST) Subject: [Python-Dev] const data (was: Unicode patches checked in) In-Reply-To: <200003151627.RAA32543@python.inrialpes.fr> Message-ID: On Wed, 15 Mar 2000, Vladimir Marangozov wrote: > > [me] > > > > > > Perhaps it would make sense to move the Unicode database on the > > > Python side (write it in Python)? Or init the database dynamically > > > in the unicodedata module on import? It's quite big, so if it's > > > possible to avoid the static declaration (and if the unicodata module > > > is enabled by default), I'd vote for a dynamic initialization of the > > > database from reference (Python ?) file(s). > > [Marc-Andre] > > > > The unicodedatabase module contains the Unicode database > > as static C data - this makes it shareable among (Python) > > processes. > > The static data is shared if the module is a shared object (.so). > If unicodedata is not a .so, then you'll have a seperate copy of the > database in each process. Nope. A shared module means that multiple executables can share the code. Whether the const data resides in an executable or a .so, the OS will map it into readonly memory and share it across all procsses. > > Python modules don't provide this feature: instead a dictionary > > would have to be built on import which would increase the heap > > size considerably. Those dicts would *not* be shareable. > > I haven't mentioned dicts, have I? I suggested that the entries in the > C version of the database be rewritten in Python (or a text file) > The unicodedata module would, in it's init function, allocate memory > for the database and would populate it before returning "import okay" > to Python -- this is one way to init the db dynamically, among others. This would place all that data into the per-process heap. Definitely not shared, and definitely a big hit for each Python process. > As to sharing the database among different processes, this is a classic > IPC pb, which has nothing to do with the static C declaration of the db. > Or, hmmm, one of us is royally confused . This isn't IPC. It is sharing of some constant data. The most effective way to manage this is through const C data. The OS will properly manage it. And sorry, David, but mmap'ing a file will simply add complexity. As jcw mentioned, the OS is pretty much doing this anyhow when it deals with a const data segment in your executable. I don't believe this is Linux specific. This kind of stuff has been done for a *long* time on the platforms, too. Side note: the most effective way of exposing this const data up to Python (without shoving it onto the heap) is through buffers created via: PyBuffer_FromMemory(ptr, size) This allows the data to reside in const, shared memory while it is also exposed up to Python. Cheers, -g -- Greg Stein, http://www.lyra.org/ From marangoz at python.inrialpes.fr Thu Mar 16 13:39:42 2000 From: marangoz at python.inrialpes.fr (Vladimir Marangozov) Date: Thu, 16 Mar 2000 13:39:42 +0100 (CET) Subject: [Python-Dev] const data (was: Unicode patches checked in) In-Reply-To: from "Greg Stein" at Mar 16, 2000 04:08:43 AM Message-ID: <200003161239.NAA01671@python.inrialpes.fr> Greg Stein wrote: > > [me] > > The static data is shared if the module is a shared object (.so). > > If unicodedata is not a .so, then you'll have a seperate copy of the > > database in each process. > > Nope. A shared module means that multiple executables can share the code. > Whether the const data resides in an executable or a .so, the OS will map > it into readonly memory and share it across all procsses. I must have been drunk yesterday. You're right. > I don't believe this is Linux specific. This kind of stuff has been done > for a *long* time on the platforms, too. Yes. > > Side note: the most effective way of exposing this const data up to Python > (without shoving it onto the heap) is through buffers created via: > PyBuffer_FromMemory(ptr, size) > This allows the data to reside in const, shared memory while it is also > exposed up to Python. And to avoid the size increase of the Python library, perhaps unicodedata needs to be uncommented by default in Setup.in (for the release, not now). As M-A pointed out, the module isn't isn't necessary for the normal operation of the interpreter. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From gstein at lyra.org Thu Mar 16 13:56:21 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 16 Mar 2000 04:56:21 -0800 (PST) Subject: [Python-Dev] Finalizers considered questionable ;) In-Reply-To: <000201bf8f16$237e5e80$662d153f@tim> Message-ID: On Thu, 16 Mar 2000, Tim Peters wrote: >... > IOW, you view "the current circular references involving module globals" as > "a design mistake" . And perhaps they are! I wouldn't call it a > different topic, though: so long as people are *viewing* shutdown __del__ > problems as just another instance of finalizers in cyclic trash, it makes > the latter *seem* inescapably "normal", and so something that has to be > catered to. If you have a way to take the shutdown problems out of the > discussion, it would help clarify both topics, at the very least by > deconflating them. Bah. Module globals are easy. My tp_clean suggestion handles them quite easily at shutdown. No more special-code in import.c. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tismer at tismer.com Thu Mar 16 13:53:46 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 16 Mar 2000 13:53:46 +0100 Subject: [Python-Dev] Unicode patches checked in References: Message-ID: <38D0D95A.B13EC17E@tismer.com> Greg Stein wrote: > > On Wed, 15 Mar 2000, Christian Tismer wrote: > >... > > Would it be possible to make the Unicode support configurable? > > This might be interesting from the standpoint of those guys who are doing > the tiny Python interpreter thingy for embedded systems. > > > My problem is that patches in the CVS are of different kinds. > > Some are error corrections and enhancements which I would > > definately like to use. > > Others are brand new features like the Unicode support. > > Absolutely great stuff! But this will most probably change > > a number of times again, and I think it is a bad idea when > > I include it into my Stackless distribution. > > > > I'd appreciate it very much if I could use the same CVS tree > > for testing new stuff, and to build my distribution, with > > new features switched off. Please :-) > > But! I find this reason completely off the mark. In essence, you're > arguing that we should not put *any* new feature into the CVS repository > because it might mess up what *you* are doing. No, this is your interpretation, and a reduction which I can't follow. There are inprovements and features in the CVS version which I need. I prefer to build against it, instead of the old 1.5.2. What's wrong with that? I want to find a way that gives me the least trouble in doing so. > Sorry, but that just irks me. If you want a stable Python, then don't use > the CVS version. Or base it off a specific tag in CVS. Or something. Just > don't ask for development to be stopped. No, I ask for development to be stopped. Code freeze until Y3k :-) Why are you trying to put such a nonsense into my mouth? You know that I know that you know better. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From tismer at tismer.com Thu Mar 16 14:25:48 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 16 Mar 2000 14:25:48 +0100 Subject: [Python-Dev] const data (was: Unicode patches checked in) References: <200003161239.NAA01671@python.inrialpes.fr> Message-ID: <38D0E0DC.B997F836@tismer.com> Vladimir Marangozov wrote: > > Greg Stein wrote: > > Side note: the most effective way of exposing this const data up to Python > > (without shoving it onto the heap) is through buffers created via: > > PyBuffer_FromMemory(ptr, size) > > This allows the data to reside in const, shared memory while it is also > > exposed up to Python. > > And to avoid the size increase of the Python library, perhaps unicodedata > needs to be uncommented by default in Setup.in (for the release, not now). > As M-A pointed out, the module isn't isn't necessary for the normal > operation of the interpreter. Sounds like a familiar idea. :-) BTW., yesterday evening I wrote an analysis script, to see how far this data is compactable without going into real compression, just redundancy folding and byte/short indexing was used. If I'm not wrong, this reduces the size of the database to less than 25kb. That small amount of extra data would make the uncommenting feature quite unimportant, except for the issue of building tiny Pythons. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From gstein at lyra.org Thu Mar 16 14:06:46 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 16 Mar 2000 05:06:46 -0800 (PST) Subject: [Python-Dev] Unicode patches checked in In-Reply-To: <38D0D95A.B13EC17E@tismer.com> Message-ID: On Thu, 16 Mar 2000, Christian Tismer wrote: > Greg Stein wrote: >... > > Sorry, but that just irks me. If you want a stable Python, then don't use > > the CVS version. Or base it off a specific tag in CVS. Or something. Just > > don't ask for development to be stopped. > > No, I ask for development to be stopped. Code freeze until Y3k :-) > Why are you trying to put such a nonsense into my mouth? > You know that I know that you know better. Simply because that is what it sounds like on this side of my monitor :-) I'm seeing your request as asking for people to make special considerations in their patches for your custom distribution. While I don't have a problem with making Python more flexible to distro maintainers, it seemed like you were approaching it from the "wrong" angle. Like I said, making Unicode optional for the embedded space makes sense; making it optional so it doesn't bloat your distro didn't :-) Not a big deal... it is mostly a perception on my part. I also tend to dislike things that hold development back. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Fri Mar 17 19:53:39 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 17 Mar 2000 19:53:39 +0100 Subject: [Python-Dev] Unicode Update 2000-03-17 Message-ID: <38D27F33.4055A942@lemburg.com> Attached you find an update of the Unicode implementation. The patch is against the current CVS version. I would appreciate if someone with CVS checkin permissions could check the changes in. The patch contains all bugs and patches sent this week and also fixes a leak in the codecs code and a bug in the free list code for Unicode objects (which only shows up when compiling Python with Py_DEBUG; thanks to MarkH for spotting this one). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ -------------- next part -------------- Only in CVS-Python/Doc/tools: anno-api.py diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Include/unicodeobject.h Python+Unicode/Include/unicodeobject.h --- CVS-Python/Include/unicodeobject.h Fri Mar 17 15:24:30 2000 +++ Python+Unicode/Include/unicodeobject.h Tue Mar 14 10:38:08 2000 @@ -1,8 +1,5 @@ #ifndef Py_UNICODEOBJECT_H #define Py_UNICODEOBJECT_H -#ifdef __cplusplus -extern "C" { -#endif /* @@ -109,8 +106,9 @@ /* --- Internal Unicode Operations ---------------------------------------- */ /* If you want Python to use the compiler's wctype.h functions instead - of the ones supplied with Python, define WANT_WCTYPE_FUNCTIONS. - This reduces the interpreter's code size. */ + of the ones supplied with Python, define WANT_WCTYPE_FUNCTIONS or + configure Python using --with-ctype-functions. This reduces the + interpreter's code size. */ #if defined(HAVE_USABLE_WCHAR_T) && defined(WANT_WCTYPE_FUNCTIONS) @@ -169,6 +167,10 @@ (!memcmp((string)->str + (offset), (substring)->str,\ (substring)->length*sizeof(Py_UNICODE))) +#ifdef __cplusplus +extern "C" { +#endif + /* --- Unicode Type ------------------------------------------------------- */ typedef struct { @@ -647,7 +649,7 @@ int direction /* Find direction: +1 forward, -1 backward */ ); -/* Count the number of occurances of substr in str[start:end]. */ +/* Count the number of occurrences of substr in str[start:end]. */ extern DL_IMPORT(int) PyUnicode_Count( PyObject *str, /* String */ @@ -656,7 +658,7 @@ int end /* Stop index */ ); -/* Replace at most maxcount occurances of substr in str with replstr +/* Replace at most maxcount occurrences of substr in str with replstr and return the resulting Unicode object. */ extern DL_IMPORT(PyObject *) PyUnicode_Replace( diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/codecs.py Python+Unicode/Lib/codecs.py --- CVS-Python/Lib/codecs.py Sat Mar 11 00:20:43 2000 +++ Python+Unicode/Lib/codecs.py Mon Mar 13 14:33:54 2000 @@ -55,7 +55,7 @@ """ def encode(self,input,errors='strict'): - """ Encodes the object intput and returns a tuple (output + """ Encodes the object input and returns a tuple (output object, length consumed). errors defines the error handling to apply. It defaults to diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/encodings/__init__.py Python+Unicode/Lib/encodings/__init__.py --- CVS-Python/Lib/encodings/__init__.py Sat Mar 11 00:17:18 2000 +++ Python+Unicode/Lib/encodings/__init__.py Mon Mar 13 14:30:33 2000 @@ -30,13 +30,13 @@ import string,codecs,aliases _cache = {} -_unkown = '--unkown--' +_unknown = '--unknown--' def search_function(encoding): # Cache lookup - entry = _cache.get(encoding,_unkown) - if entry is not _unkown: + entry = _cache.get(encoding,_unknown) + if entry is not _unknown: return entry # Import the module diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_string.py Python+Unicode/Lib/test/test_string.py --- CVS-Python/Lib/test/test_string.py Sat Mar 11 10:52:43 2000 +++ Python+Unicode/Lib/test/test_string.py Mon Mar 13 10:12:46 2000 @@ -143,6 +143,7 @@ test('translate', 'xyz', 'xyz', table) test('replace', 'one!two!three!', 'one at two!three!', '!', '@', 1) +test('replace', 'one!two!three!', 'onetwothree', '!', '') test('replace', 'one!two!three!', 'one at two@three!', '!', '@', 2) test('replace', 'one!two!three!', 'one at two@three@', '!', '@', 3) test('replace', 'one!two!three!', 'one at two@three@', '!', '@', 4) diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_unicode.py Python+Unicode/Lib/test/test_unicode.py --- CVS-Python/Lib/test/test_unicode.py Fri Mar 17 15:24:31 2000 +++ Python+Unicode/Lib/test/test_unicode.py Mon Mar 13 10:13:05 2000 @@ -108,6 +108,7 @@ test('translate', u'xyz', u'xyz', table) test('replace', u'one!two!three!', u'one at two!three!', u'!', u'@', 1) +test('replace', u'one!two!three!', u'onetwothree', '!', '') test('replace', u'one!two!three!', u'one at two@three!', u'!', u'@', 2) test('replace', u'one!two!three!', u'one at two@three@', u'!', u'@', 3) test('replace', u'one!two!three!', u'one at two@three@', u'!', u'@', 4) diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Misc/unicode.txt Python+Unicode/Misc/unicode.txt --- CVS-Python/Misc/unicode.txt Sat Mar 11 00:14:11 2000 +++ Python+Unicode/Misc/unicode.txt Fri Mar 17 16:55:11 2000 @@ -743,8 +743,9 @@ stream codecs as available through the codecs module should be used. -XXX There should be a short-cut open(filename,mode,encoding) available which - also assures that mode contains the 'b' character when needed. +The codecs module should provide a short-cut open(filename,mode,encoding) +available which also assures that mode contains the 'b' character when +needed. File/Stream Input: @@ -810,6 +811,10 @@ Introduction to Unicode (a little outdated by still nice to read): http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html +For comparison: + Introducing Unicode to ECMAScript -- + http://www-4.ibm.com/software/developer/library/internationalization-support.html + Encodings: Overview: @@ -832,7 +837,7 @@ History of this Proposal: ------------------------- -1.2: +1.2: Removed POD about codecs.open() 1.1: Added note about comparisons and hash values. Added note about case mapping algorithms. Changed stream codecs .read() and .write() method to match the standard file-like object methods diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Modules/stropmodule.c Python+Unicode/Modules/stropmodule.c --- CVS-Python/Modules/stropmodule.c Wed Mar 1 10:22:53 2000 +++ Python+Unicode/Modules/stropmodule.c Mon Mar 13 14:33:23 2000 @@ -1054,7 +1054,7 @@ strstr replacement for arbitrary blocks of memory. - Locates the first occurance in the memory pointed to by MEM of the + Locates the first occurrence in the memory pointed to by MEM of the contents of memory pointed to by PAT. Returns the index into MEM if found, or -1 if not found. If len of PAT is greater than length of MEM, the function returns -1. diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/stringobject.c Python+Unicode/Objects/stringobject.c --- CVS-Python/Objects/stringobject.c Tue Mar 14 00:14:17 2000 +++ Python+Unicode/Objects/stringobject.c Mon Mar 13 14:33:24 2000 @@ -1395,7 +1395,7 @@ strstr replacement for arbitrary blocks of memory. - Locates the first occurance in the memory pointed to by MEM of the + Locates the first occurrence in the memory pointed to by MEM of the contents of memory pointed to by PAT. Returns the index into MEM if found, or -1 if not found. If len of PAT is greater than length of MEM, the function returns -1. @@ -1578,7 +1578,7 @@ return NULL; if (sub_len <= 0) { - PyErr_SetString(PyExc_ValueError, "empty replacement string"); + PyErr_SetString(PyExc_ValueError, "empty pattern string"); return NULL; } new_s = mymemreplace(str,len,sub,sub_len,repl,repl_len,count,&out_len); Only in CVS-Python/Objects: stringobject.c.orig diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/unicodeobject.c Python+Unicode/Objects/unicodeobject.c --- CVS-Python/Objects/unicodeobject.c Tue Mar 14 00:14:17 2000 +++ Python+Unicode/Objects/unicodeobject.c Wed Mar 15 10:49:19 2000 @@ -83,7 +83,7 @@ all objects on the free list having a size less than this limit. This reduces malloc() overhead for small Unicode objects. - At worse this will result in MAX_UNICODE_FREELIST_SIZE * + At worst this will result in MAX_UNICODE_FREELIST_SIZE * (sizeof(PyUnicodeObject) + STAYALIVE_SIZE_LIMIT + malloc()-overhead) bytes of unused garbage. @@ -180,7 +180,7 @@ unicode_freelist = *(PyUnicodeObject **)unicode_freelist; unicode_freelist_size--; unicode->ob_type = &PyUnicode_Type; - _Py_NewReference(unicode); + _Py_NewReference((PyObject *)unicode); if (unicode->str) { if (unicode->length < length && _PyUnicode_Resize(unicode, length)) { @@ -199,16 +199,19 @@ unicode->str = PyMem_NEW(Py_UNICODE, length + 1); } - if (!unicode->str) { - PyMem_DEL(unicode); - PyErr_NoMemory(); - return NULL; - } + if (!unicode->str) + goto onError; unicode->str[length] = 0; unicode->length = length; unicode->hash = -1; unicode->utf8str = NULL; return unicode; + + onError: + _Py_ForgetReference((PyObject *)unicode); + PyMem_DEL(unicode); + PyErr_NoMemory(); + return NULL; } static @@ -224,7 +227,6 @@ *(PyUnicodeObject **)unicode = unicode_freelist; unicode_freelist = unicode; unicode_freelist_size++; - _Py_ForgetReference(unicode); } else { free(unicode->str); @@ -489,7 +491,7 @@ } else { PyErr_Format(PyExc_ValueError, - "UTF-8 decoding error; unkown error handling code: %s", + "UTF-8 decoding error; unknown error handling code: %s", errors); return -1; } @@ -611,7 +613,7 @@ else { PyErr_Format(PyExc_ValueError, "UTF-8 encoding error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } @@ -733,7 +735,7 @@ } else { PyErr_Format(PyExc_ValueError, - "UTF-16 decoding error; unkown error handling code: %s", + "UTF-16 decoding error; unknown error handling code: %s", errors); return -1; } @@ -921,7 +923,7 @@ else { PyErr_Format(PyExc_ValueError, "Unicode-Escape decoding error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } @@ -1051,6 +1053,10 @@ */ +static const Py_UNICODE *findchar(const Py_UNICODE *s, + int size, + Py_UNICODE ch); + static PyObject *unicodeescape_string(const Py_UNICODE *s, int size, @@ -1069,9 +1075,6 @@ p = q = PyString_AS_STRING(repr); if (quotes) { - static const Py_UNICODE *findchar(const Py_UNICODE *s, - int size, - Py_UNICODE ch); *p++ = 'u'; *p++ = (findchar(s, size, '\'') && !findchar(s, size, '"')) ? '"' : '\''; @@ -1298,7 +1301,7 @@ else { PyErr_Format(PyExc_ValueError, "Latin-1 encoding error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } @@ -1369,7 +1372,7 @@ else { PyErr_Format(PyExc_ValueError, "ASCII decoding error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } @@ -1431,7 +1434,7 @@ else { PyErr_Format(PyExc_ValueError, "ASCII encoding error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } @@ -1502,7 +1505,7 @@ else { PyErr_Format(PyExc_ValueError, "charmap decoding error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } @@ -1618,7 +1621,7 @@ else { PyErr_Format(PyExc_ValueError, "charmap encoding error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } @@ -1750,7 +1753,7 @@ else { PyErr_Format(PyExc_ValueError, "translate error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Python/codecs.c Python+Unicode/Python/codecs.c --- CVS-Python/Python/codecs.c Fri Mar 10 23:57:27 2000 +++ Python+Unicode/Python/codecs.c Wed Mar 15 11:27:54 2000 @@ -93,9 +93,14 @@ PyObject *_PyCodec_Lookup(const char *encoding) { - PyObject *result, *args = NULL, *v; + PyObject *result, *args = NULL, *v = NULL; int i, len; + if (_PyCodec_SearchCache == NULL || _PyCodec_SearchPath == NULL) { + PyErr_SetString(PyExc_SystemError, + "codec module not properly initialized"); + goto onError; + } if (!import_encodings_called) import_encodings(); @@ -109,6 +114,7 @@ result = PyDict_GetItem(_PyCodec_SearchCache, v); if (result != NULL) { Py_INCREF(result); + Py_DECREF(v); return result; } @@ -121,6 +127,7 @@ if (args == NULL) goto onError; PyTuple_SET_ITEM(args,0,v); + v = NULL; for (i = 0; i < len; i++) { PyObject *func; @@ -146,7 +153,7 @@ if (i == len) { /* XXX Perhaps we should cache misses too ? */ PyErr_SetString(PyExc_LookupError, - "unkown encoding"); + "unknown encoding"); goto onError; } @@ -156,6 +163,7 @@ return result; onError: + Py_XDECREF(v); Py_XDECREF(args); return NULL; } @@ -378,5 +386,7 @@ void _PyCodecRegistry_Fini() { Py_XDECREF(_PyCodec_SearchPath); + _PyCodec_SearchPath = NULL; Py_XDECREF(_PyCodec_SearchCache); + _PyCodec_SearchCache = NULL; } From bwarsaw at cnri.reston.va.us Fri Mar 17 20:16:02 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 17 Mar 2000 14:16:02 -0500 (EST) Subject: [Python-Dev] Unicode Update 2000-03-17 References: <38D27F33.4055A942@lemburg.com> Message-ID: <14546.33906.771022.916209@anthem.cnri.reston.va.us> >>>>> "M" == M writes: M> The patch is against the current CVS version. I would M> appreciate if someone with CVS checkin permissions could check M> the changes in. Hi MAL, I just tried to apply your patch against the tree, however patch complains that the Lib/codecs.py patch is reversed. I haven't looked closely at it, but do you have any ideas? Or why don't you just send me Lib/codecs.py and I'll drop it in place. Everything else patched cleanly. -Barry From ping at lfw.org Fri Mar 17 15:06:13 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 08:06:13 -0600 (CST) Subject: [Python-Dev] Boolean type for Py3K? Message-ID: I wondered to myself today while reading through the Python tutorial whether it would be a good idea to have a separate boolean type in Py3K. Would this help catch common mistakes? I won't presume to truly understand the new-to-Python experience, but one might *guess* that >>> 5 > 3 true would make a little more sense to a beginner than >>> 5 > 3 1 Of course this means introducing "true" and "false" as keywords (or built-in values like None -- perhaps they should be spelled True and False?) and completely changing the way a lot of code runs by introducing a bunch of type checking, so it may be too radical a change, but -- And i don't know if it's already been discussed a lot, but -- I thought it wouldn't hurt just to raise the question. -- ?!ng From ping at lfw.org Fri Mar 17 15:06:55 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 08:06:55 -0600 (CST) Subject: [Python-Dev] Should None be a keyword? Message-ID: Related to my last message: should None become a keyword in Py3K? -- ?!ng From bwarsaw at cnri.reston.va.us Fri Mar 17 21:49:24 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 17 Mar 2000 15:49:24 -0500 (EST) Subject: [Python-Dev] Boolean type for Py3K? References: Message-ID: <14546.39508.312796.221069@anthem.cnri.reston.va.us> >>>>> "KY" == Ka-Ping Yee writes: KY> I wondered to myself today while reading through the Python KY> tutorial whether it would be a good idea to have a separate KY> boolean type in Py3K. Would this help catch common mistakes? Almost a year ago, I mused about a boolean type in c.l.py, and came up with this prototype in Python. -------------------- snip snip -------------------- class Boolean: def __init__(self, flag=0): self.__flag = not not flag def __str__(self): return self.__flag and 'true' or 'false' def __repr__(self): return self.__str__() def __nonzero__(self): return self.__flag == 1 def __cmp__(self, other): if (self.__flag and other) or (not self.__flag and not other): return 0 else: return 1 def __rcmp__(self, other): return -self.__cmp__(other) true = Boolean(1) false = Boolean() -------------------- snip snip -------------------- I think it makes sense to augment Python's current truth rules with a built-in boolean type and True and False values. But unless it's tied in more deeply (e.g. comparisons return one of these instead of integers -- and what are the implications of that?) then it's pretty much just syntactic sugar <0.75 lick>. -Barry From bwarsaw at cnri.reston.va.us Fri Mar 17 21:50:00 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 17 Mar 2000 15:50:00 -0500 (EST) Subject: [Python-Dev] Should None be a keyword? References: Message-ID: <14546.39544.673335.378797@anthem.cnri.reston.va.us> >>>>> "KY" == Ka-Ping Yee writes: KY> Related to my last message: should None become a keyword in KY> Py3K? Why? Just to reserve it? -Barry From moshez at math.huji.ac.il Fri Mar 17 21:52:29 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Fri, 17 Mar 2000 22:52:29 +0200 (IST) Subject: [Python-Dev] Boolean type for Py3K? In-Reply-To: <14546.39508.312796.221069@anthem.cnri.reston.va.us> Message-ID: On Fri, 17 Mar 2000, Barry A. Warsaw wrote: > Almost a year ago, I mused about a boolean type in c.l.py, and came up > with this prototype in Python. Cool prototype! However, I think I have a problem with the proposed semantics: > def __cmp__(self, other): > if (self.__flag and other) or (not self.__flag and not other): > return 0 > else: > return 1 This means: true == 1 true == 2 But 1 != 2 I have some difficulty with == not being an equivalence relation... > I think it makes sense to augment Python's current truth rules with a > built-in boolean type and True and False values. Right on! Except for the built-in...why not have it like exceptions.py, Python code necessary for the interpreter? Languages which compile themselves are not unheard of > But unless it's tied > in more deeply (e.g. comparisons return one of these instead of > integers -- and what are the implications of that?) Breaking loads of horrible code. Unacceptable for the 1.x series, but perfectly fine in Py3K -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From effbot at telia.com Fri Mar 17 22:12:15 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 17 Mar 2000 22:12:15 +0100 Subject: [Python-Dev] Should None be a keyword? References: <14546.39544.673335.378797@anthem.cnri.reston.va.us> Message-ID: <004e01bf9055$79012000$34aab5d4@hagrid> Barry A. Warsaw wrote: > >>>>> "KY" == Ka-Ping Yee writes: > > KY> Related to my last message: should None become a keyword in > KY> Py3K? > > Why? Just to reserve it? to avoid stuff errors like: def foo(): result = None # two screenfuls of code None, a, b = mytuple # perlish unpacking which gives an interesting error on the first line, instead of a syntax error on the last. From guido at python.org Fri Mar 17 22:20:05 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 17 Mar 2000 16:20:05 -0500 Subject: [Python-Dev] Should None be a keyword? In-Reply-To: Your message of "Fri, 17 Mar 2000 08:06:55 CST." References: Message-ID: <200003172120.QAA09045@eric.cnri.reston.va.us> Yes. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Mar 17 22:20:36 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 17 Mar 2000 16:20:36 -0500 Subject: [Python-Dev] Boolean type for Py3K? In-Reply-To: Your message of "Fri, 17 Mar 2000 08:06:13 CST." References: Message-ID: <200003172120.QAA09115@eric.cnri.reston.va.us> Yes. True and False make sense. --Guido van Rossum (home page: http://www.python.org/~guido/) From pf at artcom-gmbh.de Fri Mar 17 22:17:06 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Fri, 17 Mar 2000 22:17:06 +0100 (MET) Subject: [Python-Dev] Should None be a keyword? In-Reply-To: <14546.39544.673335.378797@anthem.cnri.reston.va.us> from "Barry A. Warsaw" at "Mar 17, 2000 3:50: 0 pm" Message-ID: > >>>>> "KY" == Ka-Ping Yee writes: > > KY> Related to my last message: should None become a keyword in > KY> Py3K? Barry A. Warsaw schrieb: > Why? Just to reserve it? This is related to the general type checking discussion. IMO the suggested >>> 1 > 0 True wouldn't buy us much, as long the following behaviour stays in Py3K: >>> a = '2' ; b = 3 >>> a < b 0 >>> a > b 1 This is irritating to Newcomers (at least from rather short time experience as member of python-help)! And this is esspecially irritating, since you can't do >>> c = a + b Traceback (innermost last): File "", line 1, in ? TypeError: illegal argument type for built-in operation IMO this difference is far more difficult to catch for newcomers than the far more often discussed 5/3 == 1 behaviour. Have a nice weekend and don't forget to hunt for remaining bugs in Fred upcoming 1.5.2p2 docs ;-), Peter. -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From ping at lfw.org Fri Mar 17 16:53:38 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 09:53:38 -0600 (CST) Subject: [Python-Dev] list.shift() Message-ID: Has list.shift() been proposed? # pretend lists are implemented in Python and 'self' is a list def shift(self): item = self[0] del self[:1] return item This would make queues read nicely... use "append" and "pop" for a stack, "append" and "shift" for a queue. (This is while on the thought-train of "making built-in types do more, rather than introducing more special types", as you'll see in my next message.) -- ?!ng From gvanrossum at beopen.com Fri Mar 17 23:00:18 2000 From: gvanrossum at beopen.com (Guido van Rossum) Date: Fri, 17 Mar 2000 17:00:18 -0500 Subject: [Python-Dev] list.shift() References: Message-ID: <38D2AAF2.CFBF3A2@beopen.com> Ka-Ping Yee wrote: > > Has list.shift() been proposed? > > # pretend lists are implemented in Python and 'self' is a list > def shift(self): > item = self[0] > del self[:1] > return item > > This would make queues read nicely... use "append" and "pop" for > a stack, "append" and "shift" for a queue. > > (This is while on the thought-train of "making built-in types do > more, rather than introducing more special types", as you'll see > in my next message.) You can do this using list.pop(0). I don't think the name "shift" is very intuitive (smells of sh and Perl :-). Do we need a new function? --Guido From ping at lfw.org Fri Mar 17 17:08:37 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 10:08:37 -0600 (CST) Subject: [Python-Dev] Using lists as sets Message-ID: A different way to provide sets in Python, which occurred to me on Wednesday at Guido's talk in Mountain View (hi Guido!), is to just make lists work better. Someone asked Guido a question about the ugliness of using dicts in a certain way, and it was clear that what he wanted was a real set. Guido's objection to introducing more core data types is that it makes it more difficult to choose which data type to use, and opens the possibility of using entirely the wrong one -- a very well-taken point, i thought. (That recently-mentioned study of scripting vs. system language performance seems relevant here: a few of the C programs submitted were much *slower* than the ones in Python or Perl just because people had to choose and implement their own data structures, and so they were able to completely shoot themselves in both feet and lose a leg or two in the process.) So... Hypothesis: The only real reason people might want a separate set type, or have to use dicts as sets, is that linear search on a list is too slow. Therefore: All we have to do is speed up "in" on lists, and now we have a set type that is nice to read and write, and already has nice spellings for set semantics like "in". Implementation possibilities: + Whip up a hash table behind the scenes if "in" gets used a lot on a particular list and all its members are hashable. This makes "in" no longer O(n), which is most of the battle. remove() can also be cheap -- though you have to do a little more bookkeeping to take care of multiple copies of elements. + Or, add a couple of methods, e.g. take() appends an item to a list if it's not there already, drop() removes all copies of an item from a list. These tip us off: the first time one of these methods gets used, we make the hash table then. I think the semantics would be pretty understandable and simple to explain, which is the main thing. Any thoughts? -- ?!ng From ping at lfw.org Fri Mar 17 17:12:22 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 10:12:22 -0600 (CST) Subject: [Python-Dev] list.shift() In-Reply-To: <38D2AAF2.CFBF3A2@beopen.com> Message-ID: On Fri, 17 Mar 2000, Guido van Rossum wrote: > You can do this using list.pop(0). I don't think the name "shift" is very > intuitive (smells of sh and Perl :-). Do we need a new function? Oh -- sorry, that's my ignorance showing. I didn't know pop() took an argument (of course it would -- duh...). No need to add anything more, then, i think. Sorry! Fred et al. on doc-sig: it would be really good for the tutorial to show a queue example and a stack example in the section where list methods are introduced. -- ?!ng From ping at lfw.org Fri Mar 17 17:13:44 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 10:13:44 -0600 (CST) Subject: [Python-Dev] Boolean type for Py3K? In-Reply-To: <200003172120.QAA09115@eric.cnri.reston.va.us> Message-ID: Guido: (re None being a keyword) > Yes. Guido: (re booleans) > Yes. True and False make sense. Astounding. I don't think i've ever seen such quick agreement on anything! And twice in one day! I'm think i'm going to go lie down. :) :) -- ?!ng From DavidA at ActiveState.com Fri Mar 17 23:23:53 2000 From: DavidA at ActiveState.com (David Ascher) Date: Fri, 17 Mar 2000 14:23:53 -0800 Subject: [Python-Dev] Using lists as sets In-Reply-To: Message-ID: > I think the semantics would be pretty understandable and simple to > explain, which is the main thing. > > Any thoughts? Would (a,b) in Set return true of (a,b) was a subset of Set, or if (a,b) was an element of Set? --david From mal at lemburg.com Fri Mar 17 23:41:46 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 17 Mar 2000 23:41:46 +0100 Subject: [Python-Dev] Boolean type for Py3K? References: <200003172120.QAA09115@eric.cnri.reston.va.us> Message-ID: <38D2B4AA.2EE933BD@lemburg.com> Guido van Rossum wrote: > > Yes. True and False make sense. mx.Tools defines these as new builtins... and they correspond to the C level singletons Py_True and Py_False. # Truth constants True = (1==1) False = (1==0) I'm not sure whether breaking the idiom of True == 1 and False == 0 (or in other words: truth values are integers) would be such a good idea. Nothing against adding name bindings in __builtins__ though... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From ping at lfw.org Fri Mar 17 17:53:12 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 10:53:12 -0600 (CST) Subject: [Python-Dev] Boolean type for Py3K? In-Reply-To: <14546.39508.312796.221069@anthem.cnri.reston.va.us> Message-ID: On Fri, 17 Mar 2000, Barry A. Warsaw wrote: > Almost a year ago, I mused about a boolean type in c.l.py, and came up > with this prototype in Python. > > -------------------- snip snip -------------------- > class Boolean: [...] > > I think it makes sense to augment Python's current truth rules with a > built-in boolean type and True and False values. But unless it's tied > in more deeply (e.g. comparisons return one of these instead of > integers -- and what are the implications of that?) then it's pretty > much just syntactic sugar <0.75 lick>. Yeah, and the whole point *is* the change in semantics, not the syntactic sugar. I'm hoping we can gain some safety from the type checking... though i can't seem to think of a good example off the top of my head. It's easier to think of examples if things like 'if', 'and', 'or', etc. only accept booleans as conditional arguments -- but i can't imagine going that far, as that would just be really annoying. Let's see. Specifically, the following would probably return booleans: magnitude comparisons: <, >, <=, >= (and __cmp__) value equality comparisons: ==, != identity comparisons: is, is not containment tests: in, not in (and __contains__) ... and booleans would be different from integers in that arithmetic would be illegal... but that's about it. (?) Booleans are still storable immutable values; they could be keys to dicts but not lists; i don't know what else. Maybe this wouldn't actually buy us anything except for the nicer spelling of "True" and "False", which might not be worth it. ... Hmm. Can anyone think of common cases where this could help? -- n!?g From ping at lfw.org Fri Mar 17 17:59:17 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 10:59:17 -0600 (CST) Subject: [Python-Dev] Using lists as sets In-Reply-To: Message-ID: On Fri, 17 Mar 2000, David Ascher wrote: > > I think the semantics would be pretty understandable and simple to > > explain, which is the main thing. > > > > Any thoughts? > > Would > > (a,b) in Set > > return true of (a,b) was a subset of Set, or if (a,b) was an element of Set? This would return true if (a, b) was an element of the set -- exactly the same semantics as we currently have for lists. Ideally it would also be kind of nice to use < > <= >= as subset/superset operators, but that requires revising the way we do comparisons, and you know, it might not really be used all that often anyway. -, |, and & could operate on lists sensibly when we use them as sets -- just define a few simple rules for ordering and you should be fine. e.g. c = a - b is equivalent to c = a for item in b: c.drop(item) c = a | b is equivalent to c = a for item in b: c.take(item) c = a & b is equivalent to c = [] for item in a: if item in b: c.take(item) where c.take(item) is equivalent to if item not in c: c.append(item) c.drop(item) is equivalent to while item in c: c.remove(item) The above is all just semantics, of course, to make the point that the semantics can be simple. The implementation could do different things that are much faster when there's a hash table helping out. -- ?!ng From gvwilson at nevex.com Sat Mar 18 00:28:05 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Fri, 17 Mar 2000 18:28:05 -0500 (EST) Subject: [Python-Dev] Boolean type for Py3K? In-Reply-To: Message-ID: > Guido: (re None being a keyword) > > Yes. > Guido: (re booleans) > > Yes. True and False make sense. > Ka-Ping: > Astounding. I don't think i've ever seen such quick agreement on > anything! And twice in one day! I'm think i'm going to go lie down. No, no, keep going --- you're on a roll. Greg From ping at lfw.org Fri Mar 17 18:49:18 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 11:49:18 -0600 (CST) Subject: [Python-Dev] Using lists as sets In-Reply-To: Message-ID: On Fri, 17 Mar 2000, Ka-Ping Yee wrote: > > c.take(item) is equivalent to > > if item not in c: c.append(item) > > c.drop(item) is equivalent to > > while item in c: c.remove(item) I think i've decided that i like the verb "include" much better than the rather vague word "take". Perhaps this also suggests "exclude" instead of "drop". -- ?!ng From klm at digicool.com Sat Mar 18 01:32:56 2000 From: klm at digicool.com (Ken Manheimer) Date: Fri, 17 Mar 2000 19:32:56 -0500 (EST) Subject: [Python-Dev] Using lists as sets In-Reply-To: Message-ID: On Fri, 17 Mar 2000, Ka-Ping Yee wrote: > On Fri, 17 Mar 2000, David Ascher wrote: > > > I think the semantics would be pretty understandable and simple to > > > explain, which is the main thing. > > > > > > Any thoughts? > > > > Would > > > > (a,b) in Set > > > > return true of (a,b) was a subset of Set, or if (a,b) was an element of Set? > > This would return true if (a, b) was an element of the set -- > exactly the same semantics as we currently have for lists. I really like the idea of using dynamically-tuned lists provide set functionality! I often wind up needing something like set functionality, and implementing little convenience routines (unique, difference, etc) repeatedly. I don't mind that so much, but the frequency signifies that i, at least, would benefit from built-in support for sets... I guess the question is whether it's practical to come up with a reasonably adequate, reasonably general dynamic optimization strategy. Seems like an interesting challenge - is there prior art? As ping says, maintaining the existing list semantics handily answers challenges like david's question. New methods, like [].subset('a', 'b'), could provide the desired additional functionality - and contribute to biasing the object towards set optimization, etc. Neato! Ken klm at digicool.com From ping at lfw.org Fri Mar 17 20:02:13 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 13:02:13 -0600 (CST) Subject: [Python-Dev] Using lists as sets In-Reply-To: Message-ID: On Fri, 17 Mar 2000, Ken Manheimer wrote: > > I really like the idea of using dynamically-tuned lists provide set > functionality! I often wind up needing something like set functionality, > and implementing little convenience routines (unique, difference, etc) > repeatedly. I don't mind that so much, but the frequency signifies that > i, at least, would benefit from built-in support for sets... Greg asked about how to ensure that a given item only appears once in each list when used as a set, and whether i would flag the list as "i'm now operating as a set". My answer is no -- i don't want there to be any visible state on the list. (It can internally decide to optimize its behaviour for a particular purpose, but in no event should this decision ever affect the semantics of its manifested behaviour.) Externally visible state puts us back right where we started -- now the user has to decide what type of thing she wants to use, and that's more decisions and loaded guns pointing at feet that we were trying to avoid in the first place. There's something very nice about there being just two mutable container types in Python. As Guido said, the first two types you learn are lists and dicts, and it's pretty obvious which one to pick for your purposes, and you can't really go wrong. I'd like to copy my reply to Greg here because it exposes some of the philosophy i'm attempting with this proposal: You'd trust the client to use take() (or should i say include()) instead of append(). But, in the end, this wouldn't make any difference to the result of "in". In fact, you could do multisets since lists already have count(). What i'm trying to do is to put together a few very simple pieces to get all the behaviour necessary to work with sets, if you want it. I don't want the object itself to have any state that manifests itself as "now i'm a set", or "now i'm a list". You just pick the methods you want to use. It's just like stacks and queues. There's no state on the list that says "now i'm a stack, so read from the end" or "now i'm a queue, so read from the front". You decide where you want to read items by picking the appropriate method, and this lets you get the best of both worlds -- flexibility and simplicity. Back to Ken: > I guess the question is whether it's practical to come up with a > reasonably adequate, reasonably general dynamic optimization strategy. > Seems like an interesting challenge - is there prior art? I'd be quite happy with just turning on set optimization when include() and exclude() get used (nice and predictable). Maybe you could provide a set() built-in that would construct you a list with set optimization turned on, but i'm not too sure if we really want to expose it that way. -- ?!ng From moshez at math.huji.ac.il Sat Mar 18 06:27:13 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 18 Mar 2000 07:27:13 +0200 (IST) Subject: [Python-Dev] list.shift() In-Reply-To: Message-ID: On Fri, 17 Mar 2000, Ka-Ping Yee wrote: > > Has list.shift() been proposed? > > # pretend lists are implemented in Python and 'self' is a list > def shift(self): > item = self[0] > del self[:1] > return item > > This would make queues read nicely... use "append" and "pop" for > a stack, "append" and "shift" for a queue. Actually, I once thought about writing a Deque in Python for a couple of hours (I later wrote it, and then threw it away because I had nothing to do with it, but that isn't my point). So I did write "shift" (though I'm certain I didn't call it that). It's not as easy to write a maintainable yet efficient "shift": I got stuck with a pointer to the beginning of the "real list" which I incremented on a "shift", and a complex heuristic for when lists de- and re-allocate. I think the tradeoffs are shaky enough that it is better to write it in pure Python rather then having more functions in C (whether in an old builtin type rather then a new one). Anyone needing to treat a list as a Deque would just construct one l = Deque(l) built-in-functions:-just-say-no-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From artcom0!pf at artcom-gmbh.de Fri Mar 17 23:43:35 2000 From: artcom0!pf at artcom-gmbh.de (artcom0!pf at artcom-gmbh.de) Date: Fri, 17 Mar 2000 23:43:35 +0100 (MET) Subject: [Python-Dev] dict.supplement() (was Re: list.shift()) In-Reply-To: <38D2AAF2.CFBF3A2@beopen.com> from Guido van Rossum at "Mar 17, 2000 5: 0:18 pm" Message-ID: Ka-Ping Yee wrote: [...] > > # pretend lists are implemented in Python and 'self' is a list > > def shift(self): > > item = self[0] > > del self[:1] > > return item [...] Guido van Rossum: > You can do this using list.pop(0). I don't think the name "shift" is very > intuitive (smells of sh and Perl :-). Do we need a new function? I think no. But what about this one?: # pretend self and dict are dictionaries: def supplement(self, dict): for k, v in dict.items(): if not self.data.has_key(k): self.data[k] = v Note the similarities to {}.update(dict), but update replaces existing entries in self, which is sometimes not desired. I know, that supplement can also simulated with: tmp = dict.copy() tmp.update(self) self.data = d But this is stll a little ugly. IMO a builtin method to supplement (complete?) a dictionary with default values from another dictionary would sometimes be a useful tool. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From ping at lfw.org Sat Mar 18 19:48:10 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sat, 18 Mar 2000 10:48:10 -0800 (PST) Subject: [Python-Dev] dict.supplement() (was Re: list.shift()) In-Reply-To: Message-ID: On Fri, 17 Mar 2000 artcom0!pf at artcom-gmbh.de wrote: > > I think no. But what about this one?: > > # pretend self and dict are dictionaries: > def supplement(self, dict): > for k, v in dict.items(): > if not self.data.has_key(k): > self.data[k] = v I'd go for that. It would be nice to have a non-overwriting update(). The only issue is the choice of verb; "supplement" sounds pretty reasonable to me. -- ?!ng "If I have not seen as far as others, it is because giants were standing on my shoulders." -- Hal Abelson From pf at artcom-gmbh.de Sat Mar 18 20:23:37 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Sat, 18 Mar 2000 20:23:37 +0100 (MET) Subject: [Python-Dev] dict.supplement() In-Reply-To: from Ka-Ping Yee at "Mar 18, 2000 10:48:10 am" Message-ID: Hi! > > # pretend self and dict are dictionaries: > > def supplement(self, dict): > > for k, v in dict.items(): > > if not self.data.has_key(k): > > self.data[k] = v Ka-Ping Yee schrieb: > I'd go for that. It would be nice to have a non-overwriting update(). > The only issue is the choice of verb; "supplement" sounds pretty > reasonable to me. In German we have the verb "erg?nzen" which translates either into "supplement" or "complete" (from my dictionary). "supplement" has the disadvantage of being rather long for the name of a builtin method. Nevertheless I've used this in my class derived from UserDict.UserDict. Now let's witch topic to the recent discussion about Set type: you all certainly know, that something similar has been done before by Aaron Watters? see: Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From gvwilson at nevex.com Mon Mar 20 15:52:12 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Mon, 20 Mar 2000 09:52:12 -0500 (EST) Subject: [Python-Dev] re: Using lists as sets Message-ID: [After discussion with Ping, and weekend thought] I would like to vote against using lists as sets: 1. It blurs Python's categorization of containers. The rest of the world thinks of sets as unordered, associative, and binary-valued (a term I just made up to mean "containing 0 or 1 instance of X"). Lists, on the other hand, are ordered, positionally-indexed, and multi-valued. While a list is always a legal queue or stack (although lists permit state transitions that are illegal for queues or stacks), most lists are not legal sets. 2. Python has, in dictionaries, a much more logical starting point for sets. A set is exactly a dictionary whose keys matter, and whose values don't. Adding operations to dictionaries to insert keys, etc., without having to supply a value, naively appears no harder than adding operations to lists, and would probably be much easier to explain when teaching a class. 3. (Long-term speculation) Even if P3K isn't written in C++, many modules for it will be. It would therefore seem sensible to design P3K in a C++-friendly way --- in particular, to align Python's container hierarchy with that used in the Standard Template Library. Using lists as a basis for sets would give Python a very different container type hierarchy than the STL, which could make it difficult for automatic tools like SWIG to map STL-based things to Python and vice versa. Using dictionaries as a basis for sets would seem to be less problematic. (Note that if Wadler et al's Generic Java proposal becomes part of that language, an STL clone will almost certainly become part of that language, and require JPython interfacing.) On a semi-related note, can someone explain why programs are not allowed to iterate directly through the elements of a dictionary: for (key, value) in dict: ...body... Thanks, Greg "No XML entities were harmed in the production of this message." From moshez at math.huji.ac.il Mon Mar 20 16:03:47 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Mon, 20 Mar 2000 17:03:47 +0200 (IST) Subject: [Python-Dev] re: Using lists as sets In-Reply-To: Message-ID: On Mon, 20 Mar 2000 gvwilson at nevex.com wrote: > [After discussion with Ping, and weekend thought] > > I would like to vote against using lists as sets: I'd like to object too, but for slightly different reasons: 20-something lines of Python can implement a set (I just chacked it) with the new __contains__. We can just suply it in the standard library (Set module?) and be over and done with. Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From jcw at equi4.com Mon Mar 20 16:37:19 2000 From: jcw at equi4.com (Jean-Claude Wippler) Date: Mon, 20 Mar 2000 16:37:19 +0100 Subject: [Python-Dev] re: Using lists as sets References: Message-ID: <38D645AF.661CA335@equi4.com> gvwilson at nevex.com wrote: > > [After discussion with Ping, and weekend thought] [good stuff] Allow me to offer yet another perspective on this. I'll keep it short. Python has sequences (indexable collections) and maps (associative collections). C++'s STL has vectors, sets, multi-sets, maps, and multi-maps. I find the distinction between these puzzling, and hereby offer another, somewhat relational-database minded, categorization as food for thought: - collections consist of objects, each of them with attributes - the first N attributes form the "key", the rest is the "residue" - there is also an implicit position attribute, which I'll call "#" - so an object consists of attributes: (K1,K2,...KN,#,R1,R2,...,RM) - one more bit of specification is needed: whether # is part of the key Let me mark the position between key attributes and residue with ":", so everything before the colon marks the uniquely identifying attributes. A vector (sequence) is: #:R1,R2,...,RM A set is: K1,K2,...KN: A multi-set is: K1,K2,...KN,#: A map is: K1,K2,...KN:#,R1,R2,...,RM A multi-map is: K1,K2,...KN,#:R1,R2,...,RM And a somewhat esoteric member of this classification: A singleton is: :R1,R2,...,RM I have no idea what this means for Python, but merely wanted to show how a relational, eh, "view" on all this might perhaps simplify the issues. -jcw From fdrake at acm.org Mon Mar 20 17:55:59 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 20 Mar 2000 11:55:59 -0500 (EST) Subject: [Python-Dev] dict.supplement() (was Re: list.shift()) In-Reply-To: References: <38D2AAF2.CFBF3A2@beopen.com> Message-ID: <14550.22559.550660.403909@weyr.cnri.reston.va.us> artcom0!pf at artcom-gmbh.de writes: > Note the similarities to {}.update(dict), but update replaces existing > entries in self, which is sometimes not desired. I know, that supplement > can also simulated with: Peter, I like this! > tmp = dict.copy() > tmp.update(self) > self.data = d I presume you mean "self.data = tmp"; "self.data.update(tmp)" would be just a little more robust, at the cost of an additional update. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From tismer at tismer.com Mon Mar 20 18:10:34 2000 From: tismer at tismer.com (Christian Tismer) Date: Mon, 20 Mar 2000 18:10:34 +0100 Subject: [Python-Dev] re: Using lists as sets References: <38D645AF.661CA335@equi4.com> Message-ID: <38D65B8A.50B81D08@tismer.com> Jean-Claude Wippler wrote: [relational notation] > A vector (sequence) is: #:R1,R2,...,RM > A set is: K1,K2,...KN: > A multi-set is: K1,K2,...KN,#: > A map is: K1,K2,...KN:#,R1,R2,...,RM > A multi-map is: K1,K2,...KN,#:R1,R2,...,RM This is a nice classification! To my understanding, why not A map is: K1,K2,...KN:R1,R2,...,RM Where is a # in a map? And what do you mean by N and M? Is K1..KN one key, mae up of N sub keys, or do you mean the whole set of keys, where each one is mapped somehow. I guess not, the notation looks like I should think of tuples. No, that would imply that N and M were fixed, but they are not. But you say "- collections consist of objects, each of them with attributes". Ok, N and M seem to be individual for each object, right? But when defining a map for instance, and we're talking of the objects, then the map is the set of these objects, and I have to think of K[0]..K(N(o)):R[0]..R(M(o)) where N and M are functions of the individual object o, right? Isn't it then better to think different of these objects, saying they can produce some key object and some value object of any shape, and a position, where each of these can be missing? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From jeremy at cnri.reston.va.us Mon Mar 20 18:28:28 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Mon, 20 Mar 2000 12:28:28 -0500 (EST) Subject: [Python-Dev] re: Using lists as sets In-Reply-To: References: Message-ID: <14550.24508.341533.908941@goon.cnri.reston.va.us> >>>>> "GVW" == gvwilson writes: GVW> On a semi-related note, can someone explain why programs are GVW> not allowed to iterate directly through the elements of a GVW> dictionary: GVW> for (key, value) in dict: ...body... Pythonic design rules #2: Explicit is better than implicit. There are at least three "natural" ways to interpret "for ... in dict:" In addition to the version that strikes you as most natural, some people also imagine that a for loop should iterate over the keys or the values. Instead of guessing, Python provides explicit methods for each possibility: items, keys, values. Yet another possibility, implemented in early versions of JPython and later removed, was to treat a dictionary exactly like a list: Call __getitem__(0), then 1, ..., until a KeyError was raised. In other words, a dictionary could behave like a list provided that it had integer keys. Jeremy From jcw at equi4.com Mon Mar 20 18:56:44 2000 From: jcw at equi4.com (Jean-Claude Wippler) Date: Mon, 20 Mar 2000 18:56:44 +0100 Subject: [Python-Dev] re: Using lists as sets References: <38D645AF.661CA335@equi4.com> <38D65B8A.50B81D08@tismer.com> Message-ID: <38D6665C.ECDE09DE@equi4.com> Christian, > A map is: K1,K2,...KN:R1,R2,...,RM Yes, my list was inconsistent. > Is K1..KN one key, made up of N sub keys, or do you mean the > whole set of keys, where each one is mapped somehow. [...] > Ok, N and M seem to be individual for each object, right? [...] > Isn't it then better to think different of these objects, saying > they can produce some key object and some value object of any > shape, and a position, where each of these can be missing? Depends on your perspective. In the relational world, the (K1,...,KN) attributes identify the object, but they are not themselves considered an object. In OO-land, (K1,...,KN) is an object, and a map takes such as an object as input and delivers (R1,...,RM) as result. This tension shows the boundary of both relational and OO models, IMO. I wish it'd be possible to unify them, but I haven't figured it out. -jcw, concept maverick / fool on the hill - pick one :) From pf at artcom-gmbh.de Mon Mar 20 19:28:17 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Mon, 20 Mar 2000 19:28:17 +0100 (MET) Subject: [Python-Dev] dict.supplement() (was Re: list.shift()) In-Reply-To: <14550.22559.550660.403909@weyr.cnri.reston.va.us> from "Fred L. Drake, Jr." at "Mar 20, 2000 11:55:59 am" Message-ID: I wrote: > > Note the similarities to {}.update(dict), but update replaces existing > > entries in self, which is sometimes not desired. I know, that supplement > > can also simulated with: > Fred L. Drake, Jr.: > Peter, > I like this! > > > tmp = dict.copy() > > tmp.update(self) > > self.data = d > > I presume you mean "self.data = tmp"; "self.data.update(tmp)" would > be just a little more robust, at the cost of an additional update. Ouppss... I should have tested this before posting. But currently I use the more explicit (and probably slower version) in my code: class ConfigDict(UserDict.UserDict): def supplement(self, defaults): for k, v in defaults.items(): if not self.data.has_key(k): self.data[k] = v Works fine so far, although it requires usually an additional copy operation. Consider another example, where arbitrary instance attributes should be specified as keyword arguments to the constructor: >>> class Example: ... _defaults = {'a': 1, 'b': 2} ... _config = _defaults ... def __init__(self, **kw): ... if kw: ... self._config = self._defaults.copy() ... self._config.update(kw) ... >>> A = Example(a=12345) >>> A._config {'b': 2, 'a': 12345} >>> B = Example(c=3) >>> B._config {'b': 2, 'c': 3, 'a': 1} If 'supplement' were a dictionary builtin method, this would become simply: kw.supplement(self._defaults) self._config = kw Unfortunately this can't be achieved using a wrapper class like UserDict, since the **kw argument is always a builtin dictionary object. Regards, Peter -- Peter Funk, Oldenburger Str.86, 27777 Ganderkesee, Tel: 04222 9502 70, Fax: -60 From ping at lfw.org Mon Mar 20 13:36:34 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 20 Mar 2000 06:36:34 -0600 (CST) Subject: [Python-Dev] dict.supplement() (was Re: list.shift()) In-Reply-To: Message-ID: On Mon, 20 Mar 2000, Peter Funk wrote: > Consider another example, where arbitrary instance attributes should be > specified as keyword arguments to the constructor: > > >>> class Example: > ... _defaults = {'a': 1, 'b': 2} > ... _config = _defaults > ... def __init__(self, **kw): > ... if kw: > ... self._config = self._defaults.copy() > ... self._config.update(kw) Yes! I do this all the time. I wrote a user-interface module to take care of exactly this kind of hassle when creating lots of UI components. When you're making UI, you can easily drown in keyword arguments and default values if you're not careful. -- ?!ng From fdrake at acm.org Mon Mar 20 20:02:48 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 20 Mar 2000 14:02:48 -0500 (EST) Subject: [Python-Dev] dict.supplement() (was Re: list.shift()) In-Reply-To: References: <14550.22559.550660.403909@weyr.cnri.reston.va.us> Message-ID: <14550.30168.129259.356581@weyr.cnri.reston.va.us> Peter Funk writes: > Ouppss... I should have tested this before posting. But currently I use > the more explicit (and probably slower version) in my code: The performance is based entirely on the size of each; in the (probably typical) case of smallish dictionaries (<50 entries), it's probably cheaper to use a temporary dict and do the update. For large dicts (on the defaults side), it may make more sense to reduce the number of objects that need to be created: target = ... has_key = target.has_key for key in defaults.keys(): if not has_key(key): target[key] = defaults[key] This saves the construction of len(defaults) 2-tuples. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From moshez at math.huji.ac.il Mon Mar 20 20:23:01 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Mon, 20 Mar 2000 21:23:01 +0200 (IST) Subject: [Python-Dev] re: Using lists as sets In-Reply-To: <14550.24508.341533.908941@goon.cnri.reston.va.us> Message-ID: On Mon, 20 Mar 2000, Jeremy Hylton wrote: > Yet another possibility, implemented in early versions of JPython and > later removed, was to treat a dictionary exactly like a list: Call > __getitem__(0), then 1, ..., until a KeyError was raised. In other > words, a dictionary could behave like a list provided that it had > integer keys. Two remarks: Jeremy meant "consecutive natural keys starting with 0", (yes, I've managed to learn mind-reading from the timbot) and that (the following is considered a misfeature): import UserDict a = UserDict.UserDict() a[0]="hello" a[1]="world" for word in a: print word Will print "hello", "world", and then die with KeyError. I realize why this is happening, and realize it could only be fixed in Py3K. However, a temporary (though not 100% backwards compatible) fix is that "for" will catch LookupError, rather then IndexError. Any comments? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mhammond at skippinet.com.au Mon Mar 20 20:39:31 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon, 20 Mar 2000 11:39:31 -0800 Subject: [Python-Dev] Unicode and Windows Message-ID: I would like to discuss Unicode on the Windows platform, and how it relates to MBCS that Windows uses. My main goal here is to ensure that Unicode on Windows can make a round-trip to and from native Unicode stores. As an example, let's take the registry - a Windows user should be able to read a Unicode value from the registry then write it back. The value written back should be _identical_ to the value read. Ditto for the file system: If the filesystem is Unicode, then I would expect the following code: for fname in os.listdir(): f = open(fname + ".tmp", "w") To create filenames on the filesystem with the exact base name even when the basename contains non-ascii characters. However, the Unicode patches do not appear to make this possible. open() uses PyArg_ParseTuple(args, "s..."); PyArg_ParseTuple() will automatically convert a Unicode object to UTF-8, so we end up passing a UTF-8 encoded string to the C runtime fopen function. The end result of all this is that we end up with UTF-8 encoded names in the registry/on the file system. It does not seem possible to get a true Unicode string onto either the file system or in the registry. Unfortunately, Im not experienced enough to know the full ramifications, but it _appears_ that on Windows the default "unicode to string" translation should be done via the WideCharToMultiByte() API. This will then pass an MBCS encoded ascii string to Windows, and the "right thing" should magically happen. Unfortunately, MBCS encoding is dependant on the current locale (ie, one MBCS sequence will mean completely different things depending on the locale). I dont see a portability issue here, as the documentation could state that "Unicode->ASCII conversions use the most appropriate conversion for the platform. If the platform is not Unicode aware, then UTF-8 will be used." This issue is the final one before I release the win32reg module. It seems _critical_ to me that if Python supports Unicode and the platform supports Unicode, then Python unicode values must be capable of being passed to the platform. For the win32reg module I could quite possibly hack around the problem, but the more general problem (categorized by the open() example above) still remains... Any thoughts? Mark. From jeremy at cnri.reston.va.us Mon Mar 20 20:51:28 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Mon, 20 Mar 2000 14:51:28 -0500 (EST) Subject: [Python-Dev] re: Using lists as sets In-Reply-To: References: <14550.24508.341533.908941@goon.cnri.reston.va.us> Message-ID: <14550.33088.110785.78631@goon.cnri.reston.va.us> >>>>> "MZ" == Moshe Zadka writes: MZ> On Mon, 20 Mar 2000, Jeremy Hylton wrote: >> Yet another possibility, implemented in early versions of JPython >> and later removed, was to treat a dictionary exactly like a list: >> Call __getitem__(0), then 1, ..., until a KeyError was raised. >> In other words, a dictionary could behave like a list provided >> that it had integer keys. MZ> Two remarks: Jeremy meant "consecutive natural keys starting MZ> with 0", (yes, I've managed to learn mind-reading from the MZ> timbot) I suppose I meant that (perhaps you can read my mind as well as I can); I also meant using values of Python's integer datatype :-). and that (the following is considered a misfeature): MZ> import UserDict MZ> a = UserDict.UserDict() MZ> a[0]="hello" MZ> a[1]="world" MZ> for word in a: print word MZ> Will print "hello", "world", and then die with KeyError. I MZ> realize why this is happening, and realize it could only be MZ> fixed in Py3K. However, a temporary (though not 100% backwards MZ> compatible) fix is that "for" will catch LookupError, rather MZ> then IndexError. I'm not sure what you mean by "fix." (Please read your mind for me .) I think by fix you mean, "allow the broken code above to execute without raising an exception." Yuck! As far as I can tell, the problem is caused by the special way that a for loop uses the __getitem__ protocol. There are two related issues that lead to confusion. In cases other than for loops, __getitem__ is invoked when the syntactic construct x[i] is used. This means either lookup in a list or in a dict depending on the type of x. If it is a list, the index must be an integer and IndexError can be raised. If it is a dict, the index can be anything (even an unhashable type; TypeError is only raised by insertion for this case) and KeyError can be raised. In a for loop, the same protocol (__getitem__) is used, but with the special convention that the object should be a sequence. Python will detect when you try to use a builtin type that is not a sequence, e.g. a dictionary. If the for loop iterates over an instance type rather than a builtin type, there is no way to check whether the __getitem__ protocol is being implemented by a sequence or a mapping. The right solution, I think, is to allow a means for stating explicitly whether a class with an __getitem__ method is a sequence or a mapping (or both?). Then UserDict can declare itself to be a mapping and using it in a for loop will raise the TypeError, "loop over non-sequence" (which has a standard meaning defined in Skip's catalog <0.8 wink>). I believe this is where types-vs.-classes meets subtyping-vs.-inheritance. I suspect that the right solution, circa Py3K, is that classes must explicitly state what types they are subtypes of or what interfaces they implement. Jeremy From moshez at math.huji.ac.il Mon Mar 20 21:13:20 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Mon, 20 Mar 2000 22:13:20 +0200 (IST) Subject: [Python-Dev] re: Using lists as sets In-Reply-To: <14550.33088.110785.78631@goon.cnri.reston.va.us> Message-ID: On Mon, 20 Mar 2000, Jeremy Hylton wrote: > I'm not sure what you mean by "fix." I mean any sane behaviour -- either failing on TypeError at the beginning, like "for" does, or executing without raising an exception. Raising an exception in the middle which is imminent is definitely (for the right values of definitely) a suprising behaviour (I know it suprised me!). >I think by fix you mean, "allow the broken code above to > execute without raising an exception." Yuck! I agree it is yucky -- it is all a weird echo of the yuckiness of the type/class dichotomy. What I suggested it a temporary patch... > As far as I can tell, the problem is caused by the special > way that a for loop uses the __getitem__ protocol. Well, my look is that it is caused by the fact __getitem__ is used both for the sequence protocol and the mapping protocol (well, I'm cheating through my teeth here, but you understand what I mean ) Agreed though, that the whole iteration protocol should be revisited -- but that is a subject for another post. > The right solution, I think, is to allow a means for stating > explicitly whether a class with an __getitem__ method is a sequence or > a mapping (or both?). And this is the fix I wanted for Py3K (details to be debated, still). See? You read my mind perfectly. > I suspect that the right solution, circa > Py3K, is that classes must explicitly state what types they are > subtypes of or what interfaces they implement. Exactly. And have subclassable built-in classes in the same fell swoop. getting-all-excited-for-py3k-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping at lfw.org Mon Mar 20 15:34:12 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 20 Mar 2000 08:34:12 -0600 (CST) Subject: [Python-Dev] Set options Message-ID: I think that at this point the possibilities for doing sets come down to four options: 1. use lists visible changes: new methods l.include, l.exclude invisible changes: faster 'in' usage: s = [1, 2], s.include(3), s.exclude(3), if item in s, for item in s 2. use dicts visible changes: for/if x in dict means keys accept dicts without values (e.g. {1, 2}) new special non-printing value ": Present" new method d.insert(x) means d[x] = Present invisible changes: none usage: s = {1, 2}, s.insert(3), del s[3], if item in s, for item in s 3. new type visible changes: set() built-in new with methods .insert, .remove invisible changes: none usage: s = set(1, 2), s.insert(3), s.remove(3) if item in s, for item in s 4. do nothing visible changes: none invisible changes: none usage: s = {1: 1, 2: 1}, s[3] = 1, del s[3], if s.has_key(item), for item in s.keys() Let me say a couple of things about #1 and #2. I'm happy with both. I quite like the idea of using dicts this way (#2), in fact -- i think it was the first idea i remember chatting about. If i remember correctly, Guido's objection to #2 was that "in" on a dictionary would work on the keys, which isn't consistent with the fact that "in" on a list works on the values. However, this doesn't really bother me at all. It's a very simple rule, especially when you think of how people understand dictionaries. If you hand someone a *real* dictionary, and ask them Is the word "python" in the dictionary? they'll go look up "python" in the *keys* of the dictionary (the words), not the values (the definitions). So i'm quite all right with saying for x in dict: and having that loop over the keys, or saying if x in dict: and having that check whether x is a valid key. It makes perfect sense to me. My main issue with #2 was that sets would print like {"Alice": 1, "Bob": 1, "Ted": 1} and this would look weird. However, as Greg explained to me, it would be possible to introduce a default value to go with set members that just says "i'm here", such as 'Present' (read as: "Alice" is present in the set) or 'Member' or even 'None', and this value wouldn't print out -- thus s = {"Bob"} s.include("Alice") print s would produce {"Alice", "Bob"} representing a dictionary that actually contained {"Alice": Present, "Bob": Present} You'd construct set constants like this too: {2, 4, 7} Using dicts this way (rather than having a separate set type that just happened to be spelled with {}) avoids the parsing issue: no need for look-ahead; you just toss in "Present" when the text doesn't supply a colon, and move on. I'd be okay with this, though i'm not sure everyone would; and together with Guido's initial objection, that's what motivated me to propose the lists-as-sets thing: fewer changes all around, no ambiguities introduced -- just two new methods, and we're done. Hmm. I know someone who's just learning Python. I will attempt to ask some questions about what she would find natural, and see if that reveals anything interesting. -- ?!ng From bwarsaw at cnri.reston.va.us Mon Mar 20 23:01:00 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Mon, 20 Mar 2000 17:01:00 -0500 (EST) Subject: [Python-Dev] re: Using lists as sets References: <14550.24508.341533.908941@goon.cnri.reston.va.us> <14550.33088.110785.78631@goon.cnri.reston.va.us> Message-ID: <14550.40860.72418.648591@anthem.cnri.reston.va.us> >>>>> "JH" == Jeremy Hylton writes: JH> As far as I can tell, the problem is caused by the special way JH> that a for loop uses the __getitem__ protocol. There are two JH> related issues that lead to confusion. >>>>> "MZ" == Moshe Zadka writes: MZ> Well, my look is that it is caused by the fact __getitem__ is MZ> used both for the sequence protocol and the mapping protocol Right. MZ> Agreed though, that the whole iteration protocol should be MZ> revisited -- but that is a subject for another post. Yup. JH> The right solution, I think, is to allow a means for stating JH> explicitly whether a class with an __getitem__ method is a JH> sequence or a mapping (or both?). Or should the two protocol use different method names (code breakage!). JH> I believe this is where types-vs.-classes meets JH> subtyping-vs.-inheritance. meets protocols-vs.-interfaces. From moshez at math.huji.ac.il Tue Mar 21 06:16:00 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 21 Mar 2000 07:16:00 +0200 (IST) Subject: [Python-Dev] re: Using lists as sets In-Reply-To: <14550.40860.72418.648591@anthem.cnri.reston.va.us> Message-ID: On Mon, 20 Mar 2000, Barry A. Warsaw wrote: > MZ> Agreed though, that the whole iteration protocol should be > MZ> revisited -- but that is a subject for another post. > > Yup. (Go Stackless, go!?) > JH> I believe this is where types-vs.-classes meets > JH> subtyping-vs.-inheritance. > > meets protocols-vs.-interfaces. It took me 5 minutes of intensive thinking just to understand what Barry meant. Just wait until we introduce Sather-like "supertypes" (which are pretty Pythonic, IMHO) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Tue Mar 21 06:21:24 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 21 Mar 2000 07:21:24 +0200 (IST) Subject: [Python-Dev] Set options In-Reply-To: Message-ID: On Mon, 20 Mar 2000, Ka-Ping Yee wrote: > I think that at this point the possibilities for doing sets > come down to four options: > > > 1. use lists > 2. use dicts > 3. new type > 4. do nothing 5. new Python module with a class "Set" (The issues are similar to #3, but this has the advantage of not changing the interpreter) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal at lemburg.com Tue Mar 21 01:25:09 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 21 Mar 2000 01:25:09 +0100 Subject: [Python-Dev] Unicode and Windows References: Message-ID: <38D6C165.EEF58232@lemburg.com> Mark Hammond wrote: > > I would like to discuss Unicode on the Windows platform, and how it relates > to MBCS that Windows uses. > > My main goal here is to ensure that Unicode on Windows can make a round-trip > to and from native Unicode stores. As an example, let's take the registry - > a Windows user should be able to read a Unicode value from the registry then > write it back. The value written back should be _identical_ to the value > read. Ditto for the file system: If the filesystem is Unicode, then I would > expect the following code: > for fname in os.listdir(): > f = open(fname + ".tmp", "w") > > To create filenames on the filesystem with the exact base name even when the > basename contains non-ascii characters. > > However, the Unicode patches do not appear to make this possible. open() > uses PyArg_ParseTuple(args, "s..."); PyArg_ParseTuple() will automatically > convert a Unicode object to UTF-8, so we end up passing a UTF-8 encoded > string to the C runtime fopen function. Right. The idea with open() was to write a special version (using #ifdefs) for use on Windows platforms which does all the needed magic to convert Unicode to whatever the native format and locale is... Using parser markers for this is obviously *not* the right way to get to the core of the problem. Basically, you will have to write a helper which takes a string, Unicode or some other "t" compatible object as name object and then converts it to the system's view of things. I think we had a private discussion about this a few months ago: there was some way to convert Unicode to a platform independent format which then got converted to MBCS -- don't remember the details though. > The end result of all this is that we end up with UTF-8 encoded names in the > registry/on the file system. It does not seem possible to get a true > Unicode string onto either the file system or in the registry. > > Unfortunately, Im not experienced enough to know the full ramifications, but > it _appears_ that on Windows the default "unicode to string" translation > should be done via the WideCharToMultiByte() API. This will then pass an > MBCS encoded ascii string to Windows, and the "right thing" should magically > happen. Unfortunately, MBCS encoding is dependant on the current locale > (ie, one MBCS sequence will mean completely different things depending on > the locale). I dont see a portability issue here, as the documentation > could state that "Unicode->ASCII conversions use the most appropriate > conversion for the platform. If the platform is not Unicode aware, then > UTF-8 will be used." No, no, no... :-) The default should be (and is) UTF-8 on all platforms -- whether the platform supports Unicode or not. If a platform uses a different encoding, an encoder should be used which applies the needed transformation. > This issue is the final one before I release the win32reg module. It seems > _critical_ to me that if Python supports Unicode and the platform supports > Unicode, then Python unicode values must be capable of being passed to the > platform. For the win32reg module I could quite possibly hack around the > problem, but the more general problem (categorized by the open() example > above) still remains... > > Any thoughts? Can't you use the wchar_t interfaces for the task (see the unicodeobject.h file for details) ? Perhaps you can first transfer Unicode to wchar_t and then on to MBCS using a win32 API ?! -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Tue Mar 21 10:27:56 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 21 Mar 2000 10:27:56 +0100 Subject: [Python-Dev] Set options References: Message-ID: <38D7409C.169B0C42@lemburg.com> Moshe Zadka wrote: > > On Mon, 20 Mar 2000, Ka-Ping Yee wrote: > > > I think that at this point the possibilities for doing sets > > come down to four options: > > > > > > 1. use lists > > 2. use dicts > > 3. new type > > 4. do nothing > > 5. new Python module with a class "Set" > (The issues are similar to #3, but this has the advantage of not changing > the interpreter) Perhaps someone could take Aaron's kjbuckets and write a Python emulation for it (I think he's even already done something like this for gadfly). Then the emulation could go into the core and if people want speed they can install his extension (the emulation would have to detect this and use the real thing then). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jack at oratrix.nl Tue Mar 21 12:54:30 2000 From: jack at oratrix.nl (Jack Jansen) Date: Tue, 21 Mar 2000 12:54:30 +0100 Subject: [Python-Dev] Unicode and Windows In-Reply-To: Message by "M.-A. Lemburg" , Tue, 21 Mar 2000 01:25:09 +0100 , <38D6C165.EEF58232@lemburg.com> Message-ID: <20000321115430.88A11370CF2@snelboot.oratrix.nl> I guess we need another format specifier than "s" here. "s" does the conversion to standard-python-utf8 for wide strings, and we'd need another format for conversion to current-local-os-convention-8-bit-encoding-of-unicode- strings. I assume that that would also come in handy for MacOS, where we'll have the same problem (filenames are in Apple's proprietary 8bit encoding). -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mal at lemburg.com Tue Mar 21 13:14:54 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 21 Mar 2000 13:14:54 +0100 Subject: [Python-Dev] Unicode and Windows References: <20000321115430.88A11370CF2@snelboot.oratrix.nl> Message-ID: <38D767BE.C45F8286@lemburg.com> Jack Jansen wrote: > > I guess we need another format specifier than "s" here. "s" does the > conversion to standard-python-utf8 for wide strings, Actually, "t" does the UTF-8 conversion... "s" will give you the raw internal UTF-16 representation in platform byte order. > and we'd need another > format for conversion to current-local-os-convention-8-bit-encoding-of-unicode- > strings. I'd suggest adding some king of generic PyOS_FilenameFromObject(PyObject *v, void *buffer, int buffer_len) API for the conversion of strings, Unicode and text buffers to an OS dependent filename buffer. And/or perhaps sepcific APIs for each OS... e.g. PyOS_MBCSFromObject() (only on WinXX) PyOS_AppleFromObject() (only on Mac ;) > I assume that that would also come in handy for MacOS, where we'll have the > same problem (filenames are in Apple's proprietary 8bit encoding). Is that encoding already supported by the encodings package ? If not, could you point me to a map file for the encoding ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake at acm.org Tue Mar 21 15:56:47 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 21 Mar 2000 09:56:47 -0500 (EST) Subject: [Python-Dev] Unicode and Windows In-Reply-To: <38D767BE.C45F8286@lemburg.com> References: <20000321115430.88A11370CF2@snelboot.oratrix.nl> <38D767BE.C45F8286@lemburg.com> Message-ID: <14551.36271.33825.841965@weyr.cnri.reston.va.us> M.-A. Lemburg writes: > And/or perhaps sepcific APIs for each OS... e.g. > > PyOS_MBCSFromObject() (only on WinXX) > PyOS_AppleFromObject() (only on Mac ;) Another approach may be to add some format modifiers: te -- text in an encoding specified by a C string (somewhat similar to O&) tE -- text, encoding specified by a Python object (probably a string passed as a parameter or stored from some other call) (I'd prefer the [eE] before the t, but the O modifiers follow, so consistency requires this ugly construct.) This brings up the issue of using a hidden conversion function which may create a new object that needs the same lifetime guarantees as the real parameters; we discussed this issue a month or two ago. Somewhere, there's a call context that includes the actual parameter tuple. PyArg_ParseTuple() could have access to a "scratch" area where it could place objects constructed during parameter parsing. This area could just be a hidden tuple. When the C call returns, the scratch area can be discarded. The difficulty is in giving PyArg_ParseTuple() access to the scratch area, but I don't know how hard that would be off the top of my head. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From jeremy at cnri.reston.va.us Tue Mar 21 18:14:07 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Tue, 21 Mar 2000 12:14:07 -0500 (EST) Subject: [Python-Dev] Set options In-Reply-To: <38D7409C.169B0C42@lemburg.com> References: <38D7409C.169B0C42@lemburg.com> Message-ID: <14551.44511.805860.808811@goon.cnri.reston.va.us> >>>>> "MAL" == M -A Lemburg writes: MAL> Perhaps someone could take Aaron's kjbuckets and write a Python MAL> emulation for it (I think he's even already done something like MAL> this for gadfly). Then the emulation could go into the core and MAL> if people want speed they can install his extension (the MAL> emulation would have to detect this and use the real thing MAL> then). I've been waiting for Tim Peters to say something about sets, but I'll chime in with what I recall him saying last time a discussion like this came up on c.l.py. (I may misremember, in which case I'll at least draw him into the discussion in order to correct me <0.5 wink>.) The problem with a set module is that there are a number of different ways to implement them -- in C using kjbuckets is one example. Each approach is appropriate for some applications, but not for every one. A set is pretty simple to build from a list or a dictionary, so we leave it to application writers to write the one that is appropriate for their application. Jeremy From skip at mojam.com Tue Mar 21 18:25:57 2000 From: skip at mojam.com (Skip Montanaro) Date: Tue, 21 Mar 2000 11:25:57 -0600 (CST) Subject: [Python-Dev] Set options In-Reply-To: <38D7409C.169B0C42@lemburg.com> References: <38D7409C.169B0C42@lemburg.com> Message-ID: <14551.45221.447838.534003@beluga.mojam.com> Marc> Perhaps someone could take Aaron's kjbuckets and write a Python Marc> emulation for it ... Any reason why kjbuckets and friends have never been placed in the core? If, as it seems from the discussion, a set type is a good thing to add to the core, it seems to me that Aaron's code would be a good candidate implementation/foundation. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From bwarsaw at cnri.reston.va.us Tue Mar 21 18:47:49 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 21 Mar 2000 12:47:49 -0500 (EST) Subject: [Python-Dev] Set options References: <38D7409C.169B0C42@lemburg.com> <14551.45221.447838.534003@beluga.mojam.com> Message-ID: <14551.46533.918688.13801@anthem.cnri.reston.va.us> >>>>> "SM" == Skip Montanaro writes: SM> Any reason why kjbuckets and friends have never been placed in SM> the core? If, as it seems from the discussion, a set type is SM> a good thing to add to the core, it seems to me that Aaron's SM> code would be a good candidate implementation/foundation. It would seem to me that distutils is a better way to go for kjbuckets. The core already has basic sets (via dictionaries). We're pretty much just quibbling about efficiency, API, and syntax, aren't we? -Barry From mhammond at skippinet.com.au Tue Mar 21 18:48:06 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue, 21 Mar 2000 09:48:06 -0800 Subject: [Python-Dev] Unicode and Windows In-Reply-To: <38D6C165.EEF58232@lemburg.com> Message-ID: > > Right. The idea with open() was to write a special version (using > #ifdefs) for use on Windows platforms which does all the needed > magic to convert Unicode to whatever the native format and locale > is... That works for open() - but what about other extension modules? This seems to imply that any Python extension on Windows that wants to pass a Unicode string to an external function can not use PyArg_ParseTuple() with anything other than "O", and perform the magic themselves. This just seems a little back-to-front to me. Platforms that have _no_ native Unicode support have useful utilities for working with Unicode. Platforms that _do_ have native Unicode support can not make use of these utilities. Is this by design, or simply a sad side-effect of the design? So - it is trivial to use Unicode on platforms that dont support it, but quite difficult on platforms that do. > Using parser markers for this is obviously *not* the right way > to get to the core of the problem. Basically, you will have to > write a helper which takes a string, Unicode or some other > "t" compatible object as name object and then converts it to > the system's view of things. Why "obviously"? What on earth does the existing mechamism buy me on Windows, other than grief that I can not use it? > I think we had a private discussion about this a few months ago: > there was some way to convert Unicode to a platform independent > format which then got converted to MBCS -- don't remember the details > though. There is a Win32 API function for this. However, as you succinctly pointed out, not many people are going to be aware of its name, or how to use the multitude of flags offered by these conversion functions, or know how to deal with the memory management, etc. > Can't you use the wchar_t interfaces for the task (see > the unicodeobject.h file for details) ? Perhaps you can > first transfer Unicode to wchar_t and then on to MBCS > using a win32 API ?! Sure - I can. But can everyone who writes interfaces to Unicode functions? You wrote the Python Unicode support but dont know its name - pity the poor Joe Average trying to write an extension. It seems to me that, on Windows, the Python Unicode support as it stands is really internal. I can not think of a single time that an extension writer on Windows would ever want to use the "t" markers - am I missing something? I dont believe that a single Unicode-aware function in the Windows extensions (of which there are _many_) could be changed to use the "t" markers. It still seems to me that the Unicode support works well on platforms with no Unicode support, and is fairly useless on platforms with the support. I dont believe that any extension on Windows would want to use the "t" marker - so, as Fred suggested, how about providing something for us that can help us interface to the platform's Unicode? This is getting too hard for me - I will release my windows registry module without Unicode support, and hope that in the future someone cares enough to address it, and to add a large number of LOC that will be needed simply to get Unicode talking to Unicode... Mark. From skip at mojam.com Tue Mar 21 19:04:11 2000 From: skip at mojam.com (Skip Montanaro) Date: Tue, 21 Mar 2000 12:04:11 -0600 (CST) Subject: [Python-Dev] Set options In-Reply-To: <14551.46533.918688.13801@anthem.cnri.reston.va.us> References: <38D7409C.169B0C42@lemburg.com> <14551.45221.447838.534003@beluga.mojam.com> <14551.46533.918688.13801@anthem.cnri.reston.va.us> Message-ID: <14551.47515.648064.969034@beluga.mojam.com> BAW> It would seem to me that distutils is a better way to go for BAW> kjbuckets. The core already has basic sets (via dictionaries). BAW> We're pretty much just quibbling about efficiency, API, and syntax, BAW> aren't we? Yes (though I would quibble with your use of the word "quibbling" ;-). If new syntax is in the offing as some have proposed, why not go for a more efficient implementation at the same time? I believe Aaron has maintained that kjbuckets is generally more efficient than Python's dictionary object. Skip From mal at lemburg.com Tue Mar 21 18:44:11 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 21 Mar 2000 18:44:11 +0100 Subject: [Python-Dev] Unicode and Windows References: <20000321115430.88A11370CF2@snelboot.oratrix.nl> <38D767BE.C45F8286@lemburg.com> <14551.36271.33825.841965@weyr.cnri.reston.va.us> Message-ID: <38D7B4EB.66DAEBF3@lemburg.com> "Fred L. Drake, Jr." wrote: > > M.-A. Lemburg writes: > > And/or perhaps sepcific APIs for each OS... e.g. > > > > PyOS_MBCSFromObject() (only on WinXX) > > PyOS_AppleFromObject() (only on Mac ;) > > Another approach may be to add some format modifiers: > > te -- text in an encoding specified by a C string (somewhat > similar to O&) > tE -- text, encoding specified by a Python object (probably a > string passed as a parameter or stored from some other > call) > > (I'd prefer the [eE] before the t, but the O modifiers follow, so > consistency requires this ugly construct.) > This brings up the issue of using a hidden conversion function which > may create a new object that needs the same lifetime guarantees as the > real parameters; we discussed this issue a month or two ago. > Somewhere, there's a call context that includes the actual parameter > tuple. PyArg_ParseTuple() could have access to a "scratch" area where > it could place objects constructed during parameter parsing. This > area could just be a hidden tuple. When the C call returns, the > scratch area can be discarded. > The difficulty is in giving PyArg_ParseTuple() access to the scratch > area, but I don't know how hard that would be off the top of my head. Some time ago, I considered adding "U+" with builtin auto-conversion to the tuple parser... after some discussion about the error handling issues involved with this I quickly dropped that idea again and used the standard "O" approach plus a call to a helper function which then applied the conversion. (Note the "+" behind "U": this was intended to indicate that the returned object has had the refcount incremented and that the caller must take care of decrementing it again.) The "O" + helper approach is a little clumsy, but works just fine. Plus it doesn't add any more overhead to the already convoluted PyArg_ParseTuple(). BTW, what other external char formats are we talking about ? E.g. how do you handle MBCS or DBCS under WinXX ? Are there routines to have wchar_t buffers converted into the two ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gmcm at hypernet.com Tue Mar 21 19:25:43 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Tue, 21 Mar 2000 13:25:43 -0500 Subject: [Python-Dev] Set options In-Reply-To: <14551.44511.805860.808811@goon.cnri.reston.va.us> References: <38D7409C.169B0C42@lemburg.com> Message-ID: <1258459347-36172889@hypernet.com> Jeremy wrote: > The problem with a set module is that there are a number of different > ways to implement them -- in C using kjbuckets is one example. Nah. Sets are pretty unambiguous. They're also easy, and boring. The interesting stuff is graphs and operations like composition, closure and transpositions. That's also where stuff gets ambiguous. E.g., what's the right behavior when you invert {'a':1,'b':1}? Hint: any answer you give will be met by the wrath of God. I would love this stuff, and as a faithful worshipper of Our Lady of Corrugated Ironism, I could probably live with whatever rules are arrived at; but I'm afraid I would have to considerably enlarge my kill file. - Gordon From gstein at lyra.org Tue Mar 21 19:40:20 2000 From: gstein at lyra.org (Greg Stein) Date: Tue, 21 Mar 2000 10:40:20 -0800 (PST) Subject: [Python-Dev] Set options In-Reply-To: <14551.44511.805860.808811@goon.cnri.reston.va.us> Message-ID: On Tue, 21 Mar 2000, Jeremy Hylton wrote: > >>>>> "MAL" == M -A Lemburg writes: > MAL> Perhaps someone could take Aaron's kjbuckets and write a Python > MAL> emulation for it (I think he's even already done something like > MAL> this for gadfly). Then the emulation could go into the core and > MAL> if people want speed they can install his extension (the > MAL> emulation would have to detect this and use the real thing > MAL> then). > > I've been waiting for Tim Peters to say something about sets, but I'll > chime in with what I recall him saying last time a discussion like > this came up on c.l.py. (I may misremember, in which case I'll at > least draw him into the discussion in order to correct me <0.5 wink>.) > > The problem with a set module is that there are a number of different > ways to implement them -- in C using kjbuckets is one example. Each > approach is appropriate for some applications, but not for every one. > A set is pretty simple to build from a list or a dictionary, so we > leave it to application writers to write the one that is appropriate > for their application. Yah... +1 on what Jeremy said. Leave them out of the distro since we can't do them Right for all people. Cheers, -g -- Greg Stein, http://www.lyra.org/ From moshez at math.huji.ac.il Tue Mar 21 19:34:56 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 21 Mar 2000 20:34:56 +0200 (IST) Subject: [Python-Dev] Set options In-Reply-To: <14551.47515.648064.969034@beluga.mojam.com> Message-ID: On Tue, 21 Mar 2000, Skip Montanaro wrote: > BAW> It would seem to me that distutils is a better way to go for > BAW> kjbuckets. The core already has basic sets (via dictionaries). > BAW> We're pretty much just quibbling about efficiency, API, and syntax, > BAW> aren't we? > > If new syntax is in the offing as some have proposed, FWIW, I'm against new syntax. The core-language has changed quite a lot between 1.5.2 and 1.6 -- * strings have grown methods * there are unicode strings * "in" operator overloadable The second change even includes a syntax change (u"some string") whose variants I'm still not familiar enough to comment on (ru"some\string"? ur"some\string"? Both legal?). I feel too many changes destabilize the language (this might seem a bit extreme, considering I pushed towards one of the changes), and we should try to improve on things other then the core -- one of these is a more hierarchical standard library, and a standard distribution mechanism, to rival CPAN -- then anyone could import data.sets.kjbuckets With only a trivial >>> import dist >>> dist.install("data.sets.kjbuckets") > why not go for a more efficient implementation at the same time? Because Python dicts are "pretty efficient", and it is not a trivial question to check optimiality in this area: tests can be rigged to prove almost anything with the right test-cases, and there's no promise we'll choose the "right ones". -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Tue Mar 21 19:38:02 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 21 Mar 2000 20:38:02 +0200 (IST) Subject: [Python-Dev] Set options In-Reply-To: <1258459347-36172889@hypernet.com> Message-ID: On Tue, 21 Mar 2000, Gordon McMillan wrote: > E.g., what's the right behavior when you > invert {'a':1,'b':1}? Hint: any answer you give will be met by the > wrath of God. Isn't "wrath of God" translated into Python is "an exception"? raise ValueError("dictionary is not 1-1") seems fine to me. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From skip at mojam.com Tue Mar 21 19:42:55 2000 From: skip at mojam.com (Skip Montanaro) Date: Tue, 21 Mar 2000 12:42:55 -0600 (CST) Subject: [Python-Dev] Set options In-Reply-To: References: <14551.47515.648064.969034@beluga.mojam.com> Message-ID: <14551.49839.377385.99637@beluga.mojam.com> Skip> If new syntax is in the offing as some have proposed, Moshe> FWIW, I'm against new syntax. The core-language has changed quite Moshe> a lot between 1.5.2 and 1.6 -- I thought we were talking about Py3K, where syntax changes are somewhat more expected. Just to make things clear, the syntax change I was referring to was the value-less dict syntax that someone proposed a few days ago: myset = {"a", "b", "c"} Note that I wasn't necessarily supporting the proposal, only acknowledging that it had been made. In general, I think we need to keep straight where people feel various proposals are going to fit. When a thread goes for more than a few messages it's easy to forget. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From ping at lfw.org Tue Mar 21 14:07:51 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 21 Mar 2000 07:07:51 -0600 (CST) Subject: [Python-Dev] Set options In-Reply-To: <14551.46533.918688.13801@anthem.cnri.reston.va.us> Message-ID: Jeremy Hylton wrote: > The problem with a set module is that there are a number of different > ways to implement them -- in C using kjbuckets is one example. Each > approach is appropriate for some applications, but not for every one. For me, anyway, this is not about trying to engineer a universally perfect solution into Python -- it's about providing some simple, basic, easy-to-understand functionality that takes care of the common case. For example, dictionaries are simple, their workings are easy enough to understand, and they aren't written to efficiently support things like inversion and composition because most of the time no one needs to do these things. The same holds true for sets. All i would want is something i can put things into, and take things out of, and ask about what's inside. Barry Warsaw wrote: > It would seem to me that distutils is a better way to go for > kjbuckets. The core already has basic sets (via dictionaries). We're > pretty much just quibbling about efficiency, API, and syntax, aren't we? Efficiency: Hashtables have proven quite adequate for dicts, so i think they're quite adequate for sets. API and syntax: I believe the goal is obvious, because Python already has very nice notation ("in", "not in") -- it just doesn't work quite the way one would want. It works semantically right on lists, but they're a little slow. It doesn't work on dicts, but we can make it so. Here is where my "explanation metric" comes into play. How much additional explaining do you have to do in each case to answer the question "what do i do when i need a set"? 1. Use lists. Explain that "include()" means "append if not already present", and "exclude()" means "remove if present". You are done. 2. Use dicts. Explain that "for x in dict" iterates over the keys, and "if x in dict" looks for a key. Explain what happens when you write "{1, 2, 3}", and the special non-printing value constant. Explain how to add elements to a set and remove elements from a set. 3. Create a new type. Explain that there exists another type "set" with methods "insert" and "remove". Explain how to construct sets. Explain how "in" and "not in" work, where this type fits in with the other types, and when to choose this type over other types. 4. Do nothing. Explain that dictionaries can be used as sets if you assign keys a dummy value, use "del" to remove keys, iterate over "dict.keys()", and use "dict.has_key()" to test membership. This is what motivated my proposal for using lists: it requires by far the least explanation. This is no surprise because a lot of things about lists have been explained already. My preference in terms of elegance is about equal for 1, 2, 3, with 4 distinctly behind; but my subjective ranking of "explanation complexity" (as in "how to get there from here") is 1 < 4 < 3 < 2. -- ?!ng From tismer at tismer.com Tue Mar 21 21:13:38 2000 From: tismer at tismer.com (Christian Tismer) Date: Tue, 21 Mar 2000 21:13:38 +0100 Subject: [Python-Dev] Unicode Database Compression Message-ID: <38D7D7F2.14A2FBB5@tismer.com> Hi, I have spent the last four days on compressing the Unicode database. With little decoding effort, I can bring the data down to 25kb. This would still be very fast, since codes are randomly accessible, although there are some simple shifts and masks. With a bit more effort, this can be squeezed down to 15kb by some more aggressive techniques like common prefix elimination. Speed would be *slightly* worse, since a small loop (average 8 cycles) is performed to obtain a character from a packed nybble. This is just all the data which is in Marc's unicodedatabase.c file. I checked efficiency by creating a delimited file like the original database text file with only these columns and ran PkZip over it. The result was 40kb. This says that I found a lot of correlations which automatic compressors cannot see. Now, before generating the final C code, I'd like to ask some questions: What is more desirable: Low compression and blinding speed? Or high compression and less speed, since we always want to unpack a whole code page? Then, what about the other database columns? There are a couple of extra atrributes which I find coded as switch statements elsewhere. Should I try to pack these codes into my squeezy database, too? And last: There are also two quite elaborated columns with textual descriptions of the codes (the uppercase blah version of character x). Do we want these at all? And if so, should I try to compress them as well? Should these perhaps go into a different source file as a dynamic module, since they will not be used so often? waiting for directives - ly y'rs - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From moshez at math.huji.ac.il Wed Mar 22 06:44:00 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 22 Mar 2000 07:44:00 +0200 (IST) Subject: [1.x] Re: [Python-Dev] Set options In-Reply-To: <14551.49839.377385.99637@beluga.mojam.com> Message-ID: On Tue, 21 Mar 2000, Skip Montanaro wrote: > Skip> If new syntax is in the offing as some have proposed, > > Moshe> FWIW, I'm against new syntax. The core-language has changed quite > Moshe> a lot between 1.5.2 and 1.6 -- > > I thought we were talking about Py3K My argument was strictly a 1.x argument. I'm hoping to get sets it in 1.7 or 1.8. > In general, I think we need to keep straight where people feel various > proposals are going to fit. You're right. I'll start prefixing my posts accordingally. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal at lemburg.com Wed Mar 22 11:11:25 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 22 Mar 2000 11:11:25 +0100 Subject: [Python-Dev] Re: Unicode Database Compression References: <38D7D7F2.14A2FBB5@tismer.com> Message-ID: <38D89C4D.370C19D@lemburg.com> Christian Tismer wrote: > > Hi, > > I have spent the last four days on compressing the > Unicode database. Cool :-) > With little decoding effort, I can bring the data down to 25kb. > This would still be very fast, since codes are randomly > accessible, although there are some simple shifts and masks. > > With a bit more effort, this can be squeezed down to 15kb > by some more aggressive techniques like common prefix > elimination. Speed would be *slightly* worse, since a small > loop (average 8 cycles) is performed to obtain a character > from a packed nybble. > > This is just all the data which is in Marc's unicodedatabase.c > file. I checked efficiency by creating a delimited file like > the original database text file with only these columns and > ran PkZip over it. The result was 40kb. This says that I found > a lot of correlations which automatic compressors cannot see. Not bad ;-) > Now, before generating the final C code, I'd like to ask some > questions: > > What is more desirable: Low compression and blinding speed? > Or high compression and less speed, since we always want to > unpack a whole code page? I'd say high speed and less compression. The reason is that the Asian codecs will need fast access to the database. With their large mapping tables size the few more kB don't hurt, I guess. > Then, what about the other database columns? > There are a couple of extra atrributes which I find coded > as switch statements elsewhere. Should I try to pack these > codes into my squeezy database, too? You basically only need to provide the APIs (and columns) defined in the unicodedata Python API, e.g. the character description column is not needed. > And last: There are also two quite elaborated columns with > textual descriptions of the codes (the uppercase blah version > of character x). Do we want these at all? And if so, should > I try to compress them as well? Should these perhaps go > into a different source file as a dynamic module, since they > will not be used so often? I guess you are talking about the "Unicode 1.0 Name" and the "10646 comment field" -- see above, there's no need to include these descriptions in the database... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Mar 22 12:04:32 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 22 Mar 2000 12:04:32 +0100 Subject: [Python-Dev] Unicode and Windows References: Message-ID: <38D8A8C0.66123F2C@lemburg.com> Mark Hammond wrote: > > > > > Right. The idea with open() was to write a special version (using > > #ifdefs) for use on Windows platforms which does all the needed > > magic to convert Unicode to whatever the native format and locale > > is... > > That works for open() - but what about other extension modules? > > This seems to imply that any Python extension on Windows that wants to pass > a Unicode string to an external function can not use PyArg_ParseTuple() with > anything other than "O", and perform the magic themselves. > > This just seems a little back-to-front to me. Platforms that have _no_ > native Unicode support have useful utilities for working with Unicode. > Platforms that _do_ have native Unicode support can not make use of these > utilities. Is this by design, or simply a sad side-effect of the design? > > So - it is trivial to use Unicode on platforms that dont support it, but > quite difficult on platforms that do. The problem is that Windows seems to use a completely different internal Unicode format than most of the rest of the world. As I've commented on in a different post, the only way to have PyArg_ParseTuple() perform auto-conversion is by allowing it to return objects which are garbage collected by the caller. The problem with this is error handling, since PyArg_ParseTuple() will have to keep track of all objects it created until the call returns successfully. An alternative approach is sketched below. Note that *all* platforms will have to use this approach... not only Windows or other platforms with Unicode support. > > Using parser markers for this is obviously *not* the right way > > to get to the core of the problem. Basically, you will have to > > write a helper which takes a string, Unicode or some other > > "t" compatible object as name object and then converts it to > > the system's view of things. > > Why "obviously"? What on earth does the existing mechamism buy me on > Windows, other than grief that I can not use it? Sure, you can :-) Just fetch the object, coerce it to Unicode and then encode it according to your platform needs (PyUnicode_FromObject() takes care of the coercion part for you). > > I think we had a private discussion about this a few months ago: > > there was some way to convert Unicode to a platform independent > > format which then got converted to MBCS -- don't remember the details > > though. > > There is a Win32 API function for this. However, as you succinctly pointed > out, not many people are going to be aware of its name, or how to use the > multitude of flags offered by these conversion functions, or know how to > deal with the memory management, etc. > > > Can't you use the wchar_t interfaces for the task (see > > the unicodeobject.h file for details) ? Perhaps you can > > first transfer Unicode to wchar_t and then on to MBCS > > using a win32 API ?! > > Sure - I can. But can everyone who writes interfaces to Unicode functions? > You wrote the Python Unicode support but dont know its name - pity the poor > Joe Average trying to write an extension. Hey, Mark... I'm not a Windows geek. How can I know which APIs are available and which of them to use ? And that's my point: add conversion APIs and codecs for the different OSes which make the extension writer life easier. > It seems to me that, on Windows, the Python Unicode support as it stands is > really internal. I can not think of a single time that an extension writer > on Windows would ever want to use the "t" markers - am I missing something? > I dont believe that a single Unicode-aware function in the Windows > extensions (of which there are _many_) could be changed to use the "t" > markers. "t" is intended to return a text representation of a buffer interface aware type... this happens to be UTF-8 for Unicode objects -- what other encoding would you have expected ? > It still seems to me that the Unicode support works well on platforms with > no Unicode support, and is fairly useless on platforms with the support. I > dont believe that any extension on Windows would want to use the "t" > marker - so, as Fred suggested, how about providing something for us that > can help us interface to the platform's Unicode? That's exactly what I'm talking about all the time... there currently are PyUnicode_AsWideChar() and PyUnicode_FromWideChar() to interface to the compiler's wchar_t type. I have no problem adding more of these APIs for the various OSes -- but they would have to be coded by someone with Unicode skills on each of those platforms, e.g. PyUnicode_AsMBCS() and PyUnicode_FromMBCS() on Windows. > This is getting too hard for me - I will release my windows registry module > without Unicode support, and hope that in the future someone cares enough to > address it, and to add a large number of LOC that will be needed simply to > get Unicode talking to Unicode... I think you're getting this wrong: I'm not argueing against adding better support for Windows. The only way I can think of using parser markers in this context would be by having PyArg_ParseTuple() *copy* data into a given data buffer rather than only passing a reference to it. This would enable PyArg_ParseTuple() to apply whatever conversion is needed while still keeping the temporary objects internal. Hmm, sketching a little: "es#",&encoding,&buffer,&buffer_len -- could mean: coerce the object to Unicode, then encode it using the given encoding and then copy at most buffer_len bytes of data into buffer and update buffer_len to the number of bytes copied This costs some cycles for copying data, but gets rid off the problems involved in cleaning up after errors. The caller will have to ensure that the buffer is large enough and that the encoding fits the application's needs. Error handling will be poor since the caller can't take any action other than to pass on the error generated by PyArg_ParseTuple(). Thoughts ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Mar 22 14:40:23 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 22 Mar 2000 14:40:23 +0100 Subject: [Python-Dev] Unicode and Windows References: <20000322113129.5E67C370CF2@snelboot.oratrix.nl> Message-ID: <38D8CD47.E573A246@lemburg.com> Jack Jansen wrote: > > > "es#",&encoding,&buffer,&buffer_len > > -- could mean: coerce the object to Unicode, then > > encode it using the given encoding and then > > copy at most buffer_len bytes of data into > > buffer and update buffer_len to the number of bytes > > copied > > This is a possible solution, but I think I would really prefer to also have > "eS", &encoding, &buffer_ptr > -- coerce the object to Unicode, then encode it using the given > encoding, malloc() a buffer to put the result in and return that. > > I don't mind doing something like > > { > char *filenamebuffer = NULL; > > if ( PyArg_ParseTuple(args, "eS", &macencoding, &filenamebuffer) > ... > open(filenamebuffer, ....); > PyMem_XDEL(filenamebuffer); > ... > } > > I think this would be much less error-prone than having fixed-length buffers > all over the place. PyArg_ParseTuple() should probably raise an error in case the data doesn't fit into the buffer. > And if this is indeed going to be used mainly in open() > calls and such the cost of the extra malloc()/free() is going to be dwarfed by > what the underlying OS call is going to use. Good point. You'll still need the buffer_len output parameter though -- otherwise you wouldn't be able tell the size of the allocated buffer (the returned data may not be terminated). How about this: "es#", &encoding, &buffer, &buffer_len -- both buffer and buffer_len are in/out parameters -- if **buffer is non-NULL, copy the data into it (at most buffer_len bytes) and update buffer_len on output; truncation produces an error -- if **buffer is NULL, malloc() a buffer of size buffer_len and return it through *buffer; if buffer_len is -1, the allocated buffer should be large enough to hold all data; again, truncation is an error -- apply coercion and encoding as described above (could be that I've got the '*'s wrong, but you get the picture...:) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jack at oratrix.nl Wed Mar 22 14:46:50 2000 From: jack at oratrix.nl (Jack Jansen) Date: Wed, 22 Mar 2000 14:46:50 +0100 Subject: [Python-Dev] Unicode and Windows In-Reply-To: Message by "M.-A. Lemburg" , Wed, 22 Mar 2000 14:40:23 +0100 , <38D8CD47.E573A246@lemburg.com> Message-ID: <20000322134650.ED1BD370CF2@snelboot.oratrix.nl> > > [on the user-supplies-buffer interface] > > I think this would be much less error-prone than having fixed-length buffers > > all over the place. > > PyArg_ParseTuple() should probably raise an error in case the > data doesn't fit into the buffer. Ah, that's right, that solves most of that problem. > > [on the malloced interface] > Good point. You'll still need the buffer_len output parameter > though -- otherwise you wouldn't be able tell the size of the > allocated buffer (the returned data may not be terminated). Are you sure? I would expect the "eS" format to be used to obtain 8-bit data in some local encoding, and I would expect that all 8-bit encodings of unicode data would still allow for null-termination. Or are there 8-bit encodings out there where a zero byte is normal occurrence and where it can't be used as terminator? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mal at lemburg.com Wed Mar 22 17:31:26 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 22 Mar 2000 17:31:26 +0100 Subject: [Python-Dev] Unicode and Windows References: <20000322134650.ED1BD370CF2@snelboot.oratrix.nl> Message-ID: <38D8F55E.6E324281@lemburg.com> Jack Jansen wrote: > > > > [on the user-supplies-buffer interface] > > > I think this would be much less error-prone than having fixed-length buffers > > > all over the place. > > > > PyArg_ParseTuple() should probably raise an error in case the > > data doesn't fit into the buffer. > > Ah, that's right, that solves most of that problem. > > > > [on the malloced interface] > > Good point. You'll still need the buffer_len output parameter > > though -- otherwise you wouldn't be able tell the size of the > > allocated buffer (the returned data may not be terminated). > > Are you sure? I would expect the "eS" format to be used to obtain 8-bit data > in some local encoding, and I would expect that all 8-bit encodings of unicode > data would still allow for null-termination. Or are there 8-bit encodings out > there where a zero byte is normal occurrence and where it can't be used as > terminator? Not sure whether these exist or not, but they are certainly a possibility to keep in mind. Perhaps adding "es#" and "es" (with 0-byte check) would be ideal ?! -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From pf at artcom-gmbh.de Wed Mar 22 17:54:42 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 22 Mar 2000 17:54:42 +0100 (MET) Subject: [Python-Dev] Nitpicking on UserList implementation Message-ID: Hi! Please have a look at the following method cited from Lib/UserList.py: def __radd__(self, other): if isinstance(other, UserList): # <-- ? return self.__class__(other.data + self.data) # <-- ? elif isinstance(other, type(self.data)): return self.__class__(other + self.data) else: return self.__class__(list(other) + self.data) The reference manual tells about the __r*__ methods: """These functions are only called if the left operand does not support the corresponding operation.""" So if the left operand is a UserList instance, it should always have a __add__ method, which will be called instead of the right operands __radd__. So I think the condition 'isinstance(other, UserList)' in __radd__ above will always evaluate to False and so the two lines marked with '# <-- ?' seem to be superfluous. But 'UserList' is so mature: Please tell me what I've oveerlooked before I make a fool of myself and submit a patch removing this two lines. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From gvwilson at nevex.com Thu Mar 23 18:10:16 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Thu, 23 Mar 2000 12:10:16 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods Message-ID: [The following passed the Ping test, so I'm posting it here] If None becomes a keyword, I would like to ask whether it could be used to signal that a method is a class method, as opposed to an instance method: class Ping: def __init__(self, arg): ...as usual... def method(self, arg): ...no change... def classMethod(None, arg): ...equivalent of C++ 'static'... p = Ping("thinks this is cool") # as always p.method("who am I to argue?") # as always Ping.classMethod("hey, cool!") # no 'self' p.classMethod("hey, cool!") # also selfless I'd also like to ask (separately) that assignment to None be defined as a no-op, so that programmers can write: year, month, None, None, None, None, weekday, None, None = gmtime(time()) instead of having to create throw-away variables to fill in slots in tuples that they don't care about. I think both behaviors are readable; the first provides genuinely new functionality, while I often found the second handy when I was doing logic programming. Greg From jim at digicool.com Thu Mar 23 18:18:29 2000 From: jim at digicool.com (Jim Fulton) Date: Thu, 23 Mar 2000 12:18:29 -0500 Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <38DA51E5.B39D3E7B@digicool.com> gvwilson at nevex.com wrote: > > [The following passed the Ping test, so I'm posting it here] > > If None becomes a keyword, I would like to ask whether it could be used to > signal that a method is a class method, as opposed to an instance method: > > class Ping: > > def __init__(self, arg): > ...as usual... > > def method(self, arg): > ...no change... > > def classMethod(None, arg): > ...equivalent of C++ 'static'... (snip) As a point of jargon, please lets call this thing a "static method" (or an instance function, or something) rather than a "class method". The distinction between "class methods" and "static methods" has been discussed at length in the types sig (over a year ago). If this proposal goes forward and the name "class method" is used, I'll have to argue strenuously, and I really don't want to do that. :] So, if you can live with the term "static method", you could save us alot of trouble by just saying "static method". Jim -- Jim Fulton mailto:jim at digicool.com Technical Director (888) 344-4332 Python Powered! Digital Creations http://www.digicool.com http://www.python.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From gvwilson at nevex.com Thu Mar 23 18:21:48 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Thu, 23 Mar 2000 12:21:48 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <38DA51E5.B39D3E7B@digicool.com> Message-ID: > As a point of jargon, please lets call this thing a "static method" > (or an instance function, or something) rather than a "class method". I'd call it a penguin if that was what it took to get something like this implemented... :-) greg From jim at digicool.com Thu Mar 23 18:28:25 2000 From: jim at digicool.com (Jim Fulton) Date: Thu, 23 Mar 2000 12:28:25 -0500 Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <38DA5439.F5FE8FE6@digicool.com> gvwilson at nevex.com wrote: > > > As a point of jargon, please lets call this thing a "static method" > > (or an instance function, or something) rather than a "class method". > > I'd call it a penguin if that was what it took to get something like this > implemented... :-) Thanks a great name. Let's go with penguin. :) Jim -- Jim Fulton mailto:jim at digicool.com Technical Director (888) 344-4332 Python Powered! Digital Creations http://www.digicool.com http://www.python.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From mhammond at skippinet.com.au Thu Mar 23 18:29:53 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 23 Mar 2000 09:29:53 -0800 Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: Message-ID: ... > If None becomes a keyword, I would like to ask whether it could be used to > signal that a method is a class method, as opposed to an instance method: > > def classMethod(None, arg): > ...equivalent of C++ 'static'... ... > I'd also like to ask (separately) that assignment to None be defined as a > no-op, so that programmers can write: > > year, month, None, None, None, None, weekday, None, None = > gmtime(time()) In the vernacular of a certain Mr Stein... +2 on both of these :-) [Although I do believe "static method" is a better name than "penguin" :-] Mark. From ping at lfw.org Thu Mar 23 18:47:47 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 23 Mar 2000 09:47:47 -0800 (PST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: Message-ID: On Thu, 23 Mar 2000 gvwilson at nevex.com wrote: > > If None becomes a keyword, I would like to ask whether it could be used to > signal that a method is a class method, as opposed to an instance method: > > class Ping: [...] Ack! I've been reduced to a class with just three methods. Oh well, i never really considered it a such a bad thing to be called "simple-minded". :) > def classMethod(None, arg): > ...equivalent of C++ 'static'... Yeah, i agree with Jim; you might as well call this a "static method" as opposed to a "class method". I like the way "None" is explicitly stated here, so there's no confusion about what the method does. (Without it, there's the question of whether the first argument will get thrown in, or what...) Hmm... i guess this also means one should ask what def function(None, arg): ... does outside a class definition. I suppose that should simply be illegal. > I'd also like to ask (separately) that assignment to None be defined as a > no-op, so that programmers can write: > > year, month, None, None, None, None, weekday, None, None = gmtime(time()) > > instead of having to create throw-away variables to fill in slots in > tuples that they don't care about. For what it's worth, i sometimes use "_" for this purpose (shades of Prolog!) but i can't make much of an argument for its readability... -- ?!ng I never dreamt that i would get to be The creature that i always meant to be But i thought, in spite of dreams, You'd be sitting somewhere here with me. From fdrake at acm.org Thu Mar 23 19:11:39 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 23 Mar 2000 13:11:39 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: References: Message-ID: <14554.24155.948286.451340@weyr.cnri.reston.va.us> gvwilson at nevex.com writes: > p.classMethod("hey, cool!") # also selfless This is the example that I haven't seen before (I'm not on the types-sig, so it may have been presented there), and I think this is what makes it interesting; a method in a module isn't quite sufficient here, since a subclass can override or extend the penguin this way. (Er, if we *do* go with penguin, does this mean it only works on Linux? ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From pf at artcom-gmbh.de Thu Mar 23 19:25:57 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Thu, 23 Mar 2000 19:25:57 +0100 (MET) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: from "gvwilson@nevex.com" at "Mar 23, 2000 12:10:16 pm" Message-ID: Hi! gvwilson at nevex.com: > I'd also like to ask (separately) that assignment to None be defined as a > no-op, so that programmers can write: > > year, month, None, None, None, None, weekday, None, None = gmtime(time()) You can already do this today with 1.5.2, if you use a 'del None' statement: Python 1.5.2 (#1, Jul 23 1999, 06:38:16) [GCC egcs-2.91.66 19990314/Linux (egcs- on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> from time import time, gmtime >>> year, month, None, None, None, None, weekday, None, None = gmtime(time()) >>> print year, month, None, weekday 2000 3 0 3 >>> del None >>> print year, month, None, weekday 2000 3 None 3 >>> if None will become a keyword in Py3K this pyidiom should better be written as year, month, None, None, None, None, ... = ... if sys.version[0] == '1': del None or try: del None except SyntaxError: pass # Wow running Py3K here! I wonder, how much existinng code the None --> keyword change would brake. Regards, Peter From paul at prescod.net Thu Mar 23 19:47:55 2000 From: paul at prescod.net (Paul Prescod) Date: Thu, 23 Mar 2000 10:47:55 -0800 Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <38DA66DB.635E8731@prescod.net> gvwilson at nevex.com wrote: > > [The following passed the Ping test, so I'm posting it here] > > If None becomes a keyword, I would like to ask whether it could be used to > signal that a method is a class method, as opposed to an instance method: +1 Idea is good, but I'm not really happy with any of the the proposed terminology...Python doesn't really have static anything. I would vote at the same time to make self a keyword and signal if the first argument is not one of None or self. Even now, one of my most common Python mistakes is in forgetting self. I expect it happens to anyone who shifts between other languages and Python. Why does None have an upper case "N"? Maybe the keyword version should be lower-case... -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "I and my companions suffer from a disease of the heart that can only be cured with gold", Hernan Cortes From bwarsaw at cnri.reston.va.us Thu Mar 23 19:57:00 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 23 Mar 2000 13:57:00 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <14554.26876.514559.320219@anthem.cnri.reston.va.us> >>>>> "gvwilson" == writes: gvwilson> If None becomes a keyword, I would like to ask whether gvwilson> it could be used to signal that a method is a class gvwilson> method, as opposed to an instance method: It still seems mildly weird that None would be a special kind of keyword, one that has a value and is used in ways that no other keyword is used. Greg gives an example, and here's a few more: def baddaboom(x, y, z=None): ... if z is None: ... try substituting `else' for `None' in these examples. ;) Putting that issue aside, Greg's suggestion for static method definitions is interesting. class Ping: # would this be a SyntaxError? def __init__(None, arg): ... def staticMethod(None, arg): ... p = Ping() Ping.staticMethod(p, 7) # TypeError Ping.staticMethod(7) # This is fine p.staticMethod(7) # So's this Ping.staticMethod(p) # and this !! -Barry From paul at prescod.net Thu Mar 23 19:52:25 2000 From: paul at prescod.net (Paul Prescod) Date: Thu, 23 Mar 2000 10:52:25 -0800 Subject: [Python-Dev] dir() Message-ID: <38DA67E9.AA593B7A@prescod.net> Can someone explain why dir(foo) does not return all of foo's methods? I know it's documented that way, I just don't know why it *is* that way. I'm also not clear why instances don't have auto-populated __methods__ and __members__ members? If there isn't a good reason (there probably is) then I would advocate that these functions and members should be more comprehensive. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "I and my companions suffer from a disease of the heart that can only be cured with gold", Hernan Cortes From bwarsaw at cnri.reston.va.us Thu Mar 23 20:00:57 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 23 Mar 2000 14:00:57 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <14554.27113.546575.170565@anthem.cnri.reston.va.us> >>>>> "PF" == Peter Funk writes: | try: | del None | except SyntaxError: | pass # Wow running Py3K here! I know how to break your Py3K code: stick None=None some where higher up :) PF> I wonder, how much existinng code the None --> keyword change PF> would brake. Me too. -Barry From gvwilson at nevex.com Thu Mar 23 20:01:06 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Thu, 23 Mar 2000 14:01:06 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <14554.26876.514559.320219@anthem.cnri.reston.va.us> Message-ID: > class Ping: > # would this be a SyntaxError? > def __init__(None, arg): > ... Absolutely a syntax error; ditto any of the other special names (e.g. __add__). Greg From akuchlin at mems-exchange.org Thu Mar 23 20:06:33 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Thu, 23 Mar 2000 14:06:33 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <14554.27113.546575.170565@anthem.cnri.reston.va.us> References: <14554.27113.546575.170565@anthem.cnri.reston.va.us> Message-ID: <14554.27449.69043.924322@amarok.cnri.reston.va.us> Barry A. Warsaw writes: >>>>>> "PF" == Peter Funk writes: > PF> I wonder, how much existinng code the None --> keyword change > PF> would brake. >Me too. I can't conceive of anyone using None as a function name or a variable name, except through a bug or thinking that 'None, useful, None = 1,2,3' works. Even though None isn't a fixed constant, it might as well be. How much C code have you see lately that starts with int function(void *NULL) ? Being able to do "None = 2" also smacks a bit of those legendary Fortran compilers that let you accidentally change 2 into 4. +1 on this change for Py3K, and I doubt it would cause breakage even if introduced into 1.x. -- A.M. Kuchling http://starship.python.net/crew/amk/ Principally I played pedants, idiots, old fathers, and drunkards. As you see, I had a narrow escape from becoming a professor. -- Robertson Davies, "Shakespeare over the Port" From paul at prescod.net Thu Mar 23 20:02:33 2000 From: paul at prescod.net (Paul Prescod) Date: Thu, 23 Mar 2000 11:02:33 -0800 Subject: [Python-Dev] Unicode character names Message-ID: <38DA6A49.A60E405B@prescod.net> Here's a feature I like from Perl's Unicode support: """ Support for interpolating named characters The new \N escape interpolates named characters within strings. For example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a unicode smiley face at the end. """ I get really tired of looking up the Unicode character for "ndash" or "right dagger". Does our Unicode database have enough information to make something like this possible? Obviously using the official (English) name is only really helpful for people who speak English, so we should not remove the numeric option. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "I and my companions suffer from a disease of the heart that can only be cured with gold", Hernan Cortes From tismer at tismer.com Thu Mar 23 20:27:53 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 23 Mar 2000 20:27:53 +0100 Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <38DA7039.B7CDC6FF@tismer.com> Mark Hammond wrote: > > ... > > If None becomes a keyword, I would like to ask whether it could be used to > > signal that a method is a class method, as opposed to an instance method: > > > > def classMethod(None, arg): > > ...equivalent of C++ 'static'... > ... > > > I'd also like to ask (separately) that assignment to None be defined as a > > no-op, so that programmers can write: > > > > year, month, None, None, None, None, weekday, None, None = > > gmtime(time()) > > In the vernacular of a certain Mr Stein... > > +2 on both of these :-) me 2, ?h 1.5... The assignment no-op seems to be ok. Having None as a place holder for static methods creates the problem that we loose compatibility with ordinary functions. What I would propose instead is: make the parameter name "self" mandatory for methods, and turn everything else into a static method. This does not change function semantics, but just the way the method binding works. > [Although I do believe "static method" is a better name than "penguin" :-] pynguin -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From gvwilson at nevex.com Thu Mar 23 20:33:47 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Thu, 23 Mar 2000 14:33:47 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <38DA7039.B7CDC6FF@tismer.com> Message-ID: Hi, Christian; thanks for your mail. > What I would propose instead is: > make the parameter name "self" mandatory for methods, and turn > everything else into a static method. In my experience, significant omissions (i.e. something being important because it is *not* there) often give beginners trouble. For example, in C++, you can't tell whether: int foo::bar(int bah) { return 0; } belongs to instances, or to the class as a whole, without referring back to the header file [1]. To quote the immortal Jeremy Hylton: Pythonic design rules #2: Explicit is better than implicit. Also, people often ask why 'self' is required as a method argument in Python, when it is not in C++ or Java; this proposal would (retroactively) answer that question... Greg [1] I know this isn't a problem in Java or Python; I'm just using it as an illustration. From skip at mojam.com Thu Mar 23 21:09:00 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 23 Mar 2000 14:09:00 -0600 (CST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <14554.27449.69043.924322@amarok.cnri.reston.va.us> References: <14554.27113.546575.170565@anthem.cnri.reston.va.us> <14554.27449.69043.924322@amarok.cnri.reston.va.us> Message-ID: <14554.31196.387213.472302@beluga.mojam.com> AMK> +1 on this change for Py3K, and I doubt it would cause breakage AMK> even if introduced into 1.x. Or if it did, it's probably code that's marginally broken already... -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From tismer at tismer.com Thu Mar 23 21:21:09 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 23 Mar 2000 21:21:09 +0100 Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <38DA7CB5.87D62E14@tismer.com> Yo, gvwilson at nevex.com wrote: > > Hi, Christian; thanks for your mail. > > > What I would propose instead is: > > make the parameter name "self" mandatory for methods, and turn > > everything else into a static method. > > In my experience, significant omissions (i.e. something being important > because it is *not* there) often give beginners trouble. For example, > in C++, you can't tell whether: > > int foo::bar(int bah) > { > return 0; > } > > belongs to instances, or to the class as a whole, without referring back > to the header file [1]. To quote the immortal Jeremy Hylton: > > Pythonic design rules #2: > Explicit is better than implicit. Sure. I am explicitly *not* using self if I want no self. :-) > Also, people often ask why 'self' is required as a method argument in > Python, when it is not in C++ or Java; this proposal would (retroactively) > answer that question... You prefer to use the explicit keyword None? How would you then deal with def outside(None, blah): pass # stuff I believe one answer about the explicit "self" is that it should be simple and compatible with ordinary functions. Guido had just to add the semantics that in methods the first parameter automatically binds to the instance. The None gives me a bit of trouble, but not much. What I would like to spell is ordinary functions (as it is now) functions which are instance methods (with the immortal self) functions which are static methods ??? functions which are class methods !!! Static methods can work either with the "1st param==None" rule or with the "1st paramname!=self" rule or whatever. But how would you do class methods, which IMHO should have their class passed in as first parameter? Do you see a clean syntax for this? I thought of some weirdness like def meth(self, ... def static(self=None, ... # eek def classm(self=class, ... # ahem but this breaks the rule of default argument order. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From akuchlin at mems-exchange.org Thu Mar 23 21:27:41 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Thu, 23 Mar 2000 15:27:41 -0500 (EST) Subject: [Python-Dev] Unicode character names In-Reply-To: <38DA6A49.A60E405B@prescod.net> References: <38DA6A49.A60E405B@prescod.net> Message-ID: <14554.32317.730574.967165@amarok.cnri.reston.va.us> Paul Prescod writes: >The new \N escape interpolates named characters within strings. For >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a >unicode smiley face at the end. Cute idea, and it certainly means you can avoid looking up Unicode numbers. (You can look up names instead. :) ) Note that this means the Unicode database is no longer optional if this is done; it has to be around at code-parsing time. Python could import it automatically, as exceptions.py is imported. Christian's work on compressing unicodedatabase.c is therefore really important. (Is Perl5.6 actually dragging around the Unicode database in the binary, or is it read out of some external file or data structure?) -- A.M. Kuchling http://starship.python.net/crew/amk/ About ten days later, it being the time of year when the National collected down and outs to walk on and understudy I arrived at the head office of the National Theatre in Aquinas Street in Waterloo. -- Tom Baker, in his autobiography From bwarsaw at cnri.reston.va.us Thu Mar 23 21:39:43 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 23 Mar 2000 15:39:43 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods References: <38DA7039.B7CDC6FF@tismer.com> Message-ID: <14554.33039.4390.591036@anthem.cnri.reston.va.us> >>>>> "gvwilson" == writes: gvwilson> belongs to instances, or to the class as a whole, gvwilson> without referring back to the header file [1]. To quote gvwilson> the immortal Jeremy Hylton: Not to take anything away from Jeremy, who has contributed some wonderfully Pythonic quotes of his own, but this one is taken from Tim Peters' Zen of Python http://www.python.org/doc/Humor.html#zen timbot-is-the-only-one-who's-gonna-outlive-his-current-chip-set- around-here-ly y'rs, -Barry From jeremy at cnri.reston.va.us Thu Mar 23 21:55:25 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Thu, 23 Mar 2000 15:55:25 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: References: <38DA7039.B7CDC6FF@tismer.com> Message-ID: <14554.33590.844200.145871@walden> >>>>> "GVW" == gvwilson writes: GVW> To quote the immortal Jeremy Hylton: GVW> Pythonic design rules #2: GVW> Explicit is better than implicit. I wish I could take credit for that :-). Tim Peters posted a list of 20 Pythonic theses to comp.lang.python under the title "The Python Way." I'll collect them all here in hopes of future readers mistaking me for Tim again . Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those! See http://x27.deja.com/getdoc.xp?AN=485548918&CONTEXT=953844380.1254555688&hitnum=9 for the full post. to-be-immortal-i'd-need-to-be-a-bot-ly y'rs Jeremy From jeremy at alum.mit.edu Thu Mar 23 22:01:01 2000 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 23 Mar 2000 16:01:01 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: References: Message-ID: <14554.34037.232728.670271@walden> >>>>> "GVW" == gvwilson writes: GVW> I'd also like to ask (separately) that assignment to None be GVW> defined as a no-op, so that programmers can write: GVW> year, month, None, None, None, None, weekday, None, None = GVW> gmtime(time()) GVW> instead of having to create throw-away variables to fill in GVW> slots in tuples that they don't care about. I think both GVW> behaviors are readable; the first provides genuinely new GVW> functionality, while I often found the second handy when I was GVW> doing logic programming. -1 on this proposal Pythonic design rule #8: Special cases aren't special enough to break the rules. I think it's confusing to have assignment mean pop the top of the stack for the special case that the name is None. If Py3K makes None a keyword, then it would also be the only keyword that can be used in an assignment. Finally, we'd need to explain to the rare newbie who used None as variable name why they assigned 12 to None but that it's value was its name when it was later referenced. (Think 'print None'.) When I need to ignore some of the return values, I use the name nil. year, month, nil, nil, nil, nil, weekday, nil, nil = gmtime(time()) I think that's just as clear, only a whisker less efficient, and requires no special cases. Heck, it's even less typing <0.5 wink>. Jeremy From gvwilson at nevex.com Thu Mar 23 21:59:41 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Thu, 23 Mar 2000 15:59:41 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <14554.33590.844200.145871@walden> Message-ID: > GVW> To quote the immortal Jeremy Hylton: > GVW> Pythonic design rules #2: > GVW> Explicit is better than implicit. > > I wish I could take credit for that :-). Tim Peters posted a list of > 20 Pythonic theses to comp.lang.python under the title "The Python > Way." Traceback (innermost last): File "", line 1, in ? AttributionError: insight incorrectly ascribed From paul at prescod.net Thu Mar 23 22:26:42 2000 From: paul at prescod.net (Paul Prescod) Date: Thu, 23 Mar 2000 13:26:42 -0800 Subject: [Python-Dev] None as a keyword / class methods References: <14554.34037.232728.670271@walden> Message-ID: <38DA8C12.DFFD63D5@prescod.net> Jeremy Hylton wrote: > > ... > year, month, nil, nil, nil, nil, weekday, nil, nil = gmtime(time()) So you're proposing nil as a new keyword? I like it. +2 -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "No, I'm not QUITE that stupid", Paul Prescod From pf at artcom-gmbh.de Thu Mar 23 22:46:49 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Thu, 23 Mar 2000 22:46:49 +0100 (MET) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <14554.27113.546575.170565@anthem.cnri.reston.va.us> from "Barry A. Warsaw" at "Mar 23, 2000 2: 0:57 pm" Message-ID: Hi Barry! > >>>>> "PF" == Peter Funk writes: > > | try: > | del None > | except SyntaxError: > | pass # Wow running Py3K here! Barry A. Warsaw: > I know how to break your Py3K code: stick None=None some where higher > up :) Hmm.... I must admit, that I don't understand your argument. In Python <= 1.5.2 'del None' works fine, iff it follows any assignment to None in the same scope regardless, whether there has been a None=None in the surrounding scope or in the same scope before this. Since something like 'del for' or 'del import' raises a SyntaxError exception in Py152, I expect 'del None' to raise the same exception in Py3K, after None has become a keyword. Right? Regards, Peter From andy at reportlab.com Thu Mar 23 22:54:23 2000 From: andy at reportlab.com (Andy Robinson) Date: Thu, 23 Mar 2000 21:54:23 GMT Subject: [Python-Dev] Unicode Character Names In-Reply-To: <20000323202533.ABDB31CEF8@dinsdale.python.org> References: <20000323202533.ABDB31CEF8@dinsdale.python.org> Message-ID: <38da90b4.756297@post.demon.co.uk> >Message: 20 >From: "Andrew M. Kuchling" >Date: Thu, 23 Mar 2000 15:27:41 -0500 (EST) >To: "python-dev at python.org" >Subject: Re: [Python-Dev] Unicode character names > >Paul Prescod writes: >>The new \N escape interpolates named characters within strings. For >>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a >>unicode smiley face at the end. > >Cute idea, and it certainly means you can avoid looking up Unicode >numbers. (You can look up names instead. :) ) Note that this means the >Unicode database is no longer optional if this is done; it has to be >around at code-parsing time. Python could import it automatically, as >exceptions.py is imported. Christian's work on compressing >unicodedatabase.c is therefore really important. (Is Perl5.6 actually >dragging around the Unicode database in the binary, or is it read out >of some external file or data structure?) I agree - the names are really useful. If you are doing conversion work, often you want to know what a character is, but don't have a complete Unicode font handy. Being able to get the description for a Unicode character is useful, as well as being able to use the description as a constructor for it. Also, there are some language specific things that might make it useful to have the full character descriptions in Christian's database. For example, we'll have an (optional, not in the standard library) Japanese module with functions like isHalfWidthKatakana(), isFullWidthKatakana() to help normalize things. Parsing the database and looking for strings in the descriptions is one way to build this - not the only one, but it might be useful. So I'd vote to put names in at first, and give us a few weeks to see how useful they are before a final decision. - Andy Robinson From paul at prescod.net Thu Mar 23 23:09:42 2000 From: paul at prescod.net (Paul Prescod) Date: Thu, 23 Mar 2000 14:09:42 -0800 Subject: [Python-Dev] Unicode character names References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us> Message-ID: <38DA9626.8B62DB77@prescod.net> "Andrew M. Kuchling" wrote: > > Paul Prescod writes: > >The new \N escape interpolates named characters within strings. For > >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a > >unicode smiley face at the end. > > Cute idea, and it certainly means you can avoid looking up Unicode > numbers. (You can look up names instead. :) ) More important, though, the code is "self documenting". You never have to go from the number back to the name. > Note that this means the > Unicode database is no longer optional if this is done; it has to be > around at code-parsing time. I don't like the idea enough to exclude support for small machines or anything like that. We should way the costs of requiring the Unicode database at compile time. > (Is Perl5.6 actually > dragging around the Unicode database in the binary, or is it read out > of some external file or data structure?) I have no idea. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "I and my companions suffer from a disease of the heart that can only be cured with gold", Hernan Cortes From pf at artcom-gmbh.de Thu Mar 23 23:12:25 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Thu, 23 Mar 2000 23:12:25 +0100 (MET) Subject: [Python-Dev] Py3K: True and False builtin or keyword? Message-ID: Regarding the discussion about None becoming a keyword in Py3K: Recently the truth values True and False have been mentioned. Should they become builtin values --like None is now-- or should they become keywords? Nevertheless: for the time being I came up with the following weird idea: If you put this in front of the main module of a Python app: #!/usr/bin/env python if __name__ == "__main__": import sys if sys.version[0] <= '1': __builtins__.True = 1 __builtins__.False = 0 del sys # --- continue with your app from here: --- import foo, bar, ... .... Now you can start to use False and True in any immported module as if they were already builtins. Of course this is no surprise here and Python is really fun, Peter. From mal at lemburg.com Thu Mar 23 22:07:35 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 23 Mar 2000 22:07:35 +0100 Subject: [Python-Dev] Unicode character names References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us> Message-ID: <38DA8797.F16301E4@lemburg.com> "Andrew M. Kuchling" wrote: > > Paul Prescod writes: > >The new \N escape interpolates named characters within strings. For > >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a > >unicode smiley face at the end. > > Cute idea, and it certainly means you can avoid looking up Unicode > numbers. (You can look up names instead. :) ) Note that this means the > Unicode database is no longer optional if this is done; it has to be > around at code-parsing time. Python could import it automatically, as > exceptions.py is imported. Christian's work on compressing > unicodedatabase.c is therefore really important. (Is Perl5.6 actually > dragging around the Unicode database in the binary, or is it read out > of some external file or data structure?) Sorry to disappoint you guys, but the Unicode name and comments are *not* included in the unicodedatabase.c file Christian is currently working on. The reason is simple: it would add huge amounts of string data to the file. So this is a no-no for the core distribution... Still, the above is easily possible by inventing a new encoding, say unicode-with-smileys, which then reads in a file containing the Unicode names and applies the necessary magic to decode/encode data as Paul described above. Would probably make a cool fun-project for someone who wants to dive into writing codecs. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From bwarsaw at cnri.reston.va.us Fri Mar 24 00:02:06 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Thu, 23 Mar 2000 18:02:06 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods References: <14554.27113.546575.170565@anthem.cnri.reston.va.us> Message-ID: <14554.41582.688247.569547@anthem.cnri.reston.va.us> Hi Peter! >>>>> "PF" == Peter Funk writes: PF> Since something like 'del for' or 'del import' raises a PF> SyntaxError exception in Py152, I expect 'del None' to raise PF> the same exception in Py3K, after None has become a keyword. PF> Right? I misread your example the first time through, but it still doesn't quite parse on my second read. -------------------- snip snip -------------------- pyvers = '2k' try: del import except SyntaxError: pyvers = '3k' -------------------- snip snip -------------------- % python /tmp/foo.py File "/tmp/foo.py", line 3 del import ^ SyntaxError: invalid syntax -------------------- snip snip -------------------- See, you can't catch that SyntaxError because it doesn't happen at run-time. Maybe you meant to wrap the try suite in an exec? Here's a code sample that ought to work with 1.5.2 and the mythical Py3K-with-a-None-keyword. -------------------- snip snip -------------------- pyvers = '2k' try: exec "del None" except SyntaxError: pyvers = '3k' except NameError: pass print pyvers -------------------- snip snip -------------------- Cheers, -Barry From klm at digicool.com Fri Mar 24 00:05:08 2000 From: klm at digicool.com (Ken Manheimer) Date: Thu, 23 Mar 2000 18:05:08 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: Message-ID: On Thu, 23 Mar 2000 pf at artcom-gmbh.de wrote: > Hi Barry! > > > >>>>> "PF" == Peter Funk writes: > > > > | try: > > | del None > > | except SyntaxError: > > | pass # Wow running Py3K here! > > Barry A. Warsaw: > > I know how to break your Py3K code: stick None=None some where higher > > up :) Huh. Does anyone really think we're going to catch SyntaxError at runtime, ever? Seems like the code fragment above wouldn't work in the first place. But i suppose, with most of a millennium to emerge, py3k could have more fundamental changes than i could even imagine...-) Ken klm at digicool.com From pf at artcom-gmbh.de Thu Mar 23 23:53:34 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Thu, 23 Mar 2000 23:53:34 +0100 (MET) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <14554.27449.69043.924322@amarok.cnri.reston.va.us> from "Andrew M. Kuchling" at "Mar 23, 2000 2: 6:33 pm" Message-ID: Hi! > Barry A. Warsaw writes: > >>>>>> "PF" == Peter Funk writes: > > PF> I wonder, how much existinng code the None --> keyword change > > PF> would brake. > >Me too. Andrew M. Kuchling: > I can't conceive of anyone using None as a function name or a variable > name, except through a bug or thinking that 'None, useful, None = > 1,2,3' works. Even though None isn't a fixed constant, it might as > well be. How much C code have you see lately that starts with int > function(void *NULL) ? I agree. urban legend: Once upon a time someone found the following neat snippet of C source hidden in some header file of a very very huge software, after he has spend some nights trying to figure out, why some simple edits he made in order to make the code more readable broke the system: #ifdef TRUE /* eat this: you arrogant Quiche Eaters */ #undef TRUE #undef FALSE #define TRUE (0) #define FALSE (1) #endif Obviously the poor guy would have found this particular small piece of evil code much earlier, if he had simply 'grep'ed for comments... there were not so many in this system. ;-) > Being able to do "None = 2" also smacks a bit of those legendary > Fortran compilers that let you accidentally change 2 into 4. +1 on > this change for Py3K, and I doubt it would cause breakage even if > introduced into 1.x. We'll see: those "Real Programmers" never die. Fortunately they prefer Perl over Python. <0.5 grin> Regards, Peter From klm at digicool.com Fri Mar 24 00:15:42 2000 From: klm at digicool.com (Ken Manheimer) Date: Thu, 23 Mar 2000 18:15:42 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <14554.41582.688247.569547@anthem.cnri.reston.va.us> Message-ID: On Thu, 23 Mar 2000 bwarsaw at cnri.reston.va.us wrote: > See, you can't catch that SyntaxError because it doesn't happen at > run-time. Maybe you meant to wrap the try suite in an exec? Here's a Huh. Guess i should have read barry's re-response before i posted mine: Desperately desiring to redeem myself, and contribute something to the discussion, i'll settle the class/static method naming quandry with the obvious alternative: > > p.classMethod("hey, cool!") # also selfless These should be called buddha methods - no self, samadhi, one with everything, etc. There, now i feel better. :-) Ken klm at digicool.com A Zen monk walks up to a hotdog vendor and says "make me one with everything." Ha. But that's not all. He gets the hot dog and pays with a ten. After several moments waiting, he says to the vendor, "i was expecting change", and the vendor say, "you of all people should know, change comes from inside." That's all. From bwarsaw at cnri.reston.va.us Fri Mar 24 00:19:28 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 23 Mar 2000 18:19:28 -0500 (EST) Subject: [Python-Dev] Py3K: True and False builtin or keyword? References: Message-ID: <14554.42624.213027.854942@anthem.cnri.reston.va.us> >>>>> "PF" == Peter Funk writes: PF> Now you can start to use False and True in any immported PF> module as if they were already builtins. Of course this is no PF> surprise here and Python is really fun, Peter. You /can/ do this, but that doesn't mean you /should/ :) Mucking with builtins is fun the way huffing dry erase markers is fun. Things are very pretty at first, but eventually the brain cell lossage will more than outweigh that cheap thrill. I've seen a few legitimate uses for hacking builtins. In Zope, I believe Jim hacks get_transaction() or somesuch into builtins because that way it's easy to get at without passing it through the call tree. And in Zope it makes sense since this is a fancy database application and your current transaction is a central concept. I've occasionally wrapped an existing builtin because I needed to extend it's functionality while keeping it's semantics and API unchanged. An example of this was my pre-Python-1.5.2 open_ex() in Mailman's CGI driver script. Before builtin open() would print the failing file name, my open_ex() -- shown below -- would hack that into the exception object. But one of the things about Python that I /really/ like is that YOU KNOW WHERE THINGS COME FROM. If I suddenly start seeing True and False in your code, I'm going to look for function locals and args, then module globals, then from ... import *'s. If I don't see it in any of those, I'm going to put down my dry erase markers, look again, and then utter a loud "huh?" :) -Barry realopen = open def open_ex(filename, mode='r', bufsize=-1, realopen=realopen): from Mailman.Utils import reraise try: return realopen(filename, mode, bufsize) except IOError, e: strerror = e.strerror + ': ' + filename e.strerror = strerror e.filename = filename e.args = (e.args[0], strerror) reraise(e) import __builtin__ __builtin__.__dict__['open'] = open_ex From pf at artcom-gmbh.de Fri Mar 24 00:23:57 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Fri, 24 Mar 2000 00:23:57 +0100 (MET) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: from Ken Manheimer at "Mar 23, 2000 6: 5: 8 pm" Message-ID: Hi! > > > | try: > > > | del None > > > | except SyntaxError: > > > | pass # Wow running Py3K here! > > > > Barry A. Warsaw: > > > I know how to break your Py3K code: stick None=None some where higher > > > up :) > Ken Manheimer: > Huh. Does anyone really think we're going to catch SyntaxError at > runtime, ever? Seems like the code fragment above wouldn't work in the > first place. Ouuppps... Unfortunately I had no chance to test this with Py3K before making a fool of myself by posting this silly example. Now I understand what Barry meant. So if None really becomes a keyword in Py3K we can be sure to catch all those imaginary 'del None' statements very quickly. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From billtut at microsoft.com Fri Mar 24 03:46:06 2000 From: billtut at microsoft.com (Bill Tutt) Date: Thu, 23 Mar 2000 18:46:06 -0800 Subject: [Python-Dev] Re: Unicode character names Message-ID: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> MAL wrote: >Andrew M. Kuchling" wrote: >> >> Paul Prescod writes: >>>The new \N escape interpolates named characters within strings. For >>>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a >>>unicode smiley face at the end. >> >> Cute idea, and it certainly means you can avoid looking up Unicode >> numbers. (You can look up names instead. :) ) Note that this means the >> Unicode database is no longer optional if this is done; it has to be >> around at code-parsing time. Python could import it automatically, as >> exceptions.py is imported. Christian's work on compressing >> unicodedatabase.c is therefore really important. (Is Perl5.6 actually >> dragging around the Unicode database in the binary, or is it read out >> of some external file or data structure?) > > Sorry to disappoint you guys, but the Unicode name and comments > are *not* included in the unicodedatabase.c file Christian > is currently working on. The reason is simple: it would add > huge amounts of string data to the file. So this is a no-no > for the core distribution... > Ok, now you're just being silly. Its possible to put the character names in a separate structure so that they don't automatically get paged in with the normal unicode character property data. If you never use it, it won't get paged in, its that simple.... Looking up the Unicode code value from the Unicode character name smells like a good time to use gperf to generate a perfect hash function for the character names. Esp. for the Unicode 3.0 character namespace. Then you can just store the hashkey -> Unicode character mapping, and hardly ever need to page in the actual full character name string itself. I haven't looked at what the comment field contains, so I have no idea how useful that info is. *waits while gperf crunches through the ~10,550 Unicode characters where this would be useful* Bill From akuchlin at mems-exchange.org Fri Mar 24 03:51:25 2000 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 23 Mar 2000 21:51:25 -0500 (EST) Subject: [Python-Dev] 1.6 job list Message-ID: <200003240251.VAA19921@newcnri.cnri.reston.va.us> I've written up a list of things that need to get done before 1.6 is finished. This is my vision of what needs to be done, and doesn't have an official stamp of approval from GvR or anyone else. So it's very probably wrong. http://starship.python.net/crew/amk/python/1.6-jobs.html Here's the list formatted as text. The major outstanding things at the moment seem to be sre and Distutils; once they go in, you could probably release an alpha, because the other items are relatively minor. Still to do * XXX Revamped import hooks (or is this a post-1.6 thing?) * Update the documentation to match 1.6 changes. * Document more undocumented modules * Unicode: Add Unicode support for open() on Windows * Unicode: Compress the size of unicodedatabase * Unicode: Write \N{SMILEY} codec for Unicode * Unicode: the various XXX items in Misc/unicode.txt * Add module: Distutils * Add module: Jim Ahlstrom's zipfile.py * Add module: PyExpat interface * Add module: mmapfile * Add module: sre * Drop cursesmodule and package it separately. (Any other obsolete modules that should go?) * Delete obsolete subdirectories in Demo/ directory * Refurbish Demo subdirectories to be properly documented, match modern coding style, etc. * Support Unicode strings in PyExpat interface * Fix ./ld_so_aix installation problem on AIX * Make test.regrtest.py more usable outside of the Python test suite * Conservative garbage collection of cycles (maybe?) * Write friendly "What's New in 1.6" document/article Done Nothing at the moment. After 1.7 * Rich comparisons * Revised coercions * Parallel for loop (for i in L; j in M: ...), * Extended slicing for all sequences. * GvR: "I've also been thinking about making classes be types (not as huge a change as you think, if you don't allow subclassing built-in types), and adding a built-in array type suitable for use by NumPy." --amk From esr at thyrsus.com Fri Mar 24 04:30:53 2000 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 23 Mar 2000 22:30:53 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003240251.VAA19921@newcnri.cnri.reston.va.us>; from Andrew Kuchling on Thu, Mar 23, 2000 at 09:51:25PM -0500 References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> Message-ID: <20000323223053.J28880@thyrsus.com> Andrew Kuchling : > * Drop cursesmodule and package it separately. (Any other obsolete > modules that should go?) Annoyingly enough, I may need this to stay in, for use by the new Linux-kernel configuration system I'm writing. Why is it on the hit list? -- Eric S. Raymond Still, if you will not fight for the right when you can easily win without bloodshed, if you will not fight when your victory will be sure and not so costly, you may come to the moment when you will have to fight with all the odds against you and only a precarious chance for survival. There may be a worse case. You may have to fight when there is no chance of victory, because it is better to perish than to live as slaves. --Winston Churchill From dan at cgsoftware.com Fri Mar 24 04:52:54 2000 From: dan at cgsoftware.com (Daniel Berlin+list.python-dev) Date: 23 Mar 2000 22:52:54 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: "Eric S. Raymond"'s message of "Thu, 23 Mar 2000 22:30:53 -0500" References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> Message-ID: <4s9x6n3d.fsf@dan.resnet.rochester.edu> "Eric S. Raymond" writes: > Andrew Kuchling : > > * Drop cursesmodule and package it separately. (Any other obsolete > > modules that should go?) > > Annoyingly enough, I may need this to stay in, for use by the new Linux-kernel > configuration system I'm writing. Why is it on the hit list? IIRC, it's because nobody really maintains it, and those that care about it, use a different one (either ncurses module, or a newer cursesmodule). So from what i understand, you get complaints, but no real advantage to having it there. I'm just trying to summarize, not fall on either side (some people get touchy about issues like this). --Dan From esr at thyrsus.com Fri Mar 24 05:11:37 2000 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 23 Mar 2000 23:11:37 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: <4s9x6n3d.fsf@dan.resnet.rochester.edu>; from Daniel Berlin+list.python-dev on Thu, Mar 23, 2000 at 10:52:54PM -0500 References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> <4s9x6n3d.fsf@dan.resnet.rochester.edu> Message-ID: <20000323231137.U28880@thyrsus.com> Daniel Berlin+list.python-dev : > > Andrew Kuchling : > > > * Drop cursesmodule and package it separately. (Any other obsolete > > > modules that should go?) > > > > Annoyingly enough, I may need this to stay in, for use by the new Linux-kernel > > configuration system I'm writing. Why is it on the hit list? > > IIRC, it's because nobody really maintains it, and those that care > about it, use a different one (either ncurses module, or a newer cursesmodule). > So from what i understand, you get complaints, but no real advantage > to having it there. OK. Then what I guess I'd like is for a maintained equivalent of this to join the core -- the ncurses module you referred to, for choice. I'm not being random. I'm trying to replace the mess that currently constitutes the kbuild system -- but I'll need to support an equivalent of menuconfig. -- Eric S. Raymond "The state calls its own violence `law', but that of the individual `crime'" -- Max Stirner From akuchlin at mems-exchange.org Fri Mar 24 05:33:24 2000 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 23 Mar 2000 23:33:24 -0500 (EST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <20000323231137.U28880@thyrsus.com> References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> <4s9x6n3d.fsf@dan.resnet.rochester.edu> <20000323231137.U28880@thyrsus.com> Message-ID: <14554.61460.311650.599253@newcnri.cnri.reston.va.us> Eric S. Raymond writes: >OK. Then what I guess I'd like is for a maintained equivalent of this >to join the core -- the ncurses module you referred to, for choice. See the "Whither cursesmodule" thread in the python-dev archives: http://www.python.org/pipermail/python-dev/2000-February/003796.html One possibility was to blow off backward compatibility; are there any systems that only have BSD curses, not SysV curses / ncurses? Given that Pavel Curtis announced he was dropping BSD curses maintainance some years ago, I expect even the *BSDs use ncurses these days. However, Oliver Andrich doesn't seem interested in maintaining his ncurses module, and someone just started a SWIG-generated interface (http://pyncurses.sourceforge.net), so it's not obvious which one you'd use. (I *would* be willing to take over maintaining Andrich's code; maintaining the BSD curses version just seems pointless these days.) --amk From dan at cgsoftware.com Fri Mar 24 05:43:51 2000 From: dan at cgsoftware.com (Daniel Berlin+list.python-dev) Date: 23 Mar 2000 23:43:51 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: Andrew Kuchling's message of "Thu, 23 Mar 2000 23:33:24 -0500 (EST)" References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> <4s9x6n3d.fsf@dan.resnet.rochester.edu> <20000323231137.U28880@thyrsus.com> <14554.61460.311650.599253@newcnri.cnri.reston.va.us> Message-ID: Andrew Kuchling writes: > Eric S. Raymond writes: > >OK. Then what I guess I'd like is for a maintained equivalent of this > >to join the core -- the ncurses module you referred to, for choice. > > See the "Whither cursesmodule" thread in the python-dev archives: > http://www.python.org/pipermail/python-dev/2000-February/003796.html > > One possibility was to blow off backward compatibility; are there any > systems that only have BSD curses, not SysV curses / ncurses? Given > that Pavel Curtis announced he was dropping BSD curses maintainance > some years ago, I expect even the *BSDs use ncurses these days. Yes, they do. ls /usr/src/lib/libncurses/ Makefile ncurses_cfg.h pathnames.h termcap.c grep 5\.0 /usr/src/contrib/ncurses/* At least, this is FreeBSD. So there is no need for BSD curses anymore, on FreeBSD's account. > --amk > From esr at thyrsus.com Fri Mar 24 05:47:56 2000 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 23 Mar 2000 23:47:56 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: <14554.61460.311650.599253@newcnri.cnri.reston.va.us>; from Andrew Kuchling on Thu, Mar 23, 2000 at 11:33:24PM -0500 References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> <4s9x6n3d.fsf@dan.resnet.rochester.edu> <20000323231137.U28880@thyrsus.com> <14554.61460.311650.599253@newcnri.cnri.reston.va.us> Message-ID: <20000323234756.A29775@thyrsus.com> Andrew Kuchling : > Eric S. Raymond writes: > >OK. Then what I guess I'd like is for a maintained equivalent of this > >to join the core -- the ncurses module you referred to, for choice. > > See the "Whither cursesmodule" thread in the python-dev archives: > http://www.python.org/pipermail/python-dev/2000-February/003796.html > > One possibility was to blow off backward compatibility; are there any > systems that only have BSD curses, not SysV curses / ncurses? Given > that Pavel Curtis announced he was dropping BSD curses maintainance > some years ago, I expect even the *BSDs use ncurses these days. BSD curses was officially declared dead by its maintainer, Keith Bostic, in early 1995. Keith and I conspired to kill it of in favor of ncurses :-). -- Eric S. Raymond If gun laws in fact worked, the sponsors of this type of legislation should have no difficulty drawing upon long lists of examples of criminal acts reduced by such legislation. That they cannot do so after a century and a half of trying -- that they must sweep under the rug the southern attempts at gun control in the 1870-1910 period, the northeastern attempts in the 1920-1939 period, the attempts at both Federal and State levels in 1965-1976 -- establishes the repeated, complete and inevitable failure of gun laws to control serious crime. -- Senator Orrin Hatch, in a 1982 Senate Report From andy at reportlab.com Fri Mar 24 11:14:44 2000 From: andy at reportlab.com (Andy Robinson) Date: Fri, 24 Mar 2000 10:14:44 GMT Subject: [Python-Dev] Unicode character names In-Reply-To: <20000324024913.B8C3A1CF22@dinsdale.python.org> References: <20000324024913.B8C3A1CF22@dinsdale.python.org> Message-ID: <38db3fc6.7370137@post.demon.co.uk> On Thu, 23 Mar 2000 21:49:13 -0500 (EST), you wrote: >Sorry to disappoint you guys, but the Unicode name and comments >are *not* included in the unicodedatabase.c file Christian >is currently working on. The reason is simple: it would add >huge amounts of string data to the file. So this is a no-no >for the core distribution... You're right about what is compiled into the core. I have to keep reminding myself to distinguish three places functionality can live: 1. What is compiled into the Python core 2. What is in the standard Python library relating to encodings. 3. Completely separate add-on packages, maintained outside of Python, to provide extra functionality for (e.g.) Asian encodings. It is clear that both the Unicode database, and the mapping tables and other files at unicode.org, are a great resource; but they could be placed in (2) or (3) easily, along with scripts to unpack them. It probably makes sense for the i18n-sig to kick off a separate 'CodecKit' project for now, and we can see what good emerges from it before thinking about what should go into the library. >Still, the above is easily possible by inventing a new >encoding, say unicode-with-smileys, which then reads in >a file containing the Unicode names and applies the necessary >magic to decode/encode data as Paul described above. >Would probably make a cool fun-project for someone who wants >to dive into writing codecs. Yup. Prime candidate for CodecKit. - Andy From mal at lemburg.com Fri Mar 24 09:52:36 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 09:52:36 +0100 Subject: [Python-Dev] Re: Unicode character names References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> Message-ID: <38DB2CD4.CAD9F0E2@lemburg.com> Bill Tutt wrote: > > MAL wrote: > > >Andrew M. Kuchling" wrote: > >> > >> Paul Prescod writes: > >>>The new \N escape interpolates named characters within strings. For > >>>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a > >>>unicode smiley face at the end. > >> > >> Cute idea, and it certainly means you can avoid looking up Unicode > >> numbers. (You can look up names instead. :) ) Note that this means the > >> Unicode database is no longer optional if this is done; it has to be > >> around at code-parsing time. Python could import it automatically, as > >> exceptions.py is imported. Christian's work on compressing > >> unicodedatabase.c is therefore really important. (Is Perl5.6 actually > >> dragging around the Unicode database in the binary, or is it read out > >> of some external file or data structure?) > > > > Sorry to disappoint you guys, but the Unicode name and comments > > are *not* included in the unicodedatabase.c file Christian > > is currently working on. The reason is simple: it would add > > huge amounts of string data to the file. So this is a no-no > > for the core distribution... > > > > Ok, now you're just being silly. Its possible to put the character names in > a separate structure so that they don't automatically get paged in with the > normal unicode character property data. If you never use it, it won't get > paged in, its that simple.... Sure, but it would still cause the interpreter binary or DLL to increase in size considerably... that caused some major noise a few days ago due to the fact that the unicodedata module adds some 600kB to the interpreter -- even though it would only get swapped in when needed (the interpreter itself doesn't use it). > Looking up the Unicode code value from the Unicode character name smells > like a good time to use gperf to generate a perfect hash function for the > character names. Esp. for the Unicode 3.0 character namespace. Then you can > just store the hashkey -> Unicode character mapping, and hardly ever need to > page in the actual full character name string itself. Great idea, but why not put this into separate codec module ? > I haven't looked at what the comment field contains, so I have no idea how > useful that info is. Probably not worth looking at... > *waits while gperf crunches through the ~10,550 Unicode characters where > this would be useful* -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Fri Mar 24 11:37:53 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 11:37:53 +0100 Subject: [Python-Dev] Unicode and Windows References: <20000322134650.ED1BD370CF2@snelboot.oratrix.nl> <38D8F55E.6E324281@lemburg.com> Message-ID: <38DB4581.EB5315E0@lemburg.com> Ok, I've just added two new parser markers to PyArg_ParseTuple() which will hopefully make life a little easier for extension writers. The new code will be in the next patch set which I will release early next week. Here are the docs: Internal Argument Parsing: -------------------------- These markers are used by the PyArg_ParseTuple() APIs: "U": Check for Unicode object and return a pointer to it "s": For Unicode objects: auto convert them to the and return a pointer to the object's buffer. "s#": Access to the Unicode object via the bf_getreadbuf buffer interface (see Buffer Interface); note that the length relates to the buffer length, not the Unicode string length (this may be different depending on the Internal Format). "t#": Access to the Unicode object via the bf_getcharbuf buffer interface (see Buffer Interface); note that the length relates to the buffer length, not necessarily to the Unicode string length (this may be different depending on the ). "es": Takes two parameters: encoding (const char **) and buffer (char **). The input object is first coerced to Unicode in the usual way and then encoded into a string using the given encoding. On output, a buffer of the needed size is allocated and returned through *buffer as NULL-terminated string. The encoded may not contain embedded NULL characters. The caller is responsible for free()ing the allocated *buffer after usage. "es#": Takes three parameters: encoding (const char **), buffer (char **) and buffer_len (int *). The input object is first coerced to Unicode in the usual way and then encoded into a string using the given encoding. If *buffer is non-NULL, *buffer_len must be set to sizeof(buffer) on input. Output is then copied to *buffer. If *buffer is NULL, a buffer of the needed size is allocated and output copied into it. *buffer is then updated to point to the allocated memory area. The caller is responsible for free()ing *buffer after usage. In both cases *buffer_len is updated to the number of characters written (excluding the trailing NULL-byte). The output buffer is assured to be NULL-terminated. Examples: Using "es#" with auto-allocation: static PyObject * test_parser(PyObject *self, PyObject *args) { PyObject *str; const char *encoding = "latin-1"; char *buffer = NULL; int buffer_len = 0; if (!PyArg_ParseTuple(args, "es#:test_parser", &encoding, &buffer, &buffer_len)) return NULL; if (!buffer) { PyErr_SetString(PyExc_SystemError, "buffer is NULL"); return NULL; } str = PyString_FromStringAndSize(buffer, buffer_len); free(buffer); return str; } Using "es" with auto-allocation returning a NULL-terminated string: static PyObject * test_parser(PyObject *self, PyObject *args) { PyObject *str; const char *encoding = "latin-1"; char *buffer = NULL; if (!PyArg_ParseTuple(args, "es:test_parser", &encoding, &buffer)) return NULL; if (!buffer) { PyErr_SetString(PyExc_SystemError, "buffer is NULL"); return NULL; } str = PyString_FromString(buffer); free(buffer); return str; } Using "es#" with a pre-allocated buffer: static PyObject * test_parser(PyObject *self, PyObject *args) { PyObject *str; const char *encoding = "latin-1"; char _buffer[10]; char *buffer = _buffer; int buffer_len = sizeof(_buffer); if (!PyArg_ParseTuple(args, "es#:test_parser", &encoding, &buffer, &buffer_len)) return NULL; if (!buffer) { PyErr_SetString(PyExc_SystemError, "buffer is NULL"); return NULL; } str = PyString_FromStringAndSize(buffer, buffer_len); return str; } -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein at lyra.org Fri Mar 24 11:54:02 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 02:54:02 -0800 (PST) Subject: [Python-Dev] Unicode and Windows In-Reply-To: <38DB4581.EB5315E0@lemburg.com> Message-ID: On Fri, 24 Mar 2000, M.-A. Lemburg wrote: >... > "s": For Unicode objects: auto convert them to the > and return a pointer to the object's buffer. Guess that I didn't notice this before, but it seems wierd that "s" and "s#" return different encodings. Why? > "es": > Takes two parameters: encoding (const char **) and > buffer (char **). >... > "es#": > Takes three parameters: encoding (const char **), > buffer (char **) and buffer_len (int *). I see no reason to make the encoding (const char **) rather than (const char *). We are never returning a value, so this just makes it harder to pass the encoding into ParseTuple. There is precedent for passing in single-ref pointers. For example: PyArg_ParseTuple(args, "O!", &s, PyString_Type) I would recommend using just one pointer level for the encoding. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Fri Mar 24 12:29:12 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 12:29:12 +0100 Subject: [Python-Dev] Unicode and Windows References: Message-ID: <38DB5188.AA580652@lemburg.com> Greg Stein wrote: > > On Fri, 24 Mar 2000, M.-A. Lemburg wrote: > >... > > "s": For Unicode objects: auto convert them to the > > and return a pointer to the object's buffer. > > Guess that I didn't notice this before, but it seems wierd that "s" and > "s#" return different encodings. > > Why? This is due to the buffer interface being used for "s#". Since "s#" refers to the getreadbuf slot, it returns raw data. In this case this is UTF-16 in platform dependent byte order. "s" relies on NULL-terminated strings and doesn't use the buffer interface at all. Thus "s" returns NULL-terminated UTF-8 (UTF-16 is full of NULLs). "t#" uses the getcharbuf slot and thus should return character data. UTF-8 is the right encoding here. > > "es": > > Takes two parameters: encoding (const char **) and > > buffer (char **). > >... > > "es#": > > Takes three parameters: encoding (const char **), > > buffer (char **) and buffer_len (int *). > > I see no reason to make the encoding (const char **) rather than > (const char *). We are never returning a value, so this just makes it > harder to pass the encoding into ParseTuple. > > There is precedent for passing in single-ref pointers. For example: > > PyArg_ParseTuple(args, "O!", &s, PyString_Type) > > I would recommend using just one pointer level for the encoding. You have a point there... even though it breaks the notion of prepending all parameters with an '&' (ok, except the type check one). OTOH, it would allow passing the encoding right with the PyArg_ParseTuple() call which probably makes more sense in this context. I'll change it... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tismer at tismer.com Fri Mar 24 14:13:02 2000 From: tismer at tismer.com (Christian Tismer) Date: Fri, 24 Mar 2000 14:13:02 +0100 Subject: [Python-Dev] Unicode character names References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us> <38DA8797.F16301E4@lemburg.com> Message-ID: <38DB69DE.6D04B084@tismer.com> "M.-A. Lemburg" wrote: > > "Andrew M. Kuchling" wrote: > > > > Paul Prescod writes: > > >The new \N escape interpolates named characters within strings. For > > >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a > > >unicode smiley face at the end. > > > > Cute idea, and it certainly means you can avoid looking up Unicode > > numbers. (You can look up names instead. :) ) Note that this means the > > Unicode database is no longer optional if this is done; it has to be > > around at code-parsing time. Python could import it automatically, as > > exceptions.py is imported. Christian's work on compressing > > unicodedatabase.c is therefore really important. (Is Perl5.6 actually > > dragging around the Unicode database in the binary, or is it read out > > of some external file or data structure?) > > Sorry to disappoint you guys, but the Unicode name and comments > are *not* included in the unicodedatabase.c file Christian > is currently working on. The reason is simple: it would add > huge amounts of string data to the file. So this is a no-no > for the core distribution... This is not settled, still an open question. What I have for non-textual data: 25 kb with dumb compression 15 kb with enhanced compression What amounts of data am I talking about? - The whole unicode database text file has size 632 kb. - With PkZip this goes down to 96 kb. Now, I produced another text file with just the currently used data in it, and this sounds so: - the stripped unicode text file has size 216 kb. - PkZip melts this down to 40 kb. Please compare that to my results above: I can do at least twice as good. I hope I can compete for the text sections as well (since this is something where zip is *good* at), but just let me try. Let's target 60 kb for the whole crap, and I'd be very pleased. Then, there is still the question where to put the data. Having one file in the dll and another externally would be an option. I could also imagine to use a binary external file all the time, with maximum possible compression. By loading this structure, this would be partially expanded to make it fast. An advantage is that the compressed Unicode database could become a stand-alone product. The size is in fact so crazy small, that I'd like to make this available to any other language. > Still, the above is easily possible by inventing a new > encoding, say unicode-with-smileys, which then reads in > a file containing the Unicode names and applies the necessary > magic to decode/encode data as Paul described above. That sounds reasonable. Compression makes sense as well here, since the expanded stuff makes quite an amount of kb, compared to what it is "worth", compared to, say, the Python dll. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From mal at lemburg.com Fri Mar 24 14:41:27 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 14:41:27 +0100 Subject: [Python-Dev] Unicode character names References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us> <38DA8797.F16301E4@lemburg.com> <38DB69DE.6D04B084@tismer.com> Message-ID: <38DB7087.1B105AC7@lemburg.com> Christian Tismer wrote: > > "M.-A. Lemburg" wrote: > > > > "Andrew M. Kuchling" wrote: > > > > > > Paul Prescod writes: > > > >The new \N escape interpolates named characters within strings. For > > > >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a > > > >unicode smiley face at the end. > > > > > > Cute idea, and it certainly means you can avoid looking up Unicode > > > numbers. (You can look up names instead. :) ) Note that this means the > > > Unicode database is no longer optional if this is done; it has to be > > > around at code-parsing time. Python could import it automatically, as > > > exceptions.py is imported. Christian's work on compressing > > > unicodedatabase.c is therefore really important. (Is Perl5.6 actually > > > dragging around the Unicode database in the binary, or is it read out > > > of some external file or data structure?) > > > > Sorry to disappoint you guys, but the Unicode name and comments > > are *not* included in the unicodedatabase.c file Christian > > is currently working on. The reason is simple: it would add > > huge amounts of string data to the file. So this is a no-no > > for the core distribution... > > This is not settled, still an open question. Well, ok, depends on how much you can sqeeze out of the text columns ;-) I still think that its better to leave these gimmicks out of the core and put them into some add-on, though. > What I have for non-textual data: > 25 kb with dumb compression > 15 kb with enhanced compression Looks good :-) With these sizes I think we could even integrate the unicodedatabase.c + API into the core interpreter and only have the unicodedata module to access the database from within Python. > What amounts of data am I talking about? > - The whole unicode database text file has size > 632 kb. > - With PkZip this goes down to > 96 kb. > > Now, I produced another text file with just the currently > used data in it, and this sounds so: > - the stripped unicode text file has size > 216 kb. > - PkZip melts this down to > 40 kb. > > Please compare that to my results above: I can do at least > twice as good. I hope I can compete for the text sections > as well (since this is something where zip is *good* at), > but just let me try. > Let's target 60 kb for the whole crap, and I'd be very pleased. > > Then, there is still the question where to put the data. > Having one file in the dll and another externally would > be an option. I could also imagine to use a binary external > file all the time, with maximum possible compression. > By loading this structure, this would be partially expanded > to make it fast. > An advantage is that the compressed Unicode database > could become a stand-alone product. The size is in fact > so crazy small, that I'd like to make this available > to any other language. You could take the unicodedatabase.c file (+ header file) and use it everywhere... I don't think it needs to contain any Python specific code. The API names would have to follow the Python naming schemes though. > > Still, the above is easily possible by inventing a new > > encoding, say unicode-with-smileys, which then reads in > > a file containing the Unicode names and applies the necessary > > magic to decode/encode data as Paul described above. > > That sounds reasonable. Compression makes sense as well here, > since the expanded stuff makes quite an amount of kb, compared > to what it is "worth", compared to, say, the Python dll. With 25kB for the non-text columns, I'd suggest simply adding the file to the core. Text columns could then go into a separate module. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Fri Mar 24 15:14:51 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 09:14:51 -0500 Subject: [Python-Dev] Hi -- I'm back! Message-ID: <200003241414.JAA11740@eric.cnri.reston.va.us> I'm back from ten days on the road. I'll try to dig through the various mailing list archives over the next few days, but it would be more efficient if you are waiting for me to take action or express an opinion on a particular issue (in *any* Python-related mailing list) to mail me a summary or at least a pointer. --Guido van Rossum (home page: http://www.python.org/~guido/) From jack at oratrix.nl Fri Mar 24 16:01:25 2000 From: jack at oratrix.nl (Jack Jansen) Date: Fri, 24 Mar 2000 16:01:25 +0100 Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: Message by Ka-Ping Yee , Thu, 23 Mar 2000 09:47:47 -0800 (PST) , Message-ID: <20000324150125.7144A370CF2@snelboot.oratrix.nl> > Hmm... i guess this also means one should ask what > > def function(None, arg): > ... > > does outside a class definition. I suppose that should simply > be illegal. No, it forces you to call the function with keyword arguments! (initially meant jokingly, but thinking about it for a couple of seconds there might actually be cases where this is useful) -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From skip at mojam.com Fri Mar 24 16:14:11 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 24 Mar 2000 09:14:11 -0600 (CST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003240251.VAA19921@newcnri.cnri.reston.va.us> References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> Message-ID: <14555.34371.749039.946891@beluga.mojam.com> AMK> I've written up a list of things that need to get done before 1.6 AMK> is finished. This is my vision of what needs to be done, and AMK> doesn't have an official stamp of approval from GvR or anyone else. AMK> So it's very probably wrong. Might I suggest moving robotparser.py from Tools/webchecker to Lib? Modules of general usefulness (this is at least generally useful for anyone writing web spiders ;-) shouldn't live in Tools, because it's not always available and users need to do extra work to make them available. I'd be happy to write up some documentation for it and twiddle the module to include doc strings. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From fdrake at acm.org Fri Mar 24 16:20:03 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 10:20:03 -0500 (EST) Subject: [Python-Dev] Unicode and Windows In-Reply-To: References: <38DB4581.EB5315E0@lemburg.com> Message-ID: <14555.34723.841426.504538@weyr.cnri.reston.va.us> Greg Stein writes: > There is precedent for passing in single-ref pointers. For example: > > PyArg_ParseTuple(args, "O!", &s, PyString_Type) ^^^^^^^^^^^^^^^^^ Feeling ok? I *suspect* these are reversed. :) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake at acm.org Fri Mar 24 16:24:13 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 10:24:13 -0500 (EST) Subject: [Python-Dev] Unicode and Windows In-Reply-To: <38DB5188.AA580652@lemburg.com> References: <38DB5188.AA580652@lemburg.com> Message-ID: <14555.34973.303273.716146@weyr.cnri.reston.va.us> M.-A. Lemburg writes: > You have a point there... even though it breaks the notion > of prepending all parameters with an '&' (ok, except the I've never heard of this notion; I hope I didn't just miss it in the docs! The O& also doesn't require a & in front of the name of the conversion function, you just pass the right value. So there are at least two cases where you *typically* don't use a &. (Other cases in the 1.5.2 API are probably just plain weird if they don't!) Changing it to avoid the extra machinery is the Right Thing; you get to feel good today. ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mal at lemburg.com Fri Mar 24 17:38:06 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 17:38:06 +0100 Subject: [Python-Dev] Unicode and Windows References: <38DB5188.AA580652@lemburg.com> <14555.34973.303273.716146@weyr.cnri.reston.va.us> Message-ID: <38DB99EE.F5949889@lemburg.com> "Fred L. Drake, Jr." wrote: > > M.-A. Lemburg writes: > > You have a point there... even though it breaks the notion > > of prepending all parameters with an '&' (ok, except the > > I've never heard of this notion; I hope I didn't just miss it in the > docs! If you scan the parameters list in getargs.c you'll come to this conclusion and thus my notion: I've been programming like this for years now :-) > The O& also doesn't require a & in front of the name of the > conversion function, you just pass the right value. So there are at > least two cases where you *typically* don't use a &. (Other cases in > the 1.5.2 API are probably just plain weird if they don't!) > Changing it to avoid the extra machinery is the Right Thing; you get > to feel good today. ;) Ok, feeling good now ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Fri Mar 24 21:44:02 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 15:44:02 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: Your message of "Fri, 24 Mar 2000 09:14:11 CST." <14555.34371.749039.946891@beluga.mojam.com> References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <14555.34371.749039.946891@beluga.mojam.com> Message-ID: <200003242044.PAA00677@eric.cnri.reston.va.us> > Might I suggest moving robotparser.py from Tools/webchecker to Lib? Modules > of general usefulness (this is at least generally useful for anyone writing > web spiders ;-) shouldn't live in Tools, because it's not always available > and users need to do extra work to make them available. > > I'd be happy to write up some documentation for it and twiddle the module to > include doc strings. Deal. Soon as we get the docs we'll move it to Lib. --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein at lyra.org Fri Mar 24 21:50:43 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 12:50:43 -0800 (PST) Subject: [Python-Dev] Unicode and Windows In-Reply-To: <14555.34723.841426.504538@weyr.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Fred L. Drake, Jr. wrote: > Greg Stein writes: > > There is precedent for passing in single-ref pointers. For example: > > > > PyArg_ParseTuple(args, "O!", &s, PyString_Type) > ^^^^^^^^^^^^^^^^^ > > Feeling ok? I *suspect* these are reversed. :) I just checked the code to ensure that it took a single pointer rather than a double-pointer. I guess that I didn't verify the order :-) Concept is valid, tho... the params do not necessarily require an ampersand. oop! Actually... this does require an ampersand: PyArg_ParseTuple(args, "O!", &PyString_Type, &s) Don't want to pass the whole structure... Well, regardless: I would much prefer to see the encoding passed as a constant string, rather than having to shove the sucker into a variable first, just so that I can insert a useless address-of operator. Cheers, -g -- Greg Stein, http://www.lyra.org/ From akuchlin at mems-exchange.org Fri Mar 24 21:51:56 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Fri, 24 Mar 2000 15:51:56 -0500 (EST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242044.PAA00677@eric.cnri.reston.va.us> References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <14555.34371.749039.946891@beluga.mojam.com> <200003242044.PAA00677@eric.cnri.reston.va.us> Message-ID: <14555.54636.811100.254309@amarok.cnri.reston.va.us> Guido van Rossum writes: >> Might I suggest moving robotparser.py from Tools/webchecker to Lib? Modules >Deal. Soon as we get the docs we'll move it to Lib. What about putting it in a package like 'www' or 'web'? Packagizing the existing library is hard because of backward compatibility, but there's no such constraint for new modules. -- A.M. Kuchling http://starship.python.net/crew/amk/ One need not be a chamber to be haunted; / One need not be a house; / The brain has corridors surpassing / Material place. -- Emily Dickinson, "Time and Eternity" From gstein at lyra.org Fri Mar 24 22:00:25 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 13:00:25 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <14555.54636.811100.254309@amarok.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Andrew M. Kuchling wrote: > Guido van Rossum writes: > >> Might I suggest moving robotparser.py from Tools/webchecker to Lib? Modules > >Deal. Soon as we get the docs we'll move it to Lib. > > What about putting it in a package like 'www' or 'web'? Packagizing > the existing library is hard because of backward compatibility, but > there's no such constraint for new modules. Or in the "network" package that was suggested a month ago? And why *can't* we start on repackaging old module? I think the only reason that somebody came up with to NOT do it was "well, if we don't repackage the whole thing, then we should repackage nothing." Which, IMO, is totally bogus. We'll never get anywhere operating under that principle. Cheers, -g -- Greg Stein, http://www.lyra.org/ From fdrake at acm.org Fri Mar 24 22:00:19 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 16:00:19 -0500 (EST) Subject: [Python-Dev] 1.6 job list In-Reply-To: References: <14555.54636.811100.254309@amarok.cnri.reston.va.us> Message-ID: <14555.55139.484135.602894@weyr.cnri.reston.va.us> Greg Stein writes: > Or in the "network" package that was suggested a month ago? +1 > And why *can't* we start on repackaging old module? I think the only > reason that somebody came up with to NOT do it was "well, if we don't > repackage the whole thing, then we should repackage nothing." Which, IMO, > is totally bogus. We'll never get anywhere operating under that principle. That doesn't bother me, but I tend to be a little conservative (though usually not as conservative as Guido on such matters). I *would* like to decided theat 1.7 will be fully packagized, and not wait until 2.0. As long as 1.7 is a "testing the evolutionary path" release, I think that's the right thing to do. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido at python.org Fri Mar 24 22:03:54 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 16:03:54 -0500 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead Message-ID: <200003242103.QAA03288@eric.cnri.reston.va.us> Someone noticed that socket.connect() and a few related functions (connect_ex() and bind()) take either a single (host, port) tuple or two separate arguments, but that only the tuple is documented. Similar to append(), I'd like to close this gap, and I've made the necessary changes. This will probably break lots of code. Similar to append(), I'd like people to fix their code rather than whine -- two-arg connect() has never been documented, although it's found in much code (even the socket module test code :-( ). Similar to append(), I may revert the change if it is shown to cause too much pain during beta testing... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Mar 24 22:05:57 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 16:05:57 -0500 Subject: [Python-Dev] Unicode and Windows In-Reply-To: Your message of "Fri, 24 Mar 2000 12:50:43 PST." References: Message-ID: <200003242105.QAA03543@eric.cnri.reston.va.us> > Well, regardless: I would much prefer to see the encoding passed as a > constant string, rather than having to shove the sucker into a variable > first, just so that I can insert a useless address-of operator. Of course. Use & for output args, not as a matter of principle. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Mar 24 22:11:25 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 16:11:25 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: Your message of "Fri, 24 Mar 2000 13:00:25 PST." References: Message-ID: <200003242111.QAA04208@eric.cnri.reston.va.us> [Greg] > And why *can't* we start on repackaging old module? I think the only > reason that somebody came up with to NOT do it was "well, if we don't > repackage the whole thing, then we should repackage nothing." Which, IMO, > is totally bogus. We'll never get anywhere operating under that principle. The reason is backwards compatibility. Assume we create a package "web" and move all web related modules into it: httplib, urllib, htmllib, etc. Now for backwards compatibility, we add the web directory to sys.path, so one can write either "import web.urllib" or "import urllib". But that loads the same code twice! And in this (carefully chosen :-) example, urllib actually has some state which shouldn't be replicated. Plus, it's too much work -- I'd rather focus on getting 1.6 out of the door, and there's a lot of other stuff I need to do besides moving modules around. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Mar 24 22:15:00 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 16:15:00 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: Your message of "Fri, 24 Mar 2000 16:00:19 EST." <14555.55139.484135.602894@weyr.cnri.reston.va.us> References: <14555.54636.811100.254309@amarok.cnri.reston.va.us> <14555.55139.484135.602894@weyr.cnri.reston.va.us> Message-ID: <200003242115.QAA04648@eric.cnri.reston.va.us> > Greg Stein writes: > > Or in the "network" package that was suggested a month ago? [Fred] > +1 Which reminds me of another reason to wait: coming up with the right package hierarchy is hard. (E.g. I find network too long; plus, does htmllib belong there?) > That doesn't bother me, but I tend to be a little conservative > (though usually not as conservative as Guido on such matters). I > *would* like to decided theat 1.7 will be fully packagized, and not > wait until 2.0. As long as 1.7 is a "testing the evolutionary path" > release, I think that's the right thing to do. Agreed. At the SD conference I gave a talk about the future of Python, and there was (again) a good suggestion about forwards compatibility. Starting with 1.7 (if not sooner), several Python 3000 features that necessarily have to be incompatible (like 1/2 yielding 0.5 instead of 0) could issue warnings when (or unless?) Python is invoked with a compatibility flag. --Guido van Rossum (home page: http://www.python.org/~guido/) From bwarsaw at cnri.reston.va.us Fri Mar 24 22:21:54 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 24 Mar 2000 16:21:54 -0500 (EST) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead References: <200003242103.QAA03288@eric.cnri.reston.va.us> Message-ID: <14555.56434.974884.832078@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> Someone noticed that socket.connect() and a few related GvR> functions (connect_ex() and bind()) take either a single GvR> (host, port) tuple or two separate arguments, but that only GvR> the tuple is documented. GvR> Similar to append(), I'd like to close this gap, and I've GvR> made the necessary changes. This will probably break lots of GvR> code. I don't agree that socket.connect() and friends need this fix. Yes, obviously append() needed fixing because of the application of Tim's Twelfth Enlightenment to the semantic ambiguity. But socket.connect() has no such ambiguity; you may spell it differently, but you know exactly what you mean. My suggestion would be to not break any code, but extend connect's interface to allow an optional second argument. Thus all of these calls would be legal: sock.connect(addr) sock.connect(addr, port) sock.connect((addr, port)) One nit on the documentation of the socket module. The second entry says: bind (address) Bind the socket to address. The socket must not already be bound. (The format of address depends on the address family -- see above.) Huh? What "above" part should I see? Note that I'm reading this doc off the web! -Barry From gstein at lyra.org Fri Mar 24 22:27:57 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 13:27:57 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242111.QAA04208@eric.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Guido van Rossum wrote: > [Greg] > > And why *can't* we start on repackaging old module? I think the only > > reason that somebody came up with to NOT do it was "well, if we don't > > repackage the whole thing, then we should repackage nothing." Which, IMO, > > is totally bogus. We'll never get anywhere operating under that principle. > > The reason is backwards compatibility. Assume we create a package > "web" and move all web related modules into it: httplib, urllib, > htmllib, etc. Now for backwards compatibility, we add the web > directory to sys.path, so one can write either "import web.urllib" or > "import urllib". But that loads the same code twice! And in this > (carefully chosen :-) example, urllib actually has some state which > shouldn't be replicated. We don't add it to the path. Instead, we create new modules that look like: ---- httplib.py ---- from web.httplib import * ---- The only backwards-compat issue with this approach is that people who poke values into the module will have problems. I don't believe that any of the modules were designed for that, anyhow, so it would seem an acceptable to (effectively) disallow that behavior. > Plus, it's too much work -- I'd rather focus on getting 1.6 out of the > door, and there's a lot of other stuff I need to do besides moving > modules around. Stuff that *you* need to do, sure. But there *are* a lot of us who can help here, and some who desire to spend their time moving modules. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Mar 24 22:32:14 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 13:32:14 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242115.QAA04648@eric.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Guido van Rossum wrote: > > Greg Stein writes: > > > Or in the "network" package that was suggested a month ago? > > [Fred] > > +1 > > Which reminds me of another reason to wait: coming up with the right > package hierarchy is hard. (E.g. I find network too long; plus, does > htmllib belong there?) htmllib does not go there. Where does it go? Dunno. Leave it unless/until somebody comes up with a place for it. We package up obvious ones. We don't have to design a complete hierarchy. There seemed to be a general "good feeling" around some kind of network (protocol) package. Call it "net" if "network" is too long. Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido at python.org Fri Mar 24 22:27:51 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 16:27:51 -0500 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: Your message of "Fri, 24 Mar 2000 16:21:54 EST." <14555.56434.974884.832078@anthem.cnri.reston.va.us> References: <200003242103.QAA03288@eric.cnri.reston.va.us> <14555.56434.974884.832078@anthem.cnri.reston.va.us> Message-ID: <200003242127.QAA06269@eric.cnri.reston.va.us> > >>>>> "GvR" == Guido van Rossum writes: > > GvR> Someone noticed that socket.connect() and a few related > GvR> functions (connect_ex() and bind()) take either a single > GvR> (host, port) tuple or two separate arguments, but that only > GvR> the tuple is documented. > > GvR> Similar to append(), I'd like to close this gap, and I've > GvR> made the necessary changes. This will probably break lots of > GvR> code. > > I don't agree that socket.connect() and friends need this fix. Yes, > obviously append() needed fixing because of the application of Tim's > Twelfth Enlightenment to the semantic ambiguity. But socket.connect() > has no such ambiguity; you may spell it differently, but you know > exactly what you mean. > > My suggestion would be to not break any code, but extend connect's > interface to allow an optional second argument. Thus all of these > calls would be legal: > > sock.connect(addr) > sock.connect(addr, port) > sock.connect((addr, port)) You probably meant: sock.connect(addr) sock.connect(host, port) sock.connect((host, port)) since (host, port) is equivalent to (addr). > One nit on the documentation of the socket module. The second entry > says: > > bind (address) > Bind the socket to address. The socket must not already be > bound. (The format of address depends on the address family -- > see above.) > > Huh? What "above" part should I see? Note that I'm reading this doc > off the web! Fred typically directs latex2html to break all sections apart. It's in the previous section: Socket addresses are represented as a single string for the AF_UNIX address family and as a pair (host, port) for the AF_INET address family, where host is a string representing either a hostname in Internet domain notation like 'daring.cwi.nl' or an IP address like '100.50.200.5', and port is an integral port number. Other address families are currently not supported. The address format required by a particular socket object is automatically selected based on the address family specified when the socket object was created. This also explains the reason for requiring a single argument: when using AF_UNIX, the second argument makes no sense! Frankly, I'm not sure what do here -- it's more correct to require a single address argument always, but it's more convenient to allow two sometimes. Note that sendto(data, addr) only accepts the tuple form: you cannot write sendto(data, host, port). --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Fri Mar 24 22:28:32 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 16:28:32 -0500 (EST) Subject: [Python-Dev] 1.6 job list In-Reply-To: References: <200003242111.QAA04208@eric.cnri.reston.va.us> Message-ID: <14555.56832.336242.378838@weyr.cnri.reston.va.us> Greg Stein writes: > Stuff that *you* need to do, sure. But there *are* a lot of us who can > help here, and some who desire to spend their time moving modules. Would it make sense for one of these people with time on their hands to propose a specific mapping from old->new names? I think that would be a good first step, regardless of the implementation timing. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido at python.org Fri Mar 24 22:29:44 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 16:29:44 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: Your message of "Fri, 24 Mar 2000 13:27:57 PST." References: Message-ID: <200003242129.QAA06510@eric.cnri.reston.va.us> > We don't add it to the path. Instead, we create new modules that look > like: > > ---- httplib.py ---- > from web.httplib import * > ---- > > The only backwards-compat issue with this approach is that people who poke > values into the module will have problems. I don't believe that any of the > modules were designed for that, anyhow, so it would seem an acceptable to > (effectively) disallow that behavior. OK, that's reasonable. I'll have to invent a different reason why I don't want this -- because I really don't! > > Plus, it's too much work -- I'd rather focus on getting 1.6 out of the > > door, and there's a lot of other stuff I need to do besides moving > > modules around. > > Stuff that *you* need to do, sure. But there *are* a lot of us who can > help here, and some who desire to spend their time moving modules. Hm. Moving modules requires painful and arcane CVS manipulations that can only be done by the few of us here at CNRI -- and I'm the only one left who's full time on Python. I'm still not convinced that it's a good plan. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Fri Mar 24 22:32:39 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 16:32:39 -0500 (EST) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: <14555.56434.974884.832078@anthem.cnri.reston.va.us> References: <200003242103.QAA03288@eric.cnri.reston.va.us> <14555.56434.974884.832078@anthem.cnri.reston.va.us> Message-ID: <14555.57079.187670.916002@weyr.cnri.reston.va.us> Barry A. Warsaw writes: > I don't agree that socket.connect() and friends need this fix. Yes, > obviously append() needed fixing because of the application of Tim's > Twelfth Enlightenment to the semantic ambiguity. But socket.connect() > has no such ambiguity; you may spell it differently, but you know > exactly what you mean. Crock. The address representations have been fairly well defined for quite a while. Be explicit. > sock.connect(addr) This is the only legal signature. (host, port) is simply the form of addr for a particular address family. > One nit on the documentation of the socket module. The second entry > says: > > bind (address) > Bind the socket to address. The socket must not already be > bound. (The format of address depends on the address family -- > see above.) > > Huh? What "above" part should I see? Note that I'm reading this doc > off the web! Definately written for the paper document! Remind me about this again in a month and I'll fix it, but I don't want to play games with this little stuff until the 1.5.2p2 and 1.6 trees have been merged. Harrumph. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gstein at lyra.org Fri Mar 24 22:37:41 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 13:37:41 -0800 (PST) Subject: [Python-Dev] delegating (was: 1.6 job list) In-Reply-To: Message-ID: On Fri, 24 Mar 2000, Greg Stein wrote: >... > > Plus, it's too much work -- I'd rather focus on getting 1.6 out of the > > door, and there's a lot of other stuff I need to do besides moving > > modules around. > > Stuff that *you* need to do, sure. But there *are* a lot of us who can > help here, and some who desire to spend their time moving modules. I just want to empahisize this point some more. Python 1.6 has a defined timeline, with a defined set of minimal requirements. However! I don't believe that a corollary of that says we MUST ignore everything else. If those other options fit within the required timeline, then why not? (assuming we have adequate testing and doc to go with the changes) There are ample people who have time and inclination to contribute. If those contributions add positive benefit, then I see no reason to exclude them (other than on pure merit, of course). Note that some of the problems stem from CVS access. Much Guido-time could be saved by a commit-then-review model, rather than review-then-Guido- commits model. Fred does this very well with the Doc/ area. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Mar 24 22:38:48 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 13:38:48 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Guido van Rossum wrote: >... > > We don't add it to the path. Instead, we create new modules that look > > like: > > > > ---- httplib.py ---- > > from web.httplib import * > > ---- > > > > The only backwards-compat issue with this approach is that people who poke > > values into the module will have problems. I don't believe that any of the > > modules were designed for that, anyhow, so it would seem an acceptable to > > (effectively) disallow that behavior. > > OK, that's reasonable. I'll have to invent a different reason why I > don't want this -- because I really don't! Fair enough. > > > Plus, it's too much work -- I'd rather focus on getting 1.6 out of the > > > door, and there's a lot of other stuff I need to do besides moving > > > modules around. > > > > Stuff that *you* need to do, sure. But there *are* a lot of us who can > > help here, and some who desire to spend their time moving modules. > > Hm. Moving modules requires painful and arcane CVS manipulations that > can only be done by the few of us here at CNRI -- and I'm the only one > left who's full time on Python. I'm still not convinced that it's a > good plan. There are a number of ways to do this, and I'm familiar with all of them. It is a continuing point of strife in the Apache CVS repositories :-) But... it is premised on accepting the desire to move them, of course. Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido at python.org Fri Mar 24 22:38:51 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 16:38:51 -0500 Subject: [Python-Dev] delegating (was: 1.6 job list) In-Reply-To: Your message of "Fri, 24 Mar 2000 13:37:41 PST." References: Message-ID: <200003242138.QAA07621@eric.cnri.reston.va.us> > Note that some of the problems stem from CVS access. Much Guido-time could > be saved by a commit-then-review model, rather than review-then-Guido- > commits model. Fred does this very well with the Doc/ area. Actually, I'm experimenting with this already: Unicode, list.append() and socket.connect() are done in this way! For renames it is really painful though, even if someone else at CNRI can do it. I'd like to see a draft package hierarchy please? Also, if you have some time, please review the bugs in the bugs list. Patches submitted with a corresponding PR# will be treated with priority! --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Fri Mar 24 22:40:48 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 22:40:48 +0100 Subject: [Python-Dev] Unicode Patch Set 2000-03-24 Message-ID: <38DBE0E0.76A298FE@lemburg.com> Attached you find the latest update of the Unicode implementation. The patch is against the current CVS version. It includes the fix I posted yesterday for the core dump problem in codecs.c (was introduced by my previous patch set -- sorry), adds more tests for the codecs and two new parser markers "es" and "es#". -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ -------------- next part -------------- Only in CVS-Python/Doc/tools: anno-api.py diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/codecs.py Python+Unicode/Lib/codecs.py --- CVS-Python/Lib/codecs.py Thu Mar 23 23:58:41 2000 +++ Python+Unicode/Lib/codecs.py Fri Mar 17 23:51:01 2000 @@ -46,7 +46,7 @@ handling schemes by providing the errors argument. These string values are defined: - 'strict' - raise an error (or a subclass) + 'strict' - raise a ValueError error (or a subclass) 'ignore' - ignore the character and continue with the next 'replace' - replace with a suitable replacement character; Python will use the official U+FFFD REPLACEMENT diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/output/test_unicode Python+Unicode/Lib/test/output/test_unicode --- CVS-Python/Lib/test/output/test_unicode Fri Mar 24 22:21:26 2000 +++ Python+Unicode/Lib/test/output/test_unicode Sat Mar 11 00:23:21 2000 @@ -1,5 +1,4 @@ test_unicode Testing Unicode comparisons... done. -Testing Unicode contains method... done. Testing Unicode formatting strings... done. Testing unicodedata module... done. diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_unicode.py Python+Unicode/Lib/test/test_unicode.py --- CVS-Python/Lib/test/test_unicode.py Thu Mar 23 23:58:47 2000 +++ Python+Unicode/Lib/test/test_unicode.py Fri Mar 24 00:29:43 2000 @@ -293,3 +293,33 @@ assert unicodedata.combining(u'\u20e1') == 230 print 'done.' + +# Test builtin codecs +print 'Testing builtin codecs...', + +assert unicode('hello','ascii') == u'hello' +assert unicode('hello','utf-8') == u'hello' +assert unicode('hello','utf8') == u'hello' +assert unicode('hello','latin-1') == u'hello' + +assert u'hello'.encode('ascii') == 'hello' +assert u'hello'.encode('utf-8') == 'hello' +assert u'hello'.encode('utf8') == 'hello' +assert u'hello'.encode('utf-16-le') == 'h\000e\000l\000l\000o\000' +assert u'hello'.encode('utf-16-be') == '\000h\000e\000l\000l\000o' +assert u'hello'.encode('latin-1') == 'hello' + +u = u''.join(map(unichr, range(1024))) +for encoding in ('utf-8', 'utf-16', 'utf-16-le', 'utf-16-be', + 'raw_unicode_escape', 'unicode_escape', 'unicode_internal'): + assert unicode(u.encode(encoding),encoding) == u + +u = u''.join(map(unichr, range(256))) +for encoding in ('latin-1',): + assert unicode(u.encode(encoding),encoding) == u + +u = u''.join(map(unichr, range(128))) +for encoding in ('ascii',): + assert unicode(u.encode(encoding),encoding) == u + +print 'done.' diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Misc/unicode.txt Python+Unicode/Misc/unicode.txt --- CVS-Python/Misc/unicode.txt Thu Mar 23 23:58:48 2000 +++ Python+Unicode/Misc/unicode.txt Fri Mar 24 22:29:35 2000 @@ -715,21 +715,126 @@ These markers are used by the PyArg_ParseTuple() APIs: - 'U': Check for Unicode object and return a pointer to it + "U": Check for Unicode object and return a pointer to it - 's': For Unicode objects: auto convert them to the + "s": For Unicode objects: auto convert them to the and return a pointer to the object's buffer. - 's#': Access to the Unicode object via the bf_getreadbuf buffer interface + "s#": Access to the Unicode object via the bf_getreadbuf buffer interface (see Buffer Interface); note that the length relates to the buffer length, not the Unicode string length (this may be different depending on the Internal Format). - 't#': Access to the Unicode object via the bf_getcharbuf buffer interface + "t#": Access to the Unicode object via the bf_getcharbuf buffer interface (see Buffer Interface); note that the length relates to the buffer length, not necessarily to the Unicode string length (this may be different depending on the ). + "es": + Takes two parameters: encoding (const char *) and + buffer (char **). + + The input object is first coerced to Unicode in the usual way + and then encoded into a string using the given encoding. + + On output, a buffer of the needed size is allocated and + returned through *buffer as NULL-terminated string. + The encoded may not contain embedded NULL characters. + The caller is responsible for free()ing the allocated *buffer + after usage. + + "es#": + Takes three parameters: encoding (const char *), + buffer (char **) and buffer_len (int *). + + The input object is first coerced to Unicode in the usual way + and then encoded into a string using the given encoding. + + If *buffer is non-NULL, *buffer_len must be set to sizeof(buffer) + on input. Output is then copied to *buffer. + + If *buffer is NULL, a buffer of the needed size is + allocated and output copied into it. *buffer is then + updated to point to the allocated memory area. The caller + is responsible for free()ing *buffer after usage. + + In both cases *buffer_len is updated to the number of + characters written (excluding the trailing NULL-byte). + The output buffer is assured to be NULL-terminated. + +Examples: + +Using "es#" with auto-allocation: + + static PyObject * + test_parser(PyObject *self, + PyObject *args) + { + PyObject *str; + const char *encoding = "latin-1"; + char *buffer = NULL; + int buffer_len = 0; + + if (!PyArg_ParseTuple(args, "es#:test_parser", + encoding, &buffer, &buffer_len)) + return NULL; + if (!buffer) { + PyErr_SetString(PyExc_SystemError, + "buffer is NULL"); + return NULL; + } + str = PyString_FromStringAndSize(buffer, buffer_len); + free(buffer); + return str; + } + +Using "es" with auto-allocation returning a NULL-terminated string: + + static PyObject * + test_parser(PyObject *self, + PyObject *args) + { + PyObject *str; + const char *encoding = "latin-1"; + char *buffer = NULL; + + if (!PyArg_ParseTuple(args, "es:test_parser", + encoding, &buffer)) + return NULL; + if (!buffer) { + PyErr_SetString(PyExc_SystemError, + "buffer is NULL"); + return NULL; + } + str = PyString_FromString(buffer); + free(buffer); + return str; + } + +Using "es#" with a pre-allocated buffer: + + static PyObject * + test_parser(PyObject *self, + PyObject *args) + { + PyObject *str; + const char *encoding = "latin-1"; + char _buffer[10]; + char *buffer = _buffer; + int buffer_len = sizeof(_buffer); + + if (!PyArg_ParseTuple(args, "es#:test_parser", + encoding, &buffer, &buffer_len)) + return NULL; + if (!buffer) { + PyErr_SetString(PyExc_SystemError, + "buffer is NULL"); + return NULL; + } + str = PyString_FromStringAndSize(buffer, buffer_len); + return str; + } + File/Stream Output: ------------------- @@ -837,6 +942,7 @@ History of this Proposal: ------------------------- +1.3: Added new "es" and "es#" parser markers 1.2: Removed POD about codecs.open() 1.1: Added note about comparisons and hash values. Added note about case mapping algorithms. Changed stream codecs .read() and Only in CVS-Python/Objects: .#stringobject.c.2.59 Only in CVS-Python/Objects: stringobject.c.orig diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Python/getargs.c Python+Unicode/Python/getargs.c --- CVS-Python/Python/getargs.c Sat Mar 11 10:55:21 2000 +++ Python+Unicode/Python/getargs.c Fri Mar 24 20:22:26 2000 @@ -178,6 +178,8 @@ } else if (level != 0) ; /* Pass */ + else if (c == 'e') + ; /* Pass */ else if (isalpha(c)) max++; else if (c == '|') @@ -654,6 +656,122 @@ break; } + case 'e': /* encoded string */ + { + char **buffer; + const char *encoding; + PyObject *u, *s; + int size; + + /* Get 'e' parameter: the encoding name */ + encoding = (const char *)va_arg(*p_va, const char *); + if (encoding == NULL) + return "(encoding is NULL)"; + + /* Get 's' parameter: the output buffer to use */ + if (*format != 's') + return "(unkown parser marker combination)"; + buffer = (char **)va_arg(*p_va, char **); + format++; + if (buffer == NULL) + return "(buffer is NULL)"; + + /* Convert object to Unicode */ + u = PyUnicode_FromObject(arg); + if (u == NULL) + return "string, unicode or text buffer"; + + /* Encode object; use default error handling */ + s = PyUnicode_AsEncodedString(u, + encoding, + NULL); + Py_DECREF(u); + if (s == NULL) + return "(encoding failed)"; + if (!PyString_Check(s)) { + Py_DECREF(s); + return "(encoder failed to return a string)"; + } + size = PyString_GET_SIZE(s); + + /* Write output; output is guaranteed to be + 0-terminated */ + if (*format == '#') { + /* Using buffer length parameter '#': + + - if *buffer is NULL, a new buffer + of the needed size is allocated and + the data copied into it; *buffer is + updated to point to the new buffer; + the caller is responsible for + free()ing it after usage + + - if *buffer is not NULL, the data + is copied to *buffer; *buffer_len + has to be set to the size of the + buffer on input; buffer overflow is + signalled with an error; buffer has + to provide enough room for the + encoded string plus the trailing + 0-byte + + - in both cases, *buffer_len is + updated to the size of the buffer + /excluding/ the trailing 0-byte + + */ + int *buffer_len = va_arg(*p_va, int *); + + format++; + if (buffer_len == NULL) + return "(buffer_len is NULL)"; + if (*buffer == NULL) { + *buffer = PyMem_NEW(char, size + 1); + if (*buffer == NULL) { + Py_DECREF(s); + return "(memory error)"; + } + } else { + if (size + 1 > *buffer_len) { + Py_DECREF(s); + return "(buffer overflow)"; + } + } + memcpy(*buffer, + PyString_AS_STRING(s), + size + 1); + *buffer_len = size; + } else { + /* Using a 0-terminated buffer: + + - the encoded string has to be + 0-terminated for this variant to + work; if it is not, an error raised + + - a new buffer of the needed size + is allocated and the data copied + into it; *buffer is updated to + point to the new buffer; the caller + is responsible for free()ing it + after usage + + */ + if (strlen(PyString_AS_STRING(s)) != size) + return "(encoded string without "\ + "NULL bytes)"; + *buffer = PyMem_NEW(char, size + 1); + if (*buffer == NULL) { + Py_DECREF(s); + return "(memory error)"; + } + memcpy(*buffer, + PyString_AS_STRING(s), + size + 1); + } + Py_DECREF(s); + break; + } + case 'S': /* string object */ { PyObject **p = va_arg(*p_va, PyObject **); From fdrake at acm.org Fri Mar 24 22:40:38 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 16:40:38 -0500 (EST) Subject: [Python-Dev] delegating (was: 1.6 job list) In-Reply-To: References: Message-ID: <14555.57558.939236.363358@weyr.cnri.reston.va.us> Greg Stein writes: > Note that some of the problems stem from CVS access. Much Guido-time could > be saved by a commit-then-review model, rather than review-then-Guido- This is a non-problem; I'm willing to do the arcane CVS manipulations if the issue is Guido's time. What I will *not* do is do it piecemeal without a cohesive plan that Guido approves of at least 95%, and I'll be really careful to do that last 5% when he's not in the office. ;) > commits model. Fred does this very well with the Doc/ area. Thanks for the vote of confidence! The model that I use for the Doc/ area is more like "Fred reviews, Fred commits, and Guido can read it on python.org like everyone else." Works for me! ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From bwarsaw at cnri.reston.va.us Fri Mar 24 22:45:38 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 24 Mar 2000 16:45:38 -0500 (EST) Subject: [Python-Dev] 1.6 job list References: <200003242115.QAA04648@eric.cnri.reston.va.us> Message-ID: <14555.57858.824301.693390@anthem.cnri.reston.va.us> One thing you can definitely do now which breaks no code: propose a package hierarchy for the standard library. From akuchlin at mems-exchange.org Fri Mar 24 22:46:28 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Fri, 24 Mar 2000 16:46:28 -0500 (EST) Subject: [Python-Dev] Unicode charnames impl. In-Reply-To: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> Message-ID: <14555.57908.151946.182639@amarok.cnri.reston.va.us> Here's a strawman codec for doing the \N{NULL} thing. Questions: 0) Is the code below correct? 1) What the heck would this encoding be called? 2) What does .encode() do? (Right now it escapes \N as \N{BACKSLASH}N.) 3) How can we store all those names? The resulting dictionary makes a 361K .py file; Python dumps core trying to parse it. (Another bug...) 4) What do you with the error \N{...... no closing right bracket. Right now it stops at that point, and never advances any farther. Maybe it should assume it's an error if there's no } within the next 200 chars or some similar limit? 5) Do we need StreamReader/Writer classes, too? I've also add a script that parses the names out of the NameList.txt file at ftp://ftp.unicode.org/Public/UNIDATA/. --amk namecodec.py: ============= import codecs #from _namedict import namedict namedict = {'NULL': 0, 'START OF HEADING' : 1, 'BACKSLASH':ord('\\')} class NameCodec(codecs.Codec): def encode(self,input,errors='strict'): # XXX what should this do? Escape the # sequence \N as '\N{BACKSLASH}N'? return input.replace( '\\N', '\\N{BACKSLASH}N' ) def decode(self,input,errors='strict'): output = unicode("") last = 0 index = input.find( u'\\N{' ) while index != -1: output = output + unicode( input[last:index] ) used = index r_bracket = input.find( '}', index) if r_bracket == -1: # No closing bracket; bail out... break name = input[index + 3 : r_bracket] code = namedict.get( name ) if code is not None: output = output + unichr(code) elif errors == 'strict': raise ValueError, 'Unknown character name %s' % repr(name) elif errors == 'ignore': pass elif errors == 'replace': output = output + unichr( 0xFFFD ) last = r_bracket + 1 index = input.find( '\\N{', last) else: # Finally failed gently, no longer finding a \N{... output = output + unicode( input[last:] ) return len(input), output # Otherwise, we hit the break for an unterminated \N{...} return index, output if __name__ == '__main__': c = NameCodec() for s in [ r'b\lah blah \N{NULL} asdf', r'b\l\N{START OF HEADING}\N{NU' ]: used, s2 = c.decode(s) print repr( s2 ) s3 = c.encode(s) _, s4 = c.decode(s3) print repr(s3) assert s4 == s print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='replace' )) print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='ignore' )) makenamelist.py =============== # Hack to extract character names from NamesList.txt # Output the repr() of the resulting dictionary import re, sys, string namedict = {} while 1: L = sys.stdin.readline() if L == "": break m = re.match('([0-9a-fA-F]){4}(?:\t(.*)\s*)', L) if m is not None: last_char = int(m.group(1), 16) if m.group(2) is not None: name = string.upper( m.group(2) ) if name not in ['', '']: namedict[ name ] = last_char # print name, last_char m = re.match('\t=\s*(.*)\s*(;.*)?', L) if m is not None: name = string.upper( m.group(1) ) names = string.split(name, ',') names = map(string.strip, names) for n in names: namedict[ n ] = last_char # print n, last_char # XXX and do what with this dictionary? print namedict From mal at lemburg.com Fri Mar 24 22:50:19 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 22:50:19 +0100 Subject: [Python-Dev] Unicode Patch Set 2000-03-24 References: <38DBE0E0.76A298FE@lemburg.com> Message-ID: <38DBE31B.BCB342CA@lemburg.com> Oops, sorry, the patch file wasn't supposed to go to python-dev. Anyway, Greg's wish is included in there and MarkH should be happy now -- at least I hope he his ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Jasbahr at origin.EA.com Fri Mar 24 22:49:35 2000 From: Jasbahr at origin.EA.com (Asbahr, Jason) Date: Fri, 24 Mar 2000 15:49:35 -0600 Subject: [Python-Dev] Memory Management Message-ID: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com> Greetings! We're working on integrating our own memory manager into our project and the current challenge is figuring out how to make it play nice with Python (and SWIG). The approach we're currently taking is to patch 1.5.2 and augment the PyMem* macros to call external memory allocation functions that we provide. The idea is to easily allow the addition of third party memory management facilities to Python. Assuming 1) we get it working :-), and 2) we sync to the latest Python CVS and patch that, would this be a useful patch to give back to the community? Has anyone run up against this before? Thanks, Jason Asbahr Origin Systems, Inc. jasbahr at origin.ea.com From bwarsaw at cnri.reston.va.us Fri Mar 24 22:53:01 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Fri, 24 Mar 2000 16:53:01 -0500 (EST) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead References: <200003242103.QAA03288@eric.cnri.reston.va.us> <14555.56434.974884.832078@anthem.cnri.reston.va.us> <200003242127.QAA06269@eric.cnri.reston.va.us> Message-ID: <14555.58301.790774.159381@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> You probably meant: | sock.connect(addr) | sock.connect(host, port) | sock.connect((host, port)) GvR> since (host, port) is equivalent to (addr). Doh, yes. :) GvR> Fred typically directs latex2html to break all sections GvR> apart. It's in the previous section: I know, I was being purposefully dense for effect :) Fred, is there some way to make the html contain a link to the previous section for the "see above" text? That would solve the problem I think. GvR> This also explains the reason for requiring a single GvR> argument: when using AF_UNIX, the second argument makes no GvR> sense! GvR> Frankly, I'm not sure what do here -- it's more correct to GvR> require a single address argument always, but it's more GvR> convenient to allow two sometimes. GvR> Note that sendto(data, addr) only accepts the tuple form: you GvR> cannot write sendto(data, host, port). Hmm, that /does/ complicate things -- it makes explaining the API more difficult. Still, in this case I think I'd lean toward liberal acceptance of input parameters. :) -Barry From bwarsaw at cnri.reston.va.us Fri Mar 24 22:57:01 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 24 Mar 2000 16:57:01 -0500 (EST) Subject: [Python-Dev] 1.6 job list References: <200003242129.QAA06510@eric.cnri.reston.va.us> Message-ID: <14555.58541.207868.496747@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> OK, that's reasonable. I'll have to invent a different GvR> reason why I don't want this -- because I really don't! Tim's Fifth Enlightenment is all the reason you'd need, /if/ you can't be persuaded to change your mind :) -Barry From fdrake at acm.org Fri Mar 24 23:10:41 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 17:10:41 -0500 (EST) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: <14555.58301.790774.159381@anthem.cnri.reston.va.us> References: <200003242103.QAA03288@eric.cnri.reston.va.us> <14555.56434.974884.832078@anthem.cnri.reston.va.us> <200003242127.QAA06269@eric.cnri.reston.va.us> <14555.58301.790774.159381@anthem.cnri.reston.va.us> Message-ID: <14555.59361.460705.258859@weyr.cnri.reston.va.us> bwarsaw at cnri.reston.va.us writes: > I know, I was being purposefully dense for effect :) Fred, is there > some way to make the html contain a link to the previous section for > the "see above" text? That would solve the problem I think. No. I expect this to no longer be a problem when we push to SGML/XML, so I won't waste any time hacking around it. On the other hand, lots of places in the documentation refer to "above" and "below" in the traditional sense used in paper documents, and that doesn't work well for hypertext, even in the strongly traditional book-derivation way the Python manuals are done. As soon as it's not in the same HTML file, "above" and "below" break for a lot of people. So it still should be adjusted at an appropriate time. > Hmm, that /does/ complicate things -- it makes explaining the API more > difficult. Still, in this case I think I'd lean toward liberal > acceptance of input parameters. :) No -- all the more reason to be strict and keep the descriptions as simple as reasonable. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido at python.org Fri Mar 24 23:10:32 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 17:10:32 -0500 Subject: [Python-Dev] Memory Management In-Reply-To: Your message of "Fri, 24 Mar 2000 15:49:35 CST." <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com> References: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com> Message-ID: <200003242210.RAA11434@eric.cnri.reston.va.us> > We're working on integrating our own memory manager into our project > and the current challenge is figuring out how to make it play nice > with Python (and SWIG). The approach we're currently taking is to > patch 1.5.2 and augment the PyMem* macros to call external memory > allocation functions that we provide. The idea is to easily allow > the addition of third party memory management facilities to Python. > Assuming 1) we get it working :-), and 2) we sync to the latest Python > CVS and patch that, would this be a useful patch to give back to the > community? Has anyone run up against this before? Check out the archives for patches at python.org looking for posts by Vladimir Marangozov. Vladimir has produced several rounds of patches with a very similar goal in mind. We're still working out some details -- but it shouldn't be too long, and I hope that his patches are also suitable for you. If not, discussion is required! --Guido van Rossum (home page: http://www.python.org/~guido/) From bwarsaw at cnri.reston.va.us Fri Mar 24 23:12:35 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Fri, 24 Mar 2000 17:12:35 -0500 (EST) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead References: <200003242103.QAA03288@eric.cnri.reston.va.us> <14555.56434.974884.832078@anthem.cnri.reston.va.us> <200003242127.QAA06269@eric.cnri.reston.va.us> <14555.58301.790774.159381@anthem.cnri.reston.va.us> <14555.59361.460705.258859@weyr.cnri.reston.va.us> Message-ID: <14555.59475.802130.434345@anthem.cnri.reston.va.us> >>>>> "Fred" == Fred L Drake, Jr writes: Fred> No -- all the more reason to be strict and keep the Fred> descriptions as simple as reasonable. At the expense of (IMO unnecessarily) breaking existing code? From mal at lemburg.com Fri Mar 24 23:13:04 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 23:13:04 +0100 Subject: [Python-Dev] Unicode charnames impl. References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> <14555.57908.151946.182639@amarok.cnri.reston.va.us> Message-ID: <38DBE870.D88915B5@lemburg.com> "Andrew M. Kuchling" wrote: > > Here's a strawman codec for doing the \N{NULL} thing. Questions: > > 0) Is the code below correct? Some comments below. > 1) What the heck would this encoding be called? Ehm, 'unicode-with-smileys' I guess... after all that's what motivated the thread ;-) Seriously, I'd go with 'unicode-named'. You can then stack it on top of 'unicode-escape' and get the best of both worlds... > 2) What does .encode() do? (Right now it escapes \N as > \N{BACKSLASH}N.) .encode() should translate Unicode to a string. Since the named char thing is probably only useful on input, I'd say: don't do anything, except maybe return input.encode('unicode-escape'). > 3) How can we store all those names? The resulting dictionary makes a > 361K .py file; Python dumps core trying to parse it. (Another bug...) I've made the same experience with the large Unicode mapping tables... the trick is to split the dictionary definition in chunks and then use dict.update() to paste them together again. > 4) What do you with the error \N{...... no closing right bracket. > Right now it stops at that point, and never advances any farther. > Maybe it should assume it's an error if there's no } within the > next 200 chars or some similar limit? I'd suggest to take the upper bound of all Unicode name lengths as limit. > 5) Do we need StreamReader/Writer classes, too? If you plan to have it registered with a codec search function, yes. No big deal though, because you can use the Codec class as basis for them: class StreamWriter(Codec,codecs.StreamWriter): pass class StreamReader(Codec,codecs.StreamReader): pass ### encodings module API def getregentry(): return (Codec().encode,Codec().decode,StreamReader,StreamWriter) Then call drop the scripts into the encodings package dir and it should be useable via unicode(r'\N{SMILEY}','unicode-named') and u":-)".encode('unicode-named'). > I've also add a script that parses the names out of the NameList.txt > file at ftp://ftp.unicode.org/Public/UNIDATA/. > > --amk > > namecodec.py: > ============= > > import codecs > > #from _namedict import namedict > namedict = {'NULL': 0, 'START OF HEADING' : 1, > 'BACKSLASH':ord('\\')} > > class NameCodec(codecs.Codec): > def encode(self,input,errors='strict'): > # XXX what should this do? Escape the > # sequence \N as '\N{BACKSLASH}N'? > return input.replace( '\\N', '\\N{BACKSLASH}N' ) You should return a string on output... input will be a Unicode object and the return value too if you don't add e.g. an .encode('unicode-escape'). > def decode(self,input,errors='strict'): > output = unicode("") > last = 0 > index = input.find( u'\\N{' ) > while index != -1: > output = output + unicode( input[last:index] ) > used = index > r_bracket = input.find( '}', index) > if r_bracket == -1: > # No closing bracket; bail out... > break > > name = input[index + 3 : r_bracket] > code = namedict.get( name ) > if code is not None: > output = output + unichr(code) > elif errors == 'strict': > raise ValueError, 'Unknown character name %s' % repr(name) This could also be UnicodeError (its a subclass of ValueError). > elif errors == 'ignore': pass > elif errors == 'replace': > output = output + unichr( 0xFFFD ) '\uFFFD' would save a call. > last = r_bracket + 1 > index = input.find( '\\N{', last) > else: > # Finally failed gently, no longer finding a \N{... > output = output + unicode( input[last:] ) > return len(input), output > > # Otherwise, we hit the break for an unterminated \N{...} > return index, output Note that .decode() must only return the decoded data. The "bytes read" integer was removed in order to make the Codec APIs compatible with the standard file object APIs. > if __name__ == '__main__': > c = NameCodec() > for s in [ r'b\lah blah \N{NULL} asdf', > r'b\l\N{START OF HEADING}\N{NU' ]: > used, s2 = c.decode(s) > print repr( s2 ) > > s3 = c.encode(s) > _, s4 = c.decode(s3) > print repr(s3) > assert s4 == s > > print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='replace' )) > print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='ignore' )) > > makenamelist.py > =============== > > # Hack to extract character names from NamesList.txt > # Output the repr() of the resulting dictionary > > import re, sys, string > > namedict = {} > > while 1: > L = sys.stdin.readline() > if L == "": break > > m = re.match('([0-9a-fA-F]){4}(?:\t(.*)\s*)', L) > if m is not None: > last_char = int(m.group(1), 16) > if m.group(2) is not None: > name = string.upper( m.group(2) ) > if name not in ['', > '']: > namedict[ name ] = last_char > # print name, last_char > > m = re.match('\t=\s*(.*)\s*(;.*)?', L) > if m is not None: > name = string.upper( m.group(1) ) > names = string.split(name, ',') > names = map(string.strip, names) > for n in names: > namedict[ n ] = last_char > # print n, last_char > > # XXX and do what with this dictionary? > print namedict > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://www.python.org/mailman/listinfo/python-dev -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake at acm.org Fri Mar 24 23:12:42 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 17:12:42 -0500 (EST) Subject: [Python-Dev] Memory Management In-Reply-To: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com> References: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com> Message-ID: <14555.59482.61317.992089@weyr.cnri.reston.va.us> Asbahr, Jason writes: > community? Has anyone run up against this before? You should talk to Vladimir Marangozov; he's done a fair bit of work dealing with memory management in Python. You probably want to read the chapter he contributed to the Python/C API document for the release earlier this week. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From skip at mojam.com Fri Mar 24 23:19:50 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 24 Mar 2000 16:19:50 -0600 (CST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242115.QAA04648@eric.cnri.reston.va.us> References: <14555.54636.811100.254309@amarok.cnri.reston.va.us> <14555.55139.484135.602894@weyr.cnri.reston.va.us> <200003242115.QAA04648@eric.cnri.reston.va.us> Message-ID: <14555.59910.631130.241930@beluga.mojam.com> Guido> Which reminds me of another reason to wait: coming up with the Guido> right package hierarchy is hard. (E.g. I find network too long; Guido> plus, does htmllib belong there?) Ah, another topic for python-dev. Even if we can't do the packaging right away, we should be able to hash out the structure. Skip From guido at python.org Fri Mar 24 23:25:01 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 17:25:01 -0500 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: Your message of "Fri, 24 Mar 2000 17:10:41 EST." <14555.59361.460705.258859@weyr.cnri.reston.va.us> References: <200003242103.QAA03288@eric.cnri.reston.va.us> <14555.56434.974884.832078@anthem.cnri.reston.va.us> <200003242127.QAA06269@eric.cnri.reston.va.us> <14555.58301.790774.159381@anthem.cnri.reston.va.us> <14555.59361.460705.258859@weyr.cnri.reston.va.us> Message-ID: <200003242225.RAA13408@eric.cnri.reston.va.us> > bwarsaw at cnri.reston.va.us writes: > > I know, I was being purposefully dense for effect :) Fred, is there > > some way to make the html contain a link to the previous section for > > the "see above" text? That would solve the problem I think. [Fred] > No. I expect this to no longer be a problem when we push to > SGML/XML, so I won't waste any time hacking around it. > On the other hand, lots of places in the documentation refer to > "above" and "below" in the traditional sense used in paper documents, > and that doesn't work well for hypertext, even in the strongly > traditional book-derivation way the Python manuals are done. As soon > as it's not in the same HTML file, "above" and "below" break for a lot > of people. So it still should be adjusted at an appropriate time. My approach to this: put more stuff on the same page! I personally favor putting an entire chapter on one page; even if you split the top-level subsections this wouldn't have happened. --Guido van Rossum (home page: http://www.python.org/~guido/) From klm at digicool.com Fri Mar 24 23:40:54 2000 From: klm at digicool.com (Ken Manheimer) Date: Fri, 24 Mar 2000 17:40:54 -0500 (EST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us> Message-ID: Guido wrote: > OK, that's reasonable. I'll have to invent a different reason why I > don't want this -- because I really don't! I'm glad this organize-the-library-in-packages initiative seems to be moving towards concentrating on the organization, rather than just starting to put obvious things in the obvious places. Personally, i *crave* sensible, discoverable organization. The only thing i like less than complicated disorganization is complicated misorganization - and i think that just diving in and doing the "obvious" placements would have the terrible effect of making it harder, not easier, to move eventually to the right arrangement. Ken klm at digicool.com From akuchlin at mems-exchange.org Fri Mar 24 23:45:20 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Fri, 24 Mar 2000 17:45:20 -0500 (EST) Subject: [Python-Dev] Unicode charnames impl. In-Reply-To: <38DBE870.D88915B5@lemburg.com> References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> <14555.57908.151946.182639@amarok.cnri.reston.va.us> <38DBE870.D88915B5@lemburg.com> Message-ID: <14555.61440.613940.50492@amarok.cnri.reston.va.us> M.-A. Lemburg writes: >.encode() should translate Unicode to a string. Since the >named char thing is probably only useful on input, I'd say: >don't do anything, except maybe return input.encode('unicode-escape'). Wait... then you can't stack it on top of unicode-escape, because it would already be Unicode escaped. >> 4) What do you with the error \N{...... no closing right bracket. >I'd suggest to take the upper bound of all Unicode name >lengths as limit. Seems like a hack. >Note that .decode() must only return the decoded data. >The "bytes read" integer was removed in order to make >the Codec APIs compatible with the standard file object >APIs. Huh? Why does Misc/unicode.txt describe decode() as "Decodes the object input and returns a tuple (output object, length consumed)"? Or are you talking about a different .decode() method? -- A.M. Kuchling http://starship.python.net/crew/amk/ "Ruby's dead?" "Yes." "Ah me. That's the trouble with mortals. They do that. Not to worry, eh?" -- Dream and Pharamond, in SANDMAN #46: "Brief Lives:6" From gmcm at hypernet.com Fri Mar 24 23:50:12 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Fri, 24 Mar 2000 17:50:12 -0500 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: <200003242103.QAA03288@eric.cnri.reston.va.us> Message-ID: <1258184279-6957124@hypernet.com> [Guido] > Someone noticed that socket.connect() and a few related functions > (connect_ex() and bind()) take either a single (host, port) tuple or > two separate arguments, but that only the tuple is documented. > > Similar to append(), I'd like to close this gap, and I've made the > necessary changes. This will probably break lots of code. This will indeed cause great wailing and gnashing of teeth. I've been criticized for using the tuple form in the Sockets HOWTO (in fact I foolishly changed it to demonstrate both forms). > Similar to append(), I'd like people to fix their code rather than > whine -- two-arg connect() has never been documented, although it's > found in much code (even the socket module test code :-( ). > > Similar to append(), I may revert the change if it is shown to cause > too much pain during beta testing... I say give 'em something to whine about. put-sand-in-the-vaseline-ly y'rs - Gordon From klm at digicool.com Fri Mar 24 23:55:43 2000 From: klm at digicool.com (Ken Manheimer) Date: Fri, 24 Mar 2000 17:55:43 -0500 (EST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <14555.58541.207868.496747@anthem.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Barry A. Warsaw wrote: > > >>>>> "GvR" == Guido van Rossum writes: > > GvR> OK, that's reasonable. I'll have to invent a different > GvR> reason why I don't want this -- because I really don't! > > Tim's Fifth Enlightenment is all the reason you'd need, /if/ you can't > be persuaded to change your mind :) Maybe i'm just a slave to my organization mania, but i'd suggest the following order change of 5 and 6, plus an addition; from: 5 now: Flat is better than nested. 6 now: Sparse is better than dense. to: 5 Sparse is better than dense. 6 Flat is better than nested 6.5 until it gets too dense. or-is-it-me-that-gets-too-dense'ly yrs, ken klm at digicool.com (And couldn't the humor page get hooked up a bit better? That was definitely a fun part of maintaining python.org...) From gstein at lyra.org Sat Mar 25 02:19:18 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 17:19:18 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <14555.57858.824301.693390@anthem.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Barry A. Warsaw wrote: > One thing you can definitely do now which breaks no code: propose a > package hierarchy for the standard library. I already did! http://www.python.org/pipermail/python-dev/2000-February/003761.html *grumble* -g -- Greg Stein, http://www.lyra.org/ From tim_one at email.msn.com Sat Mar 25 05:19:33 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 24 Mar 2000 23:19:33 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us> Message-ID: <001001bf9611$52e960a0$752d153f@tim> [GregS proposes a partial packaging of std modules for 1.6, Guido objects on spurious grounds, GregS refutes that, Guido agrees] > I'll have to invent a different reason why I don't want this -- because > I really don't! This one's easy! It's why I left the 20th of the 20 Pythonic Theses for you to fill in . All you have to do now is come up with a pithy way to say "if it's something Guido is so interested in that he wants to be deeply involved in it himself, but it comes at a time when he's buried under prior commitments, then tough tulips, it waits". shades-of-the-great-renaming-ly y'rs - tim From tim_one at email.msn.com Sat Mar 25 05:19:36 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 24 Mar 2000 23:19:36 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: <14555.58541.207868.496747@anthem.cnri.reston.va.us> Message-ID: <001101bf9611$544239e0$752d153f@tim> [Guido] > OK, that's reasonable. I'll have to invent a different > reason why I don't want this -- because I really don't! [Barry] > Tim's Fifth Enlightenment is all the reason you'd need, /if/ you can't > be persuaded to change your mind :) No no no no no: "namespaces are one honking great idea ..." is the controlling one here: Guido really *does* want this! It's a question of timing, in the sense of "never is often better than *right* now", but to be eventually modified by "now is better than never". These were carefully designed to support any position whatsoever, you know . although-in-any-particular-case-there's-only-one-true-interpretation-ly y'rs - tim From guido at python.org Sat Mar 25 05:19:41 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 23:19:41 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: Your message of "Fri, 24 Mar 2000 17:19:18 PST." References: Message-ID: <200003250419.XAA25751@eric.cnri.reston.va.us> > > One thing you can definitely do now which breaks no code: propose a > > package hierarchy for the standard library. > > I already did! > > http://www.python.org/pipermail/python-dev/2000-February/003761.html > > *grumble* You've got to be kidding. That's not a package hierarchy proposal, it's just one package (network). Without a comprehensive proposal I'm against a partial reorganization: without a destination we can't start marching. Naming things is very contentious -- everybody has an opinion. To pick the right names you must see things in perspective. --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez at math.huji.ac.il Sat Mar 25 09:45:28 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 25 Mar 2000 10:45:28 +0200 (IST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: Message-ID: On Thu, 23 Mar 2000 gvwilson at nevex.com wrote: > If None becomes a keyword, I would like to ask whether it could be used to > signal that a method is a class method, as opposed to an instance method: I'd like to know what you mean by "class" method. (I do know C++ and Java, so I have some idea...). Specifically, my question is: how does a class method access class variables? They can't be totally unqualified (because that's very unpythonic). If they are qualified by the class's name, I see it as a very mild improvement on the current situation. You could suggest, for example, to qualify class variables by "class" (so you'd do things like: class.x = 1), but I'm not sure I like it. On the whole, I think it is a much bigger issue on how be denote class methods. Also, one slight problem with your method of denoting class methods: currently, it is possible to add instance method at run time to a class by something like class C: pass def foo(self): pass C.foo = foo In your suggestion, how do you view the possiblity of adding class methods to a class? (Note that "foo", above, is also perfectly usable as a plain function). I want to note that Edward suggested denotation by a seperate namespace: C.foo = foo # foo is an instance method C.__methods__.foo = foo # foo is a class method The biggest problem with that suggestion is that it doesn't address the common case of defining it textually inside the class definition. > I'd also like to ask (separately) that assignment to None be defined as a > no-op, so that programmers can write: > > year, month, None, None, None, None, weekday, None, None = gmtime(time()) > > instead of having to create throw-away variables to fill in slots in > tuples that they don't care about. Currently, I use "_" for that purpose, after I heard the idea from Fredrik Lundh. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From gstein at lyra.org Sat Mar 25 10:26:23 2000 From: gstein at lyra.org (Greg Stein) Date: Sat, 25 Mar 2000 01:26:23 -0800 (PST) Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: <200003250419.XAA25751@eric.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Guido van Rossum wrote: > > > One thing you can definitely do now which breaks no code: propose a > > > package hierarchy for the standard library. > > > > I already did! > > > > http://www.python.org/pipermail/python-dev/2000-February/003761.html > > > > *grumble* > > You've got to be kidding. That's not a package hierarchy proposal, > it's just one package (network). > > Without a comprehensive proposal I'm against a partial reorganization: > without a destination we can't start marching. Not kidding at all. I said before that I don't think we can do everything all at once. I *do* think this is solvable with a greedy algorithm rather than waiting for some nebulous completion point. > Naming things is very contentious -- everybody has an opinion. To > pick the right names you must see things in perspective. Sure. And those diverse opinions are why I don't believe it is possible to do all at once. The task is simply too large to tackle in one shot. IMO, it must be solved incrementally. I'm not even going to attempt to try to define a hierarchy for all those modules. I count 137 on my local system. Let's say that I *do* try... some are going to end up "forced" rather than obeying some obvious grouping. If you do it a chunk at a time, then you get the obvious, intuitive groupings. Try for more, and you just bung it all up. For discussion's sake: can you provide a rationale for doing it all at once? In the current scenario, modules just appear at some point. After a partial reorg, some modules appear at a different point. "No big whoop." Just because module A is in a package doesn't imply that module B must also be in a package. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sat Mar 25 10:35:39 2000 From: gstein at lyra.org (Greg Stein) Date: Sat, 25 Mar 2000 01:35:39 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <001001bf9611$52e960a0$752d153f@tim> Message-ID: On Fri, 24 Mar 2000, Tim Peters wrote: > [GregS proposes a partial packaging of std modules for 1.6, Guido objects on > spurious grounds, GregS refutes that, Guido agrees] > > > I'll have to invent a different reason why I don't want this -- because > > I really don't! > > This one's easy! It's why I left the 20th of the 20 Pythonic Theses for you > to fill in . All you have to do now is come up with a pithy way to > say "if it's something Guido is so interested in that he wants to be deeply > involved in it himself, but it comes at a time when he's buried under prior > commitments, then tough tulips, it waits". No need for Pythonic Theses. I don't see anybody disagreeing with the end goal. The issue comes up with *how* to get there. I say "do it incrementally" while others say "do it all at once." Personally, I don't think it is possible to do all at once. As a corollary, if you can't do it all at once, but you *require* that it be done all at once, then you have effectively deferred the problem. To put it another way, Guido has already invented a reason to not do it: he just requires that it be done all at once. Result: it won't be done. [ not saying this was Guido's intent or desire... but this is how I read the result :-) ] Cheers, -g -- Greg Stein, http://www.lyra.org/ From moshez at math.huji.ac.il Sat Mar 25 10:55:12 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 25 Mar 2000 11:55:12 +0200 (IST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <14555.34371.749039.946891@beluga.mojam.com> Message-ID: On Fri, 24 Mar 2000, Skip Montanaro wrote: > Might I suggest moving robotparser.py from Tools/webchecker to Lib? Modules > of general usefulness (this is at least generally useful for anyone writing > web spiders ;-) shouldn't live in Tools, because it's not always available > and users need to do extra work to make them available. You're right, but I'd like this to be a 1.7 change. It's just that I plan to suggest a great-renaming-fest for 1.7 modules, and then namespace wouldn't be cluttered when you don't need it. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Sat Mar 25 11:16:23 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 25 Mar 2000 12:16:23 +0200 (IST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Guido van Rossum wrote: > OK, that's reasonable. I'll have to invent a different reason why I > don't want this -- because I really don't! Here's a reason: there shouldn't be changes we'll retract later -- we need to come up with the (more or less) right hierarchy the first time, or we'll do a lot of work for nothing. > Hm. Moving modules requires painful and arcane CVS manipulations that > can only be done by the few of us here at CNRI -- and I'm the only one > left who's full time on Python. Hmmmmm....this is a big problem. Maybe we need to have more people with access to the CVS? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal at lemburg.com Sat Mar 25 11:47:30 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 25 Mar 2000 11:47:30 +0100 Subject: [Python-Dev] Unicode charnames impl. References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> <14555.57908.151946.182639@amarok.cnri.reston.va.us> <38DBE870.D88915B5@lemburg.com> <14555.61440.613940.50492@amarok.cnri.reston.va.us> Message-ID: <38DC9942.3C4E4B92@lemburg.com> "Andrew M. Kuchling" wrote: > > M.-A. Lemburg writes: > >.encode() should translate Unicode to a string. Since the > >named char thing is probably only useful on input, I'd say: > >don't do anything, except maybe return input.encode('unicode-escape'). > > Wait... then you can't stack it on top of unicode-escape, because it > would already be Unicode escaped. Sorry for the mixup (I guess yesterday wasn't my day...). I had stream codecs in mind: these are stackable, meaning that you can wrap one codec around another. And its also their interface API that was changed -- not the basic stateless encoder/decoder ones. Stacking of .encode()/.decode() must be done "by hand" in e.g. the way I described above. Another approach would be subclassing the unicode-escape Codec and then calling the base class method. > >> 4) What do you with the error \N{...... no closing right bracket. > >I'd suggest to take the upper bound of all Unicode name > >lengths as limit. > > Seems like a hack. It is... but what other way would there be ? > >Note that .decode() must only return the decoded data. > >The "bytes read" integer was removed in order to make > >the Codec APIs compatible with the standard file object > >APIs. > > Huh? Why does Misc/unicode.txt describe decode() as "Decodes the > object input and returns a tuple (output object, length consumed)"? > Or are you talking about a different .decode() method? You're right... I was thinking about .read() and .write(). .decode() should do return a tuple, just as documented in unicode.txt. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mhammond at skippinet.com.au Sat Mar 25 14:20:59 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun, 26 Mar 2000 00:20:59 +1100 Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: [Greg writes] > I'm not even going to attempt to try to > define a hierarchy for all those modules. I count 137 on my local system. > Let's say that I *do* try... some are going to end up "forced" rather than > obeying some obvious grouping. If you do it a chunk at a time, then you > get the obvious, intuitive groupings. Try for more, and you just bung it > all up. ... > Just because module A is in a package doesn't imply that module B must > also be in a package. I agree with Greg - every module will not fit into a package. But I also agree with Guido - we _should_ attempt to go through the 137 modules and put the ones that fit into logical groupings. Greg is probably correct with his selection for "net", but a general evaluation is still a good thing. A view of the bigger picture will help to quell debates over the structure, and only leave us with the squabbles over the exact spelling :-) +2 on ... err .... -1 on ... errr - awww - screw that--ly, Mark. From tismer at tismer.com Sat Mar 25 14:35:50 2000 From: tismer at tismer.com (Christian Tismer) Date: Sat, 25 Mar 2000 14:35:50 +0100 Subject: [Python-Dev] Unicode charnames impl. References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> <14555.57908.151946.182639@amarok.cnri.reston.va.us> Message-ID: <38DCC0B6.2A7D0EF1@tismer.com> "Andrew M. Kuchling" wrote: ... > 3) How can we store all those names? The resulting dictionary makes a > 361K .py file; Python dumps core trying to parse it. (Another bug...) This is simply not the place to use a dictionary. You don't need fast lookup from names to codes, but something that supports incremental search. This would enable PythonWin to sho a pop-up list after you typed the first letters. I'm working on a common substring analysis that makes each entry into 3 to 5 small integers. You then encode these in an order-preserving way. That means, the resulting code table is still lexically ordered, and access to the sentences is done via bisection. Takes me some more time to get that, but it will not be larger than 60k, or I drop it. Also note that all the names use uppercase letters and space only. An opportunity to use simple context encoding and use just 4 bits most of the time. ... > I've also add a script that parses the names out of the NameList.txt > file at ftp://ftp.unicode.org/Public/UNIDATA/. Is there any reason why you didn't use the UnicodeData.txt file, I mean do I cover everything if I continue to use that? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From Vladimir.Marangozov at inrialpes.fr Sat Mar 25 15:59:55 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Sat, 25 Mar 2000 15:59:55 +0100 (CET) Subject: [Python-Dev] Windows and PyObject_NEW Message-ID: <200003251459.PAA09181@python.inrialpes.fr> For MarkH, Guido and the Windows experienced: I've been reading Jeffrey Richter's "Advanced Windows" last night in order to try understanding better why PyObject_NEW is implemented differently for Windows. Again, I feel uncomfortable with this, especially now, when I'm dealing with the memory aspect of Python's object constructors/desctrs. Some time ago, Guido elaborated on why PyObject_NEW uses malloc() on the user's side, before calling _PyObject_New (on Windows, cf. objimpl.h): [Guido] > I can explain the MS_COREDLL business: > > This is defined on Windows because the core is in a DLL. Since the > caller may be in another DLL, and each DLL (potentially) has a > different default allocator, and (in pre-Vladimir times) the > type-specific deallocator typically calls free(), we (Mark & I) > decided that the allocation should be done in the type-specific > allocator. We changed the PyObject_NEW() macro to call malloc() and > pass that into _PyObject_New() as a second argument. While I agree with this, from reading chapters 5-9 of (a French copy of) the book (translated backwards here): 5. Win32 Memory Architecture 6. Exploring Virtual Memory 7. Using Virtual Memory in Your Applications 8. Memory Mapped Files 9. Heaps I can't find any radical Windows specificities for memory management. On Windows, like the rest of the OSes, the (virtual & physical) memory allocated for a process is common and seem to be accessible from all DDLs involved in an executable. Things like page sharing, copy-on-write, private process mem, etc. are conceptually all the same on Windows and Unix. Now, the backwards binary compatibility argument aside (assuming that extensions get recompiled when a new Python version comes out), my concern is that with the introduction of PyObject_NEW *and* PyObject_DEL, there's no point in having separate implementations for Windows and Unix any more (or I'm really missing something and I fail to see what it is). User objects would be allocated *and* freed by the core DLL (at least the object headers). Even if several DLLs use different allocators, this shouldn't be a problem if what's obtained via PyObject_NEW is freed via PyObject_DEL. This Python memory would be allocated from the Python's core DLL regions/pages/heaps. And I believe that the memory allocated by the core DLL is accessible from the other DLL's of the process. (I haven't seen evidence on the opposite, but tell me if this is not true) I thought that maybe Windows malloc() uses different heaps for the different DLLs, but that's fine too, as long as the _NEW/_DEL symmetry is respected and all heaps are accessible from all DLLs (which seems to be the case...), but: In the beginning of Chapter 9, Heaps, I read the following: """ ...About Win32 heaps (compared to Win16 heaps)... * There is only one kind of heap (it doesn't have any particular name, like "local" or "global" on Win16, because it's unique) * Heaps are always local to a process. The contents of a process heap is not accessible from the threads of another process. A large number of Win16 applications use the global heap as a way of sharing data between processes; this change in the Win32 heaps is often a source of problems for porting Win16 applications to Win32. * One process can create several heaps in its addressing space and can manipulate them all. * A DLL does not have its own heap. It uses the heaps as part of the addressing space of the process. However, a DLL can create a heap in the addressing space of a process and reserve it for its own use. Since several 16-bit DLLs share data between processes by using the local heap of a DLL, this change is a source of problems when porting Win16 apps to Win32... """ This last paragraph confuses me. On one hand, it's stated that all heaps can be manipulated by the process, and OTOH, a DLL can reserve a heap for personal use within that process (implying the heap is r/w protected for the other DLLs ?!?). The rest of this chapter does not explain how this "private reservation" is or can be done, so some of you would probably want to chime in and explain this to me. Going back to PyObject_NEW, if it turns out that all heaps are accessible from all DLLs involved in the process, I would probably lobby for unifying the implementation of _PyObject_NEW/_New and _PyObject_DEL/_Del for Windows and Unix. Actually on Windows, object allocation does not depend on a central, Python core memory allocator. Therefore, with the patches I'm working on, changing the core allocator would work (would be changed for real) only for platforms other than Windows. Next, ff it's possible to unify the implementation, it would also be possible to expose and officialize in the C API a new function set: PyObject_New() and PyObject_Del() (without leading underscores) For now, due to the implementation difference on Windows, we're forced to use the macro versions PyObject_NEW/DEL. Clearly, please tell me what would be wrong on Windows if a) & b) & c): a) we have PyObject_New(), PyObject_Del() b) their implementation is platform independent (no MS_COREDLL diffs, we retain the non-Windows variant) c) they're both used systematically for all object types -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From gmcm at hypernet.com Sat Mar 25 16:46:01 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Sat, 25 Mar 2000 10:46:01 -0500 Subject: [Python-Dev] Windows and PyObject_NEW In-Reply-To: <200003251459.PAA09181@python.inrialpes.fr> Message-ID: <1258123323-10623548@hypernet.com> Vladimir Marangozov > ... And I believe that the memory allocated > by the core DLL is accessible from the other DLL's of the process. > (I haven't seen evidence on the opposite, but tell me if this is not true) This is true. Or, I should say, it all boils down to HeapAlloc( heap, flags, bytes) and malloc is going to use the _crtheap. > In the beginning of Chapter 9, Heaps, I read the following: > > """ > ...About Win32 heaps (compared to Win16 heaps)... > > * There is only one kind of heap (it doesn't have any particular name, > like "local" or "global" on Win16, because it's unique) > > * Heaps are always local to a process. The contents of a process heap is > not accessible from the threads of another process. A large number of > Win16 applications use the global heap as a way of sharing data between > processes; this change in the Win32 heaps is often a source of problems > for porting Win16 applications to Win32. > > * One process can create several heaps in its addressing space and can > manipulate them all. > > * A DLL does not have its own heap. It uses the heaps as part of the > addressing space of the process. However, a DLL can create a heap in > the addressing space of a process and reserve it for its own use. > Since several 16-bit DLLs share data between processes by using the > local heap of a DLL, this change is a source of problems when porting > Win16 apps to Win32... > """ > > This last paragraph confuses me. On one hand, it's stated that all heaps > can be manipulated by the process, and OTOH, a DLL can reserve a heap for > personal use within that process (implying the heap is r/w protected for > the other DLLs ?!?). At any time, you can creat a new Heap handle HeapCreate(options, initsize, maxsize) Nothing special about the "dll" context here. On Win9x, only someone who knows about the handle can manipulate the heap. (On NT, you can enumerate the handles in the process.) I doubt very much that you would break anybody's code by removing the Windows specific behavior. But it seems to me that unless Python always uses the default malloc, those of us who write C++ extensions will have to override operator new? I'm not sure. I've used placement new to allocate objects in a memory mapped file, but I've never tried to muck with the global memory policy of C++ program. - Gordon From akuchlin at mems-exchange.org Sat Mar 25 18:58:56 2000 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Sat, 25 Mar 2000 12:58:56 -0500 (EST) Subject: [Python-Dev] Unicode charnames impl. In-Reply-To: <38DCC0B6.2A7D0EF1@tismer.com> References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> <14555.57908.151946.182639@amarok.cnri.reston.va.us> <38DCC0B6.2A7D0EF1@tismer.com> Message-ID: <14556.65120.22727.524616@newcnri.cnri.reston.va.us> Christian Tismer writes: >This is simply not the place to use a dictionary. >You don't need fast lookup from names to codes, >but something that supports incremental search. >This would enable PythonWin to sho a pop-up list after >you typed the first letters. Hmm... one could argue that PythonWin or IDLE should provide their own database for incremental searching; I was planning on following Bill Tutt's suggestion of generating a perfect minimal hash for the names. gperf isn't up to the job, but I found an algorithm that should be OK. Just got to implement it now... But, if your approach pays off it'll be superior to a perfect hash. >Is there any reason why you didn't use the UnicodeData.txt file, >I mean do I cover everything if I continue to use that? Oops; I saw the NameList file and just went for it; maybe it should use the full UnicodeData.txt. --amk From moshez at math.huji.ac.il Sat Mar 25 19:10:44 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 25 Mar 2000 20:10:44 +0200 (IST) Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Mark Hammond wrote: > But I also agree with Guido - we _should_ attempt to go through the 137 Where did you come up with that number? I counted much more -- not quite sure, but certainly more. Well, here's a tentative suggestion I worked out today. This is just to have something to quibble about. In the interest of rushing it out of the door, there are a few modules (explicitly mentioned) which I have said nothing about. net httplib ftplib urllib cgi gopherlib imaplib poplib nntplib smptlib urlparse telnetlib server BaseHTTPServer CGIHTTPServer SimpleHTTPServer SocketServer asynchat asyncore text sgmllib htmllib htmlentitydefs xml whatever the xml-sig puts here mail rfc822 mime MimeWriter mimetools mimify mailcap mimetypes base64 quopri mailbox mhlib binhex parse string re regex reconvert regex_syntax regsub shlex ConfigParser linecache multifile netrc bin gzip zlib aifc chunk image imghdr colorsys imageop imgfile rgbimg yuvconvert sound sndhdr toaiff audiodev sunau sunaudio wave audioop sunaudiodev db anydbm whichdb bsddb dbm dbhash dumbdbm gdbm math bisect fpformat random whrandom cmath math crypt fpectl fpetest array md5 mpz rotor sha time calendar time tzparse sched timing interpreter new py_compile code codeop compileall keyword token tokenize parser dis bdb pdb profile pyclbr tabnanny symbol pstats traceback rlcompleter security Bastion rexec ihooks file dircache path -- a virtual module which would do a from path import * dospath posixpath macpath nturl2path ntpath macurl2path filecmp fileinput StringIO cStringIO glob fnmatch posixfile stat statcache statvfs tempfile shutil pipes popen2 commands dl fcntl serialize pickle cPickle shelve xdrlib copy copy_reg threads thread threading Queue mutex ui curses Tkinter cmd getpass internal _codecs _locale _tkinter pcre strop posix users pwd grp nis exceptions os types UserDict UserList user site locale sgi al cd cl fl fm gl misc (what used to be sgimodule.c) sv unicode codecs unicodedata unicodedatabase ========== Modules not handled ============ formatter getopt pprint pty repr tty errno operator pure readline resource select signal socket struct syslog termios Well, if you got this far, you certainly deserve... congratualtions-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From DavidA at ActiveState.com Sat Mar 25 19:28:30 2000 From: DavidA at ActiveState.com (David Ascher) Date: Sat, 25 Mar 2000 10:28:30 -0800 Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: > db > anydbm > whichdb > bsddb > dbm > dbhash > dumbdbm > gdbm This made me think of one issue which is worth considering -- is there a mechanism for third-party packages to hook into the standard naming hierarchy? It'd be weird not to have the oracle and sybase modules within the db toplevel package, for example. --david ascher From moshez at math.huji.ac.il Sat Mar 25 19:30:26 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 25 Mar 2000 20:30:26 +0200 (IST) Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: On Sat, 25 Mar 2000, David Ascher wrote: > This made me think of one issue which is worth considering -- is there a > mechanism for third-party packages to hook into the standard naming > hierarchy? It'd be weird not to have the oracle and sybase modules within > the db toplevel package, for example. My position is that any 3rd party module decides for itself where it wants to live -- once we formalized the framework. Consider PyGTK/PyGnome, PyQT/PyKDE -- they should live in the UI package too... From DavidA at ActiveState.com Sat Mar 25 19:50:14 2000 From: DavidA at ActiveState.com (David Ascher) Date: Sat, 25 Mar 2000 10:50:14 -0800 Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: > On Sat, 25 Mar 2000, David Ascher wrote: > > > This made me think of one issue which is worth considering -- is there a > > mechanism for third-party packages to hook into the standard naming > > hierarchy? It'd be weird not to have the oracle and sybase > modules within > > the db toplevel package, for example. > > My position is that any 3rd party module decides for itself where it wants > to live -- once we formalized the framework. Consider PyGTK/PyGnome, > PyQT/PyKDE -- they should live in the UI package too... That sounds good in theory, but I can see possible problems down the line: 1) The current mapping between package names and directory structure means that installing a third party package hierarchy in a different place on disk than the standard library requires some work on the import mechanisms (this may have been discussed already) and a significant amount of user education. 2) We either need a 'registration' mechanism whereby people can claim a name in the standard hierarchy or expect conflicts. As far as I can gather, in the Perl world registration occurs by submission to CPAN. Correct? One alternative is to go the Java route, which would then mean, I think, that some core modules are placed very high in the hierarchy (the equivalent of the java. subtree), and some others are deprecated to lower subtree (the equivalent of com.sun). Anyway, I agree with Guido on this one -- naming is a contentious issue wrought with long-term implications. Let's not rush into a decision just yet. --david From guido at python.org Sat Mar 25 19:56:20 2000 From: guido at python.org (Guido van Rossum) Date: Sat, 25 Mar 2000 13:56:20 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: Your message of "Sat, 25 Mar 2000 01:35:39 PST." References: Message-ID: <200003251856.NAA09636@eric.cnri.reston.va.us> > I say "do it incrementally" while others say "do it all at once." > Personally, I don't think it is possible to do all at once. As a > corollary, if you can't do it all at once, but you *require* that it be > done all at once, then you have effectively deferred the problem. To put > it another way, Guido has already invented a reason to not do it: he just > requires that it be done all at once. Result: it won't be done. Bullshit, Greg. (I don't normally like to use such strong words, but since you're being confrontational here...) I'm all for doing it incrementally -- but I want the plan for how to do it made up front. That doesn't require all the details to be worked out -- but it requires a general idea about what kind of things we will have in the namespace and what kinds of names they get. An organizing principle, if you like. If we were to decide later that we go for a Java-like deep hierarchy, the network package would have to be moved around again -- what a waste. --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez at math.huji.ac.il Sat Mar 25 20:35:37 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 25 Mar 2000 21:35:37 +0200 (IST) Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: On Sat, 25 Mar 2000, David Ascher wrote: > > My position is that any 3rd party module decides for itself where it wants > > to live -- once we formalized the framework. Consider PyGTK/PyGnome, > > PyQT/PyKDE -- they should live in the UI package too... > > That sounds good in theory, but I can see possible problems down the line: > > 1) The current mapping between package names and directory structure means > that installing a third party package hierarchy in a different place on disk > than the standard library requires some work on the import mechanisms (this > may have been discussed already) and a significant amount of user education. Ummmm.... 1.a) If the work of the import-sig produces something (which I suspect it will), it's more complicated -- you could have JAR-like files with hierarchies inside. 1.b) Installation is the domain of the distutils-sig. I seem to remember Greg Ward saying something about installing packages. > 2) We either need a 'registration' mechanism whereby people can claim a name > in the standard hierarchy or expect conflicts. As far as I can gather, in > the Perl world registration occurs by submission to CPAN. Correct? Yes. But this is no worse then the current situation, where people pick a toplevel name . I agree a registration mechanism would be helpful. > One alternative is to go the Java route, which would then mean, I think, > that some core modules are placed very high in the hierarchy (the equivalent > of the java. subtree), and some others are deprecated to lower subtree (the > equivalent of com.sun). Personally, I *hate* the Java mechanism -- see Stallman's position on why GNU Java packages use gnu.* rather then org.gnu.* for some of the reasons. I really, really, like the Perl mechanism, and I think we would do well to think if something like that wouldn't suit us, with minor modifications. (Remember that lwall copied the Pythonic module mechanism, so Perl and Python modules are quite similar) > Anyway, I agree with Guido on this one -- naming is a contentious issue > wrought with long-term implications. Let's not rush into a decision just > yet. I agree. That's why I pushed out the straw-man proposal. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From bwarsaw at cnri.reston.va.us Sat Mar 25 21:07:27 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Sat, 25 Mar 2000 15:07:27 -0500 (EST) Subject: [Python-Dev] 1.6 job list References: <14555.57858.824301.693390@anthem.cnri.reston.va.us> Message-ID: <14557.7295.451011.36533@anthem.cnri.reston.va.us> I guess I was making a request for a more comprehensive list. People are asking to packagize the entire directory, so I'd like to know what organization they'd propose for all the modules. -Barry From bwarsaw at cnri.reston.va.us Sat Mar 25 21:20:09 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Sat, 25 Mar 2000 15:20:09 -0500 (EST) Subject: [Python-Dev] 1.6 job list References: <200003242129.QAA06510@eric.cnri.reston.va.us> Message-ID: <14557.8057.896921.693908@anthem.cnri.reston.va.us> >>>>> "MZ" == Moshe Zadka writes: MZ> Hmmmmm....this is a big problem. Maybe we need to have more MZ> people with access to the CVS? To make changes like this, you don't just need write access to CVS, you need physical access to the repository filesystem. It's not possible to provide this access to non-CNRI'ers. -Barry From gstein at lyra.org Sat Mar 25 21:40:59 2000 From: gstein at lyra.org (Greg Stein) Date: Sat, 25 Mar 2000 12:40:59 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <14557.8057.896921.693908@anthem.cnri.reston.va.us> Message-ID: On Sat, 25 Mar 2000, Barry A. Warsaw wrote: > >>>>> "MZ" == Moshe Zadka writes: > > MZ> Hmmmmm....this is a big problem. Maybe we need to have more > MZ> people with access to the CVS? > > To make changes like this, you don't just need write access to CVS, > you need physical access to the repository filesystem. It's not > possible to provide this access to non-CNRI'ers. Unless the CVS repository was moved to, say, SourceForge. :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From bwarsaw at cnri.reston.va.us Sat Mar 25 22:00:39 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Sat, 25 Mar 2000 16:00:39 -0500 (EST) Subject: [Python-Dev] module reorg (was: 1.6 job list) References: Message-ID: <14557.10487.736544.336550@anthem.cnri.reston.va.us> >>>>> "MZ" == Moshe Zadka writes: MZ> Personally, I *hate* the Java mechanism -- see Stallman's MZ> position on why GNU Java packages use gnu.* rather then MZ> org.gnu.* for some of the reasons. Actually, it's Per Bothner's position: http://www.gnu.org/software/java/why-gnu-packages.txt and I agree with him. I kind of wished that JimH had chosen simply `python' as JPython's top level package heirarchy, but that's too late to change now. -Barry From bwarsaw at cnri.reston.va.us Sat Mar 25 22:03:08 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Sat, 25 Mar 2000 16:03:08 -0500 (EST) Subject: [Python-Dev] 1.6 job list References: <14557.8057.896921.693908@anthem.cnri.reston.va.us> Message-ID: <14557.10636.504088.517078@anthem.cnri.reston.va.us> >>>>> "GS" == Greg Stein writes: GS> Unless the CVS repository was moved to, say, SourceForge. I didn't want to rehash that, but yes, you're absolutely right! -Barry From gstein at lyra.org Sat Mar 25 22:13:00 2000 From: gstein at lyra.org (Greg Stein) Date: Sat, 25 Mar 2000 13:13:00 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <14557.10636.504088.517078@anthem.cnri.reston.va.us> Message-ID: On Sat, 25 Mar 2000 bwarsaw at cnri.reston.va.us wrote: > >>>>> "GS" == Greg Stein writes: > > GS> Unless the CVS repository was moved to, say, SourceForge. > > I didn't want to rehash that, but yes, you're absolutely right! Me neither, ergo the smiley :-) Just felt inclined to mention it, and I think the conversation stopped last time at that point; not sure it ever was "hashed" :-). But it is only a discussion to raise if checkins-via-CNRI-guys becomes a true bottleneck. Which it hasn't and doesn't look to be. Constrained? Yes. Bottleneck? No. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jeremy at cnri.reston.va.us Sat Mar 25 22:22:09 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Sat, 25 Mar 2000 16:22:09 -0500 (EST) Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: References: Message-ID: <14557.4689.858620.578102@walden> >>>>> "MH" == Mark Hammond writes: MH> [Greg writes] >> I'm not even going to attempt to try to define a hierarchy for >> all those modules. I count 137 on my local system. Let's say >> that I *do* try... some are going to end up "forced" rather than >> obeying some obvious grouping. If you do it a chunk at a time, >> then you get the obvious, intuitive groupings. Try for more, and >> you just bung it all up. MH> I agree with Greg - every module will not fit into a package. Sure. No one is arguing with that :-). Where I disagree with Greg, is that we shouldn't approach this piecemeal. A greedy algorithm can lead to a locally optimal solution that isn't the right for the whole library. A name or grouping might make sense on its own, but isn't sufficiently clear when taking all 137 odd modules into account. MH> But I also agree with Guido - we _should_ attempt to go through MH> the 137 modules and put the ones that fit into logical MH> groupings. Greg is probably correct with his selection for MH> "net", but a general evaluation is still a good thing. A view MH> of the bigger picture will help to quell debates over the MH> structure, and only leave us with the squabbles over the exact MH> spelling :-) x1.5 on this. I'm not sure which direction you ended up thinking this was (+ or -), but which ever direction it was I like it. Jeremy From gstein at lyra.org Sat Mar 25 22:40:48 2000 From: gstein at lyra.org (Greg Stein) Date: Sat, 25 Mar 2000 13:40:48 -0800 (PST) Subject: [Python-Dev] voting numbers Message-ID: Hey... just thought I'd drop off a description of the "formal" mechanism that the ASF uses for voting since it has been seen here and there on this group :-) +1 "I'm all for it. Do it!" +0 "Seems cool and acceptable, but I can also live without it" -0 "Not sure this is the best thing to do, but I'm not against it." -1 "Veto. And is my reasoning." Strictly speaking, there is no vetoing here, other than by Guido. For changes to Apache (as opposed to bug fixes), it depends on where the development is. Early stages, it is reasonably open and people work straight against CVS (except for really big design changes). Late stage, it requires three +1 votes during discussion of a patch before it goes in. Here on python-dev, it would seem that the votes are a good way to quickly let Guido know people's feelings about topic X or Y. On the patches mailing list, the voting could actually be quite a useful measure for the people with CVS commit access. If a patch gets -1, then its commit should wait until reason X has been resolved. Note that it can be resolved in two ways: the person lifts their veto (after some amount of persuasion or explanation), or the patch is updated to address the concerns (well, unless the veto is against the concept of the patch entirely :-). If a patch gets a few +1 votes, then it can probably go straight in. Note that the Apache guys sometimes say things like "+1 on concept" meaning they like the idea, but haven't reviewed the code. Do we formalize on using these? Not really suggesting that. But if myself (and others) drop these things into mail notes, then we may as well have a description of just what the heck is going on :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From moshez at math.huji.ac.il Sun Mar 26 00:27:18 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 26 Mar 2000 01:27:18 +0200 (IST) Subject: [Python-Dev] Q: repr.py vs. pprint.py Message-ID: Is there any reason to keep two seperate modules with simple-formatting functions? I think pprint is somewhat more sophisticated, but in the worst case, we can just dump them both in the same file (the only thing would be that pprint would export "repr", in addition to "saferepr" (among others). (Just bumped into this in my reorg suggestion) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Sun Mar 26 00:32:38 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 26 Mar 2000 01:32:38 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 Message-ID: Here's a second version of the straw man proposal for the reorganization of modules in packages. Note that I'm treating it as a strictly 1.7 proposal, so I don't care a "lot" about backwards compatiblity. I'm down to 4 unhandled modules, which means that if no one objects (and I'm sure someone will ), this can be a plan of action. So get your objections ready guys! net httplib ftplib urllib cgi gopherlib imaplib poplib nntplib smptlib urlparse telnetlib server BaseHTTPServer CGIHTTPServer SimpleHTTPServer SocketServer asynchat asyncore text sgmllib htmllib htmlentitydefs xml whatever the xml-sig puts here mail rfc822 mime MimeWriter mimetools mimify mailcap mimetypes base64 quopri mailbox mhlib binhex parse string re regex reconvert regex_syntax regsub shlex ConfigParser linecache multifile netrc bin gzip zlib aifc chunk image imghdr colorsys imageop imgfile rgbimg yuvconvert sound sndhdr toaiff audiodev sunau sunaudio wave audioop sunaudiodev db anydbm whichdb bsddb dbm dbhash dumbdbm gdbm math bisect fpformat random whrandom cmath math crypt fpectl fpetest array md5 mpz rotor sha time calendar time tzparse sched timing interpreter new py_compile code codeop compileall keyword token tokenize parser dis bdb pdb profile pyclbr tabnanny symbol pstats traceback rlcompleter security Bastion rexec ihooks file dircache path -- a virtual module which would do a from path import * dospath posixpath macpath nturl2path ntpath macurl2path filecmp fileinput StringIO cStringIO glob fnmatch posixfile stat statcache statvfs tempfile shutil pipes popen2 commands dl fcntl lowlevel socket select terminal termios pty tty readline syslog serialize pickle cPickle shelve xdrlib copy copy_reg threads thread threading Queue mutex ui curses Tkinter cmd getpass internal _codecs _locale _tkinter pcre strop posix users pwd grp nis sgi al cd cl fl fm gl misc (what used to be sgimodule.c) sv unicode codecs unicodedata unicodedatabase exceptions os types UserDict UserList user site locale pure formatter getopt signal pprint ========== Modules not handled ============ errno resource operator struct -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From DavidA at ActiveState.com Sun Mar 26 00:39:51 2000 From: DavidA at ActiveState.com (David Ascher) Date: Sat, 25 Mar 2000 15:39:51 -0800 Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: > I really, really, like the Perl mechanism, and I think we would do well > to think if something like that wouldn't suit us, with minor > modifications. The biggest modification which I think is needed to a Perl-like organization is that IMO there is value in knowing what packages are 'blessed' by Guido. In other words, some sort of Q/A mechanism would be good, if it can be kept simple. [Alternatively, let's not put a Q/A mechanism in place and my employer can make money selling that information, the way they do for Perl! =)] > (Remember that lwall copied the Pythonic module mechanism, > so Perl and Python modules are quite similar) That's stretching things a bit (the part after the 'so' doesn't follow from the part before), as there is a lot more to the nature of module systems, but the point is well taken. --david From moshez at math.huji.ac.il Sun Mar 26 06:44:02 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 26 Mar 2000 06:44:02 +0200 (IST) Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: On Sat, 25 Mar 2000, David Ascher wrote: > The biggest modification which I think is needed to a Perl-like organization > is that IMO there is value in knowing what packages are 'blessed' by Guido. > In other words, some sort of Q/A mechanism would be good, if it can be kept > simple. You got a point. Anyone knows how the perl-porters decide what modules to put in source.tar.gz? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping at lfw.org Sun Mar 26 07:01:58 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sat, 25 Mar 2000 21:01:58 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Moshe Zadka wrote: > Here's a second version of the straw man proposal for the reorganization > of modules in packages. Note that I'm treating it as a strictly 1.7 > proposal, so I don't care a "lot" about backwards compatiblity. Hey, this looks pretty good. For the most part i agree with your layout. Here are a few notes... > net [...] > server [...] Good. > text [...] > xml > whatever the xml-sig puts here > mail > rfc822 > mime > MimeWriter > mimetools > mimify > mailcap > mimetypes > base64 > quopri > mailbox > mhlib > binhex I'm not convinced "mime" needs a separate branch here. (This is the deepest part of the tree, and at three levels small alarm bells went off in my head.) For example, why text.binhex but text.mail.mime.base64? > parse > string > re > regex > reconvert > regex_syntax > regsub > shlex > ConfigParser > linecache > multifile > netrc The "re" module, in particular, will get used a lot, and it's not clear why these all belong under "parse". I suggest dropping "parse" and moving these up. What's "multifile" doing here instead of with the rest of the mail/mime stuff? > bin [...] I like this. Good idea. > gzip > zlib > aifc Shouldn't "aifc" be under "sound"? > image [...] > sound [...] > db [...] Yup. > math [...] > time [...] Looks good. > interpreter [...] How about just "interp"? > security [...] > file [...] > lowlevel > socket > select Why the separate "lowlevel" branch? Why doesn't "socket" go under "net"? > terminal > termios > pty > tty > readline Why does "terminal" belong under "file"? Maybe it could go under "ui"? Hmm... "pty" doesn't really belong. > syslog Hmm... > serialize > pickle > cPickle > shelve > xdrlib > copy > copy_reg "copy" doesn't really fit here under "serialize", and "serialize" is kind of a long name. How about a "data types" package? We could then put "struct", "UserDict", "UserList", "pprint", and "repr" here. data copy copy_reg pickle cPickle shelve xdrlib struct UserDict UserList pprint repr On second thought, maybe "struct" fits better under "bin". > threads [...] > ui [...] Uh huh. > internal > _codecs > _locale > _tkinter > pcre > strop > posix Not sure this is a good idea. It means the Unicode work lives under both "unicode" and "internal._codecs", Tk is split between "ui" and "internal._tkinter", regular expressions are split between "text.re" and "internal.pcre". I can see your motivation for getting "posix" out of the way, but i suspect this is likely to confuse people. > users > pwd > grp > nis Hmm. Yes, i suppose so. > sgi [...] > unicode [...] Indeed. > os > UserDict > UserList > exceptions > types > operator > user > site Yeah, these are all top-level (except maybe UserDict and UserList, see above). > locale I think "locale" belongs under "math" with "fpformat" and the others. It's for numeric formatting. > pure What the heck is "pure"? > formatter This probably goes under "text". > struct See above under "data". I can't decide whether "struct" should be part of "data" or "bin". Hmm... probably "bin" -- since, unlike the serializers under "data", "struct" does not actually specify a serialization format, it only provides fairly low-level operations. Well, this leaves a few system-like modules that didn't really fit elsewhere for me: pty tty termios syslog select getopt signal errno resource They all seem to be Unix-related. How about putting these in a "unix" or "system" package? -- ?!ng "I'm not trying not to answer the question; i'm just not answering it." -- Lenore Snell From moshez at math.huji.ac.il Sun Mar 26 07:58:34 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 26 Mar 2000 07:58:34 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sat, 25 Mar 2000, Ka-Ping Yee wrote: > I'm not convinced "mime" needs a separate branch here. > (This is the deepest part of the tree, and at three levels > small alarm bells went off in my head.) I've had my problems with that too, but it seemed to many modules were mime specific. > For example, why text.binhex but text.mail.mime.base64? Actually, I thought about this (this isn't random at all): base64 encoding is part of the mime standard, together with quoted-printable. Binhex isn't. I don't know if you find it reason enough, and it may be smarter just having a text.encode.{quopri,uu,base64,binhex} > > parse > > string > > re > > regex > > reconvert > > regex_syntax > > regsub > > shlex > > ConfigParser > > linecache > > multifile > > netrc > > The "re" module, in particular, will get used a lot, and from import re Doesn't seem too painful. > and it's not clear why these all belong under "parse". These are all used for parsing data (which does not have some pre-written parser). I had problems with the name too... > What's "multifile" doing here instead of with the rest > of the mail/mime stuff? It's also useful generally. > Shouldn't "aifc" be under "sound"? You're right. > > interpreter > [...] > > How about just "interp"? I've no *strong* feelings, just a vague "don't abbrev." hunch > Why the separate "lowlevel" branch? Because it is -- most Python code will use one of the higher level modules. > Why doesn't "socket" go under "net"? What about UNIX domain sockets? Again, no *strong* opinion, though. > > terminal > > termios > > pty > > tty > > readline > > Why does "terminal" belong under "file"? Because it is (a special kind of file) > > serialize > > > pickle > > cPickle > > shelve > > xdrlib > > copy > > copy_reg > > "copy" doesn't really fit here under "serialize", and > "serialize" is kind of a long name. I beg to disagree -- "copy" is frequently close to serialization, both in the model (serializing to a "data structure") and in real life (that's the way people copy stuff in Java, and UNIX too: think tar cvf - | tar xvf -) What's more, copy_reg is used both for copy and for pickle I do like the idea of "data-types" package, but it needs to be ironed out a bit. > > internal > > _codecs > > _locale > > _tkinter > > pcre > > strop > > posix > > Not sure this is a good idea. It means the Unicode > work lives under both "unicode" and "internal._codecs", > Tk is split between "ui" and "internal._tkinter", > regular expressions are split between "text.re" and > "internal.pcre". I can see your motivation for getting > "posix" out of the way, but i suspect this is likely to > confuse people. You mistook my motivation -- I just want unadvertised modules (AKA internal use modules) to live in a carefully segregate section of the namespace. How would this confuse people? No one imports _tkinter or pcre, so no one would notice the change. > > locale > > I think "locale" belongs under "math" with "fpformat" and > the others. It's for numeric formatting. Only? And anyway, I doubt many people will think like that. > > pure > > What the heck is "pure"? A module that helps work with purify. > > formatter > > This probably goes under "text". You're right. > Well, this leaves a few system-like modules that didn't > really fit elsewhere for me: > > pty > tty > termios > syslog > select > getopt > signal > errno > resource > > They all seem to be Unix-related. How about putting these > in a "unix" or "system" package? "select", "signal" aren't UNIX specific. "getopt" is used for generic argument processing, so it isn't really UNIX specific. And I don't like the name "system" either. But I have no constructive proposals about thos either. so-i'll-just-shut-up-now-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From dan at cgsoftware.com Sun Mar 26 08:05:44 2000 From: dan at cgsoftware.com (Daniel Berlin) Date: Sat, 25 Mar 2000 22:05:44 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: > "select", "signal" aren't UNIX specific. Huh? How not? Can you name a non-UNIX that is providing them? (BeOS wouldn't count, select is broken, and nobody uses signals.) and if you can, is it providing them for something other than "UNIX/POSIX compatibility" > "getopt" is used for generic argument processing, so it isn't really UNIX > specific. It's a POSIX.2 function. I consider that UNIX. > And I don't like the name "system" either. But I have no > constructive proposals about thos either. > > so-i'll-just-shut-up-now-ly y'rs, Z. > -- just-picking-nits-ly y'rs, Dan From moshez at math.huji.ac.il Sun Mar 26 08:32:33 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 26 Mar 2000 08:32:33 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sat, 25 Mar 2000, Daniel Berlin wrote: > > > "select", "signal" aren't UNIX specific. > Huh? > How not? > Can you name a non-UNIX that is providing them? Win32. Both of them. I've even used select there. > and if you can, is it providing them for something other than "UNIX/POSIX > compatibility" I don't know what it provides them for, but I've *used* *select* on *WinNT*. I don't see why Python should make me feel bad when I'm doing that. > > "getopt" is used for generic argument processing, so it isn't really UNIX > > specific. > > It's a POSIX.2 function. > I consider that UNIX. Well, the argument style it processes is not unheard of in other OSes, and it's nice to have command line apps that have a common ui. That's it! "getopt" belongs in the ui package! -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping at lfw.org Sun Mar 26 09:23:45 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sat, 25 Mar 2000 23:23:45 -0800 (PST) Subject: [Python-Dev] cPickle and cStringIO Message-ID: Are there any objections to including try: from cPickle import * except: pass in pickle and try: from cStringIO import * except: pass in StringIO? -- ?!ng "I'm not trying not to answer the question; i'm just not answering it." -- Lenore Snell From moshez at math.huji.ac.il Sun Mar 26 09:14:10 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 26 Mar 2000 09:14:10 +0200 (IST) Subject: [Python-Dev] cPickle and cStringIO In-Reply-To: Message-ID: On Sat, 25 Mar 2000, Ka-Ping Yee wrote: > Are there any objections to including > > try: > from cPickle import * > except: > pass > > in pickle and > > try: > from cStringIO import * > except: > pass > > in StringIO? Yes, until Python types are subclassable. Currently, one can inherit from pickle.Pickler/Unpickler and StringIO.StringIO. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping at lfw.org Sun Mar 26 09:37:11 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sat, 25 Mar 2000 23:37:11 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: Okay, here's another shot at it. Notice a few things: - no text.mime package - encoders moved to text.encode - Unix stuff moved to unix package (no file.lowlevel, file.terminal) - aifc moved to bin.sound package - struct moved to bin package - locale moved to math package - linecache moved to interp package - data-type stuff moved to data package - modules in internal package moved to live with their friends Modules that are deprecated or not really intended to be imported are listed in parentheses (to give a better idea of the "real" size of each package). cStringIO and cPickle are parenthesized in hopeful anticipation of agreement on my last message... net urlparse urllib ftplib gopherlib imaplib poplib nntplib smtplib telnetlib httplib cgi server BaseHTTPServer CGIHTTPServer SimpleHTTPServer SocketServer asynchat asyncore text re # general-purpose parsing sgmllib htmllib htmlentitydefs xml whatever the xml-sig puts here mail rfc822 mailbox mhlib encode # i'm also ok with moving text.encode.* to text.* binhex uu base64 quopri MimeWriter mimify mimetools mimetypes multifile mailcap # special-purpose file parsing shlex ConfigParser netrc formatter (string, strop, pcre, reconvert, regex, regex_syntax, regsub) bin gzip zlib chunk struct image imghdr colorsys # a bit unsure, but doesn't go anywhere else imageop imgfile rgbimg yuvconvert sound aifc sndhdr toaiff audiodev sunau sunaudio wave audioop sunaudiodev db anydbm whichdb bsddb dbm dbhash dumbdbm gdbm math math # library functions cmath fpectl # type-related fpetest array mpz fpformat # formatting locale bisect # algorithm: also unsure, but doesn't go anywhere else random # randomness whrandom crypt # cryptography md5 rotor sha time calendar time tzparse sched timing interp new linecache # handling .py files py_compile code # manipulating internal objects codeop dis traceback compileall keyword # interpreter constants token symbol tokenize # parsing parser bdb # development pdb profile pyclbr tabnanny pstats rlcompleter # this might go in "ui"... security Bastion rexec ihooks file dircache path -- a virtual module which would do a from path import * nturl2path macurl2path filecmp fileinput StringIO glob fnmatch stat statcache statvfs tempfile shutil pipes popen2 commands dl (dospath, posixpath, macpath, ntpath, cStringIO) data pickle shelve xdrlib copy copy_reg UserDict UserList pprint repr (cPickle) threads thread threading Queue mutex ui _tkinter curses Tkinter cmd getpass getopt readline users pwd grp nis sgi al cd cl fl fm gl misc (what used to be sgimodule.c) sv unicode _codecs codecs unicodedata unicodedatabase unix errno resource signal posix posixfile socket select syslog fcntl termios pty tty _locale exceptions sys os types user site pure operator -- ?!ng "I'm not trying not to answer the question; i'm just not answering it." -- Lenore Snell From ping at lfw.org Sun Mar 26 09:40:27 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sat, 25 Mar 2000 23:40:27 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: Hey, while we're at it... as long as we're renaming modules, what do you all think of getting rid of that "lib" suffix? As in: > net > urlparse > url > ftp > gopher > imap > pop > nntp > smtp > telnet > http > cgi > server [...] > text > re # general-purpose parsing > sgml > html > htmlentitydefs [...] "import net.ftp" seems nicer to me than "import ftplib". We could also just stick htmlentitydefs.entitydefs in html and deprecate htmlentitydefs. -- ?!ng "I'm not trying not to answer the question; i'm just not answering it." -- Lenore Snell From ping at lfw.org Sun Mar 26 09:53:06 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sat, 25 Mar 2000 23:53:06 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Moshe Zadka wrote: > > For example, why text.binhex but text.mail.mime.base64? > > Actually, I thought about this (this isn't random at all): base64 encoding > is part of the mime standard, together with quoted-printable. Binhex > isn't. I don't know if you find it reason enough, and it may be smarter > just having a text.encode.{quopri,uu,base64,binhex} I think i'd like that better, yes. > > and it's not clear why these all belong under "parse". > > These are all used for parsing data (which does not have some pre-written > parser). I had problems with the name too... And parsing is what the "text" package is about anyway. I say move them up. (See the layout in my other message. Notice most of the regular-expression stuff is deprecated anyway, so it's not like there are really that many.) > > Why doesn't "socket" go under "net"? > > What about UNIX domain sockets? Again, no *strong* opinion, though. Bleck, you're right. Well, i think we just have to pick one or the other here, and i think most people would guess "net" first. (You can think of it as IPC, and file IPC-related things under then "net" category...?) > > Why does "terminal" belong under "file"? > > Because it is (a special kind of file) Only in Unix. It's Unix that likes to think of all things, including terminals, as files. > I do like the idea of "data-types" package, but it needs to be ironed > out a bit. See my other message for a possible suggested hierarchy... > > > internal [...] > You mistook my motivation -- I just want unadvertised modules (AKA > internal use modules) to live in a carefully segregate section of the > namespace. How would this confuse people? No one imports _tkinter or pcre, > so no one would notice the change. I think it makes more sense to classify modules by their topic rather than their exposure. (For example, you wouldn't move deprecated modules to a "deprecated" package.) Keep in mind that (well, at least to me) the main point of any naming hierarchy is to avoid name collisions. "internal" doesn't really help that purpose. You also want to be sure (or as sure as you can) that modules will be obvious to find in the hierarchy. An "internal" package creates a distinction orthogonal to the topic-matter distinction we're using for the rest of the packages, which *potentially* introduces the question "well... is this module internal or not?" for every other module. Yes, admittedly this is only "potentially", but i hope you see the abstract point i'm trying to make... > > > locale > > > > I think "locale" belongs under "math" with "fpformat" and > > the others. It's for numeric formatting. > > Only? And anyway, I doubt many people will think like that. Yeah, it is pretty much only for numeric formatting. The more generic locale stuff seems to be in _locale. > > They all seem to be Unix-related. How about putting these > > in a "unix" or "system" package? > > "select", "signal" aren't UNIX specific. Yes, but when they're available on other systems they're an attempt to emulate Unix or Posix functionality, aren't they? > Well, the argument style it processes is not unheard of in other OSes, and > it's nice to have command line apps that have a common ui. That's it! > "getopt" belongs in the ui package! I like ui.getopt. It's a pretty good idea. -- ?!ng "I'm not trying not to answer the question; i'm just not answering it." -- Lenore Snell From moshez at math.huji.ac.il Sun Mar 26 10:05:49 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 26 Mar 2000 10:05:49 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: +1. I've had minor nits, but nothing is perfect, and this is definitely "good enough". Now we'll just have to wait until the BDFL says something... -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Sun Mar 26 10:06:59 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 26 Mar 2000 10:06:59 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sat, 25 Mar 2000, Ka-Ping Yee wrote: > Hey, while we're at it... as long as we're renaming modules, > what do you all think of getting rid of that "lib" suffix? +0 -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Sun Mar 26 10:19:34 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 26 Mar 2000 10:19:34 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sat, 25 Mar 2000, Ka-Ping Yee wrote: > > "select", "signal" aren't UNIX specific. > > Yes, but when they're available on other systems they're an > attempt to emulate Unix or Posix functionality, aren't they? I thinki "signal" is ANSI C, but I'm not sure. no-other-comments-ly y'rs, Z. From gstein at lyra.org Sun Mar 26 13:52:53 2000 From: gstein at lyra.org (Greg Stein) Date: Sun, 26 Mar 2000 03:52:53 -0800 (PST) Subject: [Python-Dev] Windows and PyObject_NEW In-Reply-To: <1258123323-10623548@hypernet.com> Message-ID: On Sat, 25 Mar 2000, Gordon McMillan wrote: >... > I doubt very much that you would break anybody's code by > removing the Windows specific behavior. > > But it seems to me that unless Python always uses the > default malloc, those of us who write C++ extensions will have > to override operator new? I'm not sure. I've used placement > new to allocate objects in a memory mapped file, but I've never > tried to muck with the global memory policy of C++ program. Actually, the big problem arises when you have debug vs. non-debug DLLs. malloc() uses different heaps based on the debug setting. As a result, it is a bad idea to call malloc() from a debug DLL and free() it from a non-debug DLL. If the allocation pattern is fixed, then things may be okay. IF. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sun Mar 26 14:02:40 2000 From: gstein at lyra.org (Greg Stein) Date: Sun, 26 Mar 2000 04:02:40 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Moshe Zadka wrote: >... > [ tree ] This is a great start. I have two comments: 1) keep it *very* shallow. depth just makes it conceptually difficult. 2) you're pushing too hard. modules do not *have* to go into a package. there are some placements that you've made which are very questionable... it appears they are done for movement's sake rather than for being "right" I'm off to sleep, but will look into specific comments tomorrow or so. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sun Mar 26 14:14:32 2000 From: gstein at lyra.org (Greg Stein) Date: Sun, 26 Mar 2000 04:14:32 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003251856.NAA09636@eric.cnri.reston.va.us> Message-ID: On Sat, 25 Mar 2000, Guido van Rossum wrote: > > I say "do it incrementally" while others say "do it all at once." > > Personally, I don't think it is possible to do all at once. As a > > corollary, if you can't do it all at once, but you *require* that it be > > done all at once, then you have effectively deferred the problem. To put > > it another way, Guido has already invented a reason to not do it: he just > > requires that it be done all at once. Result: it won't be done. > > Bullshit, Greg. (I don't normally like to use such strong words, but > since you're being confrontational here...) Fair enough, and point accepted. Sorry. I will say, tho, that you've taken this slightly out of context. The next paragraph explicitly stated that I don't believe you had this intent. I just felt that coming up with a complete plan before doing anything would be prone to failure. You asked to invent a new reason :-), so I said you had one already :-) Confrontational? Yes, guilty as charged. I was a bit frustrated. > I'm all for doing it incrementally -- but I want the plan for how to > do it made up front. That doesn't require all the details to be > worked out -- but it requires a general idea about what kind of things > we will have in the namespace and what kinds of names they get. An > organizing principle, if you like. If we were to decide later that we > go for a Java-like deep hierarchy, the network package would have to > be moved around again -- what a waste. All righty. So I think there is probably a single question that I have here: Moshe posted a large breakdown of how things could be packaged. He and Ping traded a number of comments, and more will be coming as soon as people wake up :-) However, if you are only looking for a "general idea", then should python-dev'ers nit pick the individual modules, or just examine the general breakdown and hierarchy? thx, -g -- Greg Stein, http://www.lyra.org/ From moshez at math.huji.ac.il Sun Mar 26 14:09:02 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 26 Mar 2000 14:09:02 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Greg Stein wrote: > This is a great start. I have two comments: > > 1) keep it *very* shallow. depth just makes it conceptually difficult. I tried, and Ping shallowed it even more. BTW: Anyone who cares to comment, please comment on Ping's last suggestion. I pretty much agree with the changes he made. > 2) you're pushing too hard. modules do not *have* to go into a package. > there are some placements that you've made which are very > questionable... it appears they are done for movement's sake rather > than for being "right" Well, I'm certainly sorry I gave that impression -- the reason I wans't "right" wasn't that, it was more my desire to be "fast" -- I wanted to have some proposal out the door, since it is harder to argue about something concrete. The biggest prrof of concept that we all agree is that no one seriously took objections to anything -- there were just some minor nits to pick. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Sun Mar 26 14:11:10 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 26 Mar 2000 14:11:10 +0200 (IST) Subject: [Python-Dev] 1.6 job list In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Greg Stein wrote: > Moshe posted a large breakdown of how things could be packaged. He and > Ping traded a number of comments, and more will be coming as soon as > people wake up :-) Just a general comment -- it's so much fun to live in a different zone then all of you guys. just-wasting-time-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From gstein at lyra.org Sun Mar 26 14:23:57 2000 From: gstein at lyra.org (Greg Stein) Date: Sun, 26 Mar 2000 04:23:57 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Moshe Zadka wrote: > On Sun, 26 Mar 2000, Greg Stein wrote: >... > > 2) you're pushing too hard. modules do not *have* to go into a package. > > there are some placements that you've made which are very > > questionable... it appears they are done for movement's sake rather > > than for being "right" > > Well, I'm certainly sorry I gave that impression -- the reason I wans't > "right" wasn't that, it was more my desire to be "fast" -- I wanted to > have some proposal out the door, since it is harder to argue about > something concrete. The biggest prrof of concept that we all agree is that > no one seriously took objections to anything -- there were just some minor > nits to pick. Not something to apologize for! :-) Well, the indicator was the line in your original post about "unhandled modules" and the conversation between you and Ping with statements along the lines of "wasn't sure where to put this." I say just leave it then :-) If a module does not make *obvious* sense to be in a package, then it should not be there. For example: locale. That is not about numbers or about text. It has general utility. If there was an i18n package, then it would go there. Otherwise, don't force it somewhere else. Other packages are similar, so don't single out my comment about locale. Cheers, -g -- Greg Stein, http://www.lyra.org/ From DavidA at ActiveState.com Sun Mar 26 20:09:15 2000 From: DavidA at ActiveState.com (David Ascher) Date: Sun, 26 Mar 2000 10:09:15 -0800 Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: > If a module does not make *obvious* sense to be in a package, then it > should not be there. For example: locale. That is not about numbers or > about text. It has general utility. If there was an i18n package, then it > would go there. Otherwise, don't force it somewhere else. Other packages > are similar, so don't single out my comment about locale. I maintain that a general principle re: what the aim of this reorg is is needed before the partitioning of the space can make sense. What Moshe and Ping have is a good stab at partitioning of a subspace of the total space of Python modules and packages, i.e., the standard library. If we limit the aim of the reorg to cover just that subspace, then that's fine and Ping's proposal seems grossly fine to me. If we want to have a Perl-like packaging, then we _need_ to take into account all known Python modules of general utility, such as the database modules, the various GUI packages, the mx* packages, Aaron's work, PIL, etc., etc. Ignoring those means that the dataset used to decide the partitioning function is highly biased. Given the larger dataset, locale might very well fit in a not-toplevel location. I know that any organizational scheme is going to be optimal at best at its inception, and that as history happens, it will become suboptimal. However, it's important to know what the space being partitioned is supposed to look like. A final comment: there's a history and science to this kind of organization, which is part of library science. I suspect there is quite a bit of knowledge available as to organizing principles to do it right. It would be nice if someone could research it a bit and summarize the basic principles to the rest of us. I agree with Greg that we need high-level input from Guido on this. --david 'academic today' ascher From ping at lfw.org Sun Mar 26 22:34:11 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 26 Mar 2000 12:34:11 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Greg Stein wrote: > > If a module does not make *obvious* sense to be in a package, then it > should not be there. For example: locale. That is not about numbers or > about text. It has general utility. If there was an i18n package, then it > would go there. Otherwise, don't force it somewhere else. Other packages > are similar, so don't single out my comment about locale. I goofed. I apologize. Moshe and Greg are right: locale isn't just about numbers. I just read the comment at the top of locale.py: "Support for number formatting using the current locale settings" and didn't notice the from _locale import * a couple of lines down. "import locale; dir(locale)" didn't work for me because for some reason there's no _locale built-in on my system (Red Hat 6.1, python-1.5.1-10). So i looked for 'def's and they all looked like they had to do with numeric formatting. My mistake. "locale", at least, belongs at the top level. Other candidates for top-level: bisect # algorithm struct # more general than "bin" or "data" colorsys # not really just for image file formats yuvconvert # not really just for image file formats rlcompleter # not really part of the interpreter dl # not really just about files Alternatively, we could have: ui.rlcompleter, unix.dl (It would be nice, by the way, to replace "bisect" with an "algorithm" module containing some nice pedagogical implementations of things like bisect, quicksort, heapsort, Dijkstra's algorithm etc.) The following also could be left at the top-level, since they seem like applications (i.e. they probably won't get imported by code, only interactively). No strong opinion on this. bdb pdb pyclbr tabnanny profile pstats Also... i was avoiding calling the "unix" package "posix" because we already have a "posix" module. But wait... the proposed tree already contains "math" and "time" packages. If there is no conflict (is there a conflict?) then the "unix" package should probably be named "posix". -- ?!ng "In the sciences, we are now uniquely privileged to sit side by side with the giants on whose shoulders we stand." -- Gerald Holton From moshez at math.huji.ac.il Mon Mar 27 07:35:23 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Mon, 27 Mar 2000 07:35:23 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Ka-Ping Yee wrote: > The following also could be left at the top-level, since > they seem like applications (i.e. they probably won't > get imported by code, only interactively). No strong > opinion on this. > > bdb > pdb > pyclbr > tabnanny > profile > pstats Let me just state my feelings about the interpreter package: since Python programs are probably the most suited to reasoning about Python programs (among other things, thanks to the strong introspection capabilities of Python), many Python modules were written to supply a convenient interface to that introspection. These modules are *only* needed by programs dealing with Python programs, and hence should live in a well defined part of the namespace. I regret calling it "interpreter" though: "Python" is a better name (something like that java.lang package) > Also... i was avoiding calling the "unix" package "posix" > because we already have a "posix" module. But wait... the > proposed tree already contains "math" and "time" packages. Yes. That was a hard decision I made, and I'm sort of waiting for Guido to veto it: it would negate the easy backwards compatible path of providing a toplevel module for each module which is moved somewhere else which does "from import *". > If there is no conflict (is there a conflict?) then the > "unix" package should probably be named "posix". I hardly agree. "dl", for example, is a common function on unices, but it is not part of the POSIX standard. I think "posix" module should have POSIX fucntions, and the "unix" package should deal with functinality available on real-life unices. standards-are-fun-aren't-they-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From pf at artcom-gmbh.de Mon Mar 27 08:52:25 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Mon, 27 Mar 2000 08:52:25 +0200 (MEST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: from Moshe Zadka at "Mar 27, 2000 7:35:23 am" Message-ID: Hi! Moshe Zadka wrote: > Yes. That was a hard decision I made, and I'm sort of waiting for Guido to > veto it: it would negate the easy backwards compatible path of providing > a toplevel module for each module which is moved somewhere else which does > "from import *". If the result of this renaming initiative will be that I can't use import sys, os, time, re, struct, cPickle, parser import Tkinter; Tk=Tkinter; del Tkinter anymore in Python 1.x and instead I have to change this into (for example): form posix import time from text import re from bin import struct from Python import parser from ui import Tkinter; ... ... I would really really *HATE* this change! [side note: The 'from MODULE import ...' form is evil and I have abandoned its use in favor of the 'import MODULE' form in 1987 or so, as our Modula-2 programs got bigger and bigger. With 20+ software developers working on a ~1,000,000 LOC of Modula-2 software system, this decision proofed itself well. The situation with Python is comparable. Avoiding 'from ... import' rewards itself later, when your software has grown bigger and when it comes to maintaince by people not familar with the used modules. ] May be I didn't understand what this new subdivision of the standard library should achieve. The library documentation provides a existing logical subdivision into chapters, which group the library into several kinds of services. IMO this subdivision could be discussed and possibly revised. But at the moment I got the impression, that it was simply ignored. Why? What's so bad with it? Why is a subdivision on the documentation level not sufficient? Why should modules be moved into packages? I don't get it. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From moshez at math.huji.ac.il Mon Mar 27 09:09:18 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Mon, 27 Mar 2000 09:09:18 +0200 (IST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: Message-ID: On Mon, 27 Mar 2000, Peter Funk wrote: > If the result of this renaming initiative will be that I can't use > import sys, os, time, re, struct, cPickle, parser > import Tkinter; Tk=Tkinter; del Tkinter > anymore in Python 1.x and instead I have to change this into (for example): > form posix import time from time import time > from text import re > from bin import struct > from Python import parser > from ui import Tkinter; ... Yes. > I would really really *HATE* this change! Well, I'm sorry to hear that -- I'm waiting for this change to happen for a long time. > [side note: > The 'from MODULE import ...' form is evil and I have abandoned its use > in favor of the 'import MODULE' form in 1987 or so, as our Modula-2 > programs got bigger and bigger. With 20+ software developers working > on a ~1,000,000 LOC of Modula-2 software system, this decision > proofed itself well. Well, yes. Though syntactically equivalent, from package import module Is the recommended way to use packages, unless there is a specific need. > May be I didn't understand what this new subdivision of the standard > library should achieve. Namespace cleanup. Too many toplevel names seem evil to some of us. > Why is a subdivision on the documentation level not sufficient? > Why should modules be moved into packages? I don't get it. To allow a greater number of modules to live without worrying about namespace collision. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping at lfw.org Mon Mar 27 10:08:57 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 27 Mar 2000 00:08:57 -0800 (PST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: Message-ID: Hi, Peter. Your question as to the purpose of module reorganization is well worth asking, and perhaps we should stand back for a while and try to really answer it well first. I think that my answers for your question would be: 1. To alleviate potential namespace collision. 2. To permit talking about packages as a unit. I hereby solicit other reasons from the rest of the group... Reason #1 is not a serious problem yet, but i think i've seen a few cases where it might start to be an issue. Reason #2 has to do with things like assigning people responsibility for taking care of a particular package, or making commitments about which packages will be available with which distributions or platforms. Hence, for example, the idea of the "unix" package. Neither of these reasons necessitate a deep and holy hierarchy, so we certainly want to keep it shallow and simple if we're going to do this at all. > If the result of this renaming initiative will be that I can't use > import sys, os, time, re, struct, cPickle, parser > import Tkinter; Tk=Tkinter; del Tkinter > anymore in Python 1.x and instead I have to change this into (for example): > form posix import time > from text import re > from bin import struct > from Python import parser > from ui import Tkinter; ... Won't import sys, os, time.time, text.re, bin.struct, data.pickle, python.parser also work? ...i hope? > The library documentation provides a existing logical subdivision into > chapters, which group the library into several kinds of services. > IMO this subdivision could be discussed and possibly revised. > But at the moment I got the impression, that it was simply ignored. > Why? What's so bad with it? I did look at the documentation for some guidance in arranging the modules, though admittedly it didn't direct me much. -- ?!ng "In the sciences, we are now uniquely privileged to sit side by side with the giants on whose shoulders we stand." -- Gerald Holton From pf at artcom-gmbh.de Mon Mar 27 10:35:50 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Mon, 27 Mar 2000 10:35:50 +0200 (MEST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: from Ka-Ping Yee at "Mar 27, 2000 0: 8:57 am" Message-ID: Hi! > > import sys, os, time, re, struct, cPickle, parser [...] Ka-Ping Yee: > Won't > > import sys, os, time.time, text.re, bin.struct, data.pickle, python.parser > > also work? ...i hope? That is even worse. So not only the 'import' sections, which I usually keep at the top of my modules, have to be changed: This way for example 're.compile(...' has to be changed into 'text.re.compile(...' all over the place possibly breaking the 'Maximum Line Length' styleguide rule. Regards, Peter From pf at artcom-gmbh.de Mon Mar 27 12:16:48 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Mon, 27 Mar 2000 12:16:48 +0200 (MEST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? Message-ID: String objects have grown methods since 1.5.2. So it makes sense to provide a class 'UserString' similar to 'UserList' and 'UserDict', so that there is a standard base class to inherit from, if someone has the desire to extend the string methods. What do you think? Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From fdrake at acm.org Mon Mar 27 17:12:55 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 27 Mar 2000 10:12:55 -0500 (EST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: References: Message-ID: <14559.31351.783771.472320@weyr.cnri.reston.va.us> Moshe Zadka writes: > Well, I'm certainly sorry I gave that impression -- the reason I wans't > "right" wasn't that, it was more my desire to be "fast" -- I wanted to > have some proposal out the door, since it is harder to argue about > something concrete. The biggest prrof of concept that we all agree is that > no one seriously took objections to anything -- there were just some minor > nits to pick. It's *really easy* to argue about something concrete. ;) It's just harder to misunderstand the specifics of the proposal. It's too early to say what people think; not enough people have had time to look at the proposals yet. On the other hand, I think its great -- that we have a proposal to discuss. I'll make my comments after I've read through the last version posted when I have time to read these. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake at acm.org Mon Mar 27 18:20:43 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 27 Mar 2000 11:20:43 -0500 (EST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: References: Message-ID: <14559.35419.793906.868645@weyr.cnri.reston.va.us> Peter Funk said: > The library documentation provides a existing logical subdivision into > chapters, which group the library into several kinds of services. > IMO this subdivision could be discussed and possibly revised. > But at the moment I got the impression, that it was simply ignored. > Why? What's so bad with it? Ka-Ping Yee writes: > I did look at the documentation for some guidance in arranging > the modules, though admittedly it didn't direct me much. The library reference is pretty well disorganized at this point. I want to improve that for the 1.6 docs. I received a suggestion a few months back, but haven't had a chance to dig into it, or even respond to the email. ;( -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From jeremy at cnri.reston.va.us Mon Mar 27 19:14:46 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Mon, 27 Mar 2000 12:14:46 -0500 (EST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: References: Message-ID: <14559.38662.835289.499610@goon.cnri.reston.va.us> >>>>> "PF" == Peter Funk writes: PF> That is even worse. So not only the 'import' sections, which I PF> usually keep at the top of my modules, have to be changed: This PF> way for example 're.compile(...' has to be changed into PF> 'text.re.compile(...' all over the place possibly breaking the PF> 'Maximum Line Length' styleguide rule. There is nothing wrong with changing only the import statement: from text import re The only problematic use of from ... import ... is from text.re import * which adds an unspecified set of names to the current namespace. Jeremy From moshez at math.huji.ac.il Mon Mar 27 19:59:34 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Mon, 27 Mar 2000 19:59:34 +0200 (IST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <14559.35419.793906.868645@weyr.cnri.reston.va.us> Message-ID: Peter Funk said: > The library documentation provides a existing logical subdivision into > chapters, which group the library into several kinds of services. > IMO this subdivision could be discussed and possibly revised. > But at the moment I got the impression, that it was simply ignored. > Why? What's so bad with it? Ka-Ping Yee writes: > I did look at the documentation for some guidance in arranging > the modules, though admittedly it didn't direct me much. Fred L. Drake, Jr. writes: > The library reference is pretty well disorganized at this point. I > want to improve that for the 1.6 docs. Let me just mention where my inspirations came from: shame of shames, it came from Perl. It's hard to use Perl's organization as is, because it doesn't (view itself) as a general purpose langauge: so things like CGI.pm are toplevel, and regex's are part of the syntax. However, there are a lot of good hints there. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From klm at digicool.com Mon Mar 27 20:31:01 2000 From: klm at digicool.com (Ken Manheimer) Date: Mon, 27 Mar 2000 13:31:01 -0500 (EST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <14559.38662.835289.499610@goon.cnri.reston.va.us> Message-ID: On Mon, 27 Mar 2000, Jeremy Hylton wrote: > >>>>> "PF" == Peter Funk writes: > > PF> That is even worse. So not only the 'import' sections, which I > PF> usually keep at the top of my modules, have to be changed: This > PF> way for example 're.compile(...' has to be changed into > PF> 'text.re.compile(...' all over the place possibly breaking the > PF> 'Maximum Line Length' styleguide rule. > > There is nothing wrong with changing only the import statement: > from text import re > > The only problematic use of from ... import ... is > from text.re import * > which adds an unspecified set of names to the current namespace. Actually, i think there's another important gotcha with from .. import which may be contributing to peter's sense of concern, but which i don't think needs to in this case. I also thought we had discussed providing transparency in general, at least of the 1.x series. ? The other gotcha i mean applies when the thing you're importing is a terminal, ie a non-module. Then, changes to the assignments of the names in the original module aren't reflected in the names you've imported - they're decoupled from the namespace of the original module. When the thing you're importing is, itself, a module, the same kind of thing *can* happen, but you're more generally concerned with tracking revisions to the contents of those modules, which is tracked ok in the thing you "from .. import"ed. I thought the other problem peter was objecting to, having to change the import sections in the first place, was going to be avoided in the 1.x series (if we do this kind of thing) by inherently extending the import path to include all the packages, so people need not change their code? Seems like most of this would be fairly transparent w.r.t. the operation of existing applications. Have i lost track of the discussion? Ken klm at digicool.com From moshez at math.huji.ac.il Mon Mar 27 20:55:35 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Mon, 27 Mar 2000 20:55:35 +0200 (IST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: Message-ID: On Mon, 27 Mar 2000, Ken Manheimer wrote: > I also thought we had discussed providing > transparency in general, at least of the 1.x series. ? Yes, but it would be clearly marked as deprecated in 1.7, print out error messages in 1.8 and won't work at all in 3000. (That's my view on the point, but I got the feeling this is where the wind is blowing). So the transperancy mechanism is intended only to be "something backwards compatible"...it's not supposed to be a reason why things are ugly (I don't think they are, though). BTW: the transperancy mechanism I suggested was not pushing things into the import path, but rather having toplevel modules which "from import *" from the modules that were moved. E.g., re.py would contain # Deprecated: don't import re, it won't work in future releases from text.re import * -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From skip at mojam.com Mon Mar 27 21:34:39 2000 From: skip at mojam.com (Skip Montanaro) Date: Mon, 27 Mar 2000 13:34:39 -0600 (CST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: References: Message-ID: <14559.47055.604042.381126@beluga.mojam.com> Peter> The library documentation provides a existing logical subdivision Peter> into chapters, which group the library into several kinds of Peter> services. Perhaps it makes sense to revise the library reference manual's documentation to reflect the proposed package hierarchy once it becomes concrete. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From skip at mojam.com Mon Mar 27 21:52:08 2000 From: skip at mojam.com (Skip Montanaro) Date: Mon, 27 Mar 2000 13:52:08 -0600 (CST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: References: Message-ID: <14559.48104.34263.680278@beluga.mojam.com> Responding to an early item in this thread and trying to adapt to later items... Ping wrote: I'm not convinced "mime" needs a separate branch here. (This is the deepest part of the tree, and at three levels small alarm bells went off in my head.) It's not clear that mime should be beneath text/mail. Moshe moved it up a level, but not the way I would have done it. I think the mime stuff still belongs in a separate mime package. I wouldn't just sprinkle the modules under text. I see two possibilities: text>mime net>mime I prefer net>mime, because MIME and its artifacts are used heavily in networked applications where the content being transferred isn't text. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From fdrake at acm.org Mon Mar 27 22:05:32 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 27 Mar 2000 15:05:32 -0500 (EST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <14559.47055.604042.381126@beluga.mojam.com> References: <14559.47055.604042.381126@beluga.mojam.com> Message-ID: <14559.48908.354425.313775@weyr.cnri.reston.va.us> Skip Montanaro writes: > Perhaps it makes sense to revise the library reference manual's > documentation to reflect the proposed package hierarchy once it becomes > concrete. I'd go for this. ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido at python.org Mon Mar 27 22:43:06 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 27 Mar 2000 15:43:06 -0500 Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? Message-ID: <200003272043.PAA18445@eric.cnri.reston.va.us> The _tkinter.c source code is littered with #ifdefs that mostly center around distinguishing between Tcl/Tk 8.0 and older versions. The two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2. Would it be reasonable to assume that everybody is using at least Tcl/Tk version 8.0? This would simplify the code somewhat. Or should I ask this in a larger forum? --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Mon Mar 27 22:59:04 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 27 Mar 2000 15:59:04 -0500 (EST) Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? In-Reply-To: <200003272043.PAA18445@eric.cnri.reston.va.us> References: <200003272043.PAA18445@eric.cnri.reston.va.us> Message-ID: <14559.52120.633384.651377@weyr.cnri.reston.va.us> Guido van Rossum writes: > The _tkinter.c source code is littered with #ifdefs that mostly center > around distinguishing between Tcl/Tk 8.0 and older versions. The > two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2. > > Would it be reasonable to assume that everybody is using at least > Tcl/Tk version 8.0? This would simplify the code somewhat. Simplify! It's more important that the latest versions are supported than pre-8.0 versions. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gstein at lyra.org Mon Mar 27 23:31:30 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 27 Mar 2000 13:31:30 -0800 (PST) Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? In-Reply-To: <14559.52120.633384.651377@weyr.cnri.reston.va.us> Message-ID: On Mon, 27 Mar 2000, Fred L. Drake, Jr. wrote: > Guido van Rossum writes: > > The _tkinter.c source code is littered with #ifdefs that mostly center > > around distinguishing between Tcl/Tk 8.0 and older versions. The > > two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2. > > > > Would it be reasonable to assume that everybody is using at least > > Tcl/Tk version 8.0? This would simplify the code somewhat. > > Simplify! It's more important that the latest versions are > supported than pre-8.0 versions. I strongly agree. My motto is, "if the latest Python version doesn't work for you, then don't upgrade!" This is also Open Source -- they can easily get the source to the old _Tkinter if they want new Python + 7.x support. If you ask in a larger forum, then you are certain to get somebody to say, "yes... I need that support." Then you have yourself a quandary :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From effbot at telia.com Mon Mar 27 23:46:50 2000 From: effbot at telia.com (Fredrik Lundh) Date: Mon, 27 Mar 2000 23:46:50 +0200 Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? References: <200003272043.PAA18445@eric.cnri.reston.va.us> Message-ID: <009801bf9835$f85b87e0$34aab5d4@hagrid> Guido van Rossum wrote: > The _tkinter.c source code is littered with #ifdefs that mostly center > around distinguishing between Tcl/Tk 8.0 and older versions. The > two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2. > > Would it be reasonable to assume that everybody is using at least > Tcl/Tk version 8.0? This would simplify the code somewhat. yes. if people are using older versions, they can always use the version shipped with 1.5.2. (has anyone actually tested that one with pre-8.0 versions, btw?) > Or should I ask this in a larger forum? maybe. maybe not. From jack at oratrix.nl Mon Mar 27 23:58:56 2000 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 27 Mar 2000 23:58:56 +0200 Subject: [Python-Dev] 1.6 job list In-Reply-To: Message by Moshe Zadka , Sat, 25 Mar 2000 12:16:23 +0200 (IST) , Message-ID: <20000327215901.ABA08F58C1@oratrix.oratrix.nl> Recently, Moshe Zadka said: > Here's a reason: there shouldn't be changes we'll retract later -- we > need to come up with the (more or less) right hierarchy the first time, > or we'll do a lot of work for nothing. I think I disagree here (hmm, it's probably better to say that I agree, but I agree on a tangent:-). I think we can be 100% sure that we're wrong the first time around, and we should plan for that. One of the reasons why were' wrong is because the world is moving on. A module that at this point in time will reside at some level in the hierarchy may in a few years (or shorter) be one of a large family and be beter off elsewhere in the hierarchy. It would be silly if it would have to stay where it was because of backward compatability. If we plan for being wrong we can make the mistakes less painful. I think that a simple scheme where a module can say "I'm expecting the Python 1.6 namespace layout" would make transition to a completely different Python 1.7 namespace layout a lot less painful, because some agent could do the mapping. This can either happen at runtime (through a namespace, or through an import hook, or probably through other tricks as well) or optionally by a script that would do the translations. Of course this doesn't mean we should go off and hack in a couple of namespaces (hence my "agreeing on a tangent"), but it does mean that I think Gregs idea of not wanting to change everything at once has merit. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From pf at artcom-gmbh.de Tue Mar 28 00:11:39 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Tue, 28 Mar 2000 00:11:39 +0200 (MEST) Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? In-Reply-To: <200003272043.PAA18445@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 27, 2000 3:43: 6 pm" Message-ID: Guido van Rossum: > Or should I ask this in a larger forum? Don't ask. Simply tell the people on comp.lang.python that support for the ancient Tcl/Tk versions < 8.0 will be dropped in Python 1.6. Period. ;-) Regards, Peter From guido at python.org Tue Mar 28 00:17:33 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 27 Mar 2000 17:17:33 -0500 Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? In-Reply-To: Your message of "Tue, 28 Mar 2000 00:11:39 +0200." References: Message-ID: <200003272217.RAA28910@eric.cnri.reston.va.us> > Don't ask. Simply tell the people on comp.lang.python that support > for the ancient Tcl/Tk versions < 8.0 will be dropped in Python 1.6. > Period. ;-) OK, I'm convinced. We will pre-8.0 support. Could someone submit a set of patches? It would make sense to call #error if a pre-8.0 version is detected at compile-time! --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond at skippinet.com.au Tue Mar 28 01:02:21 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue, 28 Mar 2000 09:02:21 +1000 Subject: [Python-Dev] Windows and PyObject_NEW In-Reply-To: <200003251459.PAA09181@python.inrialpes.fr> Message-ID: Sorry for the delay, but Gordon's reply was accurate so should have kept you going ;-) > I've been reading Jeffrey Richter's "Advanced Windows" last night in order > to try understanding better why PyObject_NEW is implemented > differently for > Windows. So that is where the heaps discussion came from :-) The problem is simply "too many heaps are available". > Again, I feel uncomfortable with this, especially now, when > I'm dealing with the memory aspect of Python's object > constructors/desctrs. It is this exact reason it was added in the first place. I believe this code predates the "_d" convention on Windows. AFAIK, this could could be removed today and everything should work (but see below why it probably wont) MSVC allows you to choose from a number of CRT versions. Only in one of these versions is the CRTL completely shared between the .EXE and all the various .DLLs in the application. What was happening is that this macro ended up causing the "malloc" for a new object to occur in Python15.dll, but the Python type system meant that tp_dealloc() (to cleanup the object) was called in the DLL implementing the new type. Unless Python15.dll and our extension DLL shared the same CRTL (and hence the same malloc heap, fileno table etc) things would die. The DLL version of "free()" would complain, as it had never seen the pointer before. This change meant the malloc() and the free() were both implemented in the same DLL/EXE This was particularly true with Debug builds. MSVC's debug CRTL implementations have some very nice debugging features (guard-blocks, block validity checks with debugger breapoints when things go wrong, leak tracking, etc). However, this means they use yet another heap. Mixing debug builds with release builds in Python is a recipe for disaster. Theoretically, the problem has largely gone away now that a) we have seperate "_d" versions and b) the "official" postition is to use the same CRTL as Python15.dll. However, is it still a minor FAQ on comp.lang.python why PyRun_ExecFile (or whatever) fails with mysterious errors - the reason is exactly the same - they are using a different CRTL, so the CRTL can't map the file pointers correctly, and we get unexplained IO errors. But now that this macro hides the malloc problem, there may be plenty of "home grown" extensions out there that do use a different CRTL and dont see any problems - mainly cos they arent throwing file handles around! Finally getting to the point of all this: We now also have the PyMem_* functions. This problem also doesnt exist if extension modules use these functions instead of malloc()/free(). We only ask them to change the PyObject allocations and deallocations, not the rest of their code, so it is no real burden. IMO, we should adopt these functions for most internal object allocations and the extension samples/docs. Also, we should consider adding relevant PyFile_fopen(), PyFile_fclose() type functions, that simply are a thin layer over the fopen/fclose functions. If extensions writers used these instead of fopen/fclose we would gain a few fairly intangible things - lose the minor FAQ, platforms that dont have fopen at all (eg, CE) would love you, etc. Mark. From mhammond at skippinet.com.au Tue Mar 28 03:04:11 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue, 28 Mar 2000 11:04:11 +1000 Subject: [Python-Dev] Windows and PyObject_NEW In-Reply-To: Message-ID: [I wrote] > Also, we should consider adding relevant PyFile_fopen(), PyFile_fclose() Maybe I had something like PyFile_FromString in mind!! That-damn-time-machine-again-ly, Mark. From moshez at math.huji.ac.il Tue Mar 28 07:36:59 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 28 Mar 2000 07:36:59 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: <14559.48104.34263.680278@beluga.mojam.com> Message-ID: On Mon, 27 Mar 2000, Skip Montanaro wrote: > Responding to an early item in this thread and trying to adapt to later > items... > > Ping wrote: > > I'm not convinced "mime" needs a separate branch here. (This is the > deepest part of the tree, and at three levels small alarm bells went off > in my head.) > > It's not clear that mime should be beneath text/mail. Moshe moved it up a > level, Actually, Ping moved it up a level. I only decided to agree with him retroactively... > I think the mime stuff still > belongs in a separate mime package. I wouldn't just sprinkle the modules > under text. I see two possibilities: > > text>mime > net>mime > > I prefer net>mime, I don't. MIME is not a "wire protocol" like all the other things in net -- it's used inside another wire protocol, like RFC822 or HTTP. If at all, I'd go for having a net/ mail/ mime/ Package, but Ping would yell at me again for nesting 3 levels. I could live with text/mime, because the mime format basically *is* text. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Tue Mar 28 07:47:13 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 28 Mar 2000 07:47:13 +0200 (IST) Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? In-Reply-To: <200003272043.PAA18445@eric.cnri.reston.va.us> Message-ID: On Mon, 27 Mar 2000, Guido van Rossum wrote: > The _tkinter.c source code is littered with #ifdefs that mostly center > around distinguishing between Tcl/Tk 8.0 and older versions. The > two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2. > > Would it be reasonable to assume that everybody is using at least > Tcl/Tk version 8.0? This would simplify the code somewhat. I want to ask a different question: when is Python going to officially support Tcl/Tk v8.2/8.3? I'd really like for this to happen, as I hate having several libraries of Tcl/Tk on my machine. (I assume you know the joke about Jews always answering a question with a question ) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From jack at oratrix.nl Tue Mar 28 10:55:56 2000 From: jack at oratrix.nl (Jack Jansen) Date: Tue, 28 Mar 2000 10:55:56 +0200 Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message by Ka-Ping Yee , Sat, 25 Mar 2000 23:37:11 -0800 (PST) , Message-ID: <20000328085556.CFEAC370CF2@snelboot.oratrix.nl> > Okay, here's another shot at it. Notice a few things: > ... > bin > ... > image ... > sound > ... These I don't like, I think image and sound should be either at toplevel, or otherwise in a separate package (mm?). I know images and sounds are customarily stored in binary files, but so are databases and other things. Hmm, the bin group in general seems to be a bit of a catch-all. gzip, zlib and chunk definitely belong together, but struct is a wholly different beast. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From jack at oratrix.nl Tue Mar 28 11:01:51 2000 From: jack at oratrix.nl (Jack Jansen) Date: Tue, 28 Mar 2000 11:01:51 +0200 Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message by Moshe Zadka , Sat, 25 Mar 2000 20:30:26 +0200 (IST) , Message-ID: <20000328090151.86B59370CF2@snelboot.oratrix.nl> > On Sat, 25 Mar 2000, David Ascher wrote: > > > This made me think of one issue which is worth considering -- is there a > > mechanism for third-party packages to hook into the standard naming > > hierarchy? It'd be weird not to have the oracle and sybase modules within > > the db toplevel package, for example. > > My position is that any 3rd party module decides for itself where it wants > to live -- once we formalized the framework. Consider PyGTK/PyGnome, > PyQT/PyKDE -- they should live in the UI package too... For separate modules, yes. For packages this is different. As a point in case think of MacPython: it could stuff all mac-specific packages under the toplevel "mac", but it would probably be nicer if it could extend the existing namespace. It is a bit silly if mac users have to do "from mac.text.encoding import macbinary" but "from text.encoding import binhex", just because BinHex support happens to live in the core (purely for historical reasons). But maybe this holds only for the platform distributions, then it shouldn't be as much of a problem as there aren't that many. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From moshez at math.huji.ac.il Tue Mar 28 11:24:14 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 28 Mar 2000 11:24:14 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: <20000328085556.CFEAC370CF2@snelboot.oratrix.nl> Message-ID: On Tue, 28 Mar 2000, Jack Jansen wrote: > These I don't like, I think image and sound should be either at toplevel, or > otherwise in a separate package (mm?). I know images and sounds are > customarily stored in binary files, but so are databases and other things. Hmmm...I think of "bin" as "interface to binary files". Agreed that I don't have a good reason for seperating gdbm from zlib. > Hmm, the bin group in general seems to be a bit of a catch-all. gzip, zlib and > chunk definitely belong together, but struct is a wholly different beast. I think Ping and I decided to move struct to toplevel. Ping, would you like to take your last proposal and fold into it the consensual changes,, or should I? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From effbot at telia.com Tue Mar 28 11:44:14 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 28 Mar 2000 11:44:14 +0200 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead References: <200003242103.QAA03288@eric.cnri.reston.va.us> Message-ID: <02c101bf989a$2ee35860$34aab5d4@hagrid> Guido van Rossum wrote: > Similar to append(), I'd like to close this gap, and I've made the > necessary changes. This will probably break lots of code. > > Similar to append(), I'd like people to fix their code rather than > whine -- two-arg connect() has never been documented, although it's > found in much code (even the socket module test code :-( ). > > Similar to append(), I may revert the change if it is shown to cause > too much pain during beta testing... proposal: if anyone changes the API for a fundamental module, and fails to update the standard library, the change is automatically "minus one'd" for each major module that no longer works :-) (in this case, that would be -5 or so...) From effbot at telia.com Tue Mar 28 11:55:19 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 28 Mar 2000 11:55:19 +0200 Subject: [Python-Dev] Great Renaming? What is the goal? References: Message-ID: <02c901bf989b$be203d80$34aab5d4@hagrid> Peter Funk wrote: > Why should modules be moved into packages? I don't get it. fwiw, neither do I... I'm not so sure that Python really needs a simple reorganization of the existing set of standard library modules. just moving the modules around won't solve the real problems with the 1.5.2 std library... > IMO this subdivision could be discussed and possibly revised. here's one proposal: http://www.pythonware.com/people/fredrik/librarybook-contents.htm From gstein at lyra.org Tue Mar 28 12:09:44 2000 From: gstein at lyra.org (Greg Stein) Date: Tue, 28 Mar 2000 02:09:44 -0800 (PST) Subject: [Python-Dev] 3rd parties in the hierarchy (was: module reorg) In-Reply-To: <20000328090151.86B59370CF2@snelboot.oratrix.nl> Message-ID: On Tue, 28 Mar 2000, Jack Jansen wrote: > > On Sat, 25 Mar 2000, David Ascher wrote: > > > This made me think of one issue which is worth considering -- is there a > > > mechanism for third-party packages to hook into the standard naming > > > hierarchy? It'd be weird not to have the oracle and sybase modules within > > > the db toplevel package, for example. > > > > My position is that any 3rd party module decides for itself where it wants > > to live -- once we formalized the framework. Consider PyGTK/PyGnome, > > PyQT/PyKDE -- they should live in the UI package too... > > For separate modules, yes. For packages this is different. As a point in case > think of MacPython: it could stuff all mac-specific packages under the > toplevel "mac", but it would probably be nicer if it could extend the existing > namespace. It is a bit silly if mac users have to do "from mac.text.encoding > import macbinary" but "from text.encoding import binhex", just because BinHex > support happens to live in the core (purely for historical reasons). > > But maybe this holds only for the platform distributions, then it shouldn't be > as much of a problem as there aren't that many. Assuming that you use an archive like those found in my "small" distro or Gordon's distro, then this is no problem. The archive simply recognizes and maps "text.encoding.macbinary" to its own module. Another way to say it: stop thinking in terms of the filesystem as the sole mechanism for determining placement in the package hierarchy. Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido at python.org Tue Mar 28 15:38:12 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 08:38:12 -0500 Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? In-Reply-To: Your message of "Tue, 28 Mar 2000 07:47:13 +0200." References: Message-ID: <200003281338.IAA29532@eric.cnri.reston.va.us> > I want to ask a different question: when is Python going to officially > support Tcl/Tk v8.2/8.3? I'd really like for this to happen, as I hate > having several libraries of Tcl/Tk on my machine. This is already in the CVS tree, except for the Windows installer. Python 1.6 will not install a separate complete Tcl installation; instead, it will install the needed Tcl/Tk files (Tcl/Tk 8.3 or newer) in the Python tree, so it won't affect existing Tcl/Tk installations. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Mar 28 15:57:02 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 08:57:02 -0500 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: Your message of "Tue, 28 Mar 2000 11:44:14 +0200." <02c101bf989a$2ee35860$34aab5d4@hagrid> References: <200003242103.QAA03288@eric.cnri.reston.va.us> <02c101bf989a$2ee35860$34aab5d4@hagrid> Message-ID: <200003281357.IAA29621@eric.cnri.reston.va.us> > proposal: if anyone changes the API for a fundamental module, and > fails to update the standard library, the change is automatically "minus > one'd" for each major module that no longer works :-) > > (in this case, that would be -5 or so...) Oops. Sigh. While we're pretending that this change goes in, could you point me to those five modules? Also, we need to add test cases to the standard test suite that would have found these! --Guido van Rossum (home page: http://www.python.org/~guido/) From gward at cnri.reston.va.us Tue Mar 28 17:04:47 2000 From: gward at cnri.reston.va.us (Greg Ward) Date: Tue, 28 Mar 2000 10:04:47 -0500 Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: ; from ping@lfw.org on Sat, Mar 25, 2000 at 11:37:11PM -0800 References: Message-ID: <20000328100446.A2586@cnri.reston.va.us> On 25 March 2000, Ka-Ping Yee said: > Okay, here's another shot at it. Notice a few things: Damn, I started writing a response to Moshe's original proposal -- and *then* saw this massive thread. Oh well. Turns out I still have a few useful things to say: First, any organization scheme for the standard library (or anything else, for that matter) should have a few simple guidelines. Here are two: * "deep hierarchies considered harmful": ie. avoid sub-packages if at all possible * "everything should have a purpose": every top-level package should be describable with a single, clear sentence of plain language. Eg.: net - Internet protocols, data formats, and client/server infrastructure unix - Unix-specific system calls, protocols, and conventions And two somewhat open issues: * "as long as we're renaming...": maybe this would be a good time to standardize naming conventions, eg. "cgi" -> "cgilib" *or* "{http,ftp,url,...}lib" -> "{http,ftp,url}...", "MimeWriter" -> "mimewriter", etc. * "shared namespaces vs system namespaces": the Perl model of "nothing belongs to The System; anyone can add a module in Text:: or Net:: or whatever" works there because Perl doesn't have __init__ files or anything to distinguish module namespaces; they just are. Python's import mechanism would have to change to support this, and the fact that __init__ files may contain arbitrary code makes this feel like a very tricky change to make. Now specific comments... > net > urlparse > urllib > ftplib > gopherlib > imaplib > poplib > nntplib > smtplib > telnetlib > httplib > cgi Rename? Either cgi -> cgilib or foolib -> foo? > server > BaseHTTPServer > CGIHTTPServer > SimpleHTTPServer > SocketServer > asynchat > asyncore This is one good place for a sub-package. It's a also a good place to rename: the convention for Python module names seems to be all-lowercase; and "Server" is redundant when you're in the net.server package. How about: net.server.base_http net.server.cgi_http net.server.simple_http net.server.socket Underscores negotiable. They don't seem to be popular in module names, although sometimes they would be real life-savers. > text I think "text" should mean "plain old unstructured, un-marked-up ASCII text", where "unstructured, un-marked-up" really means "not structured or marked up in a well-known standard way". Or maybe not. I'm just trying to come up with an excuse for moving xml to top-level, which I think is where it belongs. Maybe the excuse should just be, "XML is really important and visible, and anyways Paul Prescod will raise a stink if it isn't put at top-level in Python package-space". > re # general-purpose parsing Top-level: this is a fundamental module that should be treated on a par with 'string'. (Well, except for building RE methods into strings... hmmMMmm...maybe... [no, I'm kidding!]) > sgmllib > htmllib > htmlentitydefs Not sure what to do about these. Someone referred somewhere to a "web" top-level package, which seems to have disappeared. If it reappars, it would be a good place for the HTML modules (not to mention a big chunk of "net") -- this would mainly be for "important and visible" (ie. PR) reasons, rather than sound technical reasons. > xml > whatever the xml-sig puts here Should be top-level. > mail > rfc822 > mailbox > mhlib "mail" should either be top-level or under "net". (Yes, I *know* it's not a wire-level protocol: that's what net.smtplib is for. But last time I checked, email is pretty useless without a network. And vice-versa.) Or maybe these all belong in a top-level "data" package: I'm starting to warm to that. > bin > gzip > zlib > chunk > struct > image > imghdr > colorsys # a bit unsure, but doesn't go anywhere else > imageop > imgfile > rgbimg > yuvconvert > sound > aifc > sndhdr > toaiff > audiodev > sunau > sunaudio > wave > audioop > sunaudiodev I agree with Jack: image and sound (audio?) should be top-level. I don't think I like the idea of an intervening "mm" or "multimedia" or "media" or what-have-you package, though. The other stuff in "bin" is kind of a grab-bag: "chunk" and "struct" might belong in the mythical "data" package. > db > anydbm > whichdb > bsddb > dbm > dbhash > dumbdbm > gdbm Yup. > math > math # library functions > cmath > fpectl # type-related > fpetest > array > mpz > fpformat # formatting > locale > bisect # algorithm: also unsure, but doesn't go anywhere else > random # randomness > whrandom > crypt # cryptography > md5 > rotor > sha Hmmm. "locale" has already been dealt with; obviously it should be top-evel. I think "array" should be top-level or under the mythical "data". Six crypto-related modules seems like enough to justify a top-level "crypt" package, though. > time > calendar > time > tzparse > sched > timing Yup. > interp > new > linecache # handling .py files [...] > tabnanny > pstats > rlcompleter # this might go in "ui"... I like "python" for this one. (But I'm not sure if tabnanny and rlcompleter belong there.) > security > Bastion > rexec > ihooks What does ihooks have to do with security? > file > dircache > path -- a virtual module which would do a from path import * > nturl2path > macurl2path > filecmp > fileinput > StringIO Lowercase for consistency? > glob > fnmatch > stat > statcache > statvfs > tempfile > shutil > pipes > popen2 > commands > dl No problem until these last two -- 'commands' is a Unix-specific thing that has very little to do with the filesystem per se, and 'dl' is (as I understand it) deep ju-ju with sharp edges that should probably be hidden away in the 'python' ('sys'?) package. Oh yeah, "dl" should be elsewhere -- "python" maybe? Top-level? Perhaps we need a "deepmagic" package for "dl" and "new"? ;-) > data > pickle > shelve > xdrlib > copy > copy_reg > UserDict > UserList > pprint > repr > (cPickle) Oh hey, it's *not* a mythical package! Guess I didn't read far enough ahead. I like it, but would add more stuff to it (obviously): 'struct', 'chunk', 'array' for starters. Should cPickle be renamed to fastpickle? > threads > thread > threading > Queue Lowercase? > ui > _tkinter > curses > Tkinter > cmd > getpass > getopt > readline > users > pwd > grp > nis These belong in "unix". Possibly "nis" belongs in "net" -- do any non-Unix OSes use NIS? > sgi > al > cd > cl > fl > fm > gl > misc (what used to be sgimodule.c) > sv Should this be "sgi" or "irix"? Ditto for "sun" vs "solaris" if there are a significant number of Sun/Solaris modules. Note that the respective trademark holders might get very antsy about who gets to put names in those namespaces -- that's exactly what happened with Sun, Solaris 8, and Perl. I believe the compromise they arrived at was that the "Solaris::" namespace remains open, but Sun gets the "Sun::" namespace. There should probably be a win32 package, for core registry access stuff if nothing else. There might someday be a "linux" package; it's highly unlikely there would be a "pc" or "alpha" package though. All of those argue over "irix" and "solaris" instead of "sgi" and "sun". Greg From gvwilson at nevex.com Tue Mar 28 17:45:10 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Tue, 28 Mar 2000 10:45:10 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: Message-ID: > > Greg Wilson > > If None becomes a keyword, I would like to ask whether it could be > > used to signal that a method is a class method, as opposed to an > > instance method: > I'd like to know what you mean by "class" method. (I do know C++ and > Java, so I have some idea...). Specifically, my question is: how does > a class method access class variables? They can't be totally > unqualified (because that's very unpythonic). If they are qualified by > the class's name, I see it as a very mild improvement on the current > situation. You could suggest, for example, to qualify class variables > by "class" (so you'd do things like: > > class.x = 1 > > ), but I'm not sure I like it. On the whole, I think it is a much > bigger issue on how be denote class methods. I don't like overloading the word 'class' this way, as it makes it difficult to distinguish a parent's 'foo' member and a child's 'foo' member: class Parent: foo = 3 ...other stuff... class Child(Parent): foo = 9 def test(): print class.foo # obviously 9, but how to get 3? I think that using the class's name instead of 'self' will be easy to explain, will look like it belongs in the language, will be unlikely to lead to errors, and will handle multiple inheritance with ease: class Child(Parent): foo = 9 def test(): print Child.foo # 9 print Parent.foo # 3 > Also, one slight problem with your method of denoting class methods: > currently, it is possible to add instance method at run time to a > class by something like > > class C: > pass > > def foo(self): > pass > > C.foo = foo > > In your suggestion, how do you view the possiblity of adding class > methods to a class? (Note that "foo", above, is also perfectly usable > as a plain function). Hm, I hadn't thought of this... :-( > > I'd also like to ask (separately) that assignment to None be defined as a > > no-op, so that programmers can write: > > > > year, month, None, None, None, None, weekday, None, None = gmtime(time()) > > > > instead of having to create throw-away variables to fill in slots in > > tuples that they don't care about. > > Currently, I use "_" for that purpose, after I heard the idea from > Fredrik Lundh. I do the same thing when I need to; I just thought that making assignment to "None" special would formalize this in a readable way. From jeremy at cnri.reston.va.us Tue Mar 28 19:31:48 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Tue, 28 Mar 2000 12:31:48 -0500 (EST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: References: <14559.38662.835289.499610@goon.cnri.reston.va.us> Message-ID: <14560.60548.74378.613188@goon.cnri.reston.va.us> >>>>> "KLM" == Ken Manheimer writes: >> The only problematic use of from ... import ... is >> from text.re import * >> which adds an unspecified set of names to the current >> namespace. KLM> The other gotcha i mean applies when the thing you're importing KLM> is a terminal, ie a non-module. Then, changes to the KLM> assignments of the names in the original module aren't KLM> reflected in the names you've imported - they're decoupled from KLM> the namespace of the original module. This isn't an import issue. Some people simply don't understand that assignment (and import as form of assignment) is name binding. Import binds an imported object to a name in the current namespace. It does not affect bindings in other namespaces, nor should it. KLM> I thought the other problem peter was objecting to, having to KLM> change the import sections in the first place, was going to be KLM> avoided in the 1.x series (if we do this kind of thing) by KLM> inherently extending the import path to include all the KLM> packages, so people need not change their code? Seems like KLM> most of this would be fairly transparent w.r.t. the operation KLM> of existing applications. I'm not sure if there is consensus on backwards compatibility. I'm not in favor of creating a huge sys.path that includes every package's contents. It would be a big performance hit. Jeremy From moshez at math.huji.ac.il Tue Mar 28 19:36:47 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 28 Mar 2000 19:36:47 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: <20000328100446.A2586@cnri.reston.va.us> Message-ID: On Tue, 28 Mar 2000, Greg Ward wrote: > * "deep hierarchies considered harmful": ie. avoid sub-packages if at > all possible > > * "everything should have a purpose": every top-level package should > be describable with a single, clear sentence of plain language. Good guidelines, but they aren't enough. And anyway, rules were meant to be broken <0.9 wink> > * "as long as we're renaming...": maybe this would be a good time to > standardize naming conventions, eg. "cgi" -> "cgilib" *or* > "{http,ftp,url,...}lib" -> "{http,ftp,url}...", "MimeWriter" -> > "mimewriter", etc. +1 > * "shared namespaces vs system namespaces": the Perl model of "nothing > belongs to The System; anyone can add a module in Text:: or Net:: or > whatever" works there because Perl doesn't have __init__ files or > anything to distinguish module namespaces; they just are. Python's > import mechanism would have to change to support this, and the fact > that __init__ files may contain arbitrary code makes this feel > like a very tricky change to make. Indeed. But I still feel that "few things should belong to the system" is quite a useful rule... (That's what I referred to when I said Perl's module system is more suited to CPAN (now there's a surprise)) > Rename? Either cgi -> cgilib or foolib -> foo? Yes. But I wanted the first proposal to be just about placing stuff, because that airs out more disagreements. > This is one good place for a sub-package. It's a also a good place to > rename: the convention for Python module names seems to be > all-lowercase; and "Server" is redundant when you're in the net.server > package. How about: > > net.server.base_http > net.server.cgi_http > net.server.simple_http > net.server.socket Hmmmmm......+0 > Underscores negotiable. They don't seem to be popular in module names, > although sometimes they would be real life-savers. Personally, I prefer underscores to CamelCase. > Or maybe not. I'm just trying to come up with an excuse for moving xml > to top-level, which I think is where it belongs. Maybe the excuse > should just be, "XML is really important and visible, and anyways Paul > Prescod will raise a stink if it isn't put at top-level in Python > package-space". I still think "xml" should be a brother to "html" and "sgml". Current political trans not withstanding. > Not sure what to do about these. Someone referred somewhere to a "web" > top-level package, which seems to have disappeared. If it reappars, it > would be a good place for the HTML modules (not to mention a big chunk > of "net") -- this would mainly be for "important and visible" (ie. PR) > reasons, rather than sound technical reasons. I think the "web" package should be reinstated. But you won't like it: I'd put xml in web. > "mail" should either be top-level or under "net". (Yes, I *know* it's > not a wire-level protocol: that's what net.smtplib is for. But last > time I checked, email is pretty useless without a network. And > vice-versa.) Ummmm.....I'd disagree, but I lack the strength and the moral conviction. Put it under net and we'll call it a deal > Or maybe these all belong in a top-level "data" package: I'm starting to > warm to that. Ummmm...I don't like the "data" package personally. It seems to disobey your second guideline. > I agree with Jack: image and sound (audio?) should be top-level. I > don't think I like the idea of an intervening "mm" or "multimedia" or > "media" or what-have-you package, though. Definitely multimedia. Okay, I'm bought. > Six crypto-related modules seems like enough to justify a top-level > "crypt" package, though. It seemed obvious to me that "crypt" should be under "math". But maybe that's just the mathematician in me speaking. > I like "python" for this one. (But I'm not sure if tabnanny and > rlcompleter belong there.) I agree, and I'm not sure about rlcompleter, but am sure about tabnanny. > What does ihooks have to do with security? Well, it was more or less written to support rexec. A weak argument, admittedly > No problem until these last two -- 'commands' is a Unix-specific thing > that has very little to do with the filesystem per se Hmmmmm...it is on the same level with popen. Why not move popen too? >, and 'dl' is (as I > understand it) deep ju-ju with sharp edges that should probably be > hidden away Ummmmmm.....not in the "python" package: it doesn't have anything to do with the interpreter. > Should this be "sgi" or "irix"? Ditto for "sun" vs "solaris" if there > are a significant number of Sun/Solaris modules. Note that the > respective trademark holders might get very antsy about who gets to put > names in those namespaces -- that's exactly what happened with Sun, > Solaris 8, and Perl. I believe the compromise they arrived at was that > the "Solaris::" namespace remains open, but Sun gets the "Sun::" > namespace. Ummmmm.....I don't see how they have any legal standing. I for one refuse to care about what Sun Microsystem thinks about names for Python packages. > There should probably be a win32 package, for core registry access stuff > if nothing else. And for all the other extensions in win32all Yep! (Just goes to show what happens when you decide to package based on a UNIX system) > All of those > argue over "irix" and "solaris" instead of "sgi" and "sun". Fine with me -- just wanted to move them out of my face -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From andy at reportlab.com Tue Mar 28 20:13:02 2000 From: andy at reportlab.com (Andy Robinson) Date: Tue, 28 Mar 2000 18:13:02 GMT Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: <20000327170031.693531CDF6@dinsdale.python.org> References: <20000327170031.693531CDF6@dinsdale.python.org> Message-ID: <38e0f4cf.24247656@post.demon.co.uk> On Mon, 27 Mar 2000 12:00:31 -0500 (EST), Peter Funk wrote: > Do we need a UserString class? This will probably be useful on top of the i18n stuff in due course, so I'd like it. Something Mike Da Silva and I have discussed a lot is implementing a higher-level 'typed string' library on top of the Unicode stuff. A 'typed string' is like a string, but knows what encoding it is in - possibly Unicode, possibly a native encoding and embodies some basic type safety and convenience notions, like not being able to add a Shift-JIS and an EUC string together. Iteration would always be per character, not per byte; and a certain amount of magic would say that if the string was (say) Japanese, it would acquire a few extra methods for doing some Japan-specific things like expanding half-width katakana. Of course, we can do this anyway, but I think defining the API clearly in UserString is a great idea. - Andy Robinson From guido at python.org Tue Mar 28 21:22:43 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 14:22:43 -0500 Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: Your message of "Tue, 28 Mar 2000 18:13:02 GMT." <38e0f4cf.24247656@post.demon.co.uk> References: <20000327170031.693531CDF6@dinsdale.python.org> <38e0f4cf.24247656@post.demon.co.uk> Message-ID: <200003281922.OAA03113@eric.cnri.reston.va.us> > > Do we need a UserString class? > > This will probably be useful on top of the i18n stuff in due course, > so I'd like it. > > Something Mike Da Silva and I have discussed a lot is implementing a > higher-level 'typed string' library on top of the Unicode stuff. > A 'typed string' is like a string, but knows what encoding it is in - > possibly Unicode, possibly a native encoding and embodies some basic > type safety and convenience notions, like not being able to add a > Shift-JIS and an EUC string together. Iteration would always be per > character, not per byte; and a certain amount of magic would say that > if the string was (say) Japanese, it would acquire a few extra methods > for doing some Japan-specific things like expanding half-width > katakana. > > Of course, we can do this anyway, but I think defining the API clearly > in UserString is a great idea. Agreed. Please somebody send a patch! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Mar 28 21:25:39 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 14:25:39 -0500 Subject: [Python-Dev] First alpha release of Python 1.6 Message-ID: <200003281925.OAA03287@eric.cnri.reston.va.us> I'm hoping to release a first, rough alpha of Python 1.6 by April 1st (no joke!). Not everything needs to be finished by then, but I hope to have the current versions of distutil, expat, and sre in there. Anything else that needs to go into 1.6 and isn't ready yet? (Small stuff doesn't matter, everything currently in the patches queue can probably go in if it isn't rejected by then.) --Guido van Rossum (home page: http://www.python.org/~guido/) From DavidA at ActiveState.com Tue Mar 28 21:40:24 2000 From: DavidA at ActiveState.com (David Ascher) Date: Tue, 28 Mar 2000 11:40:24 -0800 Subject: [Python-Dev] First alpha release of Python 1.6 In-Reply-To: <200003281925.OAA03287@eric.cnri.reston.va.us> Message-ID: > Anything else that needs to go into 1.6 and isn't ready yet? No one seems to have found time to figure out the mmap module support. --david From guido at python.org Tue Mar 28 21:33:29 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 14:33:29 -0500 Subject: [Python-Dev] First alpha release of Python 1.6 In-Reply-To: Your message of "Tue, 28 Mar 2000 11:40:24 PST." References: Message-ID: <200003281933.OAA04896@eric.cnri.reston.va.us> > > Anything else that needs to go into 1.6 and isn't ready yet? > > No one seems to have found time to figure out the mmap module support. I wasn't even aware that that was a priority. If someone submits it, it will go in -- alpha 1 is not a total feature freeze, just a "testing the waters". --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer at tismer.com Tue Mar 28 21:49:17 2000 From: tismer at tismer.com (Christian Tismer) Date: Tue, 28 Mar 2000 21:49:17 +0200 Subject: [Python-Dev] First alpha release of Python 1.6 References: <200003281925.OAA03287@eric.cnri.reston.va.us> Message-ID: <38E10CBD.C6B71D50@tismer.com> Guido van Rossum wrote: ... > Anything else that needs to go into 1.6 and isn't ready yet? Stackless Python of course, but it *is* ready yet. Just kidding. I will provide a compressed unicode database in a few days. That will be a non-Python-specific module, and (Marc or I) will provide a Python specific wrapper. This will probably not get ready until April 1. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From akuchlin at mems-exchange.org Tue Mar 28 21:51:29 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Tue, 28 Mar 2000 14:51:29 -0500 (EST) Subject: [Python-Dev] mmapfile module In-Reply-To: References: <200003281925.OAA03287@eric.cnri.reston.va.us> Message-ID: <14561.3393.761177.776684@amarok.cnri.reston.va.us> David Ascher writes: >> Anything else that needs to go into 1.6 and isn't ready yet? >No one seems to have found time to figure out the mmap module support. The issue there is cross-platform compatibility; the Windows and Unix versions take completely different constructor arguments, so how should we paper over the differences? Unix arguments: (file descriptor, size, flags, protection) Win32 arguments:(filename, tagname, size) We could just say, "OK, the args are completely different between Win32 and Unix, despite it being the same function name". Maybe that's best, because there seems no way to reconcile those two different sets of arguments. -- A.M. Kuchling http://starship.python.net/crew/amk/ I'm here for the FBI, not the _Weekly World News_. -- Scully in X-FILES #1 From DavidA at ActiveState.com Tue Mar 28 22:06:09 2000 From: DavidA at ActiveState.com (David Ascher) Date: Tue, 28 Mar 2000 12:06:09 -0800 Subject: [Python-Dev] mmapfile module In-Reply-To: <14561.3393.761177.776684@amarok.cnri.reston.va.us> Message-ID: > The issue there is cross-platform compatibility; the Windows and Unix > versions take completely different constructor arguments, so how > should we paper over the differences? > > Unix arguments: (file descriptor, size, flags, protection) > Win32 arguments:(filename, tagname, size) > > We could just say, "OK, the args are completely different between > Win32 and Unix, despite it being the same function name". Maybe > that's best, because there seems no way to reconcile those two > different sets of arguments. I guess my approach would be to provide two platform-specific modules, and to figure out a high-level Python module which could provide a reasonable platform-independent interface on top of it. One problem with that approach is that I think that there is also great value in having a portable mmap interface in the C layer, where i see lots of possible uses in extension modules (much like the threads API). --david From guido at python.org Tue Mar 28 22:00:57 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 15:00:57 -0500 Subject: [Python-Dev] mmapfile module In-Reply-To: Your message of "Tue, 28 Mar 2000 12:06:09 PST." References: Message-ID: <200003282000.PAA11988@eric.cnri.reston.va.us> > > The issue there is cross-platform compatibility; the Windows and Unix > > versions take completely different constructor arguments, so how > > should we paper over the differences? > > > > Unix arguments: (file descriptor, size, flags, protection) > > Win32 arguments:(filename, tagname, size) > > > > We could just say, "OK, the args are completely different between > > Win32 and Unix, despite it being the same function name". Maybe > > that's best, because there seems no way to reconcile those two > > different sets of arguments. > > I guess my approach would be to provide two platform-specific modules, and > to figure out a high-level Python module which could provide a reasonable > platform-independent interface on top of it. One problem with that approach > is that I think that there is also great value in having a portable mmap > interface in the C layer, where i see lots of possible uses in extension > modules (much like the threads API). I don't know enough about this, but it seems that there might be two steps: *creating* a mmap object is necessarily platform-specific; but *using* a mmap object could be platform-neutral. What is the API for mmap objects? --Guido van Rossum (home page: http://www.python.org/~guido/) From klm at digicool.com Tue Mar 28 22:07:25 2000 From: klm at digicool.com (Ken Manheimer) Date: Tue, 28 Mar 2000 15:07:25 -0500 (EST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <14560.60548.74378.613188@goon.cnri.reston.va.us> Message-ID: On Tue, 28 Mar 2000, Jeremy Hylton wrote: > >>>>> "KLM" == Ken Manheimer writes: > > >> The only problematic use of from ... import ... is > >> from text.re import * > >> which adds an unspecified set of names to the current > >> namespace. > > KLM> The other gotcha i mean applies when the thing you're importing > KLM> is a terminal, ie a non-module. Then, changes to the > KLM> assignments of the names in the original module aren't > KLM> reflected in the names you've imported - they're decoupled from > KLM> the namespace of the original module. > > This isn't an import issue. Some people simply don't understand > that assignment (and import as form of assignment) is name binding. > Import binds an imported object to a name in the current namespace. > It does not affect bindings in other namespaces, nor should it. I know that - i was addressing the asserted evilness of from ... import ... and how it applied - and didn't - w.r.t. packages. > KLM> I thought the other problem peter was objecting to, having to > KLM> change the import sections in the first place, was going to be > KLM> avoided in the 1.x series (if we do this kind of thing) by > KLM> inherently extending the import path to include all the > KLM> packages, so people need not change their code? Seems like > KLM> most of this would be fairly transparent w.r.t. the operation > KLM> of existing applications. > > I'm not sure if there is consensus on backwards compatibility. I'm > not in favor of creating a huge sys.path that includes every package's > contents. It would be a big performance hit. Yes, someone reminded me that the other (better, i think) option is stub modules in the current places that do the "from ... import *" for the right values of "...". py3k finishes the migration by eliminating the stubs. Ken klm at digicool.com From gward at cnri.reston.va.us Tue Mar 28 22:29:55 2000 From: gward at cnri.reston.va.us (Greg Ward) Date: Tue, 28 Mar 2000 15:29:55 -0500 Subject: [Python-Dev] First alpha release of Python 1.6 In-Reply-To: <200003281925.OAA03287@eric.cnri.reston.va.us>; from guido@python.org on Tue, Mar 28, 2000 at 02:25:39PM -0500 References: <200003281925.OAA03287@eric.cnri.reston.va.us> Message-ID: <20000328152955.A3136@cnri.reston.va.us> On 28 March 2000, Guido van Rossum said: > I'm hoping to release a first, rough alpha of Python 1.6 by April 1st > (no joke!). > > Not everything needs to be finished by then, but I hope to have the > current versions of distutil, expat, and sre in there. We just need to do a bit of CVS trickery to put Distutils under the Python tree. I'd *like* for Distutils to have its own CVS existence at least until 1.6 is released, but it's not essential. Two of the big Distutils to-do items that I enumerated at IPC8 have been knocked off: the "dist" command has been completely redone (and renamed "sdist", for "source distribution"), as has the "install" command. The really major to-do items left for Distutils are: * implement the "bdist" command with enough marbles to generate RPMs and some sort of Windows installer (Wise?); Solaris packages, Debian packages, and something for the Mac would be nice too. * documentation (started, but only just) And there are some almost-as-important items: * Mac OS support; this has been started, at least for the unfashionable and clunky sounding MPW compiler; CodeWarrior support (via AppleEvents, I think) would be nice * test suite -- at least the fundamental Distutils marbles should get a good exercise; it would also be nice to put together a bunch of toy module distributions and make sure that "build" and "install" on them do the right things... all automatically, of course! * reduce number of tracebacks: right now, certain errors in the setup script or on the command line can result in a traceback, when they should just result in SystemExit with "error in setup script: ..." or "error on command line: ..." * fold in Finn Bock's JPython compat. patch * fold in Michael Muller's "pkginfo" patch * finish and fold in my Python 1.5.1 compat. patch (only necessary as long as Distutils has a life of its own, outside Python) Well, I'd better get cracking ... Guido, we can do the CVS thing any time; I guess I'll mosey on downstairs. Greg -- Greg Ward - software developer gward at cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913 From effbot at telia.com Tue Mar 28 21:46:17 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 28 Mar 2000 21:46:17 +0200 Subject: [Python-Dev] mmapfile module References: <200003281925.OAA03287@eric.cnri.reston.va.us> <14561.3393.761177.776684@amarok.cnri.reston.va.us> Message-ID: <003501bf98ee$50097a20$34aab5d4@hagrid> Andrew M. Kuchling wrote: > The issue there is cross-platform compatibility; the Windows and Unix > versions take completely different constructor arguments, so how > should we paper over the differences? > > Unix arguments: (file descriptor, size, flags, protection) > Win32 arguments:(filename, tagname, size) > > We could just say, "OK, the args are completely different between > Win32 and Unix, despite it being the same function name". Maybe > that's best, because there seems no way to reconcile those two > different sets of arguments. I don't get this. Why expose low-level implementation details to the user (flags, protection, tagname)? (And how come the Windows implementation doesn't support read-only vs. read/write flags?) Unless the current implementation uses something radically different from mmap/MapViewOfFile, wouldn't an interface like: (filename, mode="rb", size=entire file, offset=0) be sufficient? (where mode can be "wb" or "wb+" or "rb+", optionally without the "b") From donb at init.com Tue Mar 28 22:46:06 2000 From: donb at init.com (Donald Beaudry) Date: Tue, 28 Mar 2000 15:46:06 -0500 Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <200003282046.PAA18822@zippy.init.com> ...sorry to jump in on the middle of this one, but. A while back I put a lot of thought into how to support class methods and class attributes. I feel that I solved the problem in a fairly complete way though the solution does have some warts. Here's an example: >>> class foo(base): ... value = 10 # this is an instance attribute called 'value' ... # as usual, it is shared between all instances ... # until explicitly set on a particular instance ... ... def set_value(self, x): ... print "instance method" ... self.value = x ... ... # ... # here come the weird part ... # ... class __class__: ... value = 5 # this is a class attribute called value ... ... def set_value(cl, x): ... print "class method" ... cl.value = x ... ... def set_instance_default_value(cl, x): ... cl._.value = x ... >>> f = foo() >>> f.value 10 >>> foo.value = 20 >>> f.value 10 >>> f.__class__.value 20 >>> foo._.value 10 >>> foo._.value = 1 >>> f.value 1 >>> foo.set_value(100) class method >>> foo.value 100 >>> f.value 1 >>> f.set_value(40) instance method >>> f.value 40 >>> foo._.value 1 >>> ff=foo() >>> foo.set_instance_default_value(15) >>> ff.value 15 >>> foo._.set_value(ff, 5) instance method >>> ff.value 5 >>> Is anyone still with me? The crux of the problem is that in the current python class/instance implementation, classes dont have attributes of their own. All of those things that look like class attributes are really there as defaults for the instances. To support true class attributes a new name space must be invented. Since I wanted class objects to look like any other object, I chose to move the "instance defaults" name space under the underscore attribute. This allows the class's unqualified namespace to refer to its own attributes. Clear as mud, right? In case you are wondering, yes, the code above is a working example. I released it a while back as the 'objectmodule' and just updated it to work with Python-1.5.2. The update has yet to be released. -- Donald Beaudry Ab Initio Software Corp. 201 Spring Street donb at init.com Lexington, MA 02421 ...Will hack for sushi... From akuchlin at mems-exchange.org Tue Mar 28 22:50:18 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Tue, 28 Mar 2000 15:50:18 -0500 (EST) Subject: [Python-Dev] mmapfile module In-Reply-To: <003501bf98ee$50097a20$34aab5d4@hagrid> References: <200003281925.OAA03287@eric.cnri.reston.va.us> <14561.3393.761177.776684@amarok.cnri.reston.va.us> <003501bf98ee$50097a20$34aab5d4@hagrid> Message-ID: <14561.6922.415063.279939@amarok.cnri.reston.va.us> Fredrik Lundh writes: >(And how come the Windows implementation doesn't support >read-only vs. read/write flags?) Good point; that should be fixed. > (filename, mode="rb", size=entire file, offset=0) >be sufficient? (where mode can be "wb" or "wb+" or "rb+", >optionally without the "b") Hmm... maybe we can dispose of the PROT_* argument that way on Unix. But how would you specify MAP_SHARED vs. MAP_PRIVATE, or MAP_ANONYMOUS? (MAP_FIXED seems useless to a Python programmer.) Another character in the mode argument, or a flags argument? Worse, as you pointed out in the same thread, MAP_ANONYMOUS on OSF/1 doesn't want to take a file descriptor at all. Also, the tag name on Windows seems important, from Gordon McMillan's explanation of it: http://www.python.org/pipermail/python-dev/1999-November/002808.html -- A.M. Kuchling http://starship.python.net/crew/amk/ You mustn't kill me. You don't love me. You d-don't even know me. -- The Furies kill Abel, in SANDMAN #66: "The Kindly Ones:10" From guido at python.org Tue Mar 28 23:02:04 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 16:02:04 -0500 Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: Your message of "Tue, 28 Mar 2000 15:46:06 EST." <200003282046.PAA18822@zippy.init.com> References: <200003282046.PAA18822@zippy.init.com> Message-ID: <200003282102.QAA13041@eric.cnri.reston.va.us> > A while back I put a lot of thought into how to support class methods > and class attributes. I feel that I solved the problem in a fairly > complete way though the solution does have some warts. Here's an > example: [...] > Is anyone still with me? > > The crux of the problem is that in the current python class/instance > implementation, classes dont have attributes of their own. All of > those things that look like class attributes are really there as > defaults for the instances. To support true class attributes a new > name space must be invented. Since I wanted class objects to look > like any other object, I chose to move the "instance defaults" name > space under the underscore attribute. This allows the class's > unqualified namespace to refer to its own attributes. Clear as mud, > right? > > In case you are wondering, yes, the code above is a working example. > I released it a while back as the 'objectmodule' and just updated it > to work with Python-1.5.2. The update has yet to be released. This looks like it would break a lot of code. How do you refer to a superclass method? It seems that ClassName.methodName would refer to the class method, not to the unbound instance method. Also, moving the default instance attributes to a different namespace seems to be a semantic change that could change lots of things. I am still in favor of saying "Python has no class methods -- use module-global functions for that". Between the module, the class and the instance, there are enough namespaces -- we don't need another one. --Guido van Rossum (home page: http://www.python.org/~guido/) From pf at artcom-gmbh.de Tue Mar 28 23:01:29 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Tue, 28 Mar 2000 23:01:29 +0200 (MEST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: <200003281922.OAA03113@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 28, 2000 2:22:43 pm" Message-ID: I wrote: > > > Do we need a UserString class? > > Andy Robinson: > > This will probably be useful on top of the i18n stuff in due course, > > so I'd like it. > > > > Something Mike Da Silva and I have discussed a lot is implementing a > > higher-level 'typed string' library on top of the Unicode stuff. > > A 'typed string' is like a string, but knows what encoding it is in - > > possibly Unicode, possibly a native encoding and embodies some basic > > type safety and convenience notions, like not being able to add a > > Shift-JIS and an EUC string together. Iteration would always be per > > character, not per byte; and a certain amount of magic would say that > > if the string was (say) Japanese, it would acquire a few extra methods > > for doing some Japan-specific things like expanding half-width > > katakana. > > > > Of course, we can do this anyway, but I think defining the API clearly > > in UserString is a great idea. > Guido van Rossum: > Agreed. Please somebody send a patch! I feel unable to do, what Andy proposed. What I had in mind was a simple wrapper class around the builtin string type similar to UserDict and UserList which can be used to derive other classes from. I use UserList and UserDict quite often and find them very useful. They are simple and powerful and easy to extend. May be the things Andy Robinson proposed above belong into a sub class which inherits from a simple UserString class? Do we need an additional UserUnicode class for unicode string objects? Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From guido at python.org Tue Mar 28 23:56:49 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 16:56:49 -0500 Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: Your message of "Tue, 28 Mar 2000 23:01:29 +0200." References: Message-ID: <200003282156.QAA13361@eric.cnri.reston.va.us> [Peter Funk] > > > > Do we need a UserString class? > > > > Andy Robinson: > > > This will probably be useful on top of the i18n stuff in due course, > > > so I'd like it. > > > > > > Something Mike Da Silva and I have discussed a lot is implementing a > > > higher-level 'typed string' library on top of the Unicode stuff. > > > A 'typed string' is like a string, but knows what encoding it is in - > > > possibly Unicode, possibly a native encoding and embodies some basic > > > type safety and convenience notions, like not being able to add a > > > Shift-JIS and an EUC string together. Iteration would always be per > > > character, not per byte; and a certain amount of magic would say that > > > if the string was (say) Japanese, it would acquire a few extra methods > > > for doing some Japan-specific things like expanding half-width > > > katakana. > > > > > > Of course, we can do this anyway, but I think defining the API clearly > > > in UserString is a great idea. > > > Guido van Rossum: > > Agreed. Please somebody send a patch! [PF] > I feel unable to do, what Andy proposed. What I had in mind was a > simple wrapper class around the builtin string type similar to > UserDict and UserList which can be used to derive other classes from. Yes. I think Andy wanted his class to be a subclass of UserString. > I use UserList and UserDict quite often and find them very useful. > They are simple and powerful and easy to extend. Agreed. > May be the things Andy Robinson proposed above belong into a sub class > which inherits from a simple UserString class? Do we need > an additional UserUnicode class for unicode string objects? It would be great if there was a single UserString class which would work with either Unicode or 8-bit strings. I think that shouldn't be too hard, since it's just a wrapper. So why don't you give the UserString.py a try and leave Andy's wish alone? --Guido van Rossum (home page: http://www.python.org/~guido/) From pf at artcom-gmbh.de Tue Mar 28 23:47:59 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Tue, 28 Mar 2000 23:47:59 +0200 (MEST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <02c901bf989b$be203d80$34aab5d4@hagrid> from Fredrik Lundh at "Mar 28, 2000 11:55:19 am" Message-ID: Hi! > Peter Funk wrote: > > Why should modules be moved into packages? I don't get it. > Fredrik Lundh: > fwiw, neither do I... Pheeewww... And I thought I'am the only one! ;-) > I'm not so sure that Python really needs a simple reorganization > of the existing set of standard library modules. just moving the > modules around won't solve the real problems with the 1.5.2 std > library... Right. I propose to leave the namespace flat. I like to argue with Brad J. Cox ---the author of the book "Object Oriented Programming - An Evolutionary Approach" Addison Wesley, 1987--- who proposes the idea of what he calls a "Software-IC": He looks closely to design process of electronic engineers which ussually deal with large data books with prefabricated components. There are often hundreds of them in such a databook and most of them have terse and not very mnemonic names. But the engineers using them all day *know* after a short while that a 7400 chip is a TTL-chip containing 4 NAND gates. Nearly the same holds true for software engineers using Software-IC like 're' or 'struct' as their daily building blocks. A software engineer who is already familar with his/her building blocks has absolutely no advantage from a deeply nested namespace. Now for something completely different: Fredrik Lundh about the library documentation: > here's one proposal: > http://www.pythonware.com/people/fredrik/librarybook-contents.htm Whether 'md5', 'getpass' and 'traceback' fit into a category 'Commonly Used Modules' is ....ummmm.... at least a bit questionable. But we should really focus the discussion on the structure of the documentation. Since many standard library modules belong into several logical catagories at once, a true tree structured organization is simply not sufficient to describe everything. So it is important to set up pointers between related functionality. For example 'string.replace' is somewhat related to 're.sub' or 'getpass' is related to 'crypt', however 'crypt' is related to 'md5' and so on. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From pf at artcom-gmbh.de Wed Mar 29 00:13:02 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 29 Mar 2000 00:13:02 +0200 (MEST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules _tkinter.c,1.91,1.92 In-Reply-To: <200003282007.PAA12045@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 28, 2000 3: 7: 9 pm" Message-ID: Hi! Guido van Rossum: > Modified Files: > _tkinter.c [...] > *** 491,501 **** > > v->interp = Tcl_CreateInterp(); > - > - #if TKMAJORMINOR == 8001 > - TclpInitLibraryPath(baseName); > - #endif /* TKMAJORMINOR */ > > ! #if defined(macintosh) && TKMAJORMINOR >= 8000 > ! /* This seems to be needed since Tk 8.0 */ > ClearMenuBar(); > TkMacInitMenus(v->interp); > --- 475,481 ---- > > v->interp = Tcl_CreateInterp(); > > ! #if defined(macintosh) > ! /* This seems to be needed */ > ClearMenuBar(); > TkMacInitMenus(v->interp); > *************** Are you sure that the call to 'TclpInitLibraryPath(baseName);' is not required in Tcl/Tk 8.1, 8.2, 8.3 ? I would propose the following: +#if TKMAJORMINOR >= 8001 + TclpInitLibraryPath(baseName); +# endif /* TKMAJORMINOR */ Here I quote from the Tcl8.3 source distribution: /* *--------------------------------------------------------------------------- * * TclpInitLibraryPath -- * * Initialize the library path at startup. We have a minor * metacircular problem that we don't know the encoding of the * operating system but we may need to talk to operating system * to find the library directories so that we know how to talk to * the operating system. * * We do not know the encoding of the operating system. * We do know that the encoding is some multibyte encoding. * In that multibyte encoding, the characters 0..127 are equivalent * to ascii. * * So although we don't know the encoding, it's safe: * to look for the last slash character in a path in the encoding. * to append an ascii string to a path. * to pass those strings back to the operating system. * * But any strings that we remembered before we knew the encoding of * the operating system must be translated to UTF-8 once we know the * encoding so that the rest of Tcl can use those strings. * * This call sets the library path to strings in the unknown native * encoding. TclpSetInitialEncodings() will translate the library * path from the native encoding to UTF-8 as soon as it determines * what the native encoding actually is. * * Called at process initialization time. * * Results: * None. */ Sorry, but I don't know enough about this in connection with the unicode patches and if we should pay attention to this. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From akuchlin at mems-exchange.org Wed Mar 29 00:21:07 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Tue, 28 Mar 2000 17:21:07 -0500 (EST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: References: <02c901bf989b$be203d80$34aab5d4@hagrid> Message-ID: <14561.12371.857178.550236@amarok.cnri.reston.va.us> Peter Funk quoted: >Fredrik Lundh: >> I'm not so sure that Python really needs a simple reorganization >> of the existing set of standard library modules. just moving the >> modules around won't solve the real problems with the 1.5.2 std >> library... >Right. I propose to leave the namespace flat. I third that comment. Arguments against reorganizing for 1.6: 1) I doubt that we have time to do a good job of it for 1.6. (1.7, maybe.) 2) Right now there's no way for third-party extensions to add themselves to a package in the standard library. Once Python finds foo/__init__.py, it won't look for site-packages/foo/__init__.py, so if you grab, say, "crypto" as a package name in the standard library, it's forever lost to third-party extensions. 3) Rearranging the modules is a good chance to break backward compatibility in other ways. If you want to rewrite, say, httplib in a non-compatible way to support HTTP/1.1, then the move from httplib.py to net.http.py is a great chance to do that, and leave httplib.py as-is for old programs. If you just copy httplib.py, rewriting net.http.py is now harder, since you have to either maintain compatibility or break things *again* in the next version of Python. 4) We wanted to get 1.6 out fairly quickly, and therefore limited the number of features that would get in. (Vide the "Python 1.6 timing" thread last ... November, was it?) Packagizing is feature creep that'll slow things down Maybe we should start a separate list to discuss a package hierarchy for 1.7. But for 1.6, forget it. -- A.M. Kuchling http://starship.python.net/crew/amk/ Posting "Please send e-mail, since I don't read this group": Poster is rendered illiterate by a simple trepanation. -- Kibo, in the Happynet Manifesto From guido at python.org Wed Mar 29 00:24:46 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 17:24:46 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules _tkinter.c,1.91,1.92 In-Reply-To: Your message of "Wed, 29 Mar 2000 00:13:02 +0200." References: Message-ID: <200003282224.RAA13573@eric.cnri.reston.va.us> > Are you sure that the call to 'TclpInitLibraryPath(baseName);' > is not required in Tcl/Tk 8.1, 8.2, 8.3 ? > I would propose the following: > > +#if TKMAJORMINOR >= 8001 > + TclpInitLibraryPath(baseName); > +# endif /* TKMAJORMINOR */ It is an internal routine which shouldn't be called at all by the user. I believe it is called internally at the right time. Note that we now call Tcl_FindExecutable(), which *is* intended to be called by the user (and exists in all 8.x versions) -- maybe this causes TclpInitLibraryPath() to be called. I tested it on Solaris, with Tcl/Tk versions 8.0.4, 8.1.1, 8.2.3 and 8.3.0, and it doesn't seem to make any difference, as long as that version of Tcl/Tk has actually been installed. (When it's not installed, TclpInitLibraryPath() doesn't help either.) I still have to check this on Windows -- maybe it'll have to go back in. [...] > Sorry, but I don't know enough about this in connection with the > unicode patches and if we should pay attention to this. It seems to be allright... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Mar 29 00:25:27 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 17:25:27 -0500 Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: Your message of "Tue, 28 Mar 2000 17:21:07 EST." <14561.12371.857178.550236@amarok.cnri.reston.va.us> References: <02c901bf989b$be203d80$34aab5d4@hagrid> <14561.12371.857178.550236@amarok.cnri.reston.va.us> Message-ID: <200003282225.RAA13586@eric.cnri.reston.va.us> > Maybe we should start a separate list to discuss a package hierarchy > for 1.7. But for 1.6, forget it. Yes! Please! --Guido van Rossum (home page: http://www.python.org/~guido/) From donb at init.com Wed Mar 29 00:56:03 2000 From: donb at init.com (Donald Beaudry) Date: Tue, 28 Mar 2000 17:56:03 -0500 Subject: [Python-Dev] None as a keyword / class methods References: <200003282046.PAA18822@zippy.init.com> <200003282102.QAA13041@eric.cnri.reston.va.us> Message-ID: <200003282256.RAA21080@zippy.init.com> Guido van Rossum wrote, > This looks like it would break a lot of code. Only if it were to replace the current implementation. Perhaps I inadvertly made that suggestion. It was not my intention. Another way to look at my post is to say that it was intended to point out why we cant have class methods in the current implementation... it's a name space issue. > How do you refer to a superclass method? It seems that > ClassName.methodName would refer to the class method, not to the > unbound instance method. Right. To get at the unbound instance methods you must go through the 'unbound accessor' which is accessed via the underscore. If you wanted to chain to a superclass method it would look like this: class child(parent): def do_it(self, x): z = parent._.do_it(self, x) return z > Also, moving the default instance attributes to a different > namespace seems to be a semantic change that could change lots of > things. I agree... and that's why I wouldnt suggest doing it to the current class/instance implementation. However, for those who insist on having class attributes and methods I think it would be cool to settle on a standard "syntax". > I am still in favor of saying "Python has no class methods -- use > module-global functions for that". Or use a class/instance implementation provided via an extension module rather than the built-in one. The class named 'base' shown in my example is a class designed for that purpose. > Between the module, the class and the instance, there are enough > namespaces -- we don't need another one. The topic comes up often enough to make me think some might disagree. -- Donald Beaudry Ab Initio Software Corp. 201 Spring Street donb at init.com Lexington, MA 02421 ...So much code, so little time... From moshez at math.huji.ac.il Wed Mar 29 01:24:29 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 29 Mar 2000 01:24:29 +0200 (IST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <14561.12371.857178.550236@amarok.cnri.reston.va.us> Message-ID: On Tue, 28 Mar 2000, Andrew M. Kuchling wrote: > Peter Funk quoted: > >Fredrik Lundh: > >> I'm not so sure that Python really needs a simple reorganization > >> of the existing set of standard library modules. just moving the > >> modules around won't solve the real problems with the 1.5.2 std > >> library... > >Right. I propose to leave the namespace flat. > > I third that comment. Arguments against reorganizing for 1.6: Let me just note that my original great renaming proposal was titled "1.7". I'm certain I don't want it to affect the 1.6 release -- my god, it's almost alpha time and we don't even know how to reorganize. Strictly 1.7. > 4) We wanted to get 1.6 out fairly quickly, and therefore limited > the number of features that would get in. (Vide the "Python 1.6 > timing" thread last ... November, was it?) Packagizing is feature > creep that'll slow things down Oh yes. I'm waiting for that 1.6....I wouldn't want to stall it for the world. But this is a good chance as any to discuss reasons, before strategies. Here's why I believe we should re-organize Python modules: -- modules fall quite naturally into subpackages. Reducing the number of toplevel modules will lessen the clutter -- it would be easier to synchronize documentation and code (think "automatically generated documentation") -- it would enable us to move toward a CPAN-like module repository, together with the dist-sig efforts. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From gmcm at hypernet.com Wed Mar 29 01:44:27 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Tue, 28 Mar 2000 18:44:27 -0500 Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <14561.12371.857178.550236@amarok.cnri.reston.va.us> References: Message-ID: <1257835425-27941123@hypernet.com> Andrew M. Kuchling wrote: [snip] > 2) Right now there's no way for third-party extensions to add > themselves to a package in the standard library. Once Python finds > foo/__init__.py, it won't look for site-packages/foo/__init__.py, so > if you grab, say, "crypto" as a package name in the standard library, > it's forever lost to third-party extensions. That way lies madness. While I'm happy to carp at Java for requiring "com", "net" or whatever as a top level name, their intent is correct: the names grabbed by the Python standard packages belong to no one but the Python standard packages. If you *don't* do that, upgrades are an absolute nightmare. Marc-Andre grabbed "mx". If (as I rather suspect ) he wants to remake the entire standard lib in his image, he's welcome to - *under* mx. What would happen if he (and everyone else) installed themselves *into* my core packages, then I decided I didn't want his stuff? More than likely I'd have to scrub the damn installation and start all over again. - Gordon From DavidA at ActiveState.com Wed Mar 29 02:01:57 2000 From: DavidA at ActiveState.com (David Ascher) Date: Tue, 28 Mar 2000 16:01:57 -0800 Subject: [Python-Dev] yeah! for Jeremy and Greg Message-ID: I'm thrilled to see the extended call syntax patches go in! One less wart in the language! Jeremy ZitBlaster Hylton and Greg Noxzema Ewing! --david From pf at artcom-gmbh.de Wed Mar 29 01:53:50 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 29 Mar 2000 01:53:50 +0200 (MEST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: <200003282156.QAA13361@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 28, 2000 4:56:49 pm" Message-ID: Hi! > [Peter Funk] > > > > > Do we need a UserString class? [...] Guido van Rossum: > So why don't you give the UserString.py a try and leave Andy's wish alone? Okay. Here we go. Could someone please have a close eye on this? I've haccked it up in hurry. ---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ---- #!/usr/bin/env python """A user-defined wrapper around string objects Note: string objects have grown methods in Python 1.6 This module requires Python 1.6 or later. """ import sys # XXX Totally untested and hacked up until 2:00 am with too less sleep ;-) class UserString: def __init__(self, string=""): self.data = string def __repr__(self): return repr(self.data) def __cmp__(self, string): if isinstance(string, UserString): return cmp(self.data, string.data) else: return cmp(self.data, string) def __len__(self): return len(self.data) # methods defined in alphabetical order def capitalize(self): return self.__class__(self.data.capitalize()) def center(self, width): return self.__class__(self.data.center(width)) def count(self, sub, start=0, end=sys.maxint): return self.data.count(sub, start, end) def encode(self, encoding=None, errors=None): # XXX improve this? if encoding: if errors: return self.__class__(self.data.encode(encoding, errors)) else: return self.__class__(self.data.encode(encoding)) else: return self.__class__(self.data.encode()) def endswith(self): raise NotImplementedError def find(self, sub, start=0, end=sys.maxint): return self.data.find(sub, start, end) def index(self): return self.data.index(sub, start, end) def isdecimal(self): return self.data.isdecimal() def isdigit(self): return self.data.isdigit() def islower(self): return self.data.islower() def isnumeric(self): return self.data.isnumeric() def isspace(self): return self.data.isspace() def istitle(self): return self.data.istitle() def isupper(self): return self.data.isupper() def join(self, seq): return self.data.join(seq) def ljust(self, width): return self.__class__(self.data.ljust(width)) def lower(self): return self.__class__(self.data.lower()) def lstrip(self): return self.__class__(self.data.lstrip()) def replace(self, old, new, maxsplit=-1): return self.__class__(self.data.replace(old, new, maxsplit)) def rfind(self, sub, start=0, end=sys.maxint): return self.data.rfind(sub, start, end) def rindex(self, sub, start=0, end=sys.maxint): return self.data.rindex(sub, start, end) def rjust(self, width): return self.__class__(self.data.rjust(width)) def rstrip(self): return self.__class__(self.data.rstrip()) def split(self, sep=None, maxsplit=-1): return self.data.split(sep, maxsplit) def splitlines(self, maxsplit=-1): return self.data.splitlines(maxsplit) def startswith(self, prefix, start=0, end=sys.maxint): return self.data.startswith(prefix, start, end) def strip(self): return self.__class__(self.data.strip()) def swapcase(self): return self.__class__(self.data.swapcase()) def title(self): return self.__class__(self.data.title()) def translate(self, table, deletechars=""): return self.__class__(self.data.translate(table, deletechars)) def upper(self): return self.__class__(self.data.upper()) def __add__(self, other): if isinstance(other, UserString): return self.__class__(self.data + other.data) elif isinstance(other, type(self.data)): return self.__class__(self.data + other) else: return self.__class__(self.data + str(other)) def __radd__(self, other): if isinstance(other, type(self.data)): return self.__class__(other + self.data) else: return self.__class__(str(other) + self.data) def __mul__(self, n): return self.__class__(self.data*n) __rmul__ = __mul__ def _test(): s = UserString("abc") u = UserString(u"efg") # XXX add some real tests here? return [0] if __name__ == "__main__": import sys sys.exit(_test()[0]) From effbot at telia.com Wed Mar 29 01:12:55 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 29 Mar 2000 01:12:55 +0200 Subject: [Python-Dev] yeah! for Jeremy and Greg References: Message-ID: <012301bf990b$2a494c80$34aab5d4@hagrid> > I'm thrilled to see the extended call syntax patches go in! One less wart > in the language! but did he compile before checking in? ..\Python\compile.c(1225) : error C2065: 'CALL_FUNCTION_STAR' : undeclared identifier (compile.c and opcode.h both mention this identifier, but nobody defines it... should it be CALL_FUNCTION_VAR, perhaps?) From guido at python.org Wed Mar 29 02:07:34 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 19:07:34 -0500 Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: Your message of "Wed, 29 Mar 2000 01:53:50 +0200." References: Message-ID: <200003290007.TAA16081@eric.cnri.reston.va.us> > > [Peter Funk] > > > > > > Do we need a UserString class? > [...] > Guido van Rossum: > > So why don't you give the UserString.py a try and leave Andy's wish alone? [Peter] > Okay. Here we go. Could someone please have a close eye on this? > I've haccked it up in hurry. Good job! Go get some sleep, and tomorrow morning when you're fresh, compare it to UserList. From visual inpsection, you seem to be missing __getitem__ and __getslice__, and maybe more (of course not __set*__). --Guido van Rossum (home page: http://www.python.org/~guido/) From ping at lfw.org Wed Mar 29 02:13:24 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 28 Mar 2000 18:13:24 -0600 (CST) Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: <012301bf990b$2a494c80$34aab5d4@hagrid> Message-ID: On Wed, 29 Mar 2000, Fredrik Lundh wrote: > > I'm thrilled to see the extended call syntax patches go in! One less wart > > in the language! > > but did he compile before checking in? You beat me to it. I read David's message and got so excited i just had to try it right away. So i updated my CVS tree, did "make", and got the same error: make[1]: Entering directory `/home/ping/dev/python/dist/src/Python' gcc -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c compile.c -o compile.o compile.c: In function `com_call_function': compile.c:1225: `CALL_FUNCTION_STAR' undeclared (first use in this function) compile.c:1225: (Each undeclared identifier is reported only once compile.c:1225: for each function it appears in.) make[1]: *** [compile.o] Error 1 > (compile.c and opcode.h both mention this identifier, but > nobody defines it... should it be CALL_FUNCTION_VAR, > perhaps?) But CALL_FUNCTION_STAR is mentioned in the comments... #define CALL_FUNCTION 131 /* #args + (#kwargs<<8) */ #define MAKE_FUNCTION 132 /* #defaults */ #define BUILD_SLICE 133 /* Number of items */ /* The next 3 opcodes must be contiguous and satisfy (CALL_FUNCTION_STAR - CALL_FUNCTION) & 3 == 1 */ #define CALL_FUNCTION_VAR 140 /* #args + (#kwargs<<8) */ #define CALL_FUNCTION_KW 141 /* #args + (#kwargs<<8) */ #define CALL_FUNCTION_VAR_KW 142 /* #args + (#kwargs<<8) */ The condition (CALL_FUNCTION_STAR - CALL_FUNCTION) & 3 == 1 doesn't make much sense, though... -- ?!ng From jeremy at cnri.reston.va.us Wed Mar 29 02:18:54 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Tue, 28 Mar 2000 19:18:54 -0500 (EST) Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: <012301bf990b$2a494c80$34aab5d4@hagrid> References: <012301bf990b$2a494c80$34aab5d4@hagrid> Message-ID: <14561.19438.157799.810802@goon.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: >> I'm thrilled to see the extended call syntax patches go in! One >> less wart in the language! FL> but did he compile before checking in? Indeed, but not often enough :-). FL> ..\Python\compile.c(1225) : error C2065: 'CALL_FUNCTION_STAR' : FL> undeclared identifier FL> (compile.c and opcode.h both mention this identifier, but nobody FL> defines it... should it be CALL_FUNCTION_VAR, perhaps?) This was a last minute change of names. I had previously compiled under the old names. The Makefile doesn't describe the dependency between opcode.h and compile.c. And the compile.o file I had worked, because the only change was to the name of a macro. It's too bad the Makefile doesn't have all the dependencies. It seems that it's necessary to do a make clean before checking in a change that affects many files. Jeremy From klm at digicool.com Wed Mar 29 02:30:05 2000 From: klm at digicool.com (Ken Manheimer) Date: Tue, 28 Mar 2000 19:30:05 -0500 (EST) Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: Message-ID: On Tue, 28 Mar 2000, David Ascher wrote: > I'm thrilled to see the extended call syntax patches go in! One less wart > in the language! Me too! Even the lisps i used to know (albeit ancient, according to eric) couldn't get it as tidy as this. (Silly me, now i'm imagining we're going to see operator assignments just around the bend. "Give them a tasty morsel, they ask for your dinner..."-) Ken klm at digicool.com From ping at lfw.org Wed Mar 29 02:35:54 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 28 Mar 2000 18:35:54 -0600 (CST) Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: <14561.19438.157799.810802@goon.cnri.reston.va.us> Message-ID: On Tue, 28 Mar 2000, Jeremy Hylton wrote: > > It's too bad the Makefile doesn't have all the dependencies. It seems > that it's necessary to do a make clean before checking in a change > that affects many files. I updated again and rebuilt. >>> def sum(*args): ... s = 0 ... for x in args: s = s + x ... return s ... >>> sum(2,3,4) 9 >>> sum(*[2,3,4]) 9 >>> x = (2,3,4) >>> sum(*x) 9 >>> def func(a, b, c): ... print a, b, c ... >>> func(**{'a':2, 'b':1, 'c':6}) 2 1 6 >>> func(**{'c':8, 'a':1, 'b':9}) 1 9 8 >>> *cool*. So does this completely obviate the need for "apply", then? apply(x, y, z) <==> x(*y, **z) -- ?!ng From guido at python.org Wed Mar 29 02:35:17 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 19:35:17 -0500 Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: Your message of "Tue, 28 Mar 2000 18:35:54 CST." References: Message-ID: <200003290035.TAA16278@eric.cnri.reston.va.us> > *cool*. > > So does this completely obviate the need for "apply", then? > > apply(x, y, z) <==> x(*y, **z) I think so (except for backwards compatibility). The 1.6 docs for apply should point this out! --Guido van Rossum (home page: http://www.python.org/~guido/) From DavidA at ActiveState.com Wed Mar 29 02:42:20 2000 From: DavidA at ActiveState.com (David Ascher) Date: Tue, 28 Mar 2000 16:42:20 -0800 Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: Message-ID: > I updated again and rebuilt. > > >>> def sum(*args): > ... s = 0 > ... for x in args: s = s + x > ... return s > ... > >>> sum(2,3,4) > 9 > >>> sum(*[2,3,4]) > 9 > >>> x = (2,3,4) > >>> sum(*x) > 9 > >>> def func(a, b, c): > ... print a, b, c > ... > >>> func(**{'a':2, 'b':1, 'c':6}) > 2 1 6 > >>> func(**{'c':8, 'a':1, 'b':9}) > 1 9 8 > >>> > > *cool*. But most importantly, IMO: class SubClass(Class): def __init__(self, a, *args, **kw): self.a = a Class.__init__(self, *args, **kw) Much neater. From bwarsaw at cnri.reston.va.us Wed Mar 29 02:46:11 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 28 Mar 2000 19:46:11 -0500 (EST) Subject: [Python-Dev] yeah! for Jeremy and Greg References: <14561.19438.157799.810802@goon.cnri.reston.va.us> Message-ID: <14561.21075.637108.322536@anthem.cnri.reston.va.us> Uh oh. Fresh CVS update and make clean, make: -------------------- snip snip -------------------- Python 1.5.2+ (#20, Mar 28 2000, 19:37:38) [GCC 2.8.1] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> def sum(*args): ... s = 0 ... for x in args: s = s + x ... return s ... >>> class Nums: ... def __getitem__(self, i): ... if i >= 10 or i < 0: raise IndexError ... return i ... >>> n = Nums() >>> for i in n: print i ... 0 1 2 3 4 5 6 7 8 9 >>> sum(*n) Traceback (innermost last): File "", line 1, in ? SystemError: bad argument to internal function -------------------- snip snip -------------------- -Barry From bwarsaw at cnri.reston.va.us Wed Mar 29 03:02:16 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 28 Mar 2000 20:02:16 -0500 (EST) Subject: [Python-Dev] yeah! for Jeremy and Greg References: <14561.19438.157799.810802@goon.cnri.reston.va.us> <14561.21075.637108.322536@anthem.cnri.reston.va.us> Message-ID: <14561.22040.383370.283163@anthem.cnri.reston.va.us> Changing the definition of class Nums to class Nums: def __getitem__(self, i): if 0 <= i < 10: return i raise IndexError def __len__(self): return 10 I.e. adding the __len__() method avoids the SystemError. Either the *arg call should not depend on the sequence being lenght-able, or it should error check that the length calculation doesn't return -1 or raise an exception. Looking at PySequence_Length() though, it seems that m->sq_length(s) can return -1 without setting a type_error. So the fix is either to include a check for return -1 in PySequence_Length() when calling sq_length, or instance_length() should set a TypeError when it has no __len__() method and returns -1. I gotta run so I can't follow this through -- I'm sure I'll see the right solution from someone in tomorrow mornings email :) -Barry From ping at lfw.org Wed Mar 29 03:17:27 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 28 Mar 2000 19:17:27 -0600 (CST) Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: <14561.22040.383370.283163@anthem.cnri.reston.va.us> Message-ID: On Tue, 28 Mar 2000, Barry A. Warsaw wrote: > > Changing the definition of class Nums to > > class Nums: > def __getitem__(self, i): > if 0 <= i < 10: return i > raise IndexError > def __len__(self): > return 10 > > I.e. adding the __len__() method avoids the SystemError. It should be noted that "apply" has the same problem, with a different counterintuitive error message: >>> n = Nums() >>> apply(sum, n) Traceback (innermost last): File "", line 1, in ? AttributeError: __len__ -- ?!ng From jeremy at cnri.reston.va.us Wed Mar 29 04:59:26 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Tue, 28 Mar 2000 21:59:26 -0500 (EST) Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: References: Message-ID: <14561.29070.940238.542509@bitdiddle.cnri.reston.va.us> >>>>> "DA" == David Ascher writes: DA> But most importantly, IMO: DA> class SubClass(Class): DA> def __init__(self, a, *args, **kw): DA> self.a = a DA> Class.__init__(self, *args, **kw) DA> Much neater. This version of method overloading was what I liked most about Greg's patch. Note that I also prefer: class SubClass(Class): super_init = Class.__init__ def __init__(self, a, *args, **kw): self.a = a self.super_init(*args, **kw) I've been happy to have all the overridden methods explicitly labelled at the top of a class lately. It is much easier to change the class hierarchy later. Jeremy From gward at cnri.reston.va.us Wed Mar 29 05:15:00 2000 From: gward at cnri.reston.va.us (Greg Ward) Date: Tue, 28 Mar 2000 22:15:00 -0500 Subject: [Python-Dev] __debug__ and py_compile Message-ID: <20000328221500.A3290@cnri.reston.va.us> Hi all -- a particularly active member of the Distutils-SIG brought the global '__debug__' flag to my attention, since I (and thus my code) didn't know if calling 'py_compile.compile()' would result in a ".pyc" or a ".pyo" file. It appears that, using __debug__, you can determine what you're going to get. Cool! However, it doesn't look like you can *choose* what you're going to get. Is this correct? Ie. does the presence/absence of -O when the interpreter starts up *completely* decide how code is compiled? Also, can I rely on __debug__ being there in the future? How about in the past? I still occasionally ponder making Distutils compatible with Python 1.5.1. Thanks -- Greg From guido at python.org Wed Mar 29 06:08:12 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 23:08:12 -0500 Subject: [Python-Dev] __debug__ and py_compile In-Reply-To: Your message of "Tue, 28 Mar 2000 22:15:00 EST." <20000328221500.A3290@cnri.reston.va.us> References: <20000328221500.A3290@cnri.reston.va.us> Message-ID: <200003290408.XAA17991@eric.cnri.reston.va.us> > a particularly active member of the Distutils-SIG brought the > global '__debug__' flag to my attention, since I (and thus my code) > didn't know if calling 'py_compile.compile()' would result in a ".pyc" > or a ".pyo" file. It appears that, using __debug__, you can determine > what you're going to get. Cool! > > However, it doesn't look like you can *choose* what you're going to > get. Is this correct? Ie. does the presence/absence of -O when the > interpreter starts up *completely* decide how code is compiled? Correct. You (currently) can't change the opt setting of the compiler. (It was part of the compiler restructuring to give more freedom here; this has been pushed back to 1.7.) > Also, can I rely on __debug__ being there in the future? How about in > the past? I still occasionally ponder making Distutils compatible with > Python 1.5.1. __debug__ is as old as the assert statement, going back to at least 1.5.0. --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez at math.huji.ac.il Wed Mar 29 07:35:51 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 29 Mar 2000 07:35:51 +0200 (IST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <1257835425-27941123@hypernet.com> Message-ID: On Tue, 28 Mar 2000, Gordon McMillan wrote: > What would happen if he (and everyone else) installed > themselves *into* my core packages, then I decided I didn't > want his stuff? More than likely I'd have to scrub the damn > installation and start all over again. I think Greg Stein answered that objection, by reminding us that the filesystem isn't the only way to set up a package hierarchy. In particular, even with Python's current module system, there is no need to scrub installations: Python core modules go (under UNIX) in /usr/local/lib/python1.5, and 3rd party modules go in /usr/local/lib/python1.5/site-packages. Need to remove stuff? Remove whatever is in /usr/local/lib/python1.5/site-packages. Need to upgrade? Just backup /usr/local/lib/python1.5/site-packages, remove /usr/local/lib/python1.5/, install, and move 3rd party modules back from backup. This becomes even easier if the standard installation is in a JAR-like file, and 3rd party modules are also in a JAR-like file, but specified to be in their natural place. Wow! That was a long rant! Anyway, I already expressed my preference of the Perl way, over the Java way. For one thing, I don't want to have to register a domain just so I could distribute Python code -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From bwarsaw at cnri.reston.va.us Wed Mar 29 07:42:34 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 29 Mar 2000 00:42:34 -0500 (EST) Subject: [Python-Dev] yeah! for Jeremy and Greg References: <14561.19438.157799.810802@goon.cnri.reston.va.us> <14561.21075.637108.322536@anthem.cnri.reston.va.us> Message-ID: <14561.38858.41246.28460@anthem.cnri.reston.va.us> >>>>> "BAW" == Barry A Warsaw writes: BAW> Uh oh. Fresh CVS update and make clean, make: >>> sum(*n) | Traceback (innermost last): | File "", line 1, in ? | SystemError: bad argument to internal function Here's a proposed patch that will cause a TypeError to be raised instead. -Barry -------------------- snip snip -------------------- Index: abstract.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/abstract.c,v retrieving revision 2.33 diff -c -r2.33 abstract.c *** abstract.c 2000/03/10 22:55:18 2.33 --- abstract.c 2000/03/29 05:36:21 *************** *** 860,866 **** PyObject *s; { PySequenceMethods *m; ! if (s == NULL) { null_error(); return -1; --- 860,867 ---- PyObject *s; { PySequenceMethods *m; ! int size = -1; ! if (s == NULL) { null_error(); return -1; *************** *** 868,877 **** m = s->ob_type->tp_as_sequence; if (m && m->sq_length) ! return m->sq_length(s); ! type_error("len() of unsized object"); ! return -1; } PyObject * --- 869,879 ---- m = s->ob_type->tp_as_sequence; if (m && m->sq_length) ! size = m->sq_length(s); ! if (size < 0) ! type_error("len() of unsized object"); ! return size; } PyObject * Index: ceval.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Python/ceval.c,v retrieving revision 2.169 diff -c -r2.169 ceval.c *** ceval.c 2000/03/28 23:49:16 2.169 --- ceval.c 2000/03/29 05:39:00 *************** *** 1636,1641 **** --- 1636,1649 ---- break; } nstar = PySequence_Length(stararg); + if (nstar < 0) { + if (!PyErr_Occurred) + PyErr_SetString( + PyExc_TypeError, + "len() of unsized object"); + x = NULL; + break; + } } if (nk > 0) { if (kwdict == NULL) { From bwarsaw at cnri.reston.va.us Wed Mar 29 07:46:19 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Wed, 29 Mar 2000 00:46:19 -0500 (EST) Subject: [Python-Dev] yeah! for Jeremy and Greg References: <14561.22040.383370.283163@anthem.cnri.reston.va.us> Message-ID: <14561.39083.748093.694726@anthem.cnri.reston.va.us> >>>>> "KY" == Ka-Ping Yee writes: | It should be noted that "apply" has the same problem, with a | different counterintuitive error message: >> n = Nums() apply(sum, n) | Traceback (innermost last): | File "", line 1, in ? | AttributeError: __len__ The patch I just posted fixes this too. The error message ain't great, but at least it's consistent with the direct call. -Barry -------------------- snip snip -------------------- Traceback (innermost last): File "/tmp/doit.py", line 15, in ? print apply(sum, n) TypeError: len() of unsized object From pf at artcom-gmbh.de Wed Mar 29 08:30:22 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 29 Mar 2000 08:30:22 +0200 (MEST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: from Moshe Zadka at "Mar 29, 2000 7:44:42 am" Message-ID: Hi! > On Wed, 29 Mar 2000, Peter Funk wrote: > > > class UserString: > > def __init__(self, string=""): > > self.data = string > ^^^^^^^ Moshe Zadka wrote: > Why do you feel there is a need to default? Strings are immutable I had something like this in my mind: class MutableString(UserString): """Python strings are immutable objects. But of course this can be changed in a derived class implementing the missing methods. >>> s = MutableString() >>> s[0:5] = "HUH?" """ def __setitem__(self, char): .... def __setslice__(self, i, j, substring): .... > What about __int__, __long__, __float__, __str__, __hash__? > And what about __getitem__ and __contains__? > And __complex__? I was obviously too tired and too eager to get this out! Thanks for reviewing and responding so quickly. I will add them. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From moshez at math.huji.ac.il Wed Mar 29 08:51:30 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 29 Mar 2000 08:51:30 +0200 (IST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: Message-ID: On Wed, 29 Mar 2000, Peter Funk wrote: > Moshe Zadka wrote: > > Why do you feel there is a need to default? Strings are immutable > > I had something like this in my mind: > > class MutableString(UserString): > """Python strings are immutable objects. But of course this can > be changed in a derived class implementing the missing methods. Then add the default in the constructor for MutableString.... eagerly-waiting-for-UserString.py-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Wed Mar 29 09:03:53 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 29 Mar 2000 09:03:53 +0200 (IST) Subject: [Python-Dev] 1.5.2->1.6 Changes Message-ID: I'm starting to compile a list of changes from 1.5.2 to 1.6. Here's what I came up with so far -- string objects now have methods (though they are still immutable) -- unicode support: Unicode strings are marked with u"string", and there is support for arbitrary encoders/decoders -- "in" operator can now be overriden in user-defined classes to mean anything: it calls the magic method __contains__ -- SRE is the new regular expression engine. re.py became an interface to the same engine. The new engine fully supports unicode regular expressions. -- Some methods which would take multiple arguments and treat them as a tuple were fixed: list.{append, insert, remove, count}, socket.connect -- Some modules were made obsolete -- filecmp.py (supersedes the old cmp.py and dircmp.py modules), -- tabnanny.py (make sure the source file doesn't assume a specific tab-width) -- win32reg (win32 registry editor) -- unicode module, and codecs package -- New calling syntax: f(*args, **kw) equivalent to apply(f, args, kw) -- _tkinter now uses the object, rather then string, interface to Tcl. Please e-mail me personally if you think of any other changes, and I'll try to integrate them into a complete "changes" document. Thanks in advance -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From esr at thyrsus.com Wed Mar 29 09:21:29 2000 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 29 Mar 2000 02:21:29 -0500 Subject: [Python-Dev] 1.5.2->1.6 Changes In-Reply-To: ; from Moshe Zadka on Wed, Mar 29, 2000 at 09:03:53AM +0200 References: Message-ID: <20000329022129.A15539@thyrsus.com> Moshe Zadka : > -- _tkinter now uses the object, rather then string, interface to Tcl. Hm, does this mean that the annoying requirement to do explicit gets and sets to move data between the Python world and the Tcl/Tk world is gone? -- Eric S. Raymond "A system of licensing and registration is the perfect device to deny gun ownership to the bourgeoisie." -- Vladimir Ilyich Lenin From moshez at math.huji.ac.il Wed Mar 29 09:22:54 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 29 Mar 2000 09:22:54 +0200 (IST) Subject: [Python-Dev] 1.5.2->1.6 Changes In-Reply-To: <20000329022129.A15539@thyrsus.com> Message-ID: On Wed, 29 Mar 2000, Eric S. Raymond wrote: > Moshe Zadka : > > -- _tkinter now uses the object, rather then string, interface to Tcl. > > Hm, does this mean that the annoying requirement to do explicit gets and > sets to move data between the Python world and the Tcl/Tk world is gone? I doubt it. It's just that Python and Tcl have such a different outlook about variables, that I don't think it can be slided over. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From pf at artcom-gmbh.de Wed Mar 29 11:16:17 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 29 Mar 2000 11:16:17 +0200 (MEST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: from Moshe Zadka at "Mar 29, 2000 8:51:30 am" Message-ID: Hi! Moshe Zadka: > eagerly-waiting-for-UserString.py-ly y'rs, Z. Well, I've added the missing methods. Unfortunately I ran out of time now and a 'test_userstring.py' derived from 'src/Lib/test/test_string.py' is still missing. Regards, Peter ---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ---- #!/usr/bin/env python """A user-defined wrapper around string objects Note: string objects have grown methods in Python 1.6 This module requires Python 1.6 or later. """ from types import StringType, UnicodeType import sys class UserString: def __init__(self, string): self.data = string def __str__(self): return str(self.data) def __repr__(self): return repr(self.data) def __int__(self): return int(self.data) def __long__(self): return long(self.data) def __float__(self): return float(self.data) def __hash__(self): return hash(self.data) def __cmp__(self, string): if isinstance(string, UserString): return cmp(self.data, string.data) else: return cmp(self.data, string) def __contains__(self, char): return char in self.data def __len__(self): return len(self.data) def __getitem__(self, index): return self.__class__(self.data[index]) def __getslice__(self, start, end): start = max(start, 0); end = max(end, 0) return self.__class__(self.data[start:end]) def __add__(self, other): if isinstance(other, UserString): return self.__class__(self.data + other.data) elif isinstance(other, StringType) or isinstance(other, UnicodeType): return self.__class__(self.data + other) else: return self.__class__(self.data + str(other)) def __radd__(self, other): if isinstance(other, StringType) or isinstance(other, UnicodeType): return self.__class__(other + self.data) else: return self.__class__(str(other) + self.data) def __mul__(self, n): return self.__class__(self.data*n) __rmul__ = __mul__ # the following methods are defined in alphabetical order: def capitalize(self): return self.__class__(self.data.capitalize()) def center(self, width): return self.__class__(self.data.center(width)) def count(self, sub, start=0, end=sys.maxint): return self.data.count(sub, start, end) def encode(self, encoding=None, errors=None): # XXX improve this? if encoding: if errors: return self.__class__(self.data.encode(encoding, errors)) else: return self.__class__(self.data.encode(encoding)) else: return self.__class__(self.data.encode()) def endswith(self, suffix, start=0, end=sys.maxint): return self.data.endswith(suffix, start, end) def find(self, sub, start=0, end=sys.maxint): return self.data.find(sub, start, end) def index(self, sub, start=0, end=sys.maxint): return self.data.index(sub, start, end) def isdecimal(self): return self.data.isdecimal() def isdigit(self): return self.data.isdigit() def islower(self): return self.data.islower() def isnumeric(self): return self.data.isnumeric() def isspace(self): return self.data.isspace() def istitle(self): return self.data.istitle() def isupper(self): return self.data.isupper() def join(self, seq): return self.data.join(seq) def ljust(self, width): return self.__class__(self.data.ljust(width)) def lower(self): return self.__class__(self.data.lower()) def lstrip(self): return self.__class__(self.data.lstrip()) def replace(self, old, new, maxsplit=-1): return self.__class__(self.data.replace(old, new, maxsplit)) def rfind(self, sub, start=0, end=sys.maxint): return self.data.rfind(sub, start, end) def rindex(self, sub, start=0, end=sys.maxint): return self.data.rindex(sub, start, end) def rjust(self, width): return self.__class__(self.data.rjust(width)) def rstrip(self): return self.__class__(self.data.rstrip()) def split(self, sep=None, maxsplit=-1): return self.data.split(sep, maxsplit) def splitlines(self, maxsplit=-1): return self.data.splitlines(maxsplit) def startswith(self, prefix, start=0, end=sys.maxint): return self.data.startswith(prefix, start, end) def strip(self): return self.__class__(self.data.strip()) def swapcase(self): return self.__class__(self.data.swapcase()) def title(self): return self.__class__(self.data.title()) def translate(self, table, deletechars=""): return self.__class__(self.data.translate(table, deletechars)) def upper(self): return self.__class__(self.data.upper()) class MutableString(UserString): """mutable string objects Python strings are immutable objects. This has the advantage, that strings may be used as dictionary keys. If this property isn't needed and you insist on changing string values in place instead, you may cheat and use MutableString. But the purpose of this class is an educational one: to prevent people from inventing their own mutable string class derived from UserString and than forget thereby to remove (override) the __hash__ method inherited from ^UserString. This would lead to errors that would be very hard to track down. A faster and better solution is to rewrite the program using lists.""" def __init__(self, string=""): self.data = string def __hash__(self): raise TypeError, "unhashable type (it is mutable)" def __setitem__(self, index, sub): if index < 0 or index >= len(self.data): raise IndexError self.data = self.data[:index] + sub + self.data[index+1:] def __delitem__(self, index): if index < 0 or index >= len(self.data): raise IndexError self.data = self.data[:index] + self.data[index+1:] def __setslice__(self, start, end, sub): start = max(start, 0); end = max(end, 0) if isinstance(sub, UserString): self.data = self.data[:start]+sub.data+self.data[end:] elif isinstance(sub, StringType) or isinstance(sub, UnicodeType): self.data = self.data[:start]+sub+self.data[end:] else: self.data = self.data[:start]+str(sub)+self.data[end:] def __delslice__(self, start, end): start = max(start, 0); end = max(end, 0) self.data = self.data[:start] + self.data[end:] def immutable(self): return UserString(self.data) def _test(): s = UserString("abc") u = UserString(u"efg") # XXX add some real tests here? return 0 if __name__ == "__main__": sys.exit(_test()) From mal at lemburg.com Wed Mar 29 11:34:21 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 29 Mar 2000 11:34:21 +0200 Subject: [Python-Dev] Great Renaming? What is the goal? References: <1257835425-27941123@hypernet.com> Message-ID: <38E1CE1D.7899B1BC@lemburg.com> Gordon McMillan wrote: > > Andrew M. Kuchling wrote: > [snip] > > 2) Right now there's no way for third-party extensions to add > > themselves to a package in the standard library. Once Python finds > > foo/__init__.py, it won't look for site-packages/foo/__init__.py, so > > if you grab, say, "crypto" as a package name in the standard library, > > it's forever lost to third-party extensions. > > That way lies madness. While I'm happy to carp at Java for > requiring "com", "net" or whatever as a top level name, their > intent is correct: the names grabbed by the Python standard > packages belong to no one but the Python standard > packages. If you *don't* do that, upgrades are an absolute > nightmare. > > Marc-Andre grabbed "mx". If (as I rather suspect ) he > wants to remake the entire standard lib in his image, he's > welcome to - *under* mx. Right, that's the way I see it too. BTW, where can I register the "mx" top-level package name ? Should these be registered in the NIST registry ? Will the names registered there be honored ? > What would happen if he (and everyone else) installed > themselves *into* my core packages, then I decided I didn't > want his stuff? More than likely I'd have to scrub the damn > installation and start all over again. That's a no-no, IMHO. Unless explicitly allowed, packages should *not* install themselves as subpackages to other existing top-level packages. If they do, its their problem if the hierarchy changes... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From moshez at math.huji.ac.il Wed Mar 29 11:59:47 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 29 Mar 2000 11:59:47 +0200 (IST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: Message-ID: On Wed, 29 Mar 2000, Peter Funk wrote: > Hi! > > Moshe Zadka: > > eagerly-waiting-for-UserString.py-ly y'rs, Z. > > Well, I've added the missing methods. Unfortunately I ran out of time now and > a 'test_userstring.py' derived from 'src/Lib/test/test_string.py' is still > missing. Great work, Peter! I really like UserString. However, I have two issues with MutableString: 1. I tshouldn't share implementation with UserString, otherwise your algorithm are not behaving with correct big-O properties. It should probably use a char-array (from the array module) as the internal representation. 2. It shouldn't share interface iwth UserString, since it doesn't have a proper implementation with __hash__. All in all, I probably disagree with making MutableString a subclass of UserString. If I have time later today, I'm hoping to be able to make my own MutableString From pf at artcom-gmbh.de Wed Mar 29 12:35:32 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 29 Mar 2000 12:35:32 +0200 (MEST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: from Moshe Zadka at "Mar 29, 2000 11:59:47 am" Message-ID: Hi! > > Moshe Zadka: > > > eagerly-waiting-for-UserString.py-ly y'rs, Z. > > > On Wed, 29 Mar 2000, Peter Funk wrote: > > Well, I've added the missing methods. Unfortunately I ran out of time now and > > a 'test_userstring.py' derived from 'src/Lib/test/test_string.py' is still > > missing. > Moshe Zadka schrieb: > Great work, Peter! I really like UserString. However, I have two issues > with MutableString: > > 1. I tshouldn't share implementation with UserString, otherwise your > algorithm are not behaving with correct big-O properties. It should > probably use a char-array (from the array module) as the internal > representation. Hmm.... I don't understand what you mean with 'big-O properties'. The internal representation of any object should be considered ... umm ... internal. > 2. It shouldn't share interface iwth UserString, since it doesn't have a > proper implementation with __hash__. What's wrong with my implementation of __hash__ raising a TypeError with the attribution 'unhashable object'. This is the same behaviour, if you try to add some other mutable object as key to dictionary: >>> l = [] >>> d = { l : 'foo' } Traceback (innermost last): File "", line 1, in ? TypeError: unhashable type > All in all, I probably disagree with making MutableString a subclass of > UserString. If I have time later today, I'm hoping to be able to make my > own MutableString As I tried to point out in the docstring of 'MutableString', I don't want people actually start using the 'MutableString' class. My Intentation was to prevent people from trying to invent their own and than probably wrong MutableString class derived from UserString. Only Newbies will really ever need mutable strings in Python (see FAQ). May be my 'MutableString' idea belongs somewhere into the to be written src/Doc/libuserstring.tex. But since Newbies tend to ignore docs ... Sigh. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From gmcm at hypernet.com Wed Mar 29 13:07:20 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Wed, 29 Mar 2000 06:07:20 -0500 Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: References: <1257835425-27941123@hypernet.com> Message-ID: <1257794452-30405909@hypernet.com> Moshe Zadka wrote: > On Tue, 28 Mar 2000, Gordon McMillan wrote: > > > What would happen if he (and everyone else) installed > > themselves *into* my core packages, then I decided I didn't > > want his stuff? More than likely I'd have to scrub the damn > > installation and start all over again. > > I think Greg Stein answered that objection, by reminding us that the > filesystem isn't the only way to set up a package hierarchy. You mean when Greg said: >Assuming that you use an archive like those found in my "small" distro or > Gordon's distro, then this is no problem. The archive simply recognizes > and maps "text.encoding.macbinary" to its own module. I don't know what this has to do with it. When we get around to the 'macbinary' part, we have already established that 'text.encoding' is the parent which should supply 'macbinary'. > In > particular, even with Python's current module system, there is no need to > scrub installations: Python core modules go (under UNIX) in > /usr/local/lib/python1.5, and 3rd party modules go in > /usr/local/lib/python1.5/site-packages. And if there's a /usr/local/lib/python1.5/text/encoding, there's no way that /usr/local/lib/python1.5/site- packages/text/encoding will get searched. I believe you could hack up an importer that did allow this, and I think you'd be 100% certifiable if you did. Just look at the surprise factor. Hacking stuff into another package is just as evil as math.pi = 42. > Anyway, I already expressed my preference of the Perl way, over the Java > way. For one thing, I don't want to have to register a domain just so I > could distribute Python code I haven't the foggiest what the "Perl way" is; I wouldn't be surprised if it relied on un-Pythonic sociological factors. I already said the Java mechanics are silly; uniqueness is what matters. When Python packages start selling in the four and five figure range , then a registry mechanism will likely be necessary. - Gordon From moshez at math.huji.ac.il Wed Mar 29 13:21:09 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 29 Mar 2000 13:21:09 +0200 (IST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: Message-ID: On Wed, 29 Mar 2000, Peter Funk wrote: > > 1. I tshouldn't share implementation with UserString, otherwise your > > algorithm are not behaving with correct big-O properties. It should > > probably use a char-array (from the array module) as the internal > > representation. > > Hmm.... I don't understand what you mean with 'big-O properties'. > The internal representation of any object should be considered ... > umm ... internal. Yes, but s[0] = 'a' Should take O(1) time, not O(len(s)) > > 2. It shouldn't share interface iwth UserString, since it doesn't have a > > proper implementation with __hash__. > > What's wrong with my implementation of __hash__ raising a TypeError with > the attribution 'unhashable object'. A subtype shouldn't change contracts of its supertypes. hash() was implicitly contracted as "raising no exceptions". -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Wed Mar 29 13:30:59 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 29 Mar 2000 13:30:59 +0200 (IST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <1257794452-30405909@hypernet.com> Message-ID: On Wed, 29 Mar 2000, Gordon McMillan wrote: > And if there's a /usr/local/lib/python1.5/text/encoding, there's > no way that /usr/local/lib/python1.5/site- > packages/text/encoding will get searched. Oh my god! I just realized you're right. Well, back to the drawing board. > I haven't the foggiest what the "Perl way" is; I wouldn't be > surprised if it relied on un-Pythonic sociological factors. No, it relies on non-Pythonic (but not unpythonic -- simply different) technical choices. > I > already said the Java mechanics are silly; uniqueness is what > matters. As in all things namespacish ;-) Though I suspect a registry will be needed much sooner. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From guido at python.org Wed Mar 29 14:26:56 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Mar 2000 07:26:56 -0500 Subject: [Python-Dev] 1.5.2->1.6 Changes In-Reply-To: Your message of "Wed, 29 Mar 2000 02:21:29 EST." <20000329022129.A15539@thyrsus.com> References: <20000329022129.A15539@thyrsus.com> Message-ID: <200003291226.HAA18216@eric.cnri.reston.va.us> > Moshe Zadka : > > -- _tkinter now uses the object, rather then string, interface to Tcl. Eric Raymond: > Hm, does this mean that the annoying requirement to do explicit gets and > sets to move data between the Python world and the Tcl/Tk world is gone? Not sure what you are referring to -- this should be completely transparant to Python/Tkinter users. If you are thinking of the way Tcl variables are created and manipulated in Python, no, this doesn't change, alas (Tcl variables aren't objects -- they are manipulated through get and set commands. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Mar 29 14:32:16 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Mar 2000 07:32:16 -0500 Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: Your message of "Wed, 29 Mar 2000 11:34:21 +0200." <38E1CE1D.7899B1BC@lemburg.com> References: <1257835425-27941123@hypernet.com> <38E1CE1D.7899B1BC@lemburg.com> Message-ID: <200003291232.HAA18234@eric.cnri.reston.va.us> > > Marc-Andre grabbed "mx". If (as I rather suspect ) he > > wants to remake the entire standard lib in his image, he's > > welcome to - *under* mx. > > Right, that's the way I see it too. BTW, where can I register > the "mx" top-level package name ? Should these be registered > in the NIST registry ? Will the names registered there be > honored ? I think the NIST registry is a failed experiment -- too cumbersome to maintain or consult. We can do this the same way as common law handles trade marks: if you have used it as your brand name long enough, even if you didn't register, someone else cannot grab it away from you. > > What would happen if he (and everyone else) installed > > themselves *into* my core packages, then I decided I didn't > > want his stuff? More than likely I'd have to scrub the damn > > installation and start all over again. > > That's a no-no, IMHO. Unless explicitly allowed, packages > should *not* install themselves as subpackages to other > existing top-level packages. If they do, its their problem > if the hierarchy changes... Agreed. Although some people seem to *want* this. Probably because it's okay to do that in Java and (apparently?) in Perl. And C++, probably. It all probably stems back to Lisp. I admit that I didn't see this subtlety when I designed Python's package architecture. It's too late to change (e.g. because of __init__.py). Is it a problem though? Let's be open-minded about this and think about whether we want to allow this or not, and why... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Mar 29 14:35:33 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Mar 2000 07:35:33 -0500 Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: Your message of "Wed, 29 Mar 2000 13:21:09 +0200." References: Message-ID: <200003291235.HAA18249@eric.cnri.reston.va.us> > > What's wrong with my implementation of __hash__ raising a TypeError with > > the attribution 'unhashable object'. > > A subtype shouldn't change contracts of its supertypes. hash() was > implicitly contracted as "raising no exceptions". Let's not confuse subtypes and subclasses. One of the things implicit in the discussion on types-sig is that not every subclass is a subtype! Yes, this violates something we all learned from C++ -- but it's a great insight. No time to explain it more, but for me, Peter's subclassing UserString for MutableString to borrow implementation is fine. --Guido van Rossum (home page: http://www.python.org/~guido/) From pf at artcom-gmbh.de Wed Mar 29 15:49:24 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 29 Mar 2000 15:49:24 +0200 (MEST) Subject: [Python-Dev] NIST Registry (was Great Renaming? What is the goal?) In-Reply-To: <200003291232.HAA18234@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 29, 2000 7:32:16 am" Message-ID: Hi! Guido van Rossum: > I think the NIST registry is a failed experiment -- too cumbersome to > maintain or consult. The WEB frontend of the NIST registry is not that bad --- if you are even aware of the fact, that such a beast exists! I use Python since 1994 and discovered the NIST registry incidental a few weeks ago, when I was really looking for something about the Win32 registry and used the search engine on www.python.org. My first thought was: What a neat clever idea! I think this is an example how the Python community suffers from poor advertising of good ideas. > We can do this the same way as common law > handles trade marks: if you have used it as your brand name long > enough, even if you didn't register, someone else cannot grab it away > from you. Okay. But a more formal registry wouldn't hurt. Something like the global module index from the current docs supplemented with all contribution modules which can be currently found a www.vex.net would be a useful resource. Regards, Peter From moshez at math.huji.ac.il Wed Mar 29 16:15:36 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 29 Mar 2000 16:15:36 +0200 (IST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: <200003291235.HAA18249@eric.cnri.reston.va.us> Message-ID: On Wed, 29 Mar 2000, Guido van Rossum wrote: > Let's not confuse subtypes and subclasses. One of the things implicit > in the discussion on types-sig is that not every subclass is a > subtype! Yes, this violates something we all learned from C++ -- but > it's a great insight. No time to explain it more, but for me, Peter's > subclassing UserString for MutableString to borrow implementation is > fine. Oh, I agree with this. An earlier argument which got snipped in the discussion is why it's a bad idea to borrow implementation (a totally different argument) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From fdrake at acm.org Wed Mar 29 18:02:13 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 29 Mar 2000 11:02:13 -0500 (EST) Subject: [Python-Dev] 1.5.2->1.6 Changes In-Reply-To: References: Message-ID: <14562.10501.726637.335088@seahag.cnri.reston.va.us> Moshe Zadka writes: > -- filecmp.py (supersedes the old cmp.py and dircmp.py modules), > -- tabnanny.py (make sure the source file doesn't assume a specific tab-width) Weren't these in 1.5.2? I think filecmp is documented in the released docs... ah, no, I'm safe. ;) > Please e-mail me personally if you think of any other changes, and I'll > try to integrate them into a complete "changes" document. The documentation is updated. ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From skip at mojam.com Wed Mar 29 18:57:51 2000 From: skip at mojam.com (Skip Montanaro) Date: Wed, 29 Mar 2000 10:57:51 -0600 Subject: [Python-Dev] CVS woes... Message-ID: <200003291657.KAA22177@beluga.mojam.com> Does anyone else besides me have trouble getting their Python tree to sync with the CVS repository? I've tried all manner of flags to "cvs update", most recently "cvs update -d -A ." with no success. There are still some files I know Fred Drake has patched that show up as different and it refuses to pick up Lib/robotparser.py. I'm going to blast my current tree and start anew after saving one or two necessary files. Any thoughts you might have would be much appreciated. (Private emails please, unless for some reason you think this should be a python-dev topic. I only post here because I suspect most of the readers use CVS to keep in frequent sync and may have some insight.) Thx, -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From moshez at math.huji.ac.il Wed Mar 29 19:06:59 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 29 Mar 2000 19:06:59 +0200 (IST) Subject: [Python-Dev] 1.5.2->1.6 Changes In-Reply-To: <14562.10501.726637.335088@seahag.cnri.reston.va.us> Message-ID: On Wed, 29 Mar 2000, Fred L. Drake, Jr. wrote: > > Moshe Zadka writes: > > -- filecmp.py (supersedes the old cmp.py and dircmp.py modules), > > -- tabnanny.py (make sure the source file doesn't assume a specific tab-width) > > Weren't these in 1.5.2? I think filecmp is documented in the > released docs... ah, no, I'm safe. ;) Tabnanny wasn't a module, and filecmp wasn't at all. > The documentation is updated. ;) Yes, but it was released as a late part of 1.5.2. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From effbot at telia.com Wed Mar 29 18:38:00 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 29 Mar 2000 18:38:00 +0200 Subject: [Python-Dev] CVS woes... References: <200003291657.KAA22177@beluga.mojam.com> Message-ID: <01b701bf999d$267b6740$34aab5d4@hagrid> Skip wrote: > Does anyone else besides me have trouble getting their Python tree to sync > with the CVS repository? I've tried all manner of flags to "cvs update", > most recently "cvs update -d -A ." with no success. There are still some > files I know Fred Drake has patched that show up as different and it refuses > to pick up Lib/robotparser.py. note that robotparser doesn't show up on cvs.python.org either. maybe cnri's cvs admins should look into this... From fdrake at acm.org Wed Mar 29 20:20:14 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 29 Mar 2000 13:20:14 -0500 (EST) Subject: [Python-Dev] CVS woes... In-Reply-To: <200003291657.KAA22177@beluga.mojam.com> References: <200003291657.KAA22177@beluga.mojam.com> Message-ID: <14562.18782.465814.696099@seahag.cnri.reston.va.us> Skip Montanaro writes: > most recently "cvs update -d -A ." with no success. There are still some > files I know Fred Drake has patched that show up as different and it refuses You should be aware that many of the more recent documentation patches have been in the 1.5.2p2 branch (release-1.5.2p1-patches, I think), rather than the development head. I'm hoping to begin the merge in the next week. I also have a few patches that I haven't had time to look at yet, and I'm not inclined to make any changes until I've merged the 1.5.2p2 docs with the 1.6 tree, mostly to keep the merge from being any more painful than I already expect it to be. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From bwarsaw at cnri.reston.va.us Wed Mar 29 20:22:57 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 29 Mar 2000 13:22:57 -0500 (EST) Subject: [Python-Dev] CVS woes... References: <200003291657.KAA22177@beluga.mojam.com> <01b701bf999d$267b6740$34aab5d4@hagrid> Message-ID: <14562.18945.407398.812930@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> note that robotparser doesn't show up on cvs.python.org FL> either. maybe cnri's cvs admins should look into this... I've just resync'd python/dist and am doing a fresh checkout now. Looks like Lib/robotparser.py is there now. -Barry From guido at python.org Wed Mar 29 20:23:38 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Mar 2000 13:23:38 -0500 Subject: [Python-Dev] CVS woes... In-Reply-To: Your message of "Wed, 29 Mar 2000 10:57:51 CST." <200003291657.KAA22177@beluga.mojam.com> References: <200003291657.KAA22177@beluga.mojam.com> Message-ID: <200003291823.NAA20134@eric.cnri.reston.va.us> > Does anyone else besides me have trouble getting their Python tree to sync > with the CVS repository? I've tried all manner of flags to "cvs update", > most recently "cvs update -d -A ." with no success. There are still some > files I know Fred Drake has patched that show up as different and it refuses > to pick up Lib/robotparser.py. My bad. When I move or copy a file around in the CVS repository directly instead of using cvs commit, I have to manually call a script that updates the mirror. I've done that now, and robotparser.py should now be in the mirror. --Guido van Rossum (home page: http://www.python.org/~guido/) From gward at cnri.reston.va.us Wed Mar 29 21:06:14 2000 From: gward at cnri.reston.va.us (Greg Ward) Date: Wed, 29 Mar 2000 14:06:14 -0500 Subject: [Python-Dev] Distutils now in Python CVS tree Message-ID: <20000329140613.A5850@cnri.reston.va.us> Hi all -- Distutils is now available through the Python CVS tree *in addition to its own CVS tree*. That is, if you keep on top of developments in the Python CVS tree, then you will be tracking the latest Distutils code in Lib/distutils. Or, you can keep following the Distutils through its own CVS tree. (This is all done through one itty-bitty little symlink in the CNRI CVS repository, and It Just Works. Cool.) Note that only the 'distutils' subdirectory of the distutils distribution is tracked by Python: that is, changes to the documentation, test suites, and example setup scripts are *not* reflected in the Python CVS tree. If you follow neither Python nor Distutils CVS updates, this doesn't affect you. If you've been following Distutils CVS updates, you can continue to do so as you've always done (and as is documented on the Distutils "Anonymous CVS" web page). If you've been following Python CVS updates, then you are now following most Distutils CVS updates too -- as long as you do "cvs update -d", of course. If you're interested in following updates in the Distutils documentation, tests, examples, etc. then you should follow the Distutils CVS tree directly. If you've been following *both* Python and Distutils CVS updates, and hacking on the Distutils, then you should pick one or the other as your working directory. If you submit patches, it doesn't really matter if they're relative to the top of the Python tree, the top of the Distutils tree, or what -- I'll probably figure it out. However, it's probably best to continue sending Distutils patches to distutils-sig at python.org, *or* direct to me (gward at python.net) for trivial patches. Unless Guido says otherwise, I don't see a compelling reason to send Distutils patches to patches at python.org. In related news, the distutils-checkins list is probably going to go away, and all Distutils checkin messages will go python-checkins instead. Let me know if you avidly follow distutils-checkins, but do *not* want to follow python-checkins -- if lots of people respond (doubtful, as distutils-checkins only had 3 subscribers last I checked!), we'll reconsider. Greg From fdrake at acm.org Wed Mar 29 21:28:19 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 29 Mar 2000 14:28:19 -0500 (EST) Subject: [Python-Dev] Re: [Distutils] Distutils now in Python CVS tree In-Reply-To: <20000329140525.A5842@cnri.reston.va.us> References: <20000329140525.A5842@cnri.reston.va.us> Message-ID: <14562.22867.998809.897214@seahag.cnri.reston.va.us> Greg Ward writes: > Distutils is now available through the Python CVS tree *in addition to > its own CVS tree*. That is, if you keep on top of developments in the > Python CVS tree, then you will be tracking the latest Distutils code in > Lib/distutils. Or, you can keep following the Distutils through its own > CVS tree. (This is all done through one itty-bitty little symlink in > the CNRI CVS repository, and It Just Works. Cool.) Greg, You may want to point out the legalese requirements for patches to the Python tree. ;( That means the patches should probably go to patches at python.org or you should ensure an archive of all the legal statements is maintained at CNRI. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From ping at lfw.org Wed Mar 29 23:44:31 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 29 Mar 2000 15:44:31 -0600 (CST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <02c901bf989b$be203d80$34aab5d4@hagrid> Message-ID: On Tue, 28 Mar 2000, Fredrik Lundh wrote: > > > IMO this subdivision could be discussed and possibly revised. > > here's one proposal: > http://www.pythonware.com/people/fredrik/librarybook-contents.htm Wow. I don't think i hardly ever use any of the modules in your "Commonly Used Modules" category. Except traceback, from time to time, but that's really the only one! Hmm. I'd arrange things a little differently, though i do like the category for Data Representation (it should probably go next to Data Storage though). I would prefer a separate group for interpreter-and-development-related things. The "File Formats" group seems weak... to me, its contents would better belong in a "parsing" or "text processing" classification. urlparse definitely goes with urllib. These comments are kind of random, i know... maybe i'll try putting together another grouping if i have any time. -- ?!ng From adustman at comstar.net Thu Mar 30 02:57:06 2000 From: adustman at comstar.net (Andy Dustman) Date: Wed, 29 Mar 2000 19:57:06 -0500 (EST) Subject: [Python-Dev] socketmodule with SSL enabled In-Reply-To: <200003290150.UAA17819@eric.cnri.reston.va.us> Message-ID: I had to make the following one-line change to socketmodule.c so that it would link properly with openssl-0.9.4. In studying the openssl include files, I found: #define SSLeay_add_ssl_algorithms() SSL_library_init() SSL_library_init() seems to be the "correct" call nowadays. I don't know why this isn't being picked up. I also don't know how well the module works, other than it imports, but I sure would like to try it with Zope/ZServer/Medusa... -- andy dustman | programmer/analyst | comstar.net, inc. telephone: 770.485.6025 / 706.549.7689 | icq: 32922760 | pgp: 0xc72f3f1d "Therefore, sweet knights, if you may doubt your strength or courage, come no further, for death awaits you all, with nasty, big, pointy teeth!" Index: socketmodule.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Modules/socketmodule.c,v retrieving revision 1.98 diff -c -r1.98 socketmodule.c *** socketmodule.c 2000/03/24 20:56:56 1.98 --- socketmodule.c 2000/03/30 00:49:09 *************** *** 2384,2390 **** return; #ifdef USE_SSL SSL_load_error_strings(); ! SSLeay_add_ssl_algorithms(); SSLErrorObject = PyErr_NewException("socket.sslerror", NULL, NULL); if (SSLErrorObject == NULL) return; --- 2384,2390 ---- return; #ifdef USE_SSL SSL_load_error_strings(); ! SSL_library_init(); SSLErrorObject = PyErr_NewException("socket.sslerror", NULL, NULL); if (SSLErrorObject == NULL) return; From gstein at lyra.org Thu Mar 30 04:54:27 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 29 Mar 2000 18:54:27 -0800 (PST) Subject: [Python-Dev] installation points (was: Great Renaming? What is the goal?) In-Reply-To: <1257794452-30405909@hypernet.com> Message-ID: On Wed, 29 Mar 2000, Gordon McMillan wrote: > Moshe Zadka wrote: > > On Tue, 28 Mar 2000, Gordon McMillan wrote: > > > What would happen if he (and everyone else) installed > > > themselves *into* my core packages, then I decided I didn't > > > want his stuff? More than likely I'd have to scrub the damn > > > installation and start all over again. > > > > I think Greg Stein answered that objection, by reminding us that the > > filesystem isn't the only way to set up a package hierarchy. > > You mean when Greg said: > >Assuming that you use an archive like those found in my "small" distro or > > Gordon's distro, then this is no problem. The archive simply recognizes > > and maps "text.encoding.macbinary" to its own module. > > I don't know what this has to do with it. When we get around > to the 'macbinary' part, we have already established that > 'text.encoding' is the parent which should supply 'macbinary'. good point... > > In > > particular, even with Python's current module system, there is no need to > > scrub installations: Python core modules go (under UNIX) in > > /usr/local/lib/python1.5, and 3rd party modules go in > > /usr/local/lib/python1.5/site-packages. > > And if there's a /usr/local/lib/python1.5/text/encoding, there's > no way that /usr/local/lib/python1.5/site- > packages/text/encoding will get searched. > > I believe you could hack up an importer that did allow this, and > I think you'd be 100% certifiable if you did. Just look at the > surprise factor. > > Hacking stuff into another package is just as evil as math.pi = > 42. Not if the package was designed for it. For a "package" like "net", it would be perfectly acceptable to allow third-parties to define that as their installation point. And yes, assume there is an importer that looks into the installed archives for modules. In the example, the harder part is determining where the "text.encoding" package is loaded from. And yah: it may be difficult to arrange the the text.encoding's importer to allow for archive searching. Cheers, -g -- Greg Stein, http://www.lyra.org/ From thomas.heller at ion-tof.com Thu Mar 30 21:30:25 2000 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Thu, 30 Mar 2000 21:30:25 +0200 Subject: [Python-Dev] Metaclasses, customizing attribute access for classes Message-ID: <021c01bf9a7e$662327c0$4500a8c0@thomasnotebook> Dear Python-developers, Recently I played with metaclasses from within python, also with Jim Fulton's ExtensionClass. I even tried to write my own metaclass in a C-extension, using the famous Don Beaudry hook. It seems that ExtensionClass does not completely what I want. Metaclasses implemented in python are somewhat slow, also writing them is a lot of work. Writing a metaclass in C is even more work... Well, what do I want? Often, I use the following pattern: class X: def __init__ (self): self.delegate = anObjectImplementedInC(...) def __getattr__ (self, key): return self.delegate.dosomething(key) def __setattr__ (self, key, value): self.delegate.doanotherthing(key, value) def __delattr__ (self, key): self.delegate.doevenmore(key) This is too slow (for me). So what I would like do to is: class X: def __init__ (self): self.__dict__ = aMappingObject(...) and now aMappingObject will automatically receive all the setattr, getattr, and delattr calls. The *only* thing which is required for this is to remove the restriction that the __dict__ attribute must be a dictionary. This is only a small change to classobject.c (which unfortunately I have only implemented for 1.5.2, not for the CVS version). The performance impact for this change is unnoticable in pystone. What do you think? Should I prepare a patch? Any chance that this can be included in a future python version? Thomas Heller From petrilli at amber.org Thu Mar 30 21:52:02 2000 From: petrilli at amber.org (Christopher Petrilli) Date: Thu, 30 Mar 2000 14:52:02 -0500 Subject: [Python-Dev] Unicode compile Message-ID: <20000330145202.B9078@trump.amber.org> I don't know how much memory other people have in their machiens, but in this machine (128Mb), I get the following trying to compile a CVS checkout of Python: gcc -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c ./unicodedatabase.c:53482: virtual memory exhausted I hope that this is a temporary thing, or we ship the database some other manner, but I would argue that you should be able to compile Python on a machine with 32Mb of RAM at MOST.... for an idea of how much VM this machine has, i have 256Mb of SWAP on top of it. Chris -- | Christopher Petrilli | petrilli at amber.org From guido at python.org Thu Mar 30 22:12:22 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Mar 2000 15:12:22 -0500 Subject: [Python-Dev] Unicode compile In-Reply-To: Your message of "Thu, 30 Mar 2000 14:52:02 EST." <20000330145202.B9078@trump.amber.org> References: <20000330145202.B9078@trump.amber.org> Message-ID: <200003302012.PAA22062@eric.cnri.reston.va.us> > I don't know how much memory other people have in their machiens, but > in this machine (128Mb), I get the following trying to compile a CVS > checkout of Python: > > gcc -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c > ./unicodedatabase.c:53482: virtual memory exhausted > > I hope that this is a temporary thing, or we ship the database some > other manner, but I would argue that you should be able to compile > Python on a machine with 32Mb of RAM at MOST.... for an idea of how > much VM this machine has, i have 256Mb of SWAP on top of it. I'm not sure how to fix this, short of reading the main database from a file. Marc-Andre? --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer at tismer.com Thu Mar 30 22:14:55 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 30 Mar 2000 22:14:55 +0200 Subject: [Python-Dev] Unicode compile References: <20000330145202.B9078@trump.amber.org> Message-ID: <38E3B5BF.2D00F930@tismer.com> Christopher Petrilli wrote: > > I don't know how much memory other people have in their machiens, but > in this machine (128Mb), I get the following trying to compile a CVS > checkout of Python: > > gcc -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c > ./unicodedatabase.c:53482: virtual memory exhausted > > I hope that this is a temporary thing, or we ship the database some > other manner, but I would argue that you should be able to compile > Python on a machine with 32Mb of RAM at MOST.... for an idea of how > much VM this machine has, i have 256Mb of SWAP on top of it. I had similar effects, what made me work on a compressed database (see older messages). Due to time limits, I will not get ready before 1.6.a1 is out. And then quite a lot of other changes will be necessary by Marc, since the API changes quite much. But it will definately be a less than 20 KB module, proven. ciao - chris(2) -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From akuchlin at mems-exchange.org Thu Mar 30 22:14:27 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Thu, 30 Mar 2000 15:14:27 -0500 (EST) Subject: [Python-Dev] Unicode compile In-Reply-To: <200003302012.PAA22062@eric.cnri.reston.va.us> References: <20000330145202.B9078@trump.amber.org> <200003302012.PAA22062@eric.cnri.reston.va.us> Message-ID: <14563.46499.555853.413690@amarok.cnri.reston.va.us> Guido van Rossum writes: >I'm not sure how to fix this, short of reading the main database from >a file. Marc-Andre? Turning off optimization may help. (Or it may not -- it might be creating the data structures for a large static table that's the problem.) --amk From akuchlin at mems-exchange.org Thu Mar 30 22:22:02 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Thu, 30 Mar 2000 15:22:02 -0500 (EST) Subject: [Python-Dev] mmapfile module In-Reply-To: <200003282000.PAA11988@eric.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> Message-ID: <14563.46954.70800.706245@amarok.cnri.reston.va.us> Guido van Rossum writes: >I don't know enough about this, but it seems that there might be two >steps: *creating* a mmap object is necessarily platform-specific; but >*using* a mmap object could be platform-neutral. > >What is the API for mmap objects? You create them; Unix wants a file descriptor, and Windows wants a filename. Then they behave like buffer objects, like mutable strings. I like Fredrik's suggestion of an 'open(filename, mode, ...)' type of interface. If someone can suggest a way to handle the extra flags such as MAP_SHARED and the Windows tag argument, I'll happily implement it. Maybe just keyword arguments that differ across platforms? open(filename, mode, [tag = 'foo',] [flags = mmapfile.MAP_SHARED]). We could preserve the ability to mmap() only a file descriptor on Unix through a separate openfd() function. I'm also strongly tempted to rename the module from mmapfile to just 'mmap'. I'd suggest waiting until the interface is finalized before adding the module to the CVS tree -- which means after 1.6a1 -- but I can add the module as it stands if you like. Guido, let me know if you want me to do that. -- A.M. Kuchling http://starship.python.net/crew/amk/ A Puck is harder by far to hurt than some little lord of malice from the lands of ice and snow. We Pucks are old and hard and wild... -- Robin Goodfellow, in SANDMAN #66: "The Kindly Ones:10" From guido at python.org Thu Mar 30 22:23:42 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Mar 2000 15:23:42 -0500 Subject: [Python-Dev] socketmodule with SSL enabled In-Reply-To: Your message of "Wed, 29 Mar 2000 19:57:06 EST." References: Message-ID: <200003302023.PAA22350@eric.cnri.reston.va.us> > I had to make the following one-line change to socketmodule.c so that it > would link properly with openssl-0.9.4. In studying the openssl include > files, I found: > > #define SSLeay_add_ssl_algorithms() SSL_library_init() > > SSL_library_init() seems to be the "correct" call nowadays. I don't know > why this isn't being picked up. I also don't know how well the module > works, other than it imports, but I sure would like to try it with > Zope/ZServer/Medusa... Strange -- the version of OpenSSL I have also calls itself 0.9.4 ("OpenSSL 0.9.4 09 Aug 1999" to be precise) and doesn't have SSL_library_init(). I wonder what gives... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Mar 30 22:25:58 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Mar 2000 15:25:58 -0500 Subject: [Python-Dev] mmapfile module In-Reply-To: Your message of "Thu, 30 Mar 2000 15:22:02 EST." <14563.46954.70800.706245@amarok.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> Message-ID: <200003302025.PAA22367@eric.cnri.reston.va.us> > Guido van Rossum writes: > >I don't know enough about this, but it seems that there might be two > >steps: *creating* a mmap object is necessarily platform-specific; but > >*using* a mmap object could be platform-neutral. > > > >What is the API for mmap objects? [AMK] > You create them; Unix wants a file descriptor, and Windows wants a > filename. Then they behave like buffer objects, like mutable strings. > > I like Fredrik's suggestion of an 'open(filename, mode, ...)' type of > interface. If someone can suggest a way to handle the extra flags > such as MAP_SHARED and the Windows tag argument, I'll happily > implement it. Maybe just keyword arguments that differ across > platforms? open(filename, mode, [tag = 'foo',] [flags = > mmapfile.MAP_SHARED]). We could preserve the ability to mmap() only a > file descriptor on Unix through a separate openfd() function. Yes, keyword args seem to be the way to go. To avoid an extra function you could add a fileno=... kwarg, in which case the filename is ignored or required to be "". > I'm > also strongly tempted to rename the module from mmapfile to just > 'mmap'. Sure. > I'd suggest waiting until the interface is finalized before adding the > module to the CVS tree -- which means after 1.6a1 -- but I can add the > module as it stands if you like. Guido, let me know if you want me to > do that. Might as well check it in -- the alpha is going to be rough and I expect another alpha to come out shortly to correct the biggest problems. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Thu Mar 30 22:22:08 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 30 Mar 2000 22:22:08 +0200 Subject: [Python-Dev] Unicode compile References: <20000330145202.B9078@trump.amber.org> <200003302012.PAA22062@eric.cnri.reston.va.us> Message-ID: <38E3B770.6CD61C37@lemburg.com> Guido van Rossum wrote: > > > I don't know how much memory other people have in their machiens, but > > in this machine (128Mb), I get the following trying to compile a CVS > > checkout of Python: > > > > gcc -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c > > ./unicodedatabase.c:53482: virtual memory exhausted > > > > I hope that this is a temporary thing, or we ship the database some > > other manner, but I would argue that you should be able to compile > > Python on a machine with 32Mb of RAM at MOST.... for an idea of how > > much VM this machine has, i have 256Mb of SWAP on top of it. > > I'm not sure how to fix this, short of reading the main database from > a file. Marc-Andre? Hmm, the file compiles fine on my 64MB Linux machine with about 100MB of swap. What gcc version do you use ? Anyway, once Christian is ready with his compact replacement I think we no longer have to worry about that chunk of static data :-) Reading in the data from a file is not a very good solution, because it would override the OS optimizations for static data in object files (like e.g. swapping in only those pages which are really needed, etc.). An alternative solution would be breaking the large table into several smaller ones and accessing it via a redirection function. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From adustman at comstar.net Thu Mar 30 23:12:51 2000 From: adustman at comstar.net (Andy Dustman) Date: Thu, 30 Mar 2000 16:12:51 -0500 (EST) Subject: [Python-Dev] socketmodule with SSL enabled In-Reply-To: <200003302023.PAA22350@eric.cnri.reston.va.us> Message-ID: On Thu, 30 Mar 2000, Guido van Rossum wrote: > > I had to make the following one-line change to socketmodule.c so that it > > would link properly with openssl-0.9.4. In studying the openssl include > > files, I found: > > > > #define SSLeay_add_ssl_algorithms() SSL_library_init() > > > > SSL_library_init() seems to be the "correct" call nowadays. I don't know > > why this isn't being picked up. I also don't know how well the module > > works, other than it imports, but I sure would like to try it with > > Zope/ZServer/Medusa... > > Strange -- the version of OpenSSL I have also calls itself 0.9.4 > ("OpenSSL 0.9.4 09 Aug 1999" to be precise) and doesn't have > SSL_library_init(). > > I wonder what gives... I don't know. Right after I made the patch, I found that 0.9.5 is available, and I was able to successfully compile against that version (with the patch). -- andy dustman | programmer/analyst | comstar.net, inc. telephone: 770.485.6025 / 706.549.7689 | icq: 32922760 | pgp: 0xc72f3f1d "Therefore, sweet knights, if you may doubt your strength or courage, come no further, for death awaits you all, with nasty, big, pointy teeth!" From akuchlin at mems-exchange.org Thu Mar 30 23:19:45 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Thu, 30 Mar 2000 16:19:45 -0500 (EST) Subject: [Python-Dev] mmapfile module In-Reply-To: <200003302025.PAA22367@eric.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> Message-ID: <14563.50417.909045.81868@amarok.cnri.reston.va.us> Guido van Rossum writes: >Might as well check it in -- the alpha is going to be rough and I >expect another alpha to come out shortly to correct the biggest >problems. Done -- just doing my bit to ensure the first alpha is rough! :) My next task is to add the Expat module. My understanding is that it's OK to add Expat itself, too; where should I put all that code? Modules/expat/* ? -- A.M. Kuchling http://starship.python.net/crew/amk/ I'll bring the Kindly Ones down on his blasted head. -- Desire, in SANDMAN #31: "Three Septembers and a January" From fdrake at acm.org Thu Mar 30 23:29:58 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 30 Mar 2000 16:29:58 -0500 (EST) Subject: [Python-Dev] mmapfile module In-Reply-To: <14563.50417.909045.81868@amarok.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> Message-ID: <14563.51030.24773.587972@seahag.cnri.reston.va.us> Andrew M. Kuchling writes: > Done -- just doing my bit to ensure the first alpha is rough! :) > > My next task is to add the Expat module. My understanding is that > it's OK to add Expat itself, too; where should I put all that code? > Modules/expat/* ? Do you have documentation for this? -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From akuchlin at mems-exchange.org Thu Mar 30 23:30:35 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Thu, 30 Mar 2000 16:30:35 -0500 (EST) Subject: [Python-Dev] mmapfile module In-Reply-To: <14563.51030.24773.587972@seahag.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <14563.51030.24773.587972@seahag.cnri.reston.va.us> Message-ID: <14563.51067.560938.367690@amarok.cnri.reston.va.us> Fred L. Drake, Jr. writes: > Do you have documentation for this? Somewhere at home, I think, but not here at work. I'll try to get it checked in before 1.6alpha1, but don't hold me to that. --amk From guido at python.org Thu Mar 30 23:31:58 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Mar 2000 16:31:58 -0500 Subject: [Python-Dev] mmapfile module In-Reply-To: Your message of "Thu, 30 Mar 2000 16:19:45 EST." <14563.50417.909045.81868@amarok.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> Message-ID: <200003302131.QAA22897@eric.cnri.reston.va.us> > Done -- just doing my bit to ensure the first alpha is rough! :) When the going gets rough, the rough get going :-) > My next task is to add the Expat module. My understanding is that > it's OK to add Expat itself, too; where should I put all that code? > Modules/expat/* ? Whoa... Not sure. This will give issues with Patrice, at least (even if it is pure Open Source -- given the size). I'd prefer to add instructions to Setup.in about where to get it. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Thu Mar 30 23:34:55 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 30 Mar 2000 16:34:55 -0500 (EST) Subject: [Python-Dev] mmapfile module In-Reply-To: <14563.51067.560938.367690@amarok.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <14563.51030.24773.587972@seahag.cnri.reston.va.us> <14563.51067.560938.367690@amarok.cnri.reston.va.us> Message-ID: <14563.51327.190466.477566@seahag.cnri.reston.va.us> Andrew M. Kuchling writes: > Somewhere at home, I think, but not here at work. I'll try to get it > checked in before 1.6alpha1, but don't hold me to that. The date isn't important; I'm not planning to match alpha/beta releases with Doc releases. I just want to be sure it gets in soon so that the debugging process can kick in for that as well. ;) Thanks! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido at python.org Thu Mar 30 23:34:02 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Mar 2000 16:34:02 -0500 Subject: [Python-Dev] mmapfile module In-Reply-To: Your message of "Thu, 30 Mar 2000 16:31:58 EST." <200003302131.QAA22897@eric.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> Message-ID: <200003302134.QAA22939@eric.cnri.reston.va.us> > Whoa... Not sure. This will give issues with Patrice, at least (even > if it is pure Open Source -- given the size). For those outside CNRI -- Patrice is CNRI's tough IP lawyer. --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Thu Mar 30 23:48:13 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Thu, 30 Mar 2000 16:48:13 -0500 (EST) Subject: [Python-Dev] Expat module In-Reply-To: <200003302131.QAA22897@eric.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> Message-ID: <14563.52125.401817.986919@amarok.cnri.reston.va.us> Guido van Rossum writes: >> My next task is to add the Expat module. My understanding is that >> it's OK to add Expat itself, too; where should I put all that code? >> Modules/expat/* ? > >Whoa... Not sure. This will give issues with Patrice, at least (even >if it is pure Open Source -- given the size). I'd prefer to add >instructions to Setup.in about where to get it. Fair enough; I'll just add the module itself, then, and we can always change it later. Should we consider replacing the makesetup/Setup.in mechanism with a setup.py script that uses the Distutils? You'd have to compile a minipython with just enough critical modules -- strop and posixmodule are probably the most important ones -- in order to run setup.py. It's something I'd like to look at for 1.6, because then you could be much smarter in automatically enabling modules. -- A.M. Kuchling http://starship.python.net/crew/amk/ This is the way of Haskell or Design by Contract of Eiffel. This one is like wearing a XV century armor, you walk very safely but in a very tiring way. -- Manuel Gutierrez Algaba, 26 Jan 2000 From guido at python.org Fri Mar 31 00:41:45 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Mar 2000 17:41:45 -0500 Subject: [Python-Dev] Expat module In-Reply-To: Your message of "Thu, 30 Mar 2000 16:48:13 EST." <14563.52125.401817.986919@amarok.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> <14563.52125.401817.986919@amarok.cnri.reston.va.us> Message-ID: <200003302241.RAA23050@eric.cnri.reston.va.us> > Fair enough; I'll just add the module itself, then, and we can always > change it later. OK. > Should we consider replacing the makesetup/Setup.in mechanism with a > setup.py script that uses the Distutils? You'd have to compile a > minipython with just enough critical modules -- strop and posixmodule > are probably the most important ones -- in order to run setup.py. > It's something I'd like to look at for 1.6, because then you could be > much smarter in automatically enabling modules. If you can come up with something that works well enough, that would be great. (Although I'm not sure where the distutils come in.) We still need to use configure/autoconf though. Hardcoding a small complement of modules is no problem. (Why do you think you need strop though? Remember we have string methods!) --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond at skippinet.com.au Fri Mar 31 01:03:39 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri, 31 Mar 2000 09:03:39 +1000 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/PC python_nt.rc,1.8,1.9 In-Reply-To: <200003302259.RAA23266@eric.cnri.reston.va.us> Message-ID: This is the version number as displayed by Windows Explorer in the "properties" dialog. Mark. > Modified Files: > python_nt.rc > Log Message: > Seems there was a version string here that still looked > like 1.5.2. > > > Index: python_nt.rc > ========================================================== > ========= > RCS file: /projects/cvsroot/python/dist/src/PC/python_nt.rc,v > retrieving revision 1.8 > retrieving revision 1.9 > diff -C2 -r1.8 -r1.9 > *** python_nt.rc 2000/03/29 01:50:50 1.8 > --- python_nt.rc 2000/03/30 22:59:09 1.9 > *************** > *** 29,34 **** > > VS_VERSION_INFO VERSIONINFO > ! FILEVERSION 1,5,2,3 > ! PRODUCTVERSION 1,5,2,3 > FILEFLAGSMASK 0x3fL > #ifdef _DEBUG > --- 29,34 ---- > > VS_VERSION_INFO VERSIONINFO > ! FILEVERSION 1,6,0,0 > ! PRODUCTVERSION 1,6,0,0 > FILEFLAGSMASK 0x3fL > #ifdef _DEBUG > > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://www.python.org/mailman/listinfo/python-checkins > From effbot at telia.com Fri Mar 31 00:40:51 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 31 Mar 2000 00:40:51 +0200 Subject: [Python-Dev] SRE: what to do with undocumented attributes? Message-ID: <00b701bf9a99$022339c0$34aab5d4@hagrid> at this time, SRE uses types instead of classes for compiled patterns and matches. these classes provide a documented interface, and a bunch of internal attributes, for example: RegexObjects: code -- a PCRE code object pattern -- the source pattern groupindex -- maps group names to group indices MatchObjects: regs -- same as match.span()? groupindex -- as above re -- the pattern object used for this match string -- the target string used for this match the problem is that some other modules use these attributes directly. for example, xmllib.py uses the pattern attribute, and other code I've seen uses regs to speed things up. in SRE, I would like to get rid of all these (except possibly for the match.string attribute). opinions? From guido at python.org Fri Mar 31 01:31:43 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Mar 2000 18:31:43 -0500 Subject: [Python-Dev] SRE: what to do with undocumented attributes? In-Reply-To: Your message of "Fri, 31 Mar 2000 00:40:51 +0200." <00b701bf9a99$022339c0$34aab5d4@hagrid> References: <00b701bf9a99$022339c0$34aab5d4@hagrid> Message-ID: <200003302331.SAA24895@eric.cnri.reston.va.us> > at this time, SRE uses types instead of classes for compiled > patterns and matches. these classes provide a documented > interface, and a bunch of internal attributes, for example: > > RegexObjects: > > code -- a PCRE code object > pattern -- the source pattern > groupindex -- maps group names to group indices > > MatchObjects: > > regs -- same as match.span()? > groupindex -- as above > re -- the pattern object used for this match > string -- the target string used for this match > > the problem is that some other modules use these attributes > directly. for example, xmllib.py uses the pattern attribute, and > other code I've seen uses regs to speed things up. > > in SRE, I would like to get rid of all these (except possibly for > the match.string attribute). > > opinions? Sounds reasonable. All std lib modules that violate this will need to be fixed once sre.py replaces re.py. (Checkin of sre is next.) --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Fri Mar 31 01:40:16 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Thu, 30 Mar 2000 18:40:16 -0500 (EST) Subject: [Python-Dev] SRE: what to do with undocumented attributes? In-Reply-To: <00b701bf9a99$022339c0$34aab5d4@hagrid> References: <00b701bf9a99$022339c0$34aab5d4@hagrid> Message-ID: <14563.58848.109072.339060@amarok.cnri.reston.va.us> Fredrik Lundh writes: >RegexObjects: > code -- a PCRE code object > pattern -- the source pattern > groupindex -- maps group names to group indices pattern and groupindex are documented in the Library Reference, and they're part of the public interface. .code is not, so you can drop it. >MatchObjects: > regs -- same as match.span()? > groupindex -- as above > re -- the pattern object used for this match > string -- the target string used for this match .re and .string are documented. I don't see a reference to MatchObject.groupindex anywhere, and .regs isn't documented, so those two can be ignored; xmllib or whatever external modules use them are being very naughty, so go ahead and break them. -- A.M. Kuchling http://starship.python.net/crew/amk/ Imagine a thousand thousand fireflies of every shape and color; Oh, that was Baghdad at night in those days. -- From SANDMAN #50: "Ramadan" From effbot at telia.com Fri Mar 31 01:05:15 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 31 Mar 2000 01:05:15 +0200 Subject: [Python-Dev] SRE: what to do with undocumented attributes? References: <00b701bf9a99$022339c0$34aab5d4@hagrid> <14563.58848.109072.339060@amarok.cnri.reston.va.us> Message-ID: <00e901bf9a9c$6c036240$34aab5d4@hagrid> Andrew wrote: > >RegexObjects: > > code -- a PCRE code object > > pattern -- the source pattern > > groupindex -- maps group names to group indices > > pattern and groupindex are documented in the Library Reference, and > they're part of the public interface. hmm. I could have sworn... guess I didn't look carefully enough (or someone's used his time machine again :-). oh well, more bloat... btw, "pattern" doesn't make much sense in SRE -- who says the pattern object was created by re.compile? guess I'll just set it to None in other cases (e.g. sregex, sreverb, sgema...) From bwarsaw at cnri.reston.va.us Fri Mar 31 02:35:16 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 30 Mar 2000 19:35:16 -0500 (EST) Subject: [Python-Dev] SRE: what to do with undocumented attributes? References: <00b701bf9a99$022339c0$34aab5d4@hagrid> <14563.58848.109072.339060@amarok.cnri.reston.va.us> <00e901bf9a9c$6c036240$34aab5d4@hagrid> Message-ID: <14563.62148.860971.360871@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> hmm. I could have sworn... guess I didn't look carefully FL> enough (or someone's used his time machine again :-). Yep, sorry. If it's documented as in the public interface, it should be kept. Anything else can go (he says without yet grep'ing through his various code bases). -Barry From bwarsaw at cnri.reston.va.us Fri Mar 31 06:34:15 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 30 Mar 2000 23:34:15 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 References: <200003310117.UAA26774@eric.cnri.reston.va.us> Message-ID: <14564.10951.90258.729547@anthem.cnri.reston.va.us> >>>>> "Guido" == Guido van Rossum writes: Guido> Modified Files: mmapmodule.c Log Message: Hacked for Win32 Guido> by Mark Hammond. Reformatted for 8-space tabs and fitted Guido> into 80-char lines by GvR. Can we change the 8-space-tab rule for all new C code that goes in? I know that we can't practically change existing code right now, but for new C code, I propose we use no tab characters, and we use a 4-space block indentation. -Barry From DavidA at ActiveState.com Fri Mar 31 07:07:02 2000 From: DavidA at ActiveState.com (David Ascher) Date: Thu, 30 Mar 2000 21:07:02 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 In-Reply-To: <14564.10951.90258.729547@anthem.cnri.reston.va.us> Message-ID: > Can we change the 8-space-tab rule for all new C code that goes in? I > know that we can't practically change existing code right now, but for > new C code, I propose we use no tab characters, and we use a 4-space > block indentation. Heretic! +1, FWIW =) From bwarsaw at cnri.reston.va.us Fri Mar 31 07:16:48 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Fri, 31 Mar 2000 00:16:48 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 References: <14564.10951.90258.729547@anthem.cnri.reston.va.us> Message-ID: <14564.13504.310866.835201@anthem.cnri.reston.va.us> >>>>> "DA" == David Ascher writes: DA> Heretic! DA> +1, FWIW =) I hereby offer to so untabify and reformat any C code in the standard distribution that Guido will approve of. -Barry From mhammond at skippinet.com.au Fri Mar 31 07:16:26 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri, 31 Mar 2000 15:16:26 +1000 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 In-Reply-To: Message-ID: +1 for me too. It also brings all source files under the same guidelines (rather than seperate ones for .py and .c) Mark. From bwarsaw at cnri.reston.va.us Fri Mar 31 07:40:16 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Fri, 31 Mar 2000 00:40:16 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 References: Message-ID: <14564.14912.629414.970309@anthem.cnri.reston.va.us> >>>>> "MH" == Mark Hammond writes: MH> +1 for me too. It also brings all source files under the same MH> guidelines (rather than seperate ones for .py and .c) BTW, I further propose that if Guido lets me reformat the C code, that we freeze other checkins for the duration and I temporarily turn off the python-checkins email. That is, unless you guys /want/ to be bombarded with boatloads of useless diffs. :) -Barry From pf at artcom-gmbh.de Fri Mar 31 08:45:45 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Fri, 31 Mar 2000 08:45:45 +0200 (MEST) Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....) In-Reply-To: <14564.14912.629414.970309@anthem.cnri.reston.va.us> from "bwarsaw@cnri.reston.va.us" at "Mar 31, 2000 0:40:16 am" Message-ID: Hi! sigh :-( > >>>>> "MH" == Mark Hammond writes: > > MH> +1 for me too. It also brings all source files under the same > MH> guidelines (rather than seperate ones for .py and .c) bwarsaw at cnri.reston.va.us: > BTW, I further propose that if Guido lets me reformat the C code, that > we freeze other checkins for the duration and I temporarily turn off > the python-checkins email. That is, unless you guys /want/ to be > bombarded with boatloads of useless diffs. :) -1 for C reformatting. The 4 space intendation seesm reasonable for Python sources, but I disaggree for C code. C is not Python. Let me cite a very prominent member of the open source community (pasted from /usr/src/linux/Documentation/CodingStyle): Chapter 1: Indentation Tabs are 8 characters, and thus indentations are also 8 characters. There are heretic movements that try to make indentations 4 (or even 2!) characters deep, and that is akin to trying to define the value of PI to be 3. Rationale: The whole idea behind indentation is to clearly define where a block of control starts and ends. Especially when you've been looking at your screen for 20 straight hours, you'll find it a lot easier to see how the indentation works if you have large indentations. Now, some people will claim that having 8-character indentations makes the code move too far to the right, and makes it hard to read on a 80-character terminal screen. The answer to that is that if you need more than 3 levels of indentation, you're screwed anyway, and should fix your program. In short, 8-char indents make things easier to read, and have the added benefit of warning you when you're nesting your functions too deep. Heed that warning. Also the Python interpreter has no strong relationship with Linux kernel a agree with Linus on this topic. Python source code is another thing: Python identifiers are usually longer due to qualifiying and Python operands are often lists, tuples or the like, so lines contain more stuff. disliking-yet-another-white-space-discussion-ly y'rs - peter From mhammond at skippinet.com.au Fri Mar 31 09:11:50 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri, 31 Mar 2000 17:11:50 +1000 Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....) In-Reply-To: Message-ID: > Rationale: The whole idea behind indentation is to > clearly define where > a block of control starts and ends. Especially when Ironically, this statement is a strong argument for insisting on Python using real tab characters! "Clearly define" is upgraded to "used to define". > 80-character terminal screen. The answer to that is > that if you need > more than 3 levels of indentation, you're screwed > anyway, and should fix > your program. Yeah, right! int foo() { // one level for the privilege of being here. switch (bar) { // uh oh - running out of room... case WTF: // Oh no - if I use an "if" statement, // my code is "screwed"?? } } > disliking-yet-another-white-space-discussion-ly y'rs - peter Like-death-and-taxes-ly y'rs - Mark. From moshez at math.huji.ac.il Fri Mar 31 10:04:32 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Fri, 31 Mar 2000 10:04:32 +0200 (IST) Subject: [Python-Dev] mmapfile module In-Reply-To: <200003302134.QAA22939@eric.cnri.reston.va.us> Message-ID: On Thu, 30 Mar 2000, Guido van Rossum wrote: > > Whoa... Not sure. This will give issues with Patrice, at least (even > > if it is pure Open Source -- given the size). > > For those outside CNRI -- Patrice is CNRI's tough IP lawyer. It was understandable from the context... Personally, I'd rather if it was folded in by value, and not by reference: one reason is versioning problems, and another is pure laziness on my part. what-do-you-have-when-you-got-a-lawyer-up-to-his-neck-in-the-sand-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal at lemburg.com Fri Mar 31 09:42:04 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 31 Mar 2000 09:42:04 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us> Message-ID: <38E456CC.1A49334A@lemburg.com> "Barry A. Warsaw" wrote: > > >>>>> "Guido" == Guido van Rossum writes: > > Guido> Modified Files: mmapmodule.c Log Message: Hacked for Win32 > Guido> by Mark Hammond. Reformatted for 8-space tabs and fitted > Guido> into 80-char lines by GvR. > > Can we change the 8-space-tab rule for all new C code that goes in? I > know that we can't practically change existing code right now, but for > new C code, I propose we use no tab characters, and we use a 4-space > block indentation. Why not just leave new code formatted as it is (except maybe to bring the used TAB width to the standard 8 spaces used throughout the Python C source code) ? BTW, most of the new unicode stuff uses 4-space indents. Unfortunately, it mixes whitespace and tabs since Emacs c-mode doesn't do the python-mode magic yet (is there a way to turn it on ?). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From effbot at telia.com Fri Mar 31 11:14:49 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 31 Mar 2000 11:14:49 +0200 Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....) References: Message-ID: <01ae01bf9af1$927b1940$34aab5d4@hagrid> Peter Funk wrote: > Also the Python interpreter has no strong relationship with Linux kernel > a agree with Linus on this topic. Python source code is another thing: > Python identifiers are usually longer due to qualifiying and Python > operands are often lists, tuples or the like, so lines contain more stuff. you're just guessing, right? (if you check, you'll find that the actual difference is very small. iirc, that's true for c, c++, java, python, tcl, and probably a few more languages. dunno about perl, though... :-) From effbot at telia.com Fri Mar 31 11:17:42 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 31 Mar 2000 11:17:42 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us> <38E456CC.1A49334A@lemburg.com> Message-ID: <01b501bf9af1$f9b44500$34aab5d4@hagrid> M.-A. Lemburg wrote: > Why not just leave new code formatted as it is (except maybe > to bring the used TAB width to the standard 8 spaces used throughout > the Python C source code) ? > > BTW, most of the new unicode stuff uses 4-space indents. > Unfortunately, it mixes whitespace and tabs since Emacs > c-mode doesn't do the python-mode magic yet (is there a > way to turn it on ?). http://www.jwz.org/doc/tabs-vs-spaces.html contains some hints. From moshez at math.huji.ac.il Fri Mar 31 13:24:05 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Fri, 31 Mar 2000 13:24:05 +0200 (IST) Subject: [Python-Dev] 1.5.2 -> 1.6 Changes Message-ID: Here is a new list of things that will change in the next release. Thanks to all the people who gave me hints and information! If you have anything you think I missed, or mistreated, please e-mail me personally -- I'll post an updated version soon. Obligatory ========== A lot of bug-fixes, some optimizations, many improvements in the documentation Core changes ============ Deleting objects is safe even for deeply nested data structures. Long/int unifications: long integers can be used in seek() calls, as slice indexes. str(1L) --> '1', not '1L' (repr() is still the same) Builds on NT Alpha UnboundLocalError is raised when a local variable is undefined long, int take optional "base" parameter string objects now have methods (though they are still immutable) unicode support: Unicode strings are marked with u"string", and there is support for arbitrary encoders/decoders "in" operator can now be overriden in user-defined classes to mean anything: it calls the magic method __contains__ New calling syntax: f(*args, **kw) equivalent to apply(f, args, kw) Some methods which would take multiple arguments and treat them as a tuple were fixed: list.{append, insert, remove, count}, socket.connect New modules =========== winreg - Windows registry interface. Distutils - tools for distributing Python modules robotparser - parse a robots.txt file (for writing web spiders) linuxaudio - audio for Linux mmap - treat a file as a memory buffer sre - regular expressions (fast, supports unicode) filecmp - supersedes the old cmp.py and dircmp.py modules tabnanny - check Python sources for tab-width dependance unicode - support for unicode codecs - support for Unicode encoders/decoders Module changes ============== re - changed to be a frontend to sre readline, ConfigParser, cgi, calendar, posix, readline, xmllib, aifc, chunk, wave, random, shelve, nntplib - minor enhancements socket, httplib, urllib - optional OpenSSL support _tkinter - support for 8.1,8.2,8.3 (no support for versions older then 8.0) Tool changes ============ IDLE -- complete overhaul (Andrew, I'm still waiting for the expat support and integration to add to this list -- other than that, please contact me if you want something less telegraphic ) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping at lfw.org Fri Mar 31 14:01:21 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 31 Mar 2000 04:01:21 -0800 (PST) Subject: [Python-Dev] Roundup et al. Message-ID: Hi -- there was some talk on this list earlier about nosy lists, managing patches, and such things, so i just wanted to mention, for anybody interested, that i threw together Roundup very quickly for you to try out. http://www.lfw.org/python/ There's a tar file there -- it's very messy code, and i apologize (it was hastily hacked out of the running prototype implementation), but it should be workable enough to play with. There's a test installation to play with at http://www.lfw.org/ping/roundup/roundup.cgi Dummy user:password pairs are test:test, spam:spam, eggs:eggs. A fancier design, still in the last stages of coming together (which will be my submission to the Software Carpentry contest) is up at http://crit.org/http://www.lfw.org/ping/sctrack.html and i welcome your thoughts and comments on that if you have the spare time (ha!) and generous inclination to contribute them. Thank you and apologies for the interruption. -- ?!ng "To be human is to continually change. Your desire to remain as you are is what ultimately limits you." -- The Puppet Master, Ghost in the Shell From guido at python.org Fri Mar 31 14:10:45 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 31 Mar 2000 07:10:45 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 In-Reply-To: Your message of "Thu, 30 Mar 2000 23:34:15 EST." <14564.10951.90258.729547@anthem.cnri.reston.va.us> References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us> Message-ID: <200003311210.HAA29010@eric.cnri.reston.va.us> > Can we change the 8-space-tab rule for all new C code that goes in? I > know that we can't practically change existing code right now, but for > new C code, I propose we use no tab characters, and we use a 4-space > block indentation. Actually, this one was formatted for 8-space indents but using 4-space tabs, so in my editor it looked like 16-space indents! Given that we don't want to change existing code, I'd prefer to stick with 1-tab 8-space indents. --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez at math.huji.ac.il Fri Mar 31 15:10:06 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Fri, 31 Mar 2000 15:10:06 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc ACKS,1.51,1.52 In-Reply-To: <200003311301.IAA29221@eric.cnri.reston.va.us> Message-ID: On Fri, 31 Mar 2000, Guido van Rossum wrote: > + Christian Tismer > + Christian Tismer Ummmmm....I smell something fishy here. Are there two Christian Tismers? That would explain how Christian has so much time to work on Stackless. Well, between the both of them, Guido will have no chance but to put Stackless in the standard distribution. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From fredrik at pythonware.com Fri Mar 31 15:16:16 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 31 Mar 2000 15:16:16 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc ACKS,1.51,1.52 References: <200003311301.IAA29221@eric.cnri.reston.va.us> Message-ID: <000d01bf9b13$4be1db00$0500a8c0@secret.pythonware.com> > Tracy Tims > + Christian Tismer > + Christian Tismer > R Lindsay Todd two christians? From bwarsaw at cnri.reston.va.us Fri Mar 31 15:55:13 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Fri, 31 Mar 2000 08:55:13 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us> <38E456CC.1A49334A@lemburg.com> Message-ID: <14564.44609.221250.471147@anthem.cnri.reston.va.us> >>>>> "M" == M writes: M> BTW, most of the new unicode stuff uses 4-space indents. M> Unfortunately, it mixes whitespace and tabs since Emacs M> c-mode doesn't do the python-mode magic yet (is there a M> way to turn it on ?). (setq indent-tabs-mode nil) I could add that to the "python" style. And to zap all your existing tab characters: C-M-h M-x untabify RET -Barry From skip at mojam.com Fri Mar 31 16:04:46 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 31 Mar 2000 08:04:46 -0600 (CST) Subject: [Python-Dev] 1.5.2 -> 1.6 Changes In-Reply-To: References: Message-ID: <14564.45182.460160.589244@beluga.mojam.com> Moshe, I would highlight those bits that are likely to warrant a little closer scrutiny. The list.{append,insert,...} and socket.connect change certainly qualify. Perhaps split the Core Changes section into two subsections, one set of changes likely to require some adaptation and one set that should be backwards-compatible. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From guido at python.org Fri Mar 31 16:47:31 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 31 Mar 2000 09:47:31 -0500 Subject: [Python-Dev] 1.5.2 -> 1.6 Changes In-Reply-To: Your message of "Fri, 31 Mar 2000 08:04:46 CST." <14564.45182.460160.589244@beluga.mojam.com> References: <14564.45182.460160.589244@beluga.mojam.com> Message-ID: <200003311447.JAA29633@eric.cnri.reston.va.us> See what I've done to Moshe's list: http://www.python.org/1.6/ --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Fri Mar 31 17:28:56 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 31 Mar 2000 09:28:56 -0600 (CST) Subject: [Python-Dev] 1.5.2 -> 1.6 Changes In-Reply-To: <200003311447.JAA29633@eric.cnri.reston.va.us> References: <14564.45182.460160.589244@beluga.mojam.com> <200003311447.JAA29633@eric.cnri.reston.va.us> Message-ID: <14564.50232.734778.152933@beluga.mojam.com> Guido> See what I've done to Moshe's list: http://www.python.org/1.6/ Looks good. Attached are a couple nitpicky diffs. Skip -------------- next part -------------- A non-text attachment was scrubbed... Name: 1.6.diff Type: application/octet-stream Size: 1263 bytes Desc: diffs to 1.6 Release Notes URL: From guido at python.org Fri Mar 31 17:47:56 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 31 Mar 2000 10:47:56 -0500 Subject: [Python-Dev] Windows installer pre-prelease Message-ID: <200003311547.KAA15538@eric.cnri.reston.va.us> The Windows installer is always hard to get just right. If you have a moment, go to http://www.python.org/1.6/ and download the Windows Installer prerelease. Let me know what works, what doesn't! I've successfully installed it on Windows NT 4.0 and on Windows 98, both with default install target and with a modified install target. I'd love to hear that it also installs cleanly on Windows 95. Please test IDLE from the start menu! --Guido van Rossum (home page: http://www.python.org/~guido/) From gward at cnri.reston.va.us Fri Mar 31 18:18:43 2000 From: gward at cnri.reston.va.us (Greg Ward) Date: Fri, 31 Mar 2000 11:18:43 -0500 Subject: [Python-Dev] Distutils for the std. library (was: Expat module) In-Reply-To: <14563.52125.401817.986919@amarok.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Thu, Mar 30, 2000 at 04:48:13PM -0500 References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> <14563.52125.401817.986919@amarok.cnri.reston.va.us> Message-ID: <20000331111842.A8060@cnri.reston.va.us> On 30 March 2000, Andrew M. Kuchling said: > Should we consider replacing the makesetup/Setup.in mechanism with a > setup.py script that uses the Distutils? You'd have to compile a > minipython with just enough critical modules -- strop and posixmodule > are probably the most important ones -- in order to run setup.py. > It's something I'd like to look at for 1.6, because then you could be > much smarter in automatically enabling modules. Gee, I didn't think anyone was gonna open *that* can of worms for 1.6. Obviously, I'd love to see the Distutils used to build parts of the Python library. Some possible problems: * Distutils relies heavily on the sys, os, string, and re modules, so those would have to be built and included in the mythical mini-python (as would everything they rely on -- strop, pcre, ... ?) * Distutils currently assumes that it's working with an installed Python -- it doesn't know anything about working in the Python source tree. I think this could be fixed just be tweaking the distutils.sysconfig module, but there might be subtle assumptions elsewhere in the code. * I haven't written the mythical Autoconf-in-Python yet, so we'd still have to rely on either the configure script or user intervention to find out whether library X is installed, and where its header and library files live (for X in zlib, tcl, tk, ...). Of course, the configure script would still be needed to build the mini-python, so it's not going away any time soon. Greg From skip at mojam.com Fri Mar 31 18:26:55 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 31 Mar 2000 10:26:55 -0600 (CST) Subject: [Python-Dev] Distutils for the std. library (was: Expat module) In-Reply-To: <20000331111842.A8060@cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> <14563.52125.401817.986919@amarok.cnri.reston.va.us> <20000331111842.A8060@cnri.reston.va.us> Message-ID: <14564.53711.803509.962248@beluga.mojam.com> Greg> * Distutils relies heavily on the sys, os, string, and re Greg> modules, so those would have to be built and included in the Greg> mythical mini-python (as would everything they rely on -- Greg> strop, pcre, ... ?) With string methods in 1.6, reliance on the string and strop modules should be lessened or eliminated, right? re and os may need a tweak or two to use string methods themselves. The sys module is always available. Perhaps it would make sense to put sre(module)?.c into the Python directory where sysmodule.c lives. That way, a Distutils-capable mini-python could be built without messing around in the Modules directory at all... -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From moshez at math.huji.ac.il Fri Mar 31 18:25:11 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Fri, 31 Mar 2000 18:25:11 +0200 (IST) Subject: [Python-Dev] Distutils for the std. library (was: Expat module) In-Reply-To: <20000331111842.A8060@cnri.reston.va.us> Message-ID: On Fri, 31 Mar 2000, Greg Ward wrote: > Gee, I didn't think anyone was gonna open *that* can of worms for 1.6. Well, it's not like it's not a lot of work, but it could be done, with liberal interpretation of "mini": include in "mini" Python *all* modules which do not rely on libraries not distributed with the Python core -- zlib, expat and Tkinter go right out the window, but most everything else can stay. That way, Distutils can use all modules it currently uses . The other problem, file-location, is a problem I have talked about earlier: it *cannot* be assumed that the default place for putting new libraries is the same place the Python interpreter resides, for many reasons. Why not ask the user explicitly? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From gward at cnri.reston.va.us Fri Mar 31 18:29:33 2000 From: gward at cnri.reston.va.us (Greg Ward) Date: Fri, 31 Mar 2000 11:29:33 -0500 Subject: [Python-Dev] Distutils for the std. library (was: Expat module) In-Reply-To: <14564.53711.803509.962248@beluga.mojam.com>; from skip@mojam.com on Fri, Mar 31, 2000 at 10:26:55AM -0600 References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> <14563.52125.401817.986919@amarok.cnri.reston.va.us> <20000331111842.A8060@cnri.reston.va.us> <14564.53711.803509.962248@beluga.mojam.com> Message-ID: <20000331112933.B8060@cnri.reston.va.us> On 31 March 2000, Skip Montanaro said: > With string methods in 1.6, reliance on the string and strop modules should > be lessened or eliminated, right? re and os may need a tweak or two to use > string methods themselves. The sys module is always available. Perhaps it > would make sense to put sre(module)?.c into the Python directory where > sysmodule.c lives. That way, a Distutils-capable mini-python could be built > without messing around in the Modules directory at all... But I'm striving to maintain compatability with (at least) Python 1.5.2 in Distutils. That need will fade with time, but it's not going to disappear the moment Python 1.6 is released. (Guess I'll have to find somewhere else to play with string methods and extended call syntax). Greg From thomas.heller at ion-tof.com Fri Mar 31 19:09:41 2000 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Fri, 31 Mar 2000 19:09:41 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: distutils/distutils msvccompiler.py References: <200003311653.LAA08175@thrak.cnri.reston.va.us> Message-ID: <038701bf9b33$e7c49240$4500a8c0@thomasnotebook> > Simplified Thomas Heller's registry patch: just assign all those > HKEY_* and Reg* names once, rather than having near-duplicate code > in the two import attempts. Your change won't work, the function names in win32api and winreg are not the same: Example: win32api.RegEnumValue <-> winreg.EnumValue > > Also dropped the leading underscore on all the imported symbols, > as it's not appropriate (they're not local to this module). Are they used anywhere else? Or do you think they *could* be used somewhere else? Thomas Heller From mal at lemburg.com Fri Mar 31 12:19:58 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 31 Mar 2000 12:19:58 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us> <38E456CC.1A49334A@lemburg.com> <01b501bf9af1$f9b44500$34aab5d4@hagrid> Message-ID: <38E47BCE.94E4E012@lemburg.com> Fredrik Lundh wrote: > > M.-A. Lemburg wrote: > > Why not just leave new code formatted as it is (except maybe > > to bring the used TAB width to the standard 8 spaces used throughout > > the Python C source code) ? > > > > BTW, most of the new unicode stuff uses 4-space indents. > > Unfortunately, it mixes whitespace and tabs since Emacs > > c-mode doesn't do the python-mode magic yet (is there a > > way to turn it on ?). > > http://www.jwz.org/doc/tabs-vs-spaces.html > contains some hints. Ah, cool. Thanks :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From pf at artcom-gmbh.de Fri Mar 31 20:56:40 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Fri, 31 Mar 2000 20:56:40 +0200 (MEST) Subject: [Python-Dev] 'make install' should create lib/site-packages IMO In-Reply-To: <200003311513.KAA00790@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 31, 2000 10:13:20 am" Message-ID: Hi! Guido van Rossum: [...] > Modified Files: > Makefile.in > Log Message: > Added distutils and distutils/command to LIBSUBDIRS. Noted by Andrew > Kuchling. [...] > ! LIBSUBDIRS= lib-old lib-tk test test/output encodings \ > ! distutils distutils/command $(MACHDEPS) [...] What about 'site-packages'? SuSE added this to their Python packaging and I think it is a good idea to have an empty 'site-packages' directory installed by default. Regards, Peter From akuchlin at mems-exchange.org Fri Mar 31 22:16:53 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Fri, 31 Mar 2000 15:16:53 -0500 (EST) Subject: [Python-Dev] SRE: what to do with undocumented attributes? In-Reply-To: <00e901bf9a9c$6c036240$34aab5d4@hagrid> References: <00b701bf9a99$022339c0$34aab5d4@hagrid> <14563.58848.109072.339060@amarok.cnri.reston.va.us> <00e901bf9a9c$6c036240$34aab5d4@hagrid> Message-ID: <14565.1973.361549.291817@amarok.cnri.reston.va.us> Fredrik Lundh writes: >btw, "pattern" doesn't make much sense in SRE -- who says >the pattern object was created by re.compile? guess I'll just >set it to None in other cases (e.g. sregex, sreverb, sgema...) Good point; I can imagine fabulously complex patterns assembled programmatically, for which no summary could be made. I guess there could be another attribute that also gives the class (module? function?) used to compile the pattern, but more likely, the pattern attribute should be deprecated and eventually dropped. -- A.M. Kuchling http://starship.python.net/crew/amk/ You know how she is when she gets an idea into her head. I mean, when one finally penetrates. -- Desire describes Delirium, in SANDMAN #41: "Brief Lives:1" From pf at artcom-gmbh.de Fri Mar 31 22:14:41 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Fri, 31 Mar 2000 22:14:41 +0200 (MEST) Subject: [Python-Dev] 1.5.2 -> 1.6 Changes In-Reply-To: <200003311447.JAA29633@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 31, 2000 9:47:31 am" Message-ID: Hi! Guido van Rossum : > See what I've done to Moshe's list: http://www.python.org/1.6/ Very fine, but I have a few small annotations: 1.'linuxaudio' has been renamed to 'linuxaudiodev' 2.The following text: "_tkinter - support for 8.1,8.2,8.3 (no support for versions older than 8.0)." looks a bit misleading, since it is not explicit about Version 8.0.x I suggest the following wording: "_tkinter - supports Tcl/Tk from version 8.0 up to the current 8.3. Support for versions older than 8.0 has been dropped." 3.'src/Tools/i18n/pygettext.py' by Barry should be mentioned. This is a very useful utility. I suggest to append the following text: "New utility pygettext.py -- Python equivalent of xgettext(1). A message text extraction tool used for internationalizing applications written in Python" Regards, Peter From fdrake at acm.org Fri Mar 31 22:30:00 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 31 Mar 2000 15:30:00 -0500 (EST) Subject: [Python-Dev] 1.5.2 -> 1.6 Changes In-Reply-To: References: <200003311447.JAA29633@eric.cnri.reston.va.us> Message-ID: <14565.2760.665022.206361@seahag.cnri.reston.va.us> Peter Funk writes: > I suggest the following wording: ... > a very useful utility. I suggest to append the following text: Peter, I'm beginning to figure this out -- you really just want to get published! ;) You forgot the legelese. ;( -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido at python.org Fri Mar 31 23:30:42 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 31 Mar 2000 16:30:42 -0500 Subject: [Python-Dev] Python 1.6 alpha 1 released Message-ID: <200003312130.QAA04361@eric.cnri.reston.va.us> I've just released a source tarball and a Windows installer for Python 1.6 alpha 1 to the Python website: http://www.python.org/1.6/ Probably the biggest news (if you hadn't heard the rumors) is Unicode support. More news on the above webpage. Note: this is an alpha release. Some of the code is very rough! Please give it a try with your favorite Python application, but don't trust it for production use yet. I plan to release several more alpha and beta releases over the next two months, culminating in an 1.6 final release around June first. We need your help to make the final 1.6 release as robust as possible -- please test this alpha release!!! --Guido van Rossum (home page: http://www.python.org/~guido/) From gandalf at starship.python.net Fri Mar 31 23:56:16 2000 From: gandalf at starship.python.net (Vladimir Ulogov) Date: Fri, 31 Mar 2000 16:56:16 -0500 (EST) Subject: [Python-Dev] Re: Python 1.6 alpha 1 released In-Reply-To: <200003312130.QAA04361@eric.cnri.reston.va.us> Message-ID: Guido, """where you used to write sock.connect(host, port) you must now write sock.connect((host, port))""" Is it possible to keep old notation ? I'm understand (according you past mail about parameters of the connect) this may be not what you has have in mind, but we do use this notation "a lot" and for us it will means to create workaround for socket.connect function. It's inconvinient. In general, I'm thinknig the socket.connect(Host, Port) looks prettier :)) than socket.connect((Host, Port)) Vladimir From gstein at lyra.org Wed Mar 1 00:47:55 2000 From: gstein at lyra.org (Greg Stein) Date: Tue, 29 Feb 2000 15:47:55 -0800 (PST) Subject: [Python-Dev] breaking list.append() In-Reply-To: <38BC2375.5C832488@tismer.com> Message-ID: On Tue, 29 Feb 2000, Christian Tismer wrote: > Greg Stein wrote: > > +1 on breaking it now, rather than deferring it Yet Again. > > > > IMO, there has been plenty of warning, and there is plenty of time to > > correct the software. > > > > I'm +0 on adding a warning architecture to Python to support issuing a > > warning/error when .append is called with multiple arguments. > > Well, the (bad) effect of this patch is that you cannot run > PythonWin any longer unless Mark either supplies an updated > distribution, or one corrects the two barfing Scintilla > support scripts by hand. Yes, but there is no reason to assume this won't happen. Why don't we simply move forward with the assumption that PythonWin and Scintilla will be updated? If we stand around pointing at all the uses of append that are incorrect and claim that is why we can't move forward, then we won't get anywhere. Instead, let's just *MOVE* and see that software authors update accordingly. It isn't like it is a difficult change to make. Heck, PythonWin and Scintilla could be updated within the week and re-released. *WAY* ahead of the 1.6 release. > Bad for me, since I'm building Stackless Python against 1.5.2+, > and that means the users will see PythonWin barf when installing SLP. If you're building a system using an interim release of Python, then I think you need to take responsibility for that. If you don't want those people to have problems, then you can back out the list.append change. Or you can release patches to PythonWin. I don't think the Python world at large should be hampered because somebody is using an unstable/interim version of Python. Again: we couldn't move forward. > Adding a warning instead of raising an exception would be nice IMHO, > since the warning could probably contain the file name and line > number to change, and I would leave my users with this easy task. Yes, this would be nice. But somebody has to take the time to code it up. The warning won't appear out of nowhere... Cheers, -g -- Greg Stein, http://www.lyra.org/ From mhammond at skippinet.com.au Wed Mar 1 00:57:38 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 1 Mar 2000 10:57:38 +1100 Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: > Why don't we simply move forward with the assumption that PythonWin and > Scintilla will be updated? Done :-) However, I think dropping it now _is_ a little heavy handed. I decided to do a wider search and found a few in, eg, Sam Rushings calldll based ODBC package. Personally, I would much prefer a warning now, and drop it later. _Then_ we can say we have made enough noise about it. It would only be 2 years ago that I became aware that this "feature" of append was not a feature at all - up until then I used it purposely, and habits are sometimes hard to change :-) MArk. From gstein at lyra.org Wed Mar 1 01:12:29 2000 From: gstein at lyra.org (Greg Stein) Date: Tue, 29 Feb 2000 16:12:29 -0800 (PST) Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: On Wed, 1 Mar 2000, Mark Hammond wrote: > > Why don't we simply move forward with the assumption that PythonWin and > > Scintilla will be updated? > > Done :-) hehe... > However, I think dropping it now _is_ a little heavy handed. I decided to > do a wider search and found a few in, eg, Sam Rushings calldll based ODBC > package. > > Personally, I would much prefer a warning now, and drop it later. _Then_ we > can say we have made enough noise about it. It would only be 2 years ago > that I became aware that this "feature" of append was not a feature at all - > up until then I used it purposely, and habits are sometimes hard to change > :-) What's the difference between a warning and an error? If you're running a program and it suddenly spits out a warning about a misuse of list.append, I'd certainly see that as "the program did something unexpected; that is an error." But this is all moot. Guido has already said that we would be amenable to a warning/error infrastructure which list.append could use. His description used some awkward sentences, so I'm not sure (without spending some brain cycles to parse the email) exactly what his desired defaults and behavior are. But hey... the possibility is there, and is just waiting for somebody to code it. IMO, Guido has left an out for people that are upset with the current hard-line approach. One of those people just needs to spend a bit of time coming up with a patch :-) And yes, Guido is also the Benevolent Dictator and can certainly have his mind changed, so people can definitely continue pestering him to back away from the hard-line approach... Cheers, -g -- Greg Stein, http://www.lyra.org/ From ping at lfw.org Wed Mar 1 01:20:07 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 29 Feb 2000 18:20:07 -0600 (CST) Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: On Tue, 29 Feb 2000, Greg Stein wrote: > > What's the difference between a warning and an error? If you're running a > program and it suddenly spits out a warning about a misuse of list.append, > I'd certainly see that as "the program did something unexpected; that is > an error." A big, big difference. Perhaps to one of us, it's the minor inconvenience of reading the error message and inserting a couple of parentheses in the appropriate file -- but to the end user, it's the difference between the program working (albeit noisily) and *not* working. When the program throws an exception and stops, it is safe to say most users will declare it broken and give up. We can't assume that they're going to be able to figure out what to edit (or be brave enough to try) just by reading the error message... or even what interpreter flag to give, if errors (rather than warnings) are the default behaviour. -- ?!ng From klm at digicool.com Wed Mar 1 01:37:09 2000 From: klm at digicool.com (Ken Manheimer) Date: Tue, 29 Feb 2000 19:37:09 -0500 (EST) Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: On Wed, 1 Mar 2000, Mark Hammond wrote: > > Why don't we simply move forward with the assumption that PythonWin and > > Scintilla will be updated? > > Done :-) > > However, I think dropping it now _is_ a little heavy handed. I decided to > do a wider search and found a few in, eg, Sam Rushings calldll based ODBC > package. > > Personally, I would much prefer a warning now, and drop it later. _Then_ we > can say we have made enough noise about it. It would only be 2 years ago > that I became aware that this "feature" of append was not a feature at all - > up until then I used it purposely, and habits are sometimes hard to change > :-) I agree with mark. Why the sudden rush?? It seems to me to be unfair to make such a change - one that will break peoples code - without advanced warning, which typically is handled by a deprecation period. There *are* going to be people who won't be informed of the change in the short span of less than a single release. Just because it won't cause you pain isn't a good reason to disregard the pain of those that will suffer, particularly when you can do something relatively low-cost to avoid it. Ken klm at digicool.com From gstein at lyra.org Wed Mar 1 01:57:56 2000 From: gstein at lyra.org (Greg Stein) Date: Tue, 29 Feb 2000 16:57:56 -0800 (PST) Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: On Tue, 29 Feb 2000, Ken Manheimer wrote: >... > I agree with mark. Why the sudden rush?? It seems to me to be unfair to > make such a change - one that will break peoples code - without advanced > warning, which typically is handled by a deprecation period. There *are* > going to be people who won't be informed of the change in the short span > of less than a single release. Just because it won't cause you pain isn't > a good reason to disregard the pain of those that will suffer, > particularly when you can do something relatively low-cost to avoid it. Sudden rush?!? Mark said he knew about it for a couple years. Same here. It was a long while ago that .append()'s semantics were specified to "no longer" accept multiple arguments. I see in the HISTORY file, that changes were made to Python 1.4 (October, 1996) to avoid calling append() with multiple arguments. So, that is over three years that append() has had multiple-args deprecated. There was probably discussion even before that, but I can't seem to find something to quote. Seems like plenty of time -- far from rushed. Cheers, -g -- Greg Stein, http://www.lyra.org/ From klm at digicool.com Wed Mar 1 02:02:02 2000 From: klm at digicool.com (Ken Manheimer) Date: Tue, 29 Feb 2000 20:02:02 -0500 (EST) Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: On Tue, 29 Feb 2000, Greg Stein wrote: > On Tue, 29 Feb 2000, Ken Manheimer wrote: > >... > > I agree with mark. Why the sudden rush?? It seems to me to be unfair to > > make such a change - one that will break peoples code - without advanced > > warning, which typically is handled by a deprecation period. There *are* > > going to be people who won't be informed of the change in the short span > > of less than a single release. Just because it won't cause you pain isn't > > a good reason to disregard the pain of those that will suffer, > > particularly when you can do something relatively low-cost to avoid it. > > Sudden rush?!? > > Mark said he knew about it for a couple years. Same here. It was a long > while ago that .append()'s semantics were specified to "no longer" accept > multiple arguments. > > I see in the HISTORY file, that changes were made to Python 1.4 (October, > 1996) to avoid calling append() with multiple arguments. > > So, that is over three years that append() has had multiple-args > deprecated. There was probably discussion even before that, but I can't > seem to find something to quote. Seems like plenty of time -- far from > rushed. None the less, for those practicing it, the incorrectness of it will be fresh news. I would be less sympathetic with them if there was recent warning, eg, the schedule for changing it in the next release was part of the current release. But if you tell somebody you're going to change something, and then don't for a few years, you probably need to renew the warning before you make the change. Don't you think so? Why not? Ken klm at digicool.com From paul at prescod.net Wed Mar 1 03:56:33 2000 From: paul at prescod.net (Paul Prescod) Date: Tue, 29 Feb 2000 18:56:33 -0800 Subject: [Python-Dev] breaking list.append() References: Message-ID: <38BC86E1.53F69776@prescod.net> Software configuration management is HARD. Every sudden backwards incompatible change (warranted or not) makes it harder. Mutli-arg append is not hurting anyone as much as a sudden change to it would. It would be better to leave append() alone and publicize its near-term removal rather than cause random, part-time supported modules to stop working because their programmers may be too busy to update them right now. So no, I'm not stepping up to do it. But I'm also saying that the better "lazy" option is to put something in a prominent place in the documentation and otherwise leave it alone. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "The calculus and the rich body of mathematical analysis to which it gave rise made modern science possible, but it was the algorithm that made possible the modern world." - from "Advent of the Algorithm" David Berlinski http://www.opengroup.com/mabooks/015/0151003386.shtml From guido at python.org Wed Mar 1 05:11:02 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Feb 2000 23:11:02 -0500 Subject: [Python-Dev] breaking list.append() In-Reply-To: Your message of "Tue, 29 Feb 2000 18:56:33 PST." <38BC86E1.53F69776@prescod.net> References: <38BC86E1.53F69776@prescod.net> Message-ID: <200003010411.XAA12988@eric.cnri.reston.va.us> > Software configuration management is HARD. Every sudden backwards > incompatible change (warranted or not) makes it harder. Mutli-arg append > is not hurting anyone as much as a sudden change to it would. It would > be better to leave append() alone and publicize its near-term removal > rather than cause random, part-time supported modules to stop working > because their programmers may be too busy to update them right now. I'm tired of this rhetoric. It's not like I'm changing existing Python installations retroactively. I'm planning to release a new version of Python which no longer supports certain long-obsolete and undocumented behavior. If you maintain a non-core Python module, you should test it against the new release and fix anything that comes up. This is why we have an alpha and beta test cycle and even before that the CVS version. If you are a Python user who depends on a 3rd party module, you need to find out whether the new version is compatible with the 3rd party code you are using, or whether there's a newer version available that solves the incompatibility. There are people who still run Python 1.4 (really!) because they haven't upgraded. I don't have a problem with that -- they don't get much support, but it's their choice, and they may not need the new features introduced since then. I expect that lots of people won't upgrade their Python 1.5.2 to 1.6 right away -- they'll wait until the other modules/packages they need are compatible with 1.6. Multi-arg append probably won't be the only reason why e.g. Digital Creations may need to release an update to Zope for Python 1.6. Zope comes with its own version of Python anyway, so they have control over when they make the switch. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one at email.msn.com Wed Mar 1 06:04:35 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 00:04:35 -0500 Subject: [Python-Dev] Size of int across machines (was RE: Blowfish in Python?) In-Reply-To: Message-ID: <000201bf833b$a3b01bc0$412d153f@tim> [Markus Stenberg] > ... > speed was horrendous. > > I think the main reason was the fact that I had to use _long ints_ for > calculations, as the normal ints are signed, and apparently the bitwise > operators do not work as advertised when bit32 is set (=number is > negative). [Tim, takes "bitwise operators" to mean & | ^ ~, and expresses surprise] [Markus, takes umbrage, and expresses umbrage ] > Hmm.. As far as I'm concerned, shifts for example do screw up. Do you mean "for example" as in "there are so many let's just pick one at random", or as in "this is the only one I've stumbled into" <0.9 wink>? > i.e. > > 0xffffffff >> 30 > > [64bit Python: 3] > [32bit Python: -1] > > As far as I'm concerned, that should _not_ happen. Or maybe it's just me. I could not have guessed that your complaint was about 64-bit Python from your "when bit32 is set (=number is negative)" description . The behavior shown in a Python compiled under a C in which sizeof(long)==4 matches the Reference Manual (see the "Integer and long integer literals" and "shifting operations" sections). So that can't be considered broken (you may not *like* it, but it's functioning as designed & as documented). The behavior under a sizeof(long)==8 C seems more of an ill-documented (and debatable to me too) feature. The possibility is mentioned in the "The standard type hierarchy" section (under Numbers -> Integers -> Plain integers) but really not fleshed out, and the "Integer and long integer literals" section plainly contradicts it. Python's going to have to clean up its act here -- 64-bit machines are getting more common. There's a move afoot to erase the distinction between Python ints and longs (in the sense of auto-converting from one to the other under the covers, as needed). In that world, your example would work like the "64bit Python" one. There are certainly compatability issues, though, in that int left shifts are end-off now, and on a 32-bit machine any int for which i & 0x8000000 is true "is negative" (and so sign-extends on a right shift; note that Python guarantees sign-extending right shifts *regardless* of what the platform C does (C doesn't define what happens here -- Python does)). [description of pain getting a fast C-like "mod 2**32 int +" to work too] Python really wasn't designed for high-performance bit-fiddling, so you're (as you've discovered ) swimming upstream with every stroke. Given that you can't write a C module here, there's nothing better than to do the ^ & | ~ parts with ints, and fake the rest slowly & painfully. Note that you can at least determine the size of a Python int via inspecting sys.maxint. sympathetically-unhelpfully y'rs - tim From guido at python.org Wed Mar 1 06:44:10 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 01 Mar 2000 00:44:10 -0500 Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python In-Reply-To: Your message of "Tue, 29 Feb 2000 15:34:21 MST." <20000229153421.A16502@acs.ucalgary.ca> References: <20000229153421.A16502@acs.ucalgary.ca> Message-ID: <200003010544.AAA13155@eric.cnri.reston.va.us> [I don't like to cross-post to patches and python-dev, but I think this belongs in patches because it's a followup to Neil's post there and also in -dev because of its longer-term importance.] Thanks for the new patches, Neil! We had a visitor here at CNRI today, Eric Tiedemann , who had a look at your patches before. Eric knows his way around the Scheme, Lisp and GC literature, and presented a variant on your approach which takes the bite out of the recursive passes. Eric had commented earlier on Neil's previous code, and I had used the morning to make myself familiar with Neil's code. This was relatively easy because Neil's code is very clear. Today, Eric proposed to do away with Neil's hash table altogether -- as long as we're wasting memory, we might as well add 3 fields to each container object rather than allocating the same amount in a separate hash table. Eric expects that this will run faster, although this obviously needs to be tried. Container types are: dict, list, tuple, class, instance; plus potentially user-defined container types such as kjbuckets. I have a feeling that function objects should also be considered container types, because of the cycle involving globals. Eric's algorithm, then, consists of the following parts. Each container object has three new fields: gc_next, gc_prev, and gc_refs. (Eric calls the gc_refs "refcount-zero".) We color objects white (initial), gray (root), black (scanned root). (The terms are explained later; we believe we don't actually need bits in the objects to store the color; see later.) All container objects are chained together in a doubly-linked list -- this is the same as Neil's code except Neil does it only for dicts. (Eric postulates that you need a list header.) When GC is activated, all objects are colored white; we make a pass over the entire list and set gc_refs equal to the refcount for each object. Next, we make another pass over the list to collect the internal references. Internal references are (just like in Neil's version) references from other container types. In Neil's version, this was recursive; in Eric's version, we don't need recursion, since the list already contains all containers. So we simple visit the containers in the list in turn, and for each one we go over all the objects it references and subtract one from *its* gc_refs field. (Eric left out the little detail that we ened to be able to distinguish between container and non-container objects amongst those references; this can be a flag bit in the type field.) Now, similar to Neil's version, all objects for which gc_refs == 0 have only internal references, and are potential garbage; all objects for which gc_refs > 0 are "roots". These have references to them from other places, e.g. from globals or stack frames in the Python virtual machine. We now start a second list, to which we will move all roots. The way to do this is to go over the first list again and to move each object that has gc_refs > 0 to the second list. Objects placed on the second list in this phase are considered colored gray (roots). Of course, some roots will reference some non-roots, which keeps those non-roots alive. We now make a pass over the second list, where for each object on the second list, we look at every object it references. If a referenced object is a container and is still in the first list (colored white) we *append* it to the second list (colored gray). Because we append, objects thus added to the second list will eventually be considered by this same pass; when we stop finding objects that sre still white, we stop appending to the second list, and we will eventually terminate this pass. Conceptually, objects on the second list that have been scanned in this pass are colored black (scanned root); but there is no need to to actually make the distinction. (How do we know whether an object pointed to is white (in the first list) or gray or black (in the second)? We could use an extra bitfield, but that's a waste of space. Better: we could set gc_refs to a magic value (e.g. 0xffffffff) when we move the object to the second list. During the meeting, I proposed to set the back pointer to NULL; that might work too but I think the gc_refs field is more elegant. We could even just test for a non-zero gc_refs field; the roots moved to the second list initially all have a non-zero gc_refs field already, and for the objects with a zero gc_refs field we could indeed set it to something arbitrary.) Once we reach the end of the second list, all objects still left in the first list are garbage. We can destroy them in a similar to the way Neil does this in his code. Neil calls PyDict_Clear on the dictionaries, and ignores the rest. Under Neils assumption that all cycles (that he detects) involve dictionaries, that is sufficient. In our case, we may need a type-specific "clear" function for containers in the type object. We discussed more things, but not as thoroughly. Eric & Eric stressed the importance of making excellent statistics available about the rate of garbage collection -- probably as data structures that Python code can read rather than debugging print statements. Eric T also sketched an incremental version of the algorithm, usable for real-time applications. This involved keeping the gc_refs field ("external" reference counts) up-to-date at all times, which would require two different versions of the INCREF/DECREF macros: one for adding/deleting a reference from a container, and another for adding/deleting a root reference. Also, a 4th color (red) was added, to distinguish between scanned roots and scanned non-roots. We decided not to work this out in more detail because the overhead cost appeared to be much higher than for the previous algorithm; instead, we recommed that for real-time requirements the whole GC is disabled (there should be run-time controls for this, not just compile-time). We also briefly discussed possibilities for generational schemes. The general opinion was that we should first implement and test the algorithm as sketched above, and then changes or extensions could be made. I was pleasantly surprised to find Neil's code in my inbox when we came out of the meeting; I think it would be worthwhile to compare and contrast the two approaches. (Hm, maybe there's a paper in it?) The rest of the afternoon was spent discussing continuations, coroutines and generators, and the fundamental reason why continuations are so hard (the C stack getting in the way everywhere). But that's a topic for another mail, maybe. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one at email.msn.com Wed Mar 1 06:57:49 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 00:57:49 -0500 Subject: need .append patch (was RE: [Python-Dev] Re: Python-checkins digest, Vol 1 #370 - 8 msgs) In-Reply-To: <200002291302.IAA04581@eric.cnri.reston.va.us> Message-ID: <000601bf8343$13575040$412d153f@tim> [Tim, runs checkappend.py over the entire CVS tree, comes up with surprisingly many remaining problems, and surprisingly few false hits] [Guido fixes mailerdaemon.py, and argues for nuking Demo\tkinter\www\ (the whole directory) Demo\sgi\video\VcrIndex.py (unclear whether the dir or just the file) Demo\sgi\gl\glstdwin\glstdwin.py (stdwin-related) Demo\ibrowse\ibrowse.py (stdwin-related) > All these are stdwin-related. Stdwin will also go out of service per > 1.6. ] Then the sooner someone nukes them from the CVS tree, the sooner my automated hourly checkappend complaint generator will stop pestering Python-Dev about them . > (Conclusion: most multi-arg append() calls are *very* old, But part of that is because we went thru this exercise a couple years ago too, and you repaired all the ones in the less obscure parts of the distribution then. > or contributed by others. Sigh. I must've given bad examples long > ago...) Na, I doubt that. Most people will not read a language defn, at least not until "something doesn't work". If the compiler accepts a thing, they simply *assume* it's correct. It's pretty easy (at least for me!) to make this particular mistake as a careless typo, so I assume that's the "source origin" for many of these too. As soon you *notice* you've done it, and that nothing bad happened, the natural tendencies are to (a) believe it's OK, and (b) save 4 keystrokes (incl. the SHIFTs) over & over again in the glorious indefinite future . Reminds me of a c.l.py thread a while back, wherein someone did stuff like None, x, y, None = function_returning_a_4_tuple to mean that they didn't care what the 1st & 4th values were. It happened to work, so they did it more & more. Eventually a function containing this mistake needed to reference None after that line, and "suddenly for no reason at all Python stopped working". To the extent that you're serious about CP4E, you're begging for more of this, not less . newbies-even-keep-on-doing-things-that-*don't*-work!-ly y'rs - tim From tim_one at email.msn.com Wed Mar 1 07:50:44 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 01:50:44 -0500 Subject: [Python-Dev] Unicode mapping tables In-Reply-To: <38BBD1A2.CD29AADD@lemburg.com> Message-ID: <000701bf834a$77acdfe0$412d153f@tim> [M.-A. Lemburg] > ... > Currently, mapping tables map characters to Unicode characters > and vice-versa. Now the .translate method will use a different > kind of table: mapping integer ordinals to integer ordinals. You mean that if I want to map u"a" to u"A", I have to set up some sort of dict mapping ord(u"a") to ord(u"A")? I simply couldn't follow this. > Question: What is more of efficient: having lots of integers > in a dictionary or lots of characters ? My bet is "lots of integers", to reduce both space use and comparison time. > ... > Something else that changed is the way .capitalize() works. The > Unicode version uses the Unicode algorithm for it (see TechRep. 13 > on the www.unicode.org site). #13 is "Unicode Newline Guidelines". I assume you meant #21 ("Case Mappings"). > Here's the new doc string: > > S.capitalize() -> unicode > > Return a capitalized version of S, i.e. words start with title case > characters, all remaining cased characters have lower case. > > Note that *all* characters are touched, not just the first one. > The change was needed to get it in sync with the .iscapitalized() > method which is based on the Unicode algorithm too. > > Should this change be propogated to the string implementation ? Unicode makes distinctions among "upper case", "lower case" and "title case", and you're trying to get away with a single "capitalize" function. Java has separate toLowerCase, toUpperCase and toTitleCase methods, and that's the way to do it. Whatever you do, leave .capitalize alone for 8-bit strings -- there's no reason to break code that currently works. "capitalize" seems a terrible choice of name for a titlecase method anyway, because of its baggage connotations from 8-bit strings. Since this stuff is complicated, I say it would be much better to use the same names for these things as the Unicode and Java folk do: there's excellent documentation elsewhere for all this stuff, and it's Bad to make users mentally translate unique Python terminology to make sense of the official docs. So my vote is: leave capitalize the hell alone . Do not implement capitialize for Unicode strings. Introduce a new titlecase method for Unicode strings. Add a new titlecase method to 8-bit strings too. Unicode strings should also have methods to get at uppercase and lowercase (as Unicode defines those). From tim_one at email.msn.com Wed Mar 1 08:36:03 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 02:36:03 -0500 Subject: [Python-Dev] Re: Python / Haskell (fwd) In-Reply-To: Message-ID: <000801bf8350$cc4ec580$412d153f@tim> [Greg Wilson, quoting Philip Wadler] > Well, what I most want is typing. But you already know that. So invite him to contribute to the Types-SIG <0.5 wink>. > Next after typing? Full lexical scoping for closures. I want to write: > > fun x: fun y: x+y > > Not: > > fun x: fun y, x=x: x+y > > Lexically scoped closures would be a big help for the embedding technique > I described [GVW: in a posting to the Software Carpentry discussion list, > archived at > > http://software-carpentry.codesourcery.com/lists/sc-discuss/msg00068.html > > which discussed how to build a flexible 'make' alternative in Python]. So long as we're not deathly concerned over saving a few lines of easy boilerplate code, Python already supports this approach wonderfully well -- but via using classes with __call__ methods instead of lexical closures. I can't make time to debate this now, but suffice it to say dozens on c.l.py would be delighted to . Philip is understandably attached to the "functional way of spelling things", but Python's way is at least as usable for this (and many-- including me --would say more so). > Next after closures? Disjoint sums. E.g., > > fun area(shape) : > switch shape: > case Circle(r): > return pi*r*r > case Rectangle(h,w): > return h*w > > (I'm making up a Python-like syntax.) This is an alternative to the OO > approach. With the OO approach, it is hard to add area, unless you modify > the Circle and Rectangle class definitions. Python allows adding new methods to classes dynamically "from the outside" -- the original definitions don't need to be touched (although it's certainly preferable to add new methods directly!). Take this complaint to the extreme, and I expect you end up reinventing multimethods (suppose you need to add an intersection(shape1, shape2) method: N**2 nesting of "disjoint sums" starts to appear ludicrous ). In any case, the Types-SIG already seems to have decided that some form of "typecase" stmt will be needed; see the archives for that; I expect the use above would be considered abuse, though; Python has no "switch" stmt of any kind today, and the use above can already be spelled via if isinstance(shape, Circle): etc elif isinstace(shape, Rectange): etc else: raise TypeError(etc) From gstein at lyra.org Wed Mar 1 08:51:29 2000 From: gstein at lyra.org (Greg Stein) Date: Tue, 29 Feb 2000 23:51:29 -0800 (PST) Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: On Tue, 29 Feb 2000, Ken Manheimer wrote: >... > None the less, for those practicing it, the incorrectness of it will be > fresh news. I would be less sympathetic with them if there was recent > warning, eg, the schedule for changing it in the next release was part of > the current release. But if you tell somebody you're going to change > something, and then don't for a few years, you probably need to renew the > warning before you make the change. Don't you think so? Why not? I agree. Note that Guido posted a note to c.l.py on Monday. I believe that meets your notification criteria. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Wed Mar 1 09:10:28 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 1 Mar 2000 00:10:28 -0800 (PST) Subject: [Python-Dev] breaking list.append() In-Reply-To: <200003010411.XAA12988@eric.cnri.reston.va.us> Message-ID: On Tue, 29 Feb 2000, Guido van Rossum wrote: > I'm tired of this rhetoric. It's not like I'm changing existing > Python installations retroactively. I'm planning to release a new > version of Python which no longer supports certain long-obsolete and > undocumented behavior. If you maintain a non-core Python module, you > should test it against the new release and fix anything that comes up. > This is why we have an alpha and beta test cycle and even before that > the CVS version. If you are a Python user who depends on a 3rd party > module, you need to find out whether the new version is compatible > with the 3rd party code you are using, or whether there's a newer > version available that solves the incompatibility. > > There are people who still run Python 1.4 (really!) because they > haven't upgraded. I don't have a problem with that -- they don't get > much support, but it's their choice, and they may not need the new > features introduced since then. I expect that lots of people won't > upgrade their Python 1.5.2 to 1.6 right away -- they'll wait until the > other modules/packages they need are compatible with 1.6. Multi-arg > append probably won't be the only reason why e.g. Digital Creations > may need to release an update to Zope for Python 1.6. Zope comes with > its own version of Python anyway, so they have control over when they > make the switch. I wholeheartedly support his approach. Just ask Mark Hammond :-) how many times I've said "let's change the code to make it Right; people aren't required to upgrade [and break their code]." Of course, his counter is that people need to upgrade to fix other, unrelated problems. So I relax and try again later :-). But I still maintain that they can independently grab the specific fixes and leave the other changes we make. Maybe it is grey, but I think this change is quite fine. Especially given Tim's tool. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim_one at email.msn.com Wed Mar 1 09:22:06 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 03:22:06 -0500 Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: <000b01bf8357$3af08d60$412d153f@tim> [Greg Stein] > ... > Maybe it is grey, but I think this change is quite fine. Especially given > Tim's tool. What the heck does Tim's one-eyed trouser snake have to do with this? I know *it* likes to think it's the measure of all things, but, frankly, my tool barely affects the world at all a mere two feet beyond its base . tim-and-his-tool-think-the-change-is-a-mixed-thing-but-on-balance- the-best-thing-ly y'rs - tim From effbot at telia.com Wed Mar 1 09:40:01 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 1 Mar 2000 09:40:01 +0100 Subject: [Python-Dev] breaking list.append() References: Message-ID: <00fb01bf8359$c8196a20$34aab5d4@hagrid> Greg Stein wrote: > Note that Guido posted a note to c.l.py on Monday. I believe that meets > your notification criteria. ahem. do you seriously believe that everyone in the Python universe reads comp.lang.python? afaik, most Python programmers don't. ... so as far as I'm concerned, this was officially deprecated with Guido's post. afaik, no official python documentation has explicitly mentioned this (and the fact that it doesn't explicitly allow it doesn't really matter, since the docs don't explicitly allow the x[a, b, c] syntax either. both work in 1.5.2). has anyone checked the recent crop of Python books, btw? the eff-bot guide uses old syntax in two examples out of 320. how about the others? ... sigh. running checkappend over a 50k LOC application, I just realized that it doesn't catch a very common append pydiom. how fun. even though 99% of all append calls are "legal", this "minor" change will break every single application and library we have :-( oh, wait. xmlrpclib isn't affected. always something! From gstein at lyra.org Wed Mar 1 09:43:02 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 1 Mar 2000 00:43:02 -0800 (PST) Subject: [Python-Dev] breaking list.append() In-Reply-To: <00fb01bf8359$c8196a20$34aab5d4@hagrid> Message-ID: On Wed, 1 Mar 2000, Fredrik Lundh wrote: > Greg Stein wrote: > > Note that Guido posted a note to c.l.py on Monday. I believe that meets > > your notification criteria. > > ahem. do you seriously believe that everyone in the > Python universe reads comp.lang.python? > > afaik, most Python programmers don't. Now you're simply taking my comments out of context. Not a proper thing to do. Ken said that he wanted notification along certain guidelines. I said that I believed Guido's post did just that. Period. Personally, I think it is fine. I also think that a CHANGES file that arrives with 1.6 that points out the incompatibility is also fine. >... > sigh. running checkappend over a 50k LOC application, I > just realized that it doesn't catch a very common append > pydiom. And which is that? Care to help out? Maybe just a little bit? Or do you just want to talk about how bad this change is? :-( Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Wed Mar 1 10:01:52 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 1 Mar 2000 01:01:52 -0800 (PST) Subject: [Python-Dev] breaking list.append() In-Reply-To: <000b01bf8357$3af08d60$412d153f@tim> Message-ID: On Wed, 1 Mar 2000, Tim Peters wrote: > [Greg Stein] > > ... > > Maybe it is grey, but I think this change is quite fine. Especially given > > Tim's tool. > > What the heck does Tim's one-eyed trouser snake have to do with this? I > know *it* likes to think it's the measure of all things, but, frankly, my > tool barely affects the world at all a mere two feet beyond its base . > > tim-and-his-tool-think-the-change-is-a-mixed-thing-but-on-balance- > the-best-thing-ly y'rs - tim Heh. Now how is one supposed to respond to *that* ??! All right. Fine. +3 cool points go to Tim. :-) -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Wed Mar 1 10:03:32 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 1 Mar 2000 01:03:32 -0800 (PST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src Makefile.in,1.82,1.83 In-Reply-To: <14523.56638.286603.340358@weyr.cnri.reston.va.us> Message-ID: On Tue, 29 Feb 2000, Fred L. Drake, Jr. wrote: > Guido van Rossum writes: > > You can already extract this from the updated documetation on the > > website (which has a list of obsolete modules). > > > > But you're righ,t it would be good to be open about this. I'll think > > about it. > > Note that the updated documentation isn't yet "published"; there are > no links to it and it hasn't been checked as much as I need it to be > before announcing it. Isn't the documentation better than what has been released? In other words, if you release now, how could you make things worse? If something does turn up during a check, you can always release again... Cheers, -g -- Greg Stein, http://www.lyra.org/ From effbot at telia.com Wed Mar 1 10:13:13 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 1 Mar 2000 10:13:13 +0100 Subject: [Python-Dev] breaking list.append() References: Message-ID: <011001bf835e$600d1da0$34aab5d4@hagrid> Greg Stein wrote: > On Wed, 1 Mar 2000, Fredrik Lundh wrote: > > Greg Stein wrote: > > > Note that Guido posted a note to c.l.py on Monday. I believe that meets > > > your notification criteria. > > > > ahem. do you seriously believe that everyone in the > > Python universe reads comp.lang.python? > > > > afaik, most Python programmers don't. > > Now you're simply taking my comments out of context. Not a proper thing to > do. Ken said that he wanted notification along certain guidelines. I said > that I believed Guido's post did just that. Period. my point was that most Python programmers won't see that notification. when these people download 1.6 final and find that all theirs apps just broke, they probably won't be happy with a pointer to dejanews. > And which is that? Care to help out? Maybe just a little bit? this rather common pydiom: append = list.append for x in something: append(...) it's used a lot where performance matters. > Or do you just want to talk about how bad this change is? :-( yes, I think it's bad. I've been using Python since 1.2, and no other change has had the same consequences (wrt. time/money required to fix it) call me a crappy programmer if you want, but I'm sure there are others out there who are nearly as bad. and lots of them won't be aware of this change until some- one upgrades the python interpreter on their server. From mal at lemburg.com Wed Mar 1 09:38:52 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 01 Mar 2000 09:38:52 +0100 Subject: [Python-Dev] Unicode mapping tables References: <000701bf834a$77acdfe0$412d153f@tim> Message-ID: <38BCD71C.3592E6A@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > ... > > Currently, mapping tables map characters to Unicode characters > > and vice-versa. Now the .translate method will use a different > > kind of table: mapping integer ordinals to integer ordinals. > > You mean that if I want to map u"a" to u"A", I have to set up some sort of > dict mapping ord(u"a") to ord(u"A")? I simply couldn't follow this. I meant: 'a': u'A' vs. ord('a'): ord(u'A') The latter wins ;-) Reasoning for the first was that it allows character sequences to be handled by the same mapping algorithm. I decided to leave those techniques to some future implementation, since mapping integers has the nice side-effect of also allowing sequences to be used as mapping tables... resulting in some speedup at the cost of memory consumption. BTW, there are now three different ways to do char translations: 1. char -> unicode (char mapping codec's decode) 2. unicode -> char (char mapping codec's encode) 3. unicode -> unicode (unicode's .translate() method) > > Question: What is more of efficient: having lots of integers > > in a dictionary or lots of characters ? > > My bet is "lots of integers", to reduce both space use and comparison time. Right. That's what I found too... it's "lots of integers" now :-) > > ... > > Something else that changed is the way .capitalize() works. The > > Unicode version uses the Unicode algorithm for it (see TechRep. 13 > > on the www.unicode.org site). > > #13 is "Unicode Newline Guidelines". I assume you meant #21 ("Case > Mappings"). Dang. You're right. Here's the URL in case someone wants to join in: http://www.unicode.org/unicode/reports/tr21/tr21-2.html > > Here's the new doc string: > > > > S.capitalize() -> unicode > > > > Return a capitalized version of S, i.e. words start with title case > > characters, all remaining cased characters have lower case. > > > > Note that *all* characters are touched, not just the first one. > > The change was needed to get it in sync with the .iscapitalized() > > method which is based on the Unicode algorithm too. > > > > Should this change be propogated to the string implementation ? > > Unicode makes distinctions among "upper case", "lower case" and "title > case", and you're trying to get away with a single "capitalize" function. > Java has separate toLowerCase, toUpperCase and toTitleCase methods, and > that's the way to do it. The Unicode implementation has the corresponding: .upper(), .lower() and .capitalize() They work just like .toUpperCase, .toLowerCase, .toTitleCase resp. (well at least they should ;). > Whatever you do, leave .capitalize alone for 8-bit > strings -- there's no reason to break code that currently works. > "capitalize" seems a terrible choice of name for a titlecase method anyway, > because of its baggage connotations from 8-bit strings. Since this stuff is > complicated, I say it would be much better to use the same names for these > things as the Unicode and Java folk do: there's excellent documentation > elsewhere for all this stuff, and it's Bad to make users mentally translate > unique Python terminology to make sense of the official docs. Hmm, that's an argument but it breaks the current method naming scheme of all lowercase letter. Perhaps I should simply provide a new method for .toTitleCase(), e.g. .title(), and leave the previous definition of .capitalize() intact... > So my vote is: leave capitalize the hell alone . Do not implement > capitialize for Unicode strings. Introduce a new titlecase method for > Unicode strings. Add a new titlecase method to 8-bit strings too. Unicode > strings should also have methods to get at uppercase and lowercase (as > Unicode defines those). ...looks like you're more or less on the same wave length here ;-) Here's what I'll do: * implement .capitalize() in the traditional way for Unicode objects (simply convert the first char to uppercase) * implement u.title() to mean the same as Java's toTitleCase() * don't implement s.title(): the reasoning here is that it would confuse the user when she get's different return values for the same string (titlecase chars usually live in higher Unicode code ranges not reachable in Latin-1) Thanks for the feedback, -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim_one at email.msn.com Wed Mar 1 11:06:58 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 05:06:58 -0500 Subject: [Python-Dev] breaking list.append() In-Reply-To: <00fb01bf8359$c8196a20$34aab5d4@hagrid> Message-ID: <000e01bf8365$e1e0b9c0$412d153f@tim> [/F] > ... > so as far as I'm concerned, this was officially deprecated > with Guido's post. afaik, no official python documentation > has explicitly mentioned this (and the fact that it doesn't > explicitly allow it doesn't really matter, since the docs don't > explicitly allow the x[a, b, c] syntax either. both work in > 1.5.2). The "Subscriptions" section of the Reference Manual explicitly allows for dict[a, b, c] and explicitly does not allow for sequence[a, b, c] The "Mapping Types" section of the Library Ref does not explicitly allow for it, though, and if you read it as implicitly allowing for it (based on the Reference Manual's clarification of "key" syntax), you would also have to read the Library Ref as allowing for dict.has_key(a, b, c) Which 1.5.2 does allow, but which Guido very recently patched to treat as a syntax error. > ... > sigh. running checkappend over a 50k LOC application, I > just realized that it doesn't catch a very common append > pydiom. [And, later, after prodding by GregS] > this rather common pydiom: > > append = list.append > for x in something: > append(...) This limitation was pointed out in checkappend's module docstring. Doesn't make it any easier for you to swallow, but I needed to point out that you didn't *have* to stumble into this the hard way . > how fun. even though 99% of all append calls are "legal", > this "minor" change will break every single application and > library we have :-( > > oh, wait. xmlrpclib isn't affected. always something! What would you like to do, then? The code will be at least as broken a year from now, and probably more so -- unless you fix it. So this sounds like an indirect argument for never changing Python's behavior here. Frankly, I expect you could fix the 50K LOC in less time than it took me to write this naggy response <0.50K wink>. embrace-change-ly y'rs - tim From tim_one at email.msn.com Wed Mar 1 11:31:12 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 05:31:12 -0500 Subject: [Python-Dev] breaking list.append() In-Reply-To: <000e01bf8365$e1e0b9c0$412d153f@tim> Message-ID: <001001bf8369$453e9fc0$412d153f@tim> [Tim. needing sleep] > dict.has_key(a, b, c) > > Which 1.5.2 does allow, but which Guido very recently patched to > treat as a syntax error. No, a runtime error. haskeynanny.py, anyone? not-me-ly y'rs - tim From fredrik at pythonware.com Wed Mar 1 12:14:18 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 1 Mar 2000 12:14:18 +0100 Subject: [Python-Dev] breaking list.append() References: <000e01bf8365$e1e0b9c0$412d153f@tim> Message-ID: <002101bf836f$4a012220$f29b12c2@secret.pythonware.com> Tim Peters wrote: > The "Subscriptions" section of the Reference Manual explicitly allows for > > dict[a, b, c] > > and explicitly does not allow for > > sequence[a, b, c] I'd thought we'd agreed that nobody reads the reference manual ;-) > What would you like to do, then? more time to fix it, perhaps? it's surely a minor code change, but fixing it can be harder than you think (just witness Gerrit's bogus patches) after all, python might be free, but more and more people are investing lots of money in using it [1]. > The code will be at least as broken a year > from now, and probably more so -- unless you fix it. sure. we've already started. but it's a lot of work, and it's quite likely that it will take a while until we can be 100% confident that all the changes are pro- perly done. (not all software have a 100% complete test suite that simply says "yes, this works" or "no, it doesn't") 1) fwiw, some poor soul over here posted a short note to the pythonworks mailing, mentioning that we've now fixed the price. a major flamewar erupted, and my mail- box is now full of mail from unknowns telling me that I must be a complete moron that doesn't understand that Python is just a toy system, which everyone uses just be- cause they cannot afford anything better... From tim_one at email.msn.com Wed Mar 1 12:26:21 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 1 Mar 2000 06:26:21 -0500 Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python In-Reply-To: <200003010544.AAA13155@eric.cnri.reston.va.us> Message-ID: <001101bf8370$f881dfa0$412d153f@tim> Very briefly: [Guido] > ... > Today, Eric proposed to do away with Neil's hash table altogether -- > as long as we're wasting memory, we might as well add 3 fields to each > container object rather than allocating the same amount in a separate > hash table. Eric expects that this will run faster, although this > obviously needs to be tried. No, it doesn't : it will run faster. > Container types are: dict, list, tuple, class, instance; plus > potentially user-defined container types such as kjbuckets. I > have a feeling that function objects should also be considered > container types, because of the cycle involving globals. Note that the list-migrating steps you sketch later are basically the same as (but hairier than) the ones JimF and I worked out for M&S-on-RC a few years ago, right down to using appending to effect a breadth-first traversal without requiring recursion -- except M&S doesn't have to bother accounting for sources of refcounts. Since *this* scheme does more work per item per scan, to be as fast in the end it has to touch less stuff than M&S. But the more kinds of types you track, the more stuff this scheme will have to chase. The tradeoffs are complicated & unclear, so I'll just raise an uncomfortable meta-point : you balked at M&S the last time around because of the apparent need for two link fields + a bit or two per object of a "chaseable type". If that's no longer perceived as being a showstopper, M&S should be reconsidered too. I happen to be a fan of both approaches . The worst part of M&S-on-RC (== the one I never had a good answer for) is that a non-cooperating extension type E can't be chased, hence objects reachable only from objects of type E never get marked, so are vulnerable to bogus collection. In the Neil/Toby scheme, objects of type E merely act as sources of "external" references, so the scheme fails safe (in the sense of never doing a bogus collection due to non-cooperating types). Hmm ... if both approaches converge on keeping a list of all chaseable objects, and being careful of uncoopoerating types, maybe the only real difference in the end is whether the root set is given explicitly (as in traditional M&S) or inferred indirectly (but where "root set" has a different meaning in the scheme you sketched). > ... > In our case, we may need a type-specific "clear" function for containers > in the type object. I think definitely, yes. full-speed-sideways-ly y'rs - tim From mal at lemburg.com Wed Mar 1 11:40:36 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 01 Mar 2000 11:40:36 +0100 Subject: [Python-Dev] breaking list.append() References: <011001bf835e$600d1da0$34aab5d4@hagrid> Message-ID: <38BCF3A4.1CCADFCE@lemburg.com> Fredrik Lundh wrote: > > Greg Stein wrote: > > On Wed, 1 Mar 2000, Fredrik Lundh wrote: > > > Greg Stein wrote: > > > > Note that Guido posted a note to c.l.py on Monday. I believe that meets > > > > your notification criteria. > > > > > > ahem. do you seriously believe that everyone in the > > > Python universe reads comp.lang.python? > > > > > > afaik, most Python programmers don't. > > > > Now you're simply taking my comments out of context. Not a proper thing to > > do. Ken said that he wanted notification along certain guidelines. I said > > that I believed Guido's post did just that. Period. > > my point was that most Python programmers won't > see that notification. when these people download > 1.6 final and find that all theirs apps just broke, they > probably won't be happy with a pointer to dejanews. Dito. Anyone remember the str(2L) == '2' change, BTW ? That one will cost lots of money in case someone implemented an eShop using the common str(2L)[:-1] idiom... There will need to be a big warning sign somewhere that people see *before* finding the download link. (IMHO, anyways.) > > And which is that? Care to help out? Maybe just a little bit? > > this rather common pydiom: > > append = list.append > for x in something: > append(...) > > it's used a lot where performance matters. Same here. checkappend.py doesn't find these (a great tool BTW, thanks Tim; I noticed that it leaks memory badly though). > > Or do you just want to talk about how bad this change is? :-( > > yes, I think it's bad. I've been using Python since 1.2, > and no other change has had the same consequences > (wrt. time/money required to fix it) > > call me a crappy programmer if you want, but I'm sure > there are others out there who are nearly as bad. and > lots of them won't be aware of this change until some- > one upgrades the python interpreter on their server. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Wed Mar 1 13:07:42 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 01 Mar 2000 07:07:42 -0500 Subject: need .append patch (was RE: [Python-Dev] Re: Python-checkins digest, Vol 1 #370 - 8 msgs) In-Reply-To: Your message of "Wed, 01 Mar 2000 00:57:49 EST." <000601bf8343$13575040$412d153f@tim> References: <000601bf8343$13575040$412d153f@tim> Message-ID: <200003011207.HAA13342@eric.cnri.reston.va.us> > To the extent that you're serious about CP4E, you're begging for more of > this, not less . Which is exactly why I am breaking multi-arg append now -- this is my last chance. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Mar 1 13:27:10 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 01 Mar 2000 07:27:10 -0500 Subject: [Python-Dev] Unicode mapping tables In-Reply-To: Your message of "Wed, 01 Mar 2000 09:38:52 +0100." <38BCD71C.3592E6A@lemburg.com> References: <000701bf834a$77acdfe0$412d153f@tim> <38BCD71C.3592E6A@lemburg.com> Message-ID: <200003011227.HAA13396@eric.cnri.reston.va.us> > Here's what I'll do: > > * implement .capitalize() in the traditional way for Unicode > objects (simply convert the first char to uppercase) > * implement u.title() to mean the same as Java's toTitleCase() > * don't implement s.title(): the reasoning here is that it would > confuse the user when she get's different return values for > the same string (titlecase chars usually live in higher Unicode > code ranges not reachable in Latin-1) Huh? For ASCII at least, titlecase seems to map to ASCII; in your current implementation, only two Latin-1 characters (u'\265' and u'\377', I have no easy way to show them in Latin-1) map outside the Latin-1 range. Anyway, I would suggest to add a title() call to 8-bit strings as well; then we can do away with string.capwords(), which does something similar but different, mostly by accident. --Guido van Rossum (home page: http://www.python.org/~guido/) From jack at oratrix.nl Wed Mar 1 13:34:42 2000 From: jack at oratrix.nl (Jack Jansen) Date: Wed, 01 Mar 2000 13:34:42 +0100 Subject: [Python-Dev] Re: A warning switch? In-Reply-To: Message by Guido van Rossum , Mon, 28 Feb 2000 12:35:12 -0500 , <200002281735.MAA27771@eric.cnri.reston.va.us> Message-ID: <20000301123442.7DEF8371868@snelboot.oratrix.nl> > > What about adding a command-line switch for enabling warnings, as has > > been suggested long ago? The .append() change could then print a > > warning in 1.6alphas (and betas?), but still run, and be turned into > > an error later. > > That's better. I propose that the warnings are normally on, and that > there are flags to turn them off or thrn them into errors. Can we then please have an interface to the "give warning" call (in stead of a simple fprintf)? On the mac (and possibly also in PythonWin) it's probably better to pop up a dialog (possibly with a "don't show again" button) than do a printf which may get lost. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From guido at python.org Wed Mar 1 13:55:42 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 01 Mar 2000 07:55:42 -0500 Subject: [Python-Dev] Re: A warning switch? In-Reply-To: Your message of "Wed, 01 Mar 2000 13:34:42 +0100." <20000301123442.7DEF8371868@snelboot.oratrix.nl> References: <20000301123442.7DEF8371868@snelboot.oratrix.nl> Message-ID: <200003011255.HAA13489@eric.cnri.reston.va.us> > Can we then please have an interface to the "give warning" call (in > stead of a simple fprintf)? On the mac (and possibly also in > PythonWin) it's probably better to pop up a dialog (possibly with a > "don't show again" button) than do a printf which may get lost. Sure. All you have to do is code it (or get someone else to code it). <0.9 wink> --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed Mar 1 14:32:02 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 01 Mar 2000 14:32:02 +0100 Subject: [Python-Dev] Unicode mapping tables References: <000701bf834a$77acdfe0$412d153f@tim> <38BCD71C.3592E6A@lemburg.com> <200003011227.HAA13396@eric.cnri.reston.va.us> Message-ID: <38BD1BD2.792E9B73@lemburg.com> Guido van Rossum wrote: > > > Here's what I'll do: > > > > * implement .capitalize() in the traditional way for Unicode > > objects (simply convert the first char to uppercase) > > * implement u.title() to mean the same as Java's toTitleCase() > > * don't implement s.title(): the reasoning here is that it would > > confuse the user when she get's different return values for > > the same string (titlecase chars usually live in higher Unicode > > code ranges not reachable in Latin-1) > > Huh? For ASCII at least, titlecase seems to map to ASCII; in your > current implementation, only two Latin-1 characters (u'\265' and > u'\377', I have no easy way to show them in Latin-1) map outside the > Latin-1 range. You're right, sorry for the confusion. I was thinking of other encodings like e.g. cp437 which have corresponding characters in the higher Unicode ranges. > Anyway, I would suggest to add a title() call to 8-bit strings as > well; then we can do away with string.capwords(), which does something > similar but different, mostly by accident. Ok, I'll do it this way then: s.title() will use C's toupper() and tolower() for case mapping and u.title() the Unicode routines. This will be in sync with the rest of the 8-bit string world (which is locale aware on many platforms AFAIK), even though it might not return the same string as the corresponding u.title() call. u.capwords() will be disabled in the Unicode implemetation... it wasn't even implemented for the string implementetation, so there's no breakage ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From akuchlin at mems-exchange.org Wed Mar 1 15:59:07 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Wed, 1 Mar 2000 09:59:07 -0500 (EST) Subject: [Python-Dev] breaking list.append() In-Reply-To: <011001bf835e$600d1da0$34aab5d4@hagrid> References: <011001bf835e$600d1da0$34aab5d4@hagrid> Message-ID: <14525.12347.120543.804804@amarok.cnri.reston.va.us> Fredrik Lundh writes: >yes, I think it's bad. I've been using Python since 1.2, >and no other change has had the same consequences >(wrt. time/money required to fix it) There are more things in 1.6 that might require fixing existing code: str(2L) returning '2', the int/long changes, the Unicode changes, and if it gets added, garbage collection -- and bugs caused by those changes might not be catchable by a nanny. IMHO it's too early to point at the .append() change as breaking too much existing code; there may be changes that break a lot more. I'd wait and see what happens once the 1.6 alphas become available; if c.l.p is filled with shrieks and groans, GvR might decide to back the offending change out. (Or he might not...) -- A.M. Kuchling http://starship.python.net/crew/amk/ I have no skills with machines. I fear them, and because I cannot help attributing human qualities to them, I suspect that they hate me and will kill me if they can. -- Robertson Davies, "Reading" From klm at digicool.com Wed Mar 1 16:37:49 2000 From: klm at digicool.com (Ken Manheimer) Date: Wed, 1 Mar 2000 10:37:49 -0500 (EST) Subject: [Python-Dev] breaking list.append() In-Reply-To: Message-ID: On Tue, 29 Feb 2000, Greg Stein wrote: > On Tue, 29 Feb 2000, Ken Manheimer wrote: > >... > > None the less, for those practicing it, the incorrectness of it will be > > fresh news. I would be less sympathetic with them if there was recent > > warning, eg, the schedule for changing it in the next release was part of > > the current release. But if you tell somebody you're going to change > > something, and then don't for a few years, you probably need to renew the > > warning before you make the change. Don't you think so? Why not? > > I agree. > > Note that Guido posted a note to c.l.py on Monday. I believe that meets > your notification criteria. Actually, by "part of the current release", i meant having the deprecation/impending-deletion warning in the release notes for the release before the one where the deletion happens - saying it's being deprecated now, will be deleted next time around. Ken klm at digicool.com I mean, you tell one guy it's blue. He tells his guy it's brown, and it lands on the page sorta purple. Wavy Gravy/Hugh Romney From marangoz at python.inrialpes.fr Wed Mar 1 18:07:07 2000 From: marangoz at python.inrialpes.fr (Vladimir Marangozov) Date: Wed, 1 Mar 2000 18:07:07 +0100 (CET) Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python In-Reply-To: <200003010544.AAA13155@eric.cnri.reston.va.us> from "Guido van Rossum" at Mar 01, 2000 12:44:10 AM Message-ID: <200003011707.SAA01310@python.inrialpes.fr> Guido van Rossum wrote: > > Thanks for the new patches, Neil! Thanks from me too! I notice, however, that hash_resize() still uses a malloc call instead of PyMem_NEW. Neil, please correct this in your version immediately ;-) > > We had a visitor here at CNRI today, Eric Tiedemann > , who had a look at your patches before. Eric > knows his way around the Scheme, Lisp and GC literature, and presented > a variant on your approach which takes the bite out of the recursive > passes. Avoiding the recursion is valuable, as long we're optimizing the implementation of one particular scheme. It doesn't bother me that Neil's scheme is recursive, because I still perceive his code as a proof of concept. You're presenting here another scheme based on refcounts arithmetic, generalized for all container types. The linked list implementation of this generalized scheme is not directly related to the logic. I have some suspitions on the logic, so you'll probably want to elaborate a bit more on it, and convince me that this scheme would actually work. > Today, Eric proposed to do away with Neil's hash table altogether -- > as long as we're wasting memory, we might as well add 3 fields to each > container object rather than allocating the same amount in a separate > hash table. I cannot agree so easily with this statement, but you should have expecting this from me :-) If we're about to opimize storage, I have good reasons to believe that we don't need 3 additional slots per container (but 1 for gc_refs, yes). We could certainly envision allocating the containers within memory pools of 4K (just as it is done in pymalloc, and close to what we have for ints & floats). These pools would be labaled as "container's memory", they would obviously be under our control, and we'd have additional slots per pool, not per object. As long as we isolate the containers from the rest, we can enumerate them easily by walking though the pools. But I'm willing to defer this question for now, as it involves the object allocators (the builtin allocators + PyObject_NEW for extension types E -- user objects of type E would be automatically taken into account for GC if there's a flag in the type struct which identifies them as containers). > Eric expects that this will run faster, although this obviously needs > to be tried. Definitely, although I trust Eric & Tim :-) > > Container types are: dict, list, tuple, class, instance; plus > potentially user-defined container types such as kjbuckets. I have a > feeling that function objects should also be considered container > types, because of the cycle involving globals. + other extension container types. And I insist. Don't forget that we're planning to merge types and classes... > > Eric's algorithm, then, consists of the following parts. > > Each container object has three new fields: gc_next, gc_prev, and > gc_refs. (Eric calls the gc_refs "refcount-zero".) > > We color objects white (initial), gray (root), black (scanned root). > (The terms are explained later; we believe we don't actually need bits > in the objects to store the color; see later.) > > All container objects are chained together in a doubly-linked list -- > this is the same as Neil's code except Neil does it only for dicts. > (Eric postulates that you need a list header.) > > When GC is activated, all objects are colored white; we make a pass > over the entire list and set gc_refs equal to the refcount for each > object. Step 1: for all containers, c->gc_refs = c->ob_refcnt > > Next, we make another pass over the list to collect the internal > references. Internal references are (just like in Neil's version) > references from other container types. In Neil's version, this was > recursive; in Eric's version, we don't need recursion, since the list > already contains all containers. So we simple visit the containers in > the list in turn, and for each one we go over all the objects it > references and subtract one from *its* gc_refs field. (Eric left out > the little detail that we ened to be able to distinguish between > container and non-container objects amongst those references; this can > be a flag bit in the type field.) Step 2: c->gc_refs = c->gc_refs - Nb_referenced_containers_from_c I guess that you realize that after this step, gc_refs can be zero or negative. I'm not sure that you collect "internal" references here (references from other container types). A list referencing 20 containers, being itself referenced by one container + one static variable + two times from the runtime stack, has an initial refcount == 4, so we'll end up with gc_refs == -16. A tuple referencing 1 list, referenced once by the stack, will end up with gc_refs == 0. Neil's scheme doesn't seem to have this "property". > > Now, similar to Neil's version, all objects for which gc_refs == 0 > have only internal references, and are potential garbage; all objects > for which gc_refs > 0 are "roots". These have references to them from > other places, e.g. from globals or stack frames in the Python virtual > machine. > Agreed, some roots have gc_refs > 0 I'm not sure that all of them have it, though... Do they? > We now start a second list, to which we will move all roots. The way > to do this is to go over the first list again and to move each object > that has gc_refs > 0 to the second list. Objects placed on the second > list in this phase are considered colored gray (roots). > Step 3: Roots with gc_refs > 0 go to the 2nd list. All c->gc_refs <= 0 stay in the 1st list. > Of course, some roots will reference some non-roots, which keeps those > non-roots alive. We now make a pass over the second list, where for > each object on the second list, we look at every object it references. > If a referenced object is a container and is still in the first list > (colored white) we *append* it to the second list (colored gray). > Because we append, objects thus added to the second list will > eventually be considered by this same pass; when we stop finding > objects that sre still white, we stop appending to the second list, > and we will eventually terminate this pass. Conceptually, objects on > the second list that have been scanned in this pass are colored black > (scanned root); but there is no need to to actually make the > distinction. > Step 4: Closure on reachable containers which are all moved to the 2nd list. (Assuming that the objects are checked only via their type, without involving gc_refs) > (How do we know whether an object pointed to is white (in the first > list) or gray or black (in the second)? Good question? :-) > We could use an extra bitfield, but that's a waste of space. > Better: we could set gc_refs to a magic value (e.g. 0xffffffff) when > we move the object to the second list. I doubt that this would work for the reasons mentioned above. > During the meeting, I proposed to set the back pointer to NULL; that > might work too but I think the gc_refs field is more elegant. We could > even just test for a non-zero gc_refs field; the roots moved to the > second list initially all have a non-zero gc_refs field already, and > for the objects with a zero gc_refs field we could indeed set it to > something arbitrary.) Not sure that "arbitrary" is a good choice if the differentiation is based solely on gc_refs. > > Once we reach the end of the second list, all objects still left in > the first list are garbage. We can destroy them in a similar to the > way Neil does this in his code. Neil calls PyDict_Clear on the > dictionaries, and ignores the rest. Under Neils assumption that all > cycles (that he detects) involve dictionaries, that is sufficient. In > our case, we may need a type-specific "clear" function for containers > in the type object. Couldn't this be done in the object's dealloc function? Note that both Neil's and this scheme assume that garbage _detection_ and garbage _collection_ is an atomic operation. I must say that I don't care of having some living garbage if it doesn't hurt my work. IOW, the used criterion for triggering the detection phase _may_ eventually differ from the one used for the collection phase. But this is where we reach the incremental approaches, implying different reasoning as a whole. My point is that the introduction of a "clear" function depends on the adopted scheme, whose logic depends on pertinent statistics on memory consumption of the cyclic garbage. To make it simple, we first need stats on memory consumption, then we can discuss objectively on how to implement some particular GC scheme. I second Eric on the need for excellent statistics. > > The general opinion was that we should first implement and test the > algorithm as sketched above, and then changes or extensions could be > made. I'd like to see it discussed first in conjunction with (1) the possibility of having a proprietary malloc, (2) the envisioned type/class unification. Perhaps I'm getting too deep, but once something gets in, it's difficult to take it out, even when a better solution is found subsequently. Although I'm enthousiastic about this work on GC, I'm not in a position to evaluate the true benefits of the proposed schemes, as I still don't have a basis for evaluating how much garbage my program generates and whether it hurts the interpreter compared to its overal memory consumption. > > I was pleasantly surprised to find Neil's code in my inbox when we > came out of the meeting; I think it would be worthwhile to compare and > contrast the two approaches. (Hm, maybe there's a paper in it?) I'm all for it! -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From jeremy at cnri.reston.va.us Wed Mar 1 18:53:13 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Wed, 1 Mar 2000 12:53:13 -0500 (EST) Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python In-Reply-To: <200003011707.SAA01310@python.inrialpes.fr> References: <200003010544.AAA13155@eric.cnri.reston.va.us> <200003011707.SAA01310@python.inrialpes.fr> Message-ID: <14525.22793.963077.707198@goon.cnri.reston.va.us> >>>>> "VM" == Vladimir Marangozov writes: [">>" == Guido explaining Eric Tiedemann's GC design] >> Next, we make another pass over the list to collect the internal >> references. Internal references are (just like in Neil's >> version) references from other container types. In Neil's >> version, this was recursive; in Eric's version, we don't need >> recursion, since the list already contains all containers. So we >> simple visit the containers in the list in turn, and for each one >> we go over all the objects it references and subtract one from >> *its* gc_refs field. (Eric left out the little detail that we >> ened to be able to distinguish between container and >> non-container objects amongst those references; this can be a >> flag bit in the type field.) VM> Step 2: c->gc_refs = c->gc_refs - VM> Nb_referenced_containers_from_c VM> I guess that you realize that after this step, gc_refs can be VM> zero or negative. I think Guido's explanation is slightly ambiguous. When he says, "subtract one from *its" gc_refs field" he means subtract one from the _contained_ object's gc_refs field. VM> I'm not sure that you collect "internal" references here VM> (references from other container types). A list referencing 20 VM> containers, being itself referenced by one container + one VM> static variable + two times from the runtime stack, has an VM> initial refcount == 4, so we'll end up with gc_refs == -16. The strategy is not that the container's gc_refs is decremented once for each object it contains. Rather, the container decrements each contained object's gc_refs by one. So you should never end of with gc_refs < 0. >> During the meeting, I proposed to set the back pointer to NULL; >> that might work too but I think the gc_refs field is more >> elegant. We could even just test for a non-zero gc_refs field; >> the roots moved to the second list initially all have a non-zero >> gc_refs field already, and for the objects with a zero gc_refs >> field we could indeed set it to something arbitrary.) I believe we discussed this further and concluded that setting the back pointer to NULL would not work. If we make the second list doubly-linked (like the first one), it is trivial to end GC by swapping the first and second lists. If we've zapped the NULL pointer, then we have to go back and re-set them all. Jeremy From mal at lemburg.com Wed Mar 1 19:44:58 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 01 Mar 2000 19:44:58 +0100 Subject: [Python-Dev] Unicode Snapshot 2000-03-01 Message-ID: <38BD652A.EA2EB0A3@lemburg.com> There is a new Unicode implementation snaphot available at the secret URL. It contains quite a few small changes to the internal APIs, doc strings for all methods and some new methods (e.g. .title()) on the Unicode and the string objects. The code page mappings are now integer->integer which should make them more performant. Some of the C codec APIs have changed, so you may need to adapt code that already uses these (Fredrik ?!). Still missing is a MSVC project file... haven't gotten around yet to build one. The code does compile on WinXX though, as Finn Bock told me in private mail. Please try out the new stuff... Most interesting should be the code in Lib/codecs.py as it provides a very high level interface to all those builtin codecs. BTW: I would like to implement a .readline() method using only the .read() method as basis. Does anyone have a good idea on how this could be done without buffering ? (Unicode has a slightly larger choice of line break chars as C; the .splitlines() method will deal with these) Gotta run... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From effbot at telia.com Wed Mar 1 20:20:12 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 1 Mar 2000 20:20:12 +0100 Subject: [Python-Dev] breaking list.append() References: <011001bf835e$600d1da0$34aab5d4@hagrid> <14525.12347.120543.804804@amarok.cnri.reston.va.us> Message-ID: <034a01bf83b3$e97c8620$34aab5d4@hagrid> Andrew M. Kuchling wrote: > There are more things in 1.6 that might require fixing existing code: > str(2L) returning '2', the int/long changes, the Unicode changes, and > if it gets added, garbage collection -- and bugs caused by those > changes might not be catchable by a nanny. hey, you make it sound like "1.6" should really be "2.0" ;-) From nascheme at enme.ucalgary.ca Wed Mar 1 20:29:02 2000 From: nascheme at enme.ucalgary.ca (nascheme at enme.ucalgary.ca) Date: Wed, 1 Mar 2000 12:29:02 -0700 Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python In-Reply-To: <200003011707.SAA01310@python.inrialpes.fr>; from marangoz@python.inrialpes.fr on Wed, Mar 01, 2000 at 06:07:07PM +0100 References: <200003010544.AAA13155@eric.cnri.reston.va.us> <200003011707.SAA01310@python.inrialpes.fr> Message-ID: <20000301122902.B7773@acs.ucalgary.ca> On Wed, Mar 01, 2000 at 06:07:07PM +0100, Vladimir Marangozov wrote: > Guido van Rossum wrote: > > Once we reach the end of the second list, all objects still left in > > the first list are garbage. We can destroy them in a similar to the > > way Neil does this in his code. Neil calls PyDict_Clear on the > > dictionaries, and ignores the rest. Under Neils assumption that all > > cycles (that he detects) involve dictionaries, that is sufficient. In > > our case, we may need a type-specific "clear" function for containers > > in the type object. > > Couldn't this be done in the object's dealloc function? No, I don't think so. The object still has references to it. You have to be careful about how you break cycles so that memory is not accessed after it is freed. Neil -- "If elected mayor, my first act will be to kill the whole lot of you, and burn your town to cinders!" -- Groundskeeper Willie From gvwilson at nevex.com Wed Mar 1 21:19:30 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Wed, 1 Mar 2000 15:19:30 -0500 (EST) Subject: [Python-Dev] DDJ article on Python GC Message-ID: Jon Erickson (editor-in-chief) of "Doctor Dobb's Journal" would like an article on what's involved in adding garbage collection to Python. Please email me if you're interested in tackling it... Thanks, Greg From fdrake at acm.org Wed Mar 1 21:37:49 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 1 Mar 2000 15:37:49 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src Makefile.in,1.82,1.83 In-Reply-To: References: <14523.56638.286603.340358@weyr.cnri.reston.va.us> Message-ID: <14525.32669.909212.716484@weyr.cnri.reston.va.us> Greg Stein writes: > Isn't the documentation better than what has been released? In other > words, if you release now, how could you make things worse? If something > does turn up during a check, you can always release again... Releasing is still somewhat tedious, and I don't want to ask people to do several substantial downloads & installs. So far, a major navigation bug has been fonud in the test version I posted (just now fixed online); *thats* why I don't like to release too hastily! I don't think waiting two more weeks is a problem. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido at python.org Wed Mar 1 23:53:26 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 01 Mar 2000 17:53:26 -0500 Subject: [Python-Dev] DDJ article on Python GC In-Reply-To: Your message of "Wed, 01 Mar 2000 15:19:30 EST." References: Message-ID: <200003012253.RAA16056@eric.cnri.reston.va.us> > Jon Erickson (editor-in-chief) of "Doctor Dobb's Journal" would like an > article on what's involved in adding garbage collection to Python. Please > email me if you're interested in tackling it... I might -- although I should get Neil, Eric and Tim as co-authors. I'm halfway implementing the scheme that Eric showed yesterday. It's very elegant, but I don't have an idea about its impact performance yet. Say hi to Jon -- we've met a few times. I liked his March editorial, having just read the same book and had the same feeling of "wow, an open source project in the 19th century!" --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond at skippinet.com.au Thu Mar 2 00:09:23 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 2 Mar 2000 10:09:23 +1100 Subject: [Python-Dev] Re: A warning switch? In-Reply-To: <200003011255.HAA13489@eric.cnri.reston.va.us> Message-ID: > > Can we then please have an interface to the "give warning" call (in > > stead of a simple fprintf)? On the mac (and possibly also in > > PythonWin) it's probably better to pop up a dialog (possibly with a > > "don't show again" button) than do a printf which may get lost. > > Sure. All you have to do is code it (or get someone else to code it). How about just having either a "sys.warning" function, or maybe even a sys.stdwarn stream? Then a simple C API to call this, and we are done :-) sys.stdwarn sounds OK - it just defaults to sys.stdout, so the Mac and Pythonwin etc should "just work" by sending the output wherever sys.stdout goes today... Mark. From tim_one at email.msn.com Thu Mar 2 06:08:39 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 2 Mar 2000 00:08:39 -0500 Subject: [Python-Dev] breaking list.append() In-Reply-To: <38BCF3A4.1CCADFCE@lemburg.com> Message-ID: <001001bf8405$5f9582c0$732d153f@tim> [/F] > append = list.append > for x in something: > append(...) [M.-A. Lemburg] > Same here. checkappend.py doesn't find these As detailed in a c.l.py posting, I have yet to find a single instance of this actually called with multiple arguments. Pointing out that it's *possible* isn't the same as demonstrating it's an actual problem. I'm quite willing to believe that it is, but haven't yet seen evidence of it. For whatever reason, people seem much (and, in my experience so far, infinitely ) more prone to make the list.append(1, 2, 3) error than the maybethisisanappend(1, 2, 3) error. > (a great tool BTW, thanks Tim; I noticed that it leaks memory badly > though). Which Python? Which OS? How do you know? What were you running it over? Using 1.5.2 under Win95, according to wintop, & over the whole CVS tree, the total (code + data) virtual memory allocated to it peaked at about 2Mb a few seconds into the run, and actually decreased as time went on. So, akin to the bound method multi-argument append problem, the "checkappend leak problem" is something I simply have no reason to believe . Check your claim again? checkappend.py itself obviously creates no cycles or holds on to any state across files, so if you're seeing a leak it must be a bug in some other part of the version of Python + std libraries you're using. Maybe a new 1.6 bug? Something you did while adding Unicode? Etc. Tell us what you were running. Has anyone else seen a leak? From tim_one at email.msn.com Thu Mar 2 06:50:19 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 2 Mar 2000 00:50:19 -0500 Subject: [Python-Dev] str vs repr at prompt again (FW: String printing behavior?) Message-ID: <001401bf840b$3177ba60$732d153f@tim> Another unsolicited testimonial that countless users are oppressed by auto-repr (as opposed to auto-str) at the interpreter prompt. Just trying to keep a once-hot topic from going stone cold forever . -----Original Message----- From: python-list-admin at python.org [mailto:python-list-admin at python.org] On Behalf Of Ted Drain Sent: Wednesday, March 01, 2000 5:42 PM To: python-list at python.org Subject: String printing behavior? Hi all, I've got a question about the string printing behavior. If I define a functions as: >>> def foo(): ... return "line1\nline2" >>> foo() 'line1\013line2' >>> print foo() line1 line2 >>> It seems to me that the default printing behavior for strings should match behavior of the print routine. I realize that some people may want to see embedded control codes, but I would advocate a seperate method for printing raw byte sequences. We are using the python interactive prompt as a pseudo-matlab like user interface and the current printing behavior is very confusing to users. It also means that functions that return text (like help routines) must print the string rather than returning it. Returning the string is much more flexible because it allows the string to be captured easily and redirected. Any thoughts? Ted -- Ted Drain Jet Propulsion Laboratory Ted.Drain at jpl.nasa.gov -- http://www.python.org/mailman/listinfo/python-list From mal at lemburg.com Thu Mar 2 08:42:33 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 02 Mar 2000 08:42:33 +0100 Subject: [Python-Dev] breaking list.append() References: <001001bf8405$5f9582c0$732d153f@tim> Message-ID: <38BE1B69.E0B88B41@lemburg.com> Tim Peters wrote: > > [/F] > > append = list.append > > for x in something: > > append(...) > > [M.-A. Lemburg] > > Same here. checkappend.py doesn't find these > > As detailed in a c.l.py posting, I have yet to find a single instance of > this actually called with multiple arguments. Pointing out that it's > *possible* isn't the same as demonstrating it's an actual problem. I'm > quite willing to believe that it is, but haven't yet seen evidence of it. Haven't had time to check this yet, but I'm pretty sure there are some instances of this idiom in my code. Note that I did in fact code like this on purpose: it saves a tuple construction for every append, which can make a difference in tight loops... > For whatever reason, people seem much (and, in my experience so far, > infinitely ) more prone to make the > > list.append(1, 2, 3) > > error than the > > maybethisisanappend(1, 2, 3) > > error. Of course... still there are hidden instances of the problem which are yet to be revealed. For my own code the siutation is even worse, since I sometimes did: add = list.append for x in y: add(x,1,2) > > (a great tool BTW, thanks Tim; I noticed that it leaks memory badly > > though). > > Which Python? Which OS? How do you know? What were you running it over? That's Python 1.5 on Linux2. I let the script run over a large lib directory and my projects directory. In the projects directory the script consumed as much as 240MB of process size. > Using 1.5.2 under Win95, according to wintop, & over the whole CVS tree, the > total (code + data) virtual memory allocated to it peaked at about 2Mb a few > seconds into the run, and actually decreased as time went on. So, akin to > the bound method multi-argument append problem, the "checkappend leak > problem" is something I simply have no reason to believe . Check your > claim again? checkappend.py itself obviously creates no cycles or holds on > to any state across files, so if you're seeing a leak it must be a bug in > some other part of the version of Python + std libraries you're using. > Maybe a new 1.6 bug? Something you did while adding Unicode? Etc. Tell us > what you were running. I'll try the same thing again using Python1.5.2 and the CVS version. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Thu Mar 2 08:46:49 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 02 Mar 2000 08:46:49 +0100 Subject: [Python-Dev] breaking list.append() References: <001001bf8405$5f9582c0$732d153f@tim> <38BE1B69.E0B88B41@lemburg.com> Message-ID: <38BE1C69.C8A9E6B0@lemburg.com> "M.-A. Lemburg" wrote: > > > > (a great tool BTW, thanks Tim; I noticed that it leaks memory badly > > > though). > > > > Which Python? Which OS? How do you know? What were you running it over? > > That's Python 1.5 on Linux2. I let the script run over > a large lib directory and my projects directory. In the > projects directory the script consumed as much as 240MB > of process size. > > > Using 1.5.2 under Win95, according to wintop, & over the whole CVS tree, the > > total (code + data) virtual memory allocated to it peaked at about 2Mb a few > > seconds into the run, and actually decreased as time went on. So, akin to > > the bound method multi-argument append problem, the "checkappend leak > > problem" is something I simply have no reason to believe . Check your > > claim again? checkappend.py itself obviously creates no cycles or holds on > > to any state across files, so if you're seeing a leak it must be a bug in > > some other part of the version of Python + std libraries you're using. > > Maybe a new 1.6 bug? Something you did while adding Unicode? Etc. Tell us > > what you were running. > > I'll try the same thing again using Python1.5.2 and the CVS version. Using the Unicode patched CVS version there's no leak anymore. Couldn't find a 1.5.2 version on my machine... I'll build one later. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Thu Mar 2 16:32:32 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 02 Mar 2000 10:32:32 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? Message-ID: <200003021532.KAA17088@eric.cnri.reston.va.us> I was looking at the code that invokes __del__, with the intent to implement a feature from Java: in Java, a finalizer is only called once per object, even if calling it makes the object live longer. To implement this, we need a flag in each instance that means "__del__ was called". I opened the creation code for instances, looking for the right place to set the flag. I then realized that it might be smart, now that we have this flag anyway, to set it to "true" during initialization. There are a number of exits from the initialization where the object is created but not fully initialized, where the new object is DECREF'ed and NULL is returned. When such an exit is taken, __del__ is called on an incompletely initialized object! Example: >>> class C: def __del__(self): print "deleting", self >>> x = C(1) !--> deleting <__main__.C instance at 1686d8> Traceback (innermost last): File "", line 1, in ? TypeError: this constructor takes no arguments >>> Now I have a choice to make. If the class has an __init__, should I clear the flag only after __init__ succeeds? This means that if __init__ raises an exception, __del__ is never called. This is an incompatibility. It's possible that someone has written code that relies on __del__ being called even when __init__ fails halfway, and then their code would break. But it is just as likely that calling __del__ on a partially uninitialized object is a bad mistake, and I am doing all these cases a favor by not calling __del__ when __init__ failed! Any opinions? If nobody speaks up, I'll make the change. --Guido van Rossum (home page: http://www.python.org/~guido/) From bwarsaw at cnri.reston.va.us Thu Mar 2 17:44:00 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 2 Mar 2000 11:44:00 -0500 (EST) Subject: [Python-Dev] Design question: call __del__ only after successful __init__? References: <200003021532.KAA17088@eric.cnri.reston.va.us> Message-ID: <14526.39504.36065.657527@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> Now I have a choice to make. If the class has an __init__, GvR> should I clear the flag only after __init__ succeeds? This GvR> means that if __init__ raises an exception, __del__ is never GvR> called. This is an incompatibility. It's possible that GvR> someone has written code that relies on __del__ being called GvR> even when __init__ fails halfway, and then their code would GvR> break. It reminds me of the separation between object allocation and initialization in ObjC. GvR> But it is just as likely that calling __del__ on a partially GvR> uninitialized object is a bad mistake, and I am doing all GvR> these cases a favor by not calling __del__ when __init__ GvR> failed! GvR> Any opinions? If nobody speaks up, I'll make the change. I think you should set the flag right before you call __init__(), i.e. after (nearly all) the C level initialization has occurred. Here's why: your "favor" can easily be accomplished by Python constructs in the __init__(): class MyBogo: def __init__(self): self.get_delified = 0 do_sumtin_exceptional() self.get_delified = 1 def __del__(self): if self.get_delified: ah_sweet_release() -Barry From gstein at lyra.org Thu Mar 2 18:14:35 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 2 Mar 2000 09:14:35 -0800 (PST) Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: <200003021532.KAA17088@eric.cnri.reston.va.us> Message-ID: On Thu, 2 Mar 2000, Guido van Rossum wrote: >... > But it is just as likely that calling __del__ on a partially > uninitialized object is a bad mistake, and I am doing all these cases > a favor by not calling __del__ when __init__ failed! > > Any opinions? If nobody speaks up, I'll make the change. +1 on calling __del__ IFF __init__ completes successfully. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jeremy at cnri.reston.va.us Thu Mar 2 18:15:14 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Thu, 2 Mar 2000 12:15:14 -0500 (EST) Subject: [Python-Dev] str vs repr at prompt again (FW: String printing behavior?) In-Reply-To: <001401bf840b$3177ba60$732d153f@tim> References: <001401bf840b$3177ba60$732d153f@tim> Message-ID: <14526.41378.374653.497993@goon.cnri.reston.va.us> >>>>> "TP" == Tim Peters writes: TP> Another unsolicited testimonial that countless users are TP> oppressed by auto-repr (as opposed to auto-str) at the TP> interpreter prompt. Just trying to keep a once-hot topic from TP> going stone cold forever . [Signature from the included message:] >> -- Ted Drain Jet Propulsion Laboratory Ted.Drain at jpl.nasa.gov -- This guy is probably a rocket scientist. We want the language to be useful for everybody, not just rocket scientists. Jeremy From guido at python.org Thu Mar 2 23:45:37 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 02 Mar 2000 17:45:37 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: Your message of "Thu, 02 Mar 2000 11:44:00 EST." <14526.39504.36065.657527@anthem.cnri.reston.va.us> References: <200003021532.KAA17088@eric.cnri.reston.va.us> <14526.39504.36065.657527@anthem.cnri.reston.va.us> Message-ID: <200003022245.RAA20265@eric.cnri.reston.va.us> > >>>>> "GvR" == Guido van Rossum writes: > > GvR> Now I have a choice to make. If the class has an __init__, > GvR> should I clear the flag only after __init__ succeeds? This > GvR> means that if __init__ raises an exception, __del__ is never > GvR> called. This is an incompatibility. It's possible that > GvR> someone has written code that relies on __del__ being called > GvR> even when __init__ fails halfway, and then their code would > GvR> break. [Barry] > It reminds me of the separation between object allocation and > initialization in ObjC. Is that good or bad? > GvR> But it is just as likely that calling __del__ on a partially > GvR> uninitialized object is a bad mistake, and I am doing all > GvR> these cases a favor by not calling __del__ when __init__ > GvR> failed! > > GvR> Any opinions? If nobody speaks up, I'll make the change. > > I think you should set the flag right before you call __init__(), > i.e. after (nearly all) the C level initialization has occurred. > Here's why: your "favor" can easily be accomplished by Python > constructs in the __init__(): > > class MyBogo: > def __init__(self): > self.get_delified = 0 > do_sumtin_exceptional() > self.get_delified = 1 > > def __del__(self): > if self.get_delified: > ah_sweet_release() But the other behavior (call __del__ even when __init__ fails) can also easily be accomplished in Python: class C: def __init__(self): try: ...stuff that may fail... except: self.__del__() raise def __del__(self): ...cleanup... I believe that in almost all cases the programmer would be happier if __del__ wasn't called when their __init__ fails. This makes it easier to write a __del__ that can assume that all the object's fields have been properly initialized. In my code, typically when __init__ fails, this is a symptom of a really bad bug (e.g. I just renamed one of __init__'s arguments and forgot to fix all references), and I don't care much about cleanup behavior. --Guido van Rossum (home page: http://www.python.org/~guido/) From bwarsaw at cnri.reston.va.us Thu Mar 2 23:52:31 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Thu, 2 Mar 2000 17:52:31 -0500 (EST) Subject: [Python-Dev] Design question: call __del__ only after successful __init__? References: <200003021532.KAA17088@eric.cnri.reston.va.us> <14526.39504.36065.657527@anthem.cnri.reston.va.us> <200003022245.RAA20265@eric.cnri.reston.va.us> Message-ID: <14526.61615.362973.624022@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> But the other behavior (call __del__ even when __init__ GvR> fails) can also easily be accomplished in Python: It's a fair cop. GvR> I believe that in almost all cases the programmer would be GvR> happier if __del__ wasn't called when their __init__ fails. GvR> This makes it easier to write a __del__ that can assume that GvR> all the object's fields have been properly initialized. That's probably fine; I don't have strong feelings either way. -Barry P.S. Interesting what X-Oblique-Strategy was randomly inserted in this message (but I'm not sure which approach is more "explicit" :). -Barry From tim_one at email.msn.com Fri Mar 3 06:38:59 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 3 Mar 2000 00:38:59 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: <200003021532.KAA17088@eric.cnri.reston.va.us> Message-ID: <000001bf84d2$c711e2e0$092d153f@tim> [Guido] > I was looking at the code that invokes __del__, with the intent to > implement a feature from Java: in Java, a finalizer is only called > once per object, even if calling it makes the object live longer. Why? That is, in what way is this an improvement over current behavior? Note that Java is a bit subtle: a finalizer is only called once by magic; explicit calls "don't count". The Java rules add up to quite a confusing mish-mash. Python's rules are *currently* clearer. I deal with possible exceptions in Python constructors the same way I do in C++ and Java: if there's a destructor, don't put anything in __init__ that may raise an uncaught exception. Anything dangerous is moved into a separate .reset() (or .clear() or ...) method. This works well in practice. > To implement this, we need a flag in each instance that means "__del__ > was called". At least . > I opened the creation code for instances, looking for the right place > to set the flag. I then realized that it might be smart, now that we > have this flag anyway, to set it to "true" during initialization. There > are a number of exits from the initialization where the object is created > but not fully initialized, where the new object is DECREF'ed and NULL is > returned. When such an exit is taken, __del__ is called on an > incompletely initialized object! I agree *that* isn't good. Taken on its own, though, it argues for adding an "instance construction completed" flag that __del__ later checks, as if its body were: if self.__instance_construction_completed: body That is, the problem you've identified here could be addressed directly. > Now I have a choice to make. If the class has an __init__, should I > clear the flag only after __init__ succeeds? This means that if > __init__ raises an exception, __del__ is never called. This is an > incompatibility. It's possible that someone has written code that > relies on __del__ being called even when __init__ fails halfway, and > then their code would break. > > But it is just as likely that calling __del__ on a partially > uninitialized object is a bad mistake, and I am doing all these cases > a favor by not calling __del__ when __init__ failed! > > Any opinions? If nobody speaks up, I'll make the change. I'd be in favor of fixing the actual problem; I don't understand the point to the rest of it, especially as it has the potential to break existing code and I don't see a compensating advantage (surely not compatibility w/ JPython -- JPython doesn't invoke __del__ methods at all by magic, right? or is that changing, and that's what's driving this?). too-much-magic-is-dizzying-ly y'rs - tim From bwarsaw at cnri.reston.va.us Fri Mar 3 06:50:16 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 3 Mar 2000 00:50:16 -0500 (EST) Subject: [Python-Dev] Design question: call __del__ only after successful __init__? References: <200003021532.KAA17088@eric.cnri.reston.va.us> <000001bf84d2$c711e2e0$092d153f@tim> Message-ID: <14527.21144.9421.958311@anthem.cnri.reston.va.us> >>>>> "TP" == Tim Peters writes: TP> (surely not compatibility w/ JPython -- JPython doesn't invoke TP> __del__ methods at all by magic, right? or is that changing, TP> and that's what's driving this?). No, JPython doesn't invoke __del__ methods by magic, and I don't have any plans to change that. -Barry From ping at lfw.org Fri Mar 3 10:00:21 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 3 Mar 2000 01:00:21 -0800 (PST) Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: Message-ID: On Thu, 2 Mar 2000, Greg Stein wrote: > On Thu, 2 Mar 2000, Guido van Rossum wrote: > >... > > But it is just as likely that calling __del__ on a partially > > uninitialized object is a bad mistake, and I am doing all these cases > > a favor by not calling __del__ when __init__ failed! > > > > Any opinions? If nobody speaks up, I'll make the change. > > +1 on calling __del__ IFF __init__ completes successfully. That would be my vote as well. What convinced me of this is the following: If it's up to the implementation of __del__ to deal with a problem that happened during initialization, you only know about the problem with very coarse granularity. It's a pain (or even impossible) to then rediscover the information you need to recover adequately. If on the other hand you deal with the problem in __init__, then you have much better control over what is happening, because you can position try/except blocks precisely where you need them to deal with specific potential problems. Each block can take care of its case appropriately, and re-raise if necessary. In general, it seems to me that what you want to do when __init__ runs afoul is going to be different from what you want to do to take care of object cleanup in __del__. So it doesn't belong there -- it belongs in an except: clause in __init__. Even though it's an incompatibility, i really think this is the right behaviour. -- ?!ng "To be human is to continually change. Your desire to remain as you are is what ultimately limits you." -- The Puppet Master, Ghost in the Shell From guido at python.org Fri Mar 3 17:13:16 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 03 Mar 2000 11:13:16 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: Your message of "Fri, 03 Mar 2000 00:38:59 EST." <000001bf84d2$c711e2e0$092d153f@tim> References: <000001bf84d2$c711e2e0$092d153f@tim> Message-ID: <200003031613.LAA21571@eric.cnri.reston.va.us> > [Guido] > > I was looking at the code that invokes __del__, with the intent to > > implement a feature from Java: in Java, a finalizer is only called > > once per object, even if calling it makes the object live longer. [Tim] > Why? That is, in what way is this an improvement over current behavior? > > Note that Java is a bit subtle: a finalizer is only called once by magic; > explicit calls "don't count". Of course. Same in my proposal. But I wouldn't call it "by magic" -- just "on behalf of the garbage collector". > The Java rules add up to quite a confusing mish-mash. Python's rules are > *currently* clearer. I don't find the Java rules confusing. It seems quite useful that the GC promises to call the finalizer at most once -- this can simplify the finalizer logic. (Otherwise it may have to ask itself, "did I clean this already?" and leave notes for itself.) Explicit finalizer calls are always a mistake and thus "don't count" -- the response to that should in general be "don't do that" (unless you have particularly stupid callers -- or very fearful lawyers :-). > I deal with possible exceptions in Python constructors the same way I do in > C++ and Java: if there's a destructor, don't put anything in __init__ that > may raise an uncaught exception. Anything dangerous is moved into a > separate .reset() (or .clear() or ...) method. This works well in practice. Sure, but the rule "if __init__ fails, __del__ won't be called" means that we don't have to program our __init__ or __del__ quite so defensively. Most people who design a __del__ probably assume that __init__ has run to completion. The typical scenario (which has happened to me! And I *implemented* the damn thing!) is this: __init__ opens a file and assigns it to an instance variable; __del__ closes the file. This is tested a few times and it works great. Now in production the file somehow unexpectedly fails to be openable. Sure, the programmer should've expected that, but she didn't. Now, at best, the failed __del__ creates an additional confusing error message on top of the traceback generated by IOError. At worst, the failed __del__ could wreck the original traceback. Note that I'm not proposing to change the C level behavior; when a Py_New() function is halfway its initialization and decides to bail out, it does a DECREF(self) and you bet that at this point the _dealloc() function gets called (via self->ob_type->tp_dealloc). Occasionally I need to initialize certain fields to NULL so that the dealloc() function doesn't try to free memory that wasn't allocated. Often it's as simple as using XDECREF instead of DECREF in the dealloc() function (XDECREF is safe when the argument is NULL, DECREF dumps core, saving a load-and-test if you are sure its arg is a valid object). > > To implement this, we need a flag in each instance that means "__del__ > > was called". > > At least . > > > I opened the creation code for instances, looking for the right place > > to set the flag. I then realized that it might be smart, now that we > > have this flag anyway, to set it to "true" during initialization. There > > are a number of exits from the initialization where the object is created > > but not fully initialized, where the new object is DECREF'ed and NULL is > > returned. When such an exit is taken, __del__ is called on an > > incompletely initialized object! > > I agree *that* isn't good. Taken on its own, though, it argues for adding > an "instance construction completed" flag that __del__ later checks, as if > its body were: > > if self.__instance_construction_completed: > body > > That is, the problem you've identified here could be addressed directly. Sure -- but I would argue that when __del__ returns, __instance_construction_completed should be reset to false, because the destruction (conceptually, at least) cancels out the construction! > > Now I have a choice to make. If the class has an __init__, should I > > clear the flag only after __init__ succeeds? This means that if > > __init__ raises an exception, __del__ is never called. This is an > > incompatibility. It's possible that someone has written code that > > relies on __del__ being called even when __init__ fails halfway, and > > then their code would break. > > > > But it is just as likely that calling __del__ on a partially > > uninitialized object is a bad mistake, and I am doing all these cases > > a favor by not calling __del__ when __init__ failed! > > > > Any opinions? If nobody speaks up, I'll make the change. > > I'd be in favor of fixing the actual problem; I don't understand the point > to the rest of it, especially as it has the potential to break existing code > and I don't see a compensating advantage (surely not compatibility w/ > JPython -- JPython doesn't invoke __del__ methods at all by magic, right? > or is that changing, and that's what's driving this?). JPython's a red herring here. I think that the proposed change probably *fixes* much morecode that is subtly wrong than it breaks code that is relying on __del__ being called after a partial __init__. All the rules relating to __del__ are confusing (e.g. what __del__ can expect to survive in its globals). Also note Ping's observation: | If it's up to the implementation of __del__ to deal with a problem | that happened during initialization, you only know about the problem | with very coarse granularity. It's a pain (or even impossible) to | then rediscover the information you need to recover adequately. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one at email.msn.com Fri Mar 3 17:49:52 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 3 Mar 2000 11:49:52 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: <200003031613.LAA21571@eric.cnri.reston.va.us> Message-ID: <000501bf8530$7f8c78a0$b0a0143f@tim> [Tim] >> Note that Java is a bit subtle: a finalizer is only called >> once by magic; explicit calls "don't count". [Guido] > Of course. Same in my proposal. OK -- that wasn't clear. > But I wouldn't call it "by magic" -- just "on behalf of the garbage > collector". Yup, magically called . >> The Java rules add up to quite a confusing mish-mash. Python's >> rules are *currently* clearer. > I don't find the Java rules confusing. "add up" == "taken as a whole"; include the Java spec's complex state machine for cleanup semantics, and the later complications added by three (four?) distinct flavors of weak reference, and I doubt 1 Java programmer in 1,000 actually understands the rules. This is why I'm wary of moving in the Java *direction* here. Note that Java programmers in past c.l.py threads have generally claimed Java's finalizers are so confusing & unpredictable they don't use them at all! Which, in the end, is probably a good idea in Python too <0.5 wink>. > It seems quite useful that the GC promises to call the finalizer at > most once -- this can simplify the finalizer logic. Granting that explicit calls are "use at your own risk", the only user-visible effect of "called only once" is in the presence of resurrection. Now in my Python experience, on the few occasions I've resurrected an object in __del__, *of course* I expected __del__ to get called again if the object is about to die again! Typical: def __del__(self): if oops_i_still_need_to_stay_alive: resurrect(self) else: # really going away release(self.critical_resource) Call __del__ only once, and code like this is busted bigtime. OTOH, had I written __del__ logic that relied on being called only once, switching the implementation to call it more than once would break *that* bigtime. Neither behavior is an obvious all-cases win to me, or even a plausibly most-cases win. But Python already took a stand on this & so I think you need a *good* reason to change semantics now. > ... > Sure, but the rule "if __init__ fails, __del__ won't be called" means > that we don't have to program our __init__ or __del__ quite so > defensively. Most people who design a __del__ probably assume that > __init__ has run to completion. ... This is (or can easily be made) a separate issue, & I agreed the first time this seems worth fixing (although if nobody has griped about it in a decade of use, it's hard to call it a major bug ). > ... > Sure -- but I would argue that when __del__ returns, >__instance_construction_completed should be reset to false, because > the destruction (conceptually, at least) cancels out the construction! In the __del__ above (which is typical of the cases of resurrection I've seen), there is no such implication. Perhaps this is philosophical abuse of Python's intent, but if so it relied only on trusting its advertised semantics. > I think that the proposed change probably *fixes* much morecode that > is subtly wrong than it breaks code that is relying on __del__ being > called after a partial __init__. Yes, again, I have no argument against refusing to call __del__ unless __init__ succeeded. Going beyond that to a new "called at most once" rule is indeed going beyond that, *will* break reasonable old code, and holds no particular attraction that I can see (it trades making one kind of resurrection scenario easier at the cost of making other kinds harder). If there needs to be incompatible change here, curiously enough I'd be more in favor of making resurrection illegal period (which could *really* simplify gc's headaches). > All the rules relating to __del__ are confusing (e.g. what __del__ can > expect to survive in its globals). Problems unique to final shutdown don't seem relevant here. > Also note Ping's observation: ... I can't agree with that yet another time without being quadruply redundant . From guido at python.org Fri Mar 3 17:50:08 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 03 Mar 2000 11:50:08 -0500 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Your message of "Wed, 01 Mar 2000 00:44:10 EST." <200003010544.AAA13155@eric.cnri.reston.va.us> References: <20000229153421.A16502@acs.ucalgary.ca> <200003010544.AAA13155@eric.cnri.reston.va.us> Message-ID: <200003031650.LAA21647@eric.cnri.reston.va.us> We now have two implementations of Eric Tiedemann's idea: Neil and I both implemented it. It's too soon to post the patch sets (both are pretty rough) but I've got another design question. Once we've identified a bunch of objects that are only referring to each other (i.e., one or more cycles) we have to dispose of them. The question is, how? We can't just call free on each of the objects; some may not be allocated with malloc, and some may contain pointers to other malloc'ed memory that also needs to be freed. So we have to get their destructors involved. But how? Calling ob->ob_type->tp_dealloc(ob) for an object who reference count is unsafe -- this will destroy the object while there are still references to it! Those references are all coming from other objects that are part of the same cycle; those objects will also be deallocated and they will reference the deallocated objects (if only to DECREF them). Neil uses the same solution that I use when finalizing the Python interpreter -- find the dictionaries and call PyDict_Clear() on them. (In his unpublished patch, he also clears the lists using PyList_SetSlice(list, 0, list->ob_size, NULL). He's also generalized so that *every* object can define a tp_clear function in its type object.) As long as every cycle contains at least one dictionary or list object, this will break cycles reliably and get rid of all the garbage. (If you wonder why: clearing the dict DECREFs the next object(s) in the cycle; if the last dict referencing a particular object is cleared, the last DECREF will deallocate that object, which will in turn DECREF the objects it references, and so forth. Since none of the objects in the cycle has incoming references from outside the cycle, we can prove that this will delete all objects as long as there's a dict or list in each cycle. However, there's a snag. It's the same snag as what finalizing the Python interpreter runs into -- it has to do with __del__ methods and the undefined order in which the dictionaries are cleared. For example, it's quite possible that the first dictionary we clear is the __dict__ of an instance, so this zaps all its instance variables. Suppose this breaks the cycle, so then the instance itself gets DECREFed to zero. Its deallocator will be called. If it's got a __del__, this __del__ will be called -- but all the instance variables have already been zapped, so it will fail miserably! It's also possible that the __dict__ of a class involved in a cycle gets cleared first, in which case the __del__ no longer "exists", and again the cleanup is skipped. So the question is: What to *do*? My solution is to make an extra pass over all the garbage objects *before* we clear dicts and lists, and for those that are instances and have __del__ methods, call their __del__ ("by magic", as Tim calls it in another post). The code in instance_dealloc() already does the right thing here: it calls __del__, then discovers that the reference count is > 0 ("I'm not dead yet" :-), and returns without freeing the object. (This is also why I want to introduce a flag ensuring that __del__ gets called by instance_dealloc at most once: later when the instance gets DECREFed to 0, instance_dealloc is called again and will correctly free the object; but we don't want __del__ called again.) [Note for Neil: somehow I forgot to add this logic to the code; in_del_called isn't used! The change is obvious though.] This still leaves a problem for the user: if two class instances reference each other and both have a __del__, we can't predict whose __del__ is called first when they are called as part of cycle collection. The solution is to write each __del__ so that it doesn't depend on the other __del__. Someone (Tim?) in the past suggested a different solution (probably found in another language): for objects that are collected as part of a cycle, the destructor isn't called at all. The memory is freed (since it's no longer reachable), but the destructor is not called -- it is as if the object lives on forever. This is theoretically superior, but not practical: when I have an object that creates a temp file, I want to be able to reliably delete the temp file in my destructor, even when I'm part of a cycle! --Guido van Rossum (home page: http://www.python.org/~guido/) From jack at oratrix.nl Fri Mar 3 17:57:54 2000 From: jack at oratrix.nl (Jack Jansen) Date: Fri, 03 Mar 2000 17:57:54 +0100 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Message by Guido van Rossum , Fri, 03 Mar 2000 11:50:08 -0500 , <200003031650.LAA21647@eric.cnri.reston.va.us> Message-ID: <20000303165755.490EA371868@snelboot.oratrix.nl> The __init__ rule for calling __del__ has me confused. Is this per-class or per-object? I.e. what will happen in the following case: class Purse: def __init__(self): self.balance = WithdrawCashFromBank(1000) def __del__(self): PutCashBackOnBank(self.balance) self.balance = 0 class LossyPurse(Purse): def __init__(self): Purse.__init__(self) raise 'kaboo! kaboo!' If the new scheme means that the __del__ method of Purse isn't called I think I don't like it. In the current scheme I can always program defensively: def __del__(self): try: b = self.balance self.balance = 0 except AttributeError: pass else: PutCashBackOnBank(b) but in a new scheme with a per-object "__del__ must be called" flag I can't... -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From guido at python.org Fri Mar 3 18:05:00 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 03 Mar 2000 12:05:00 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: Your message of "Fri, 03 Mar 2000 11:49:52 EST." <000501bf8530$7f8c78a0$b0a0143f@tim> References: <000501bf8530$7f8c78a0$b0a0143f@tim> Message-ID: <200003031705.MAA21700@eric.cnri.reston.va.us> OK, so we're down to this one point: if __del__ resurrects the object, should __del__ be called again later? Additionally, should resurrection be made illegal? I can easily see how __del__ could *accidentally* resurrect the object as part of its normal cleanup -- e.g. you make a call to some other routine that helps with the cleanup, passing self as an argument, and this other routine keeps a helpful cache of the last argument for some reason. I don't see how we could forbid this type of resurrection. (What are you going to do? You can't raise an exception from instance_dealloc, since it is called from DECREF. You can't track down the reference and replace it with a None easily.) In this example, the helper routine will eventually delete the object from its cache, at which point it is truly deleted. It would be harmful, not helpful, if __del__ was called again at this point. Now, it is true that the current docs for __del__ imply that resurrection is possible. The intention of that note was to warn __del__ writers that in the case of accidental resurrection __del__ might be called again. The intention certainly wasn't to allow or encourage intentional resurrection. Would there really be someone out there who uses *intentional* resurrection? I severely doubt it. I've never heard of this. [Jack just finds a snag] > The __init__ rule for calling __del__ has me confused. Is this per-class or > per-object? > > I.e. what will happen in the following case: > > class Purse: > def __init__(self): > self.balance = WithdrawCashFromBank(1000) > > def __del__(self): > PutCashBackOnBank(self.balance) > self.balance = 0 > > class LossyPurse(Purse): > def __init__(self): > Purse.__init__(self) > raise 'kaboo! kaboo!' > > If the new scheme means that the __del__ method of Purse isn't called I think > I don't like it. In the current scheme I can always program defensively: > def __del__(self): > try: > b = self.balance > self.balance = 0 > except AttributeError: > pass > else: > PutCashBackOnBank(b) > but in a new scheme with a per-object "__del__ must be called" flag I can't... Yes, that's a problem. But there are other ways for the subclass to break the base class's invariant (e.g. it could override __del__ without calling the base class' __del__). So I think it's a red herring. In Python 3000, typechecked classes may declare invariants that are enforced by the inheritance mechanism; then we may need to keep track which base class constructors succeeded and only call corresponding destructors. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Fri Mar 3 19:17:11 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 03 Mar 2000 19:17:11 +0100 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? References: <000501bf8530$7f8c78a0$b0a0143f@tim> <200003031705.MAA21700@eric.cnri.reston.va.us> Message-ID: <38C001A7.6CF8F365@lemburg.com> Guido van Rossum wrote: > > OK, so we're down to this one point: if __del__ resurrects the object, > should __del__ be called again later? Additionally, should > resurrection be made illegal? Yes and no :-) One example comes to mind: implementations of weak references, which manage weak object references themselves (as soon as __del__ is called the weak reference implementation takes over the object). Another example is that of free list like implementations which reduce object creation times by implementing smart object recycling, e.g. objects could keep allocated dictionaries alive or connections to databases open, etc. As for the second point: Calling __del__ again is certainly needed to keep application logic sane... after all, __del__ should be called whenever the refcount reaches 0 -- and that can happend more than once in the objects life-time if reanimation occurs. > I can easily see how __del__ could *accidentally* resurrect the object > as part of its normal cleanup -- e.g. you make a call to some other > routine that helps with the cleanup, passing self as an argument, and > this other routine keeps a helpful cache of the last argument for some > reason. I don't see how we could forbid this type of resurrection. > (What are you going to do? You can't raise an exception from > instance_dealloc, since it is called from DECREF. You can't track > down the reference and replace it with a None easily.) > In this example, the helper routine will eventually delete the object > from its cache, at which point it is truly deleted. It would be > harmful, not helpful, if __del__ was called again at this point. I'd say this is an application logic error -- nothing that the mechanism itself can help with automagically. OTOH, turning multi calls to __del__ off, would make certain techniques impossible. > Now, it is true that the current docs for __del__ imply that > resurrection is possible. The intention of that note was to warn > __del__ writers that in the case of accidental resurrection __del__ > might be called again. The intention certainly wasn't to allow or > encourage intentional resurrection. I don't think that docs are the right argument here ;-) It is simply the reference counting logic that plays its role: __del__ is called when refcount reaches 0, which usually means that the object is about to be garbage collected... unless the object is rereferenced by some other object and thus gets reanimated. > Would there really be someone out there who uses *intentional* > resurrection? I severely doubt it. I've never heard of this. BTW, I can't see what the original question has to do with this discussion ... calling __del__ only after successful __init__ is ok, IMHO, but what does this have to do with the way __del__ itself is implemented ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Fri Mar 3 19:30:36 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 03 Mar 2000 19:30:36 +0100 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? References: <20000229153421.A16502@acs.ucalgary.ca> <200003010544.AAA13155@eric.cnri.reston.va.us> <200003031650.LAA21647@eric.cnri.reston.va.us> Message-ID: <38C004CC.1FE0A501@lemburg.com> [Guido about ways to cleanup cyclic garbage] FYI, I'm using a special protocol for disposing of cyclic garbage: the __cleanup__ protocol. The purpose of this call is probably similar to Neil's tp_clear: it is intended to let objects break possible cycles in their own storage scope, e.g. instances can delete instance variables which they know can cause cyclic garbage. The idea is simple: give all power to the objects rather than try to solve everything with one magical master plan. The mxProxy package has details on the protocol. The __cleanup__ method is called by the Proxy when the Proxy is about to be deleted. If all references to an object go through the Proxy, the __cleanup__ method call can easily break cycles to have the refcount reach zero in which case __del__ is called. Since the object knows about this scheme it can take precautions to make sure that __del__ still works after __cleanup__ was called. Anyway, just a thought... there are probably many ways to do all this. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tismer at tismer.com Fri Mar 3 19:51:55 2000 From: tismer at tismer.com (Christian Tismer) Date: Fri, 03 Mar 2000 19:51:55 +0100 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? References: <000501bf8530$7f8c78a0$b0a0143f@tim> <200003031705.MAA21700@eric.cnri.reston.va.us> Message-ID: <38C009CB.72BD49CA@tismer.com> Guido van Rossum wrote: > > OK, so we're down to this one point: if __del__ resurrects the object, > should __del__ be called again later? Additionally, should > resurrection be made illegal? [much stuff] Just a random note: What if we had a __del__ with zombie behavior? Assume an instance that is about to be destructed. Then __del__ is called via normal method lookup. What we want is to let this happen only once. Here the Zombie: After method lookup, place a dummy __del__ into the to-be-deleted instance dict, and we are sure that this does not harm. Kinda "yes its there, but a broken link ". The zombie always works by doing nothing. Makes some sense? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From gstein at lyra.org Sat Mar 4 00:09:48 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 3 Mar 2000 15:09:48 -0800 (PST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> Message-ID: You may as well remove the entire "vi" concept from ConfigParser. Since "vi" can be *only* a '=' or ':', then you aren't truly checking anything in the "if" statement. Further, "vi" is used nowhere else, so that variable and the corresponding regex group can be nuked altogether. IMO, I'm not sure why the ";" comment form was initially restricted to just one option format in the first place. Cheers, -g On Fri, 3 Mar 2000, Jeremy Hylton wrote: > Update of /projects/cvsroot/python/dist/src/Lib > In directory bitdiddle:/home/jhylton/python/src/Lib > > Modified Files: > ConfigParser.py > Log Message: > allow comments beginning with ; in key: value as well as key = value > > > Index: ConfigParser.py > =================================================================== > RCS file: /projects/cvsroot/python/dist/src/Lib/ConfigParser.py,v > retrieving revision 1.16 > retrieving revision 1.17 > diff -C2 -r1.16 -r1.17 > *** ConfigParser.py 2000/02/28 23:23:55 1.16 > --- ConfigParser.py 2000/03/03 20:43:57 1.17 > *************** > *** 359,363 **** > optname, vi, optval = mo.group('option', 'vi', 'value') > optname = string.lower(optname) > ! if vi == '=' and ';' in optval: > # ';' is a comment delimiter only if it follows > # a spacing character > --- 359,363 ---- > optname, vi, optval = mo.group('option', 'vi', 'value') > optname = string.lower(optname) > ! if vi in ('=', ':') and ';' in optval: > # ';' is a comment delimiter only if it follows > # a spacing character > > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://www.python.org/mailman/listinfo/python-checkins > -- Greg Stein, http://www.lyra.org/ From jeremy at cnri.reston.va.us Sat Mar 4 00:15:32 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Fri, 3 Mar 2000 18:15:32 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> Message-ID: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> Thanks for catching that. I didn't look at the context. I'm going to wait, though, until I talk to Fred to mess with the code any more. General question for python-dev readers: What are your experiences with ConfigParser? I just used it to build a simple config parser for IDLE and found it hard to use for several reasons. The biggest problem was that the file format is undocumented. I also found it clumsy to have to specify section and option arguments. I ended up writing a proxy that specializes on section so that get takes only an option argument. It sounds like ConfigParser code and docs could use a general cleanup. Are there any other issues to take care of as part of that cleanup? Jeremy From gstein at lyra.org Sat Mar 4 00:35:09 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 3 Mar 2000 15:35:09 -0800 (PST) Subject: [Python-Dev] ConfigParser stuff (was: CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17) In-Reply-To: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> Message-ID: On Fri, 3 Mar 2000, Jeremy Hylton wrote: > Thanks for catching that. I didn't look at the context. I'm going to > wait, though, until I talk to Fred to mess with the code any more. Not a problem. I'm glad that diffs are now posted to -checkins. :-) > General question for python-dev readers: What are your experiences > with ConfigParser? Love it! > I just used it to build a simple config parser for > IDLE and found it hard to use for several reasons. The biggest > problem was that the file format is undocumented. In my most complex use of ConfigParser, I had to override SECTCRE to allow periods in the section name. Of course, that was quite interesting since the variable is __SECTRE in 1.5.2 (i.e. I had to compensate for the munging). I also change OPTCRE to allow a few more charaters ("@" in particular, which even the update doesn't do). Not a problem nowadays since those are public. My subclass also defines a set() method and a delsection() method. These are used because I write the resulting changes back out to a file. It might be nice to have a method which writes out a config file (with an "AUTOGENERATED BY ConfigParser.py -- DO NOT EDIT BY HAND"; or maybe "... BY ..."). > I also found it > clumsy to have to specify section and option arguments. I found these were critical in my application. I also take advantage of the sections in my "edna" application for logical organization. > I ended up > writing a proxy that specializes on section so that get takes only an > option argument. > > It sounds like ConfigParser code and docs could use a general cleanup. > Are there any other issues to take care of as part of that cleanup? A set() method and a writefile() type of method would be nice. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim_one at email.msn.com Sat Mar 4 02:38:43 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 3 Mar 2000 20:38:43 -0500 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: <200003031650.LAA21647@eric.cnri.reston.va.us> Message-ID: <000001bf857a$60b45ac0$c6a0143f@tim> [Guido] > ... > Someone (Tim?) in the past suggested a different solution (probably > found in another language): for objects that are collected as part of > a cycle, the destructor isn't called at all. The memory is freed > (since it's no longer reachable), but the destructor is not called -- > it is as if the object lives on forever. Stroustrup has written in favor of this for C++. It's exactly the kind of overly slick "good argument" he would never accept from anyone else <0.1 wink>. > This is theoretically superior, but not practical: when I have an > object that creates a temp file, I want to be able to reliably delete > the temp file in my destructor, even when I'm part of a cycle! A member of the C++ committee assured me Stroustrup is overwhelmingly opposed on this. I don't even agree it's theoretically superior: it relies on the fiction that gc "may never occur", and that's just silly in practice. You're moving down the Java path. I can't possibly do a better job of explaining the Java rules than the Java Language Spec. does for itself. So pick that up and study section 12.6 (Finalization of Class Instances). The end result makes little sense to users, but is sufficient to guarantee that Java itself never blows up. Note, though, that there is NO good answer to finalizers in cycles! The implementation cannot be made smart enough to both avoid trouble and "do the right thing" from the programmer's POV, because the latter is unknowable. Somebody has to lose, one way or another. Rather than risk doing a wrong thing, the BDW collector lets cycles with finalizers leak. But it also has optional hacks to support exceptions for use with C++ (which sometimes creates self-cycles) and Java. See http://reality.sgi.com/boehm_mti/finalization.html for Boehm's best concentrated thoughts on the subject. The only principled approach I know of comes out of the Scheme world. Scheme has no finalizers, of course. But it does have gc, and the concept of "guardians" was invented to address all gc finalization problems in one stroke. It's extremely Scheme-like in providing a perfectly general mechanism with no policy whatsoever. You (the Scheme programmer) can create guardian objects, and "register" other objects with a guardian. At any time, you can ask a guardian whether some object registered with it is "ready to die" (i.e., the only thing keeping it alive is its registration with the guardian). If so, you can ask it to give you one. Everything else is up to you: if you want to run a finalizer, your problem. If there are cycles, also your problem. Even if there are simple non-cyclic dependencies, your problem. Etc. So those are the extremes: BDW avoids blame by refusing to do anything. Java avoids blame by exposing an impossibly baroque implementation-driven finalization model. Scheme avoids blame by refusing to do anything "by magic", but helps you to shoot yourself with the weapon of your choice. That bad news is that I don't know of a scheme *not* at an extreme! It's extremely un-Pythonic to let things leak (despite that it has let things leak for a decade ), but also extremely un-Pythonic to make some wild-ass guess. So here's what I'd consider doing: explicit is better than implicit, and in the face of ambiguity refuse the temptation to guess. If a trash cycle contains a finalizer (my, but that has to be rare. in practice, in well-designed code!), don't guess, but make it available to the user. A gc.guardian() call could expose such beasts, or perhaps a callback could be registered, invoked when gc finds one of these things. Anyone crazy enough to create cyclic trash with finalizers then has to take responsibility for breaking the cycle themself. This puts the burden on the person creating the problem, and they can solve it in the way most appropriate to *their* specific needs. IOW, the only people who lose under this scheme are the ones begging to lose, and their "loss" consists of taking responsibility. when-a-problem-is-impossible-to-solve-favor-sanity-ly y'rs - tim From gstein at lyra.org Sat Mar 4 03:59:26 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 3 Mar 2000 18:59:26 -0800 (PST) Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: <000001bf857a$60b45ac0$c6a0143f@tim> Message-ID: On Fri, 3 Mar 2000, Tim Peters wrote: >... > Note, though, that there is NO good answer to finalizers in cycles! The "Note" ?? Not just a note, but I'd say an axiom :-) By definition, you have two objects referring to each other in some way. How can you *definitely* know how to break the link between them? Do you call A's finalizer or B's first? If they're instances, do you just whack their __dict__ and hope for the best? >... > So here's what I'd consider doing: explicit is better than implicit, and in > the face of ambiguity refuse the temptation to guess. If a trash cycle > contains a finalizer (my, but that has to be rare. in practice, in > well-designed code!), don't guess, but make it available to the user. A > gc.guardian() call could expose such beasts, or perhaps a callback could be > registered, invoked when gc finds one of these things. Anyone crazy enough > to create cyclic trash with finalizers then has to take responsibility for > breaking the cycle themself. This puts the burden on the person creating > the problem, and they can solve it in the way most appropriate to *their* > specific needs. IOW, the only people who lose under this scheme are the > ones begging to lose, and their "loss" consists of taking responsibility. I'm not sure if Tim is saying the same thing, but I'll write down a concreate idea for cleaning garbage cycles. First, a couple observations: * Some objects can always be reliably "cleaned": lists, dicts, tuples. They just drop their contents, with no invocations against any of them. Note that an instance without a __del__ has no opinion on how it is cleaned. (this is related to Tim's point about whether a cycle has a finalizer) * The other objects may need to *use* their referenced objects in some way to clean out cycles. Since the second set of objects (possibly) need more care during their cleanup, we must concentrate on how to solve their problem. Back up a step: to determine where an object falls, let's define a tp_clean type slot. It returns an integer and takes one parameter: an operation integer. Py_TPCLEAN_CARE_CHECK /* check whether care is needed */ Py_TPCLEAN_CARE_EXEC /* perform the careful cleaning */ Py_TPCLEAN_EXEC /* perform a non-careful cleaning */ Given a set of objects that require special cleaning mechanisms, there is no way to tell where to start first. So... just pick the first one. Call its tp_clean type slot with CARE_EXEC. For instances, this maps to __clean__. If the instance does not have a __clean__, then tp_clean returns FALSE meaning that it could not clean this object. The algorithm moves on to the next object in the set. If tp_clean returns TRUE, then the object has been "cleaned" and is moved to the "no special care needed" list of objects, awaiting its reference count to hit zero. Note that objects in the "care" and "no care" lists may disappear during the careful-cleaning process. If the careful-cleaning algorithm hits the end of the careful set of objects and the set is non-empty, then throw an exception: GCImpossibleError. The objects in this set each said they could not be cleaned carefully AND they were not dealloc'd during other objects' cleaning. [ it could be possible to define a *dynamic* CARE_EXEC that will succeed if you call it during a second pass; I'm not sure this is a Good Thing to allow, however. ] This also implies that a developer should almost *always* consider writing a __clean__ method whenever they write a __del__ method. That method MAY be called when cycles need to be broken; the object should delete any non-essential variables in such a way that integrity is retained (e.g. it fails gracefully when methods are called and __del__ won't raise an error). For example, __clean__ could call a self.close() to shut down its operation. Whatever... you get the idea. At the end of the iteration of the "care" set, then you may have objects remaining in the "no care" set. By definition, these objects don't care about their internal references to other objects (they don't need them during deallocation). We iterate over this set, calling tp_clean(EXEC). For lists, dicts, and tuples, the tp_clean(EXEC) call simply clears out the references to other objects (but does not dealloc the object!). Again: objects in the "no care" set will go away during this process. By the end of the iteration over the "no care" set, it should be empty. [ note: the iterations over these sets should probably INCREF/DECREF across the calls; otherwise, the object could be dealloc'd during the tp_clean call. ] [ if the set is NOT empty, then tp_clean(EXEC) did not remove all possible references to other objects; not sure what this means. is it an error? maybe you just force a tp_dealloc on the remaining objects. ] Note that the tp_clean mechanism could probably be used during the Python finalization, where Python does a bunch of special-casing to clean up modules. Specifically: a module does not care about its contents during its deallocation, so it is a "no care" object; it responds to tp_clean(EXEC) by clearing its dictionary. Class objects are similar: they can clear their dict (which contains a module reference which usually causes a loop) during tp_clean(EXEC). Module cleanup is easy once objects with CARE_CHECK have been handled -- all that funny logic in there is to deal with "care" objects. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim_one at email.msn.com Sat Mar 4 04:26:54 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 3 Mar 2000 22:26:54 -0500 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Message-ID: <000401bf8589$7d1364e0$c6a0143f@tim> [Tim] > Note, though, that there is NO good answer to finalizers in cycles! The [Greg Stein] > "Note" ?? Not just a note, but I'd say an axiom :-) An axiom is accepted without proof: we have plenty of proof that there's no thoroughly good answer (i.e., every language that has ever addressed this issue -- along with every language that ever will ). > By definition, you have two objects referring to each other in some way. > How can you *definitely* know how to break the link between them? Do you > call A's finalizer or B's first? If they're instances, do you just whack > their __dict__ and hope for the best? Exactly. The *programmer* may know the right thing to do, but the Python implementation can't possibly know. Facing both facts squarely constrains the possibilities to the only ones that are all of understandable, predictable and useful. Cycles with finalizers must be a Magic-Free Zone else you lose at least one of those three: even Guido's kung fu isn't strong enough to outguess this. [a nice implementation sketch, of what seems an overly elaborate scheme, if you believe cycles with finalizers are rare in intelligently designed code) ] Provided Guido stays interested in this, he'll make his own fun. I'm just inviting him to move in a sane direction <0.9 wink>. One caution: > ... > If the careful-cleaning algorithm hits the end of the careful set of > objects and the set is non-empty, then throw an exception: > GCImpossibleError. Since gc "can happen at any time", this is very severe (c.f. Guido's objection to making resurrection illegal). Hand a trash cycle back to the programmer instead, via callback or request or whatever, and it's all explicit without more cruft in the implementation. It's alive again when they get it back, and they can do anything they want with it (including resurrecting it, or dropping it again, or breaking cycles -- anything). I'd focus on the cycles themselves, not on the types of objects involved. I'm not pretending to address the "order of finalization at shutdown" question, though (although I'd agree they're deeply related: how do you follow a topological sort when there *isn't* one? well, you don't, because you can't). realistically y'rs - tim From gstein at lyra.org Sat Mar 4 09:43:45 2000 From: gstein at lyra.org (Greg Stein) Date: Sat, 4 Mar 2000 00:43:45 -0800 (PST) Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: <000401bf8589$7d1364e0$c6a0143f@tim> Message-ID: On Fri, 3 Mar 2000, Tim Peters wrote: >... > [a nice implementation sketch, of what seems an overly elaborate scheme, > if you believe cycles with finalizers are rare in intelligently designed > code) > ] Nah. Quite simple to code up, but a bit longer to explain in English :-) The hardest part is finding the cycles, but Guido already posted a long explanation about that. Once that spits out the doubly-linked list of objects, then you're set. 1) scan the list calling tp_clean(CARE_CHECK), shoving "care needed" objects to a second list 2) scan the care-needed list calling tp_clean(CARE_EXEC). if TRUE is returned, then the object was cleaned and moves to the "no care" list. 3) assert len(care-needed list) == 0 4) scan the no-care list calling tp_clean(EXEC) 5) (questionable) assert len(no-care list) == 0 The background makes it longer. The short description of the algorithm is easy. Step (1) could probably be merged right into one of the scans in the GC algorithm (e.g. during the placement into the "these are cyclical garbage" list) > Provided Guido stays interested in this, he'll make his own fun. I'm just > inviting him to move in a sane direction <0.9 wink>. hehe... Agreed. > One caution: > > > ... > > If the careful-cleaning algorithm hits the end of the careful set of > > objects and the set is non-empty, then throw an exception: > > GCImpossibleError. > > Since gc "can happen at any time", this is very severe (c.f. Guido's > objection to making resurrection illegal). GCImpossibleError would simply be a subclass of MemoryError. Makes sense to me, and definitely allows for its "spontaneity." > Hand a trash cycle back to the > programmer instead, via callback or request or whatever, and it's all > explicit without more cruft in the implementation. It's alive again when > they get it back, and they can do anything they want with it (including > resurrecting it, or dropping it again, or breaking cycles -- anything). I'd > focus on the cycles themselves, not on the types of objects involved. I'm > not pretending to address the "order of finalization at shutdown" question, > though (although I'd agree they're deeply related: how do you follow a > topological sort when there *isn't* one? well, you don't, because you > can't). I disagree. I don't think a Python-level function is going to have a very good idea of what to do. IMO, this kind of semantics belong down in the interpreter with a specific, documented algorithm. Throwing it out to Python won't help -- that function will still have to use a "standard pattern" for getting the cyclical objects to toss themselves. I think that standard pattern should be a language definition. Without a standard pattern, then you're saying the application will know what to do, but that is kind of weird -- what happens when an unexpected cycle arrives? Cheers, -g -- Greg Stein, http://www.lyra.org/ From moshez at math.huji.ac.il Sat Mar 4 10:50:19 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 4 Mar 2000 11:50:19 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> Message-ID: On Fri, 3 Mar 2000, Jeremy Hylton wrote: > It sounds like ConfigParser code and docs could use a general cleanup. > Are there any other issues to take care of as part of that cleanup? One thing that bothered me once: I want to be able to have something like: [section] tag = 1 tag = 2 And be able to retrieve ("section", "tag") -> ["1", "2"]. Can be awfully useful for things that make sense several time. Perhaps there should be two functions, one that reads a single-tag and one that reads a multi-tag? File format: I'm sure I'm going to get yelled at, but why don't we make it XML? Hard to edit, yadda, yadda, but you can easily write a special purpose widget to edit XConfig (that's what we'll call the DTD) files. hopefull-yet-not-naive-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From gstein at lyra.org Sat Mar 4 11:05:15 2000 From: gstein at lyra.org (Greg Stein) Date: Sat, 4 Mar 2000 02:05:15 -0800 (PST) Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: Message-ID: On Sat, 4 Mar 2000, Moshe Zadka wrote: > On Fri, 3 Mar 2000, Jeremy Hylton wrote: > > It sounds like ConfigParser code and docs could use a general cleanup. > > Are there any other issues to take care of as part of that cleanup? > > One thing that bothered me once: > > I want to be able to have something like: > > [section] > tag = 1 > tag = 2 > > And be able to retrieve ("section", "tag") -> ["1", "2"]. > Can be awfully useful for things that make sense several time. > Perhaps there should be two functions, one that reads a single-tag and > one that reads a multi-tag? Structured values would be nice. Several times, I've needed to decompose the right hand side into lists. > File format: I'm sure I'm going to get yelled at, but why don't we > make it XML? Hard to edit, yadda, yadda, but you can easily write a > special purpose widget to edit XConfig (that's what we'll call the DTD) > files. Write a whole new module. ConfigParser is for files that look like the above. There isn't a reason to NOT use XML, but it shouldn't go into ConfigParser. I find the above style much easier for *humans*, than an XML file, to specify options. XML is good for computers; not so good for humans. Cheers, -g -- Greg Stein, http://www.lyra.org/ From moshez at math.huji.ac.il Sat Mar 4 11:46:40 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 4 Mar 2000 12:46:40 +0200 (IST) Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: <000001bf857a$60b45ac0$c6a0143f@tim> Message-ID: [Tim Peters] > ...If a trash cycle > contains a finalizer (my, but that has to be rare. in practice, in > well-designed code!), This shows something Tim himself has often said -- he never programmed a GUI. It's very hard to build a GUI (especially with Tkinter) which is cycle-less, but the classes implementing the GUI often have __del__'s to break system-allocated resources. So, it's not as rare as we would like to believe, which is the reason I haven't given this answer. which-is-not-the-same-thing-as-disagreeing-with-it-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From moshez at math.huji.ac.il Sat Mar 4 12:16:19 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 4 Mar 2000 13:16:19 +0200 (IST) Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Message-ID: On Sat, 4 Mar 2000, Greg Stein wrote: > I disagree. I don't think a Python-level function is going to have a very > good idea of what to do Much better then the Python interpreter... > Throwing it out to Python won't help > what happens when an unexpected cycle arrives? Don't delete it. It's as simple as that, since it's a bug. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From moshez at math.huji.ac.il Sat Mar 4 12:29:33 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 4 Mar 2000 13:29:33 +0200 (IST) Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: Message-ID: On Sat, 4 Mar 2000, Greg Stein wrote: > Write a whole new module. ConfigParser is for files that look like the > above. Gotcha. One problem: two configurations modules might cause the classic "which should I use?" confusion. > > I find the above style much easier for *humans*, than an XML file, to > specify options. XML is good for computers; not so good for humans. > Of course: what human could delimit his text with and ? oh-no-another-c.l.py-bot-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From gstein at lyra.org Sat Mar 4 12:38:46 2000 From: gstein at lyra.org (Greg Stein) Date: Sat, 4 Mar 2000 03:38:46 -0800 (PST) Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Message-ID: On Sat, 4 Mar 2000, Moshe Zadka wrote: > On Sat, 4 Mar 2000, Greg Stein wrote: > > I disagree. I don't think a Python-level function is going to have a very > > good idea of what to do > > > Much better then the Python interpreter... If your function receives two instances (A and B), what are you going to do? How can you know what their policy is for cleaning up in the face of a cycle? I maintain that you would call the equivalent of my proposed __clean__. There isn't much else you'd be able to do, unless you had a completely closed system, you expected cycles between specific types of objects, and you knew a way to clean them up. Even then, you would still be calling something like __clean__ to let the objects do whatever they needed. I'm suggesting that __clean__ should be formalized (as part of tp_clean). Throwing the handling "up to Python" isn't going to do much for you. Seriously... I'm all for coding more stuff in Python rather than C, but this just doesn't feel right. Getting the objects GC'd is a language feature, and a specific pattern/method/recommendation is best formulated as an interpreter mechanism. > > > Throwing it out to Python won't help > > > what happens when an unexpected cycle arrives? > > Don't delete it. > It's as simple as that, since it's a bug. The point behind this stuff is to get rid of it, rather than let it linger on. If the objects have finalizers (which is how we get to this step!), then it typically means there is a resource they must release. Getting the object cleaned and dealloc'd becomes quite important. Cheers, -g p.s. did you send in a patch for the instance_contains() thing yet? -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sat Mar 4 12:43:12 2000 From: gstein at lyra.org (Greg Stein) Date: Sat, 4 Mar 2000 03:43:12 -0800 (PST) Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: Message-ID: On Sat, 4 Mar 2000, Moshe Zadka wrote: > On Sat, 4 Mar 2000, Greg Stein wrote: > > Write a whole new module. ConfigParser is for files that look like the > > above. > > Gotcha. > > One problem: two configurations modules might cause the classic "which > should I use?" confusion. Nah. They wouldn't *both* be called ConfigParser. And besides, I see the XML format more as a persistence mechanism rather than a configuration mechanism. I'd call the module something like "XMLPersist". > > > > I find the above style much easier for *humans*, than an XML file, to > > specify options. XML is good for computers; not so good for humans. > > > > Of course: what human could delimit his text with and ? Feh. As a communciation mechanism, dropping in that stuff... it's easy. ButI wouldnotwant ... bleck. I wouldn't want to use XML for configuration stuff. It just gets ugly. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gvwilson at nevex.com Sat Mar 4 17:46:24 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Sat, 4 Mar 2000 11:46:24 -0500 (EST) Subject: [Python-Dev] HTMLgen-style interface to SQL? Message-ID: [short form] I'm looking for an object-oriented toolkit that will do for SQL what Perl's CGI.pm module, or Python's HTMLgen, does for HTML. Pointers, examples, or expressions of interest would be welcome. [long form] Lincoln Stein's CGI.pm module for Perl allows me to build HTML in an object-oriented way, instead of getting caught in the Turing tarpit of string substitution and printf. DOM does the same (in a variety of languages) for XML. Right now, if I want to interact with an SQL database from Perl or Python, I have to embed SQL strings in my programs. I would like to have a DOM-like ability to build and manipulate queries as objects, then call a method that translate the query structure into SQL to send to the database. Alternatively, if there is an XML DTD for SQL (how's that for a chain of TLAs?), and some tool to convert the XML/SQL to pure SQL, so that I could build my query using DOM, that would be cool too. RSVP, Greg Wilson gvwilson at nevex.com From moshez at math.huji.ac.il Sat Mar 4 19:02:54 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 4 Mar 2000 20:02:54 +0200 (IST) Subject: [Python-Dev] Re: [Patches] selfnanny.py: checking for "self" in every method In-Reply-To: <200003041724.MAA05053@eric.cnri.reston.va.us> Message-ID: On Sat, 4 Mar 2000, Guido van Rossum wrote: > Before we all start writing nannies and checkers, how about a standard > API design first? I thoroughly agree -- we should have a standard API. I tried to write selfnanny so it could be callable from any API possible (e.g., it can take either a file, a string, an ast or a tuple representation) > I will want to call various nannies from a "Check" > command that I plan to add to IDLE. Very cool: what I imagine is a sort of modular PyLint. > I already did this with tabnanny, > and found that it's barely possible -- it's really written to run like > a script. Mine definitely isn't: it's designed to run both like a script and like a module. One outstanding bug: no docos. To be supplied upon request <0.5 wink>. I just wanted to float it out and see if people think that this particular nanny is worth while. > Since parsing is expensive, we probably want to share the parse tree. Yes. Probably as an AST, and transform to tuples/lists inside the checkers. > Ideas? Here's a strawman API: There's a package called Nanny Every module in that package should have a function called check_ast. It's argument is an AST object, and it's output should be a list of three-tuples: (line-number, error-message, None) or (line-number, error-message, (column-begin, column-end)) (each tuple can be a different form). Problems? (I'm CCing to python-dev. Please follow up to that discussion to python-dev only, as I don't believe it belongs in patches) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From gvwilson at nevex.com Sat Mar 4 19:26:20 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Sat, 4 Mar 2000 13:26:20 -0500 (EST) Subject: [Python-Dev] Re: selfnanny.py / nanny architecture In-Reply-To: Message-ID: > > Guido van Rossum wrote: > > Before we all start writing nannies and checkers, how about a standard > > API design first? > Moshe Zadka wrote: > Here's a strawman API: > There's a package called Nanny > Every module in that package should have a function called check_ast. > It's argument is an AST object, and it's output should be a list > of three-tuples: (line-number, error-message, None) or > (line-number, error-message, (column-begin, column-end)) (each tuple can > be a different form). Greg Wilson wrote: The SUIF (Stanford University Intermediate Format) group has been working on an extensible compiler framework for about ten years now. The framework is based on an extensible AST spec; anyone can plug in a new analysis or optimization algorithm by writing one or more modules that read and write decorated ASTs. (See http://suif.stanford.edu for more information.) Based on their experience, I'd suggest that every nanny take an AST as an argument, and add complaints in place as decorations to the nodes. A terminal nanny could then collect these and display them to the user. I think this architecture will make it simpler to write meta-nannies. I'd further suggest that the AST be something that can be manipulated through DOM, since (a) it's designed for tree-crunching, (b) it's already documented reasonably well, (c) it'll save us re-inventing a wheel, and (d) generating human-readable output in a variety of customizable formats ought to be simple (well, simpler than the alternatives). Greg From jeremy at cnri.reston.va.us Sun Mar 5 03:10:28 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Sat, 4 Mar 2000 21:10:28 -0500 (EST) Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: References: Message-ID: <14529.49684.219826.466310@bitdiddle.cnri.reston.va.us> >>>>> "MZ" == Moshe Zadka writes: MZ> On Sat, 4 Mar 2000, Greg Stein wrote: >> Write a whole new module. ConfigParser is for files that look >> like the above. MZ> Gotcha. MZ> One problem: two configurations modules might cause the classic MZ> "which should I use?" confusion. I don't think this is a hard decision to make. ConfigParser is good for simple config files that are going to be maintained by humans with a text editor. An XML-based configuration file is probably the right solution when humans aren't going to maintain the config files by hand. Perhaps XML will eventually be the right solution in both cases, but only if XML editors are widely available. >> I find the above style much easier for *humans*, than an >> XML file, to specify options. XML is good for computers; not so >> good for humans. MZ> Of course: what human could delimit his text with and MZ> ? Could? I'm sure there are more ways on Linux and Windows to mark up text than are dreamt of in your philosophy, Moshe . The question is what is easiest to read and understand? Jeremy From tim_one at email.msn.com Sun Mar 5 03:22:16 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 4 Mar 2000 21:22:16 -0500 Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" in every method In-Reply-To: <200003041724.MAA05053@eric.cnri.reston.va.us> Message-ID: <000201bf8649$a17383e0$f42d153f@tim> [Guido van Rossum] > Before we all start writing nannies and checkers, how about a standard > API design first? I will want to call various nannies from a "Check" > command that I plan to add to IDLE. I already did this with tabnanny, > and found that it's barely possible -- it's really written to run like > a script. I like Moshe's suggestion fine, except with an abstract base class named Nanny with a virtual method named check_ast. Nannies should (of course) derive from that. > Since parsing is expensive, we probably want to share the parse tree. What parse tree? Python's parser module produces an AST not nearly "A enough" for reasonably productive nanny writing. GregS & BillT have improved on that, but it's not in the std distrib. Other "problems" include the lack of original source lines in the trees, and lack of column-number info. Note that by the time Python has produced a parse tree, all evidence of the very thing tabnanny is looking for has been removed. That's why she used the tokenize module to begin with. God knows tokenize is too funky to use too when life gets harder (check out checkappend.py's tokeneater state machine for a preliminary taste of that). So the *only* solution is to adopt Christian's Stackless so I can rewrite tokenize as a coroutine like God intended . Seriously, I don't know of anything that produces a reasonably usable (for nannies) parse tree now, except via modifying a Python grammar for use with John Aycock's SPARK; the latter also comes with very pleasant & powerful tree pattern-matching abilities. But it's probably too slow for everyday "just folks" use. Grabbing the GregS/BillT enhancement is probably the most practical thing we could build on right now (but tabnanny will have to remain a special case). unsure-about-the-state-of-simpleparse-on-mxtexttools-for-this-ly y'rs - tim From tim_one at email.msn.com Sun Mar 5 04:24:18 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 4 Mar 2000 22:24:18 -0500 Subject: [Python-Dev] breaking list.append() In-Reply-To: <38BE1B69.E0B88B41@lemburg.com> Message-ID: <000301bf8652$4aadaf00$f42d153f@tim> Just noting that two instances of this were found in Zope. [/F] > append = list.append > for x in something: > append(...) [Tim] > As detailed in a c.l.py posting, I have yet to find a single instance of > this actually called with multiple arguments. Pointing out that it's > *possible* isn't the same as demonstrating it's an actual problem. I'm > quite willing to believe that it is, but haven't yet seen evidence of it. From fdrake at acm.org Sun Mar 5 04:55:27 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Sat, 4 Mar 2000 22:55:27 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> Message-ID: <14529.55983.263225.691427@weyr.cnri.reston.va.us> Jeremy Hylton writes: > Thanks for catching that. I didn't look at the context. I'm going to > wait, though, until I talk to Fred to mess with the code any more. I did it that way since the .ini format allows comments after values (the ';' comments after a '=' vi; '#' comments are a ConfigParser thing), but there's no equivalent concept for RFC822 parsing, other than '(...)' in addresses. The code was trying to allow what was expected from the .ini crowd without breaking the "native" use of ConfigParser. > General question for python-dev readers: What are your experiences > with ConfigParser? I just used it to build a simple config parser for > IDLE and found it hard to use for several reasons. The biggest > problem was that the file format is undocumented. I also found it > clumsy to have to specify section and option arguments. I ended up > writing a proxy that specializes on section so that get takes only an > option argument. > > It sounds like ConfigParser code and docs could use a general cleanup. > Are there any other issues to take care of as part of that cleanup? I agree that the API to ConfigParser sucks, and I think also that the use of it as a general solution is a big mistake. It's a messy bit of code that doesn't need to be, supports a really nasty mix of syntaxes, and can easily bite users who think they're getting something .ini-like (the magic names and interpolation is a bad idea!). While it suited the original application well enough, something with .ini syntax and interpolation from a subclass would have been *much* better. I think we should create a new module, inilib, that implements exactly .ini syntax in a base class that can be intelligently extended. ConfigParser should be deprecated. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From tim_one at email.msn.com Sun Mar 5 05:11:12 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 4 Mar 2000 23:11:12 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: <200003031705.MAA21700@eric.cnri.reston.va.us> Message-ID: <000601bf8658$d81d34e0$f42d153f@tim> [Guido] > OK, so we're down to this one point: if __del__ resurrects the object, > should __del__ be called again later? Additionally, should > resurrection be made illegal? I give up on the latter, so it really is just one. > I can easily see how __del__ could *accidentally* resurrect the object > as part of its normal cleanup ... > In this example, the helper routine will eventually delete the object > from its cache, at which point it is truly deleted. It would be > harmful, not helpful, if __del__ was called again at this point. If this is something that happens easily, and current behavior is harmful, don't you think someone would have complained about it by now? That is, __del__ *is* "called again at this point" now, and has been for years & years. And if it happens easily, it *is* happening now, and in an unknown amount of existing code. (BTW, I doubt it happens at all -- people tend to write very simple __del__ methods, so far as I've ever seen) > Now, it is true that the current docs for __del__ imply that > resurrection is possible. "imply" is too weak. The Reference Manual's "3.3.1 Basic customization" flat-out says it's possible ("though not recommended"). The precise meaning of the word "may" in the following sentence is open to debate, though. > The intention of that note was to warn __del__ writers that in the case > of accidental resurrection Sorry, but I can't buy this: saying that *accidents* are "not recommended" is just too much of a stretch . > __del__ might be called again. That's a plausible reading of the following "may", but not the only one. I believe it's the one you intended, but it's not the meaning I took prior to this. > The intention certainly wasn't to allow or encourage intentional resurrection. Well, I think it plainly says it's supported ("though not recommended"). I used it intentionally at KSR, and even recommended it on c.l.py in the dim past (in one of those "dark & useless" threads ). > Would there really be someone out there who uses *intentional* > resurrection? I severely doubt it. I've never heard of this. Why would anyone tell you about something that *works*?! You rarely hear the good stuff, you know. I gave the typical pattern in the preceding msg. To flesh out the motivation more, you have some external resource that's very expensive to set up (in KSR's case, it was an IPC connection to a remote machine). Rights to use that resource are handed out in the form of an object. When a client is done using the resource, they *should* explicitly use the object's .release() method, but you can't rely on that. So the object's __del__ method looks like (for example): def __del__(self): # Code not shown to figure out whether to disconnect: the downside to # disconnecting is that it can cost a bundle to create a new connection. # If the whole app is shutting down, then of course we want to disconnect. # Or if a timestamp trace shows that we haven't been making good use of # all the open connections lately, we may want to disconnect too. if decided_to_disconnect: self.external_resource.disconnect() else: # keep the connection alive for reuse global_available_connection_objects.append(self) This is simple & effective, and it relies on both intentional resurrection and __del__ getting called repeatedly. I don't claim there's no other way to write it, just that there's *been* no problem doing this for a millennium . Note that MAL spontaneously sketched similar examples, although I can't say whether he's actually done stuff like this. Going back up a level, in another msg you finally admitted that you want "__del__ called only once" for the same reason Java wants it: because gc has no idea what to do when faced with finalizers in a trash cycle, and settles for an unprincipled scheme whose primary virtue is that "it doesn't blow up" -- and "__del__ called only once" happens to be convenient for that scheme. But toss such cycles back to the user to deal with at the Python level, and all those problems go away (along with the artificial need to change __del__). The user can break the cycles in an order that makes sense to the app (or they can let 'em leak! up to them). >>> print gc.get_cycle.__doc__ Return a list of objects comprising a single garbage cycle; [] if none. At least one of the objects has a finalizer, so Python can't determine the intended order of destruction. If you don't break the cycle, Python will neither run any finalizers for the contained objects nor reclaim their memory. If you do break the cycle, and dispose of the list, Python will follow its normal reference-counting rules for running finalizers and reclaiming memory. That this "won't blow up" either is just the least of its virtues . you-break-it-you-buy-it-ly y'rs - tim From tim_one at email.msn.com Sun Mar 5 05:56:54 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 4 Mar 2000 23:56:54 -0500 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Message-ID: <000001bf865f$3acb99a0$432d153f@tim> [Tim sez "toss insane cycles back on the user"] [Greg Stein] > I disagree. I don't think a Python-level function is going to have a very > good idea of what to do. You've already assumed that Python coders know exactly what to do, else they couldn't have coded the new __clean__ method your proposal relies on. I'm taking what strikes me as the best part of Scheme's Guardian idea: don't assume *anything* about what users "should" do to clean up their trash. Leave it up to them: their problem, their solution. I think finalizers in trash cycles should be so rare in well-written code that it's just not worth adding much of anything in the implementation to cater to it. > IMO, this kind of semantics belong down in the interpreter with a > specific, documented algorithm. Throwing it out to Python won't help > -- that function will still have to use a "standard pattern" for getting > the cyclical objects to toss themselves. They can use any pattern they want, and if the pattern doesn't *need* to be coded in C as part of the implementation, it shouldn't be. > I think that standard pattern should be a language definition. I distrust our ability to foresee everything users may need over the next 10 years: how can we know today that the first std pattern you dreamed up off the top of your head is the best approach to an unbounded number of problems we haven't yet seen a one of ? > Without a standard pattern, then you're saying the application will know > what to do, but that is kind of weird -- what happens when an unexpected > cycle arrives? With the hypothetical gc.get_cycle() function I mentioned before, they should inspect objects in the list they get back, and if they find they don't know what to do with them, they can still do anything they want. Examples include raising an exception, dialing my home pager at 3am to insist I come in to look at it, or simply let the list go away (at which point the objects in the list will again become a trash cycle containing a finalizer). If several distinct third-party modules get into this act, I *can* see where it could become a mess. That's why Scheme "guardians" is plural: a given module could register its "problem objects" in advance with a specific guardian of its own, and query only that guardian later for things ready to die. This probably can't be implemented in Python, though, without support for weak references (or lots of brittle assumptions about specific refcount values). agreeably-disagreeing-ly y'rs - tim From tim_one at email.msn.com Sun Mar 5 05:56:58 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 4 Mar 2000 23:56:58 -0500 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Message-ID: <000101bf865f$3cb0d460$432d153f@tim> [Tim] > ...If a trash cycle contains a finalizer (my, but that has to be rare. > in practice, in well-designed code!), [Moshe Zadka] > This shows something Tim himself has often said -- he never programmed a > GUI. It's very hard to build a GUI (especially with Tkinter) which is > cycle-less, but the classes implementing the GUI often have __del__'s > to break system-allocated resources. > > So, it's not as rare as we would like to believe, which is the reason > I haven't given this answer. I wrote Cyclops.py when trying to track down leaks in IDLE. The extraordinary thing we discovered is that "even real gc" would not have reclaimed the cycles. They were legitimately reachable, because, indeed, "everything points to everything else". Guido fixed almost all of them by explicitly calling new "close" methods. I believe IDLE has no __del__ methods at all now. Tkinter.py currently contains two. so-they-contained-__del__-but-weren't-trash-ly y'rs - tim From tim_one at email.msn.com Sun Mar 5 07:05:24 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 5 Mar 2000 01:05:24 -0500 Subject: [Python-Dev] Unicode mapping tables In-Reply-To: <38BCD71C.3592E6A@lemburg.com> Message-ID: <000601bf8668$cbbdd640$432d153f@tim> [M.-A. Lemburg] > ... > Here's what I'll do: > > * implement .capitalize() in the traditional way for Unicode > objects (simply convert the first char to uppercase) Given .title(), is .capitalize() of use for Unicode strings? Or is it just a temptation to do something senseless in the Unicode world? If it doesn't make sense, leave it out (this *seems* like compulsion to implement all current string methods in *some* way for Unicode, whether or not they make sense). From moshez at math.huji.ac.il Sun Mar 5 07:16:22 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 5 Mar 2000 08:16:22 +0200 (IST) Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" in every method In-Reply-To: <000201bf8649$a17383e0$f42d153f@tim> Message-ID: On Sat, 4 Mar 2000, Tim Peters wrote: > I like Moshe's suggestion fine, except with an abstract base class named > Nanny with a virtual method named check_ast. Nannies should (of course) > derive from that. Why? The C++ you're programming damaged your common sense cycles? > > Since parsing is expensive, we probably want to share the parse tree. > > What parse tree? Python's parser module produces an AST not nearly "A > enough" for reasonably productive nanny writing. As a note, selfnanny uses the parser module AST. > GregS & BillT have > improved on that, but it's not in the std distrib. Other "problems" include > the lack of original source lines in the trees, The parser module has source lines. > and lack of column-number info. Yes, that sucks. > Note that by the time Python has produced a parse tree, all evidence of the > very thing tabnanny is looking for has been removed. That's why she used > the tokenize module to begin with. Well, it's one of the few nannies which would be in that position. > God knows tokenize is too funky to use too when life gets harder (check out > checkappend.py's tokeneater state machine for a preliminary taste of that). Why doesn't checkappend.py uses the parser module? > Grabbing the GregS/BillT enhancement is probably the most > practical thing we could build on right now You got some pointers? > (but tabnanny will have to remain a special case). tim-will-always-be-a-special-case-in-our-hearts-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From tim_one at email.msn.com Sun Mar 5 08:01:12 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 5 Mar 2000 02:01:12 -0500 Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" inevery method In-Reply-To: Message-ID: <000901bf8670$97d8f320$432d153f@tim> [Tim] >> [make Nanny a base class] [Moshe Zadka] > Why? Because it's an obvious application for OO design. A common base class formalizes the interface and can provide useful utilities for subclasses. > The C++ you're programming damaged your common sense cycles? Yes, very, but that isn't relevant here . It's good Python sense too. >> [parser module produces trees far too concrete for comfort] > As a note, selfnanny uses the parser module AST. Understood, but selfnanny has a relatively trivial task. Hassling with tuples nested dozens deep for even relatively simple stmts is both a PITA and a time sink. >> [parser doesn't give source lines] > The parser module has source lines. No, it does not (it only returns terminals, as isolated strings). The tokenize module does deliver original source lines in their entirety (as well as terminals, as isolated strings; and column numbers). >> and lack of column-number info. > Yes, that sucks. > ... > Why doesn't checkappend.py uses the parser module? Because it wanted to display the acutal source line containing an offending "append" (which, again, the parse module does not supply). Besides, it was a trivial variation on tabnanny.py, of which I have approximately 300 copies on my disk . >> Grabbing the GregS/BillT enhancement is probably the most >> practical thing we could build on right now > You got some pointers? Download python2c (http://www.mudlib.org/~rassilon/p2c/) and grab transformer.py from the zip file. The latter supplies a very useful post-processing pass over the parse module's output, squashing it *way* down. From moshez at math.huji.ac.il Sun Mar 5 08:08:41 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 5 Mar 2000 09:08:41 +0200 (IST) Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" inevery method In-Reply-To: <000901bf8670$97d8f320$432d153f@tim> Message-ID: On Sun, 5 Mar 2000, Tim Peters wrote: > [Tim] > >> [make Nanny a base class] > > [Moshe Zadka] > > Why? > > Because it's an obvious application for OO design. A common base class > formalizes the interface and can provide useful utilities for subclasses. The interface is just one function. You're welcome to have a do-nothing nanny that people *can* derive from: I see no point in making them derive from a base class. > > As a note, selfnanny uses the parser module AST. > > Understood, but selfnanny has a relatively trivial task. That it does, and it was painful. > >> [parser doesn't give source lines] > > > The parser module has source lines. > > No, it does not (it only returns terminals, as isolated strings). Sorry, misunderstanding: it seemed obvious to me you wanted line numbers. For lines, use the linecache module... > > You got some pointers? > > Download python2c (http://www.mudlib.org/~rassilon/p2c/) and grab > transformer.py from the zip file. I'll have a look. Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From effbot at telia.com Sun Mar 5 10:24:37 2000 From: effbot at telia.com (Fredrik Lundh) Date: Sun, 5 Mar 2000 10:24:37 +0100 Subject: [Python-Dev] return statements in lambda Message-ID: <006f01bf8686$391ced80$34aab5d4@hagrid> from "Python for Lisp Programmers": http://www.norvig.com/python-lisp.html > Don't forget return. Writing def twice(x): x+x is tempting > and doesn't signal a warning or > ception, but you probably > meant to have a return in there. This is particularly irksome > because in a lambda you are prohibited from writing return, > but the semantics is to do the return. maybe adding an (optional but encouraged) "return" to lambda would be an improvement? lambda x: x + 10 vs. lambda x: return x + 10 or is this just more confusing... opinions? From guido at python.org Sun Mar 5 13:04:56 2000 From: guido at python.org (Guido van Rossum) Date: Sun, 05 Mar 2000 07:04:56 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: Your message of "Sat, 04 Mar 2000 22:55:27 EST." <14529.55983.263225.691427@weyr.cnri.reston.va.us> References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> <14529.55983.263225.691427@weyr.cnri.reston.va.us> Message-ID: <200003051204.HAA05367@eric.cnri.reston.va.us> [Fred] > I agree that the API to ConfigParser sucks, and I think also that > the use of it as a general solution is a big mistake. It's a messy > bit of code that doesn't need to be, supports a really nasty mix of > syntaxes, and can easily bite users who think they're getting > something .ini-like (the magic names and interpolation is a bad > idea!). While it suited the original application well enough, > something with .ini syntax and interpolation from a subclass would > have been *much* better. > I think we should create a new module, inilib, that implements > exactly .ini syntax in a base class that can be intelligently > extended. ConfigParser should be deprecated. Amen. Some thoughts: - You could put it all in ConfigParser.py but with new classnames. (Not sure though, since the ConfigParser class, which is really a kind of weird variant, will be assumed to be the main class because its name is that of the module.) - Variants on the syntax could be given through some kind of option system rather than through subclassing -- they should be combinable independently. Som possible options (maybe I'm going overboard here) could be: - comment characters: ('#', ';', both, others?) - comments after variables allowed? on sections? - variable characters: (':', '=', both, others?) - quoting of values with "..." allowed? - backslashes in "..." allowed? - does backslash-newline mean a continuation? - case sensitivity for section names (default on) - case sensitivity for option names (default off) - variables allowed before first section name? - first section name? (default "main") - character set allowed in section names - character set allowed in variable names - %(...) substitution? (Well maybe the whole substitution thing should really be done through a subclass -- it's too weird for normal use.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun Mar 5 13:17:31 2000 From: guido at python.org (Guido van Rossum) Date: Sun, 05 Mar 2000 07:17:31 -0500 Subject: [Python-Dev] Unicode mapping tables In-Reply-To: Your message of "Sun, 05 Mar 2000 01:05:24 EST." <000601bf8668$cbbdd640$432d153f@tim> References: <000601bf8668$cbbdd640$432d153f@tim> Message-ID: <200003051217.HAA05395@eric.cnri.reston.va.us> > [M.-A. Lemburg] > > ... > > Here's what I'll do: > > > > * implement .capitalize() in the traditional way for Unicode > > objects (simply convert the first char to uppercase) [Tim] > Given .title(), is .capitalize() of use for Unicode strings? Or is it just > a temptation to do something senseless in the Unicode world? If it doesn't > make sense, leave it out (this *seems* like compulsion to implement > all current string methods in *some* way for Unicode, whether or not they > make sense). The intention of this is to make code that does something using strings do exactly the same strings if those strings happen to be Unicode strings with the same values. The capitalize method returns self[0].upper() + self[1:] -- that may not make sense for e.g. Japanese, but it certainly does for Russian or Greek. It also does this in JPython. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun Mar 5 13:24:41 2000 From: guido at python.org (Guido van Rossum) Date: Sun, 05 Mar 2000 07:24:41 -0500 Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" inevery method In-Reply-To: Your message of "Sun, 05 Mar 2000 02:01:12 EST." <000901bf8670$97d8f320$432d153f@tim> References: <000901bf8670$97d8f320$432d153f@tim> Message-ID: <200003051224.HAA05410@eric.cnri.reston.va.us> > >> [parser doesn't give source lines] > > > The parser module has source lines. > > No, it does not (it only returns terminals, as isolated strings). The > tokenize module does deliver original source lines in their entirety (as > well as terminals, as isolated strings; and column numbers). Moshe meant line numbers - -it has those. > > Why doesn't checkappend.py uses the parser module? > > Because it wanted to display the acutal source line containing an offending > "append" (which, again, the parse module does not supply). Besides, it was > a trivial variation on tabnanny.py, of which I have approximately 300 copies > on my disk . Of course another argument for making things more OO. (The code used in tabnanny.py to process files and recursively directories fronm sys.argv is replicated a thousand times in various scripts of mine -- Tim took it from my now-defunct takpolice.py. This should be in the std library somehow...) > >> Grabbing the GregS/BillT enhancement is probably the most > >> practical thing we could build on right now > > > You got some pointers? > > Download python2c (http://www.mudlib.org/~rassilon/p2c/) and grab > transformer.py from the zip file. The latter supplies a very useful > post-processing pass over the parse module's output, squashing it *way* > down. Those of you who have seen the compiler-sig should know that Jeremy made an improvement which will find its way into p2c. It's currently on display in the Python CVS tree in the nondist branch: see http://www.python.org/pipermail/compiler-sig/2000-February/000011.html and the ensuing thread for more details. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun Mar 5 14:46:13 2000 From: guido at python.org (Guido van Rossum) Date: Sun, 05 Mar 2000 08:46:13 -0500 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: Your message of "Fri, 03 Mar 2000 22:26:54 EST." <000401bf8589$7d1364e0$c6a0143f@tim> References: <000401bf8589$7d1364e0$c6a0143f@tim> Message-ID: <200003051346.IAA05539@eric.cnri.reston.va.us> I'm beginning to believe that handing cycles with finalizers to the user is better than calling __del__ with a different meaning, and I tentatively withdraw my proposal to change the rules for when __del__ is called (even when __init__ fails; I haven't had any complaints about that either). There seem to be two competing suggestions for solutions: (1) call some kind of __cleanup__ (Marc-Andre) or tp_clean (Greg) method of the object; (2) Tim's proposal of an interface to ask the garbage collector for a trash cycle with a finalizer (or for an object with a finalizer in a trash cycle?). Somehow Tim's version looks less helpful to me, because it *seems* that whoever gets to handle the cycle (the main code of the program?) isn't necessarily responsible for creating it (some library you didn't even know was used under the covers of some other library you called). Of course, it's also posssible that a trash cycle is created by code outside the responsibility of the finalizer. But still, I have a hard time understanding how Tim's version would be used. Greg or Marc-Andre's version I understand. What keeps nagging me though is what to do when there's a finalizer but no cleanup method. I guess the trash cycle remains alive. Is this acceptable? (I guess so, because we've given the programmer a way to resolve the trash: provide a cleanup method.) If we detect individual cycles (the current algorithm doesn't do that yet, though it seems easy enough to do another scan), could we special-case cycles with only one finalizer and no cleaner-upper? (I'm tempted to call the finalizer because it seems little harm can be done -- but then of course there's the problem of the finalizer being called again when the refcount really goes to zero. :-( ) > Exactly. The *programmer* may know the right thing to do, but the Python > implementation can't possibly know. Facing both facts squarely constrains > the possibilities to the only ones that are all of understandable, > predictable and useful. Cycles with finalizers must be a Magic-Free Zone > else you lose at least one of those three: even Guido's kung fu isn't > strong enough to outguess this. > > [a nice implementation sketch, of what seems an overly elaborate scheme, > if you believe cycles with finalizers are rare in intelligently designed > code) > ] > > Provided Guido stays interested in this, he'll make his own fun. I'm just > inviting him to move in a sane direction <0.9 wink>. My current tendency is to go with the basic __cleanup__ and nothing more, calling each instance's __cleanup__ before clobbering directories and lists -- which should break all cycles safely. > One caution: > > > ... > > If the careful-cleaning algorithm hits the end of the careful set of > > objects and the set is non-empty, then throw an exception: > > GCImpossibleError. > > Since gc "can happen at any time", this is very severe (c.f. Guido's > objection to making resurrection illegal). Not quite. Cycle detection is presumably only called every once in a while on memory allocation, and memory *allocation* (as opposed to deallocation) is allowed to fail. Of course, this will probably run into various coding bugs where allocation failure isn't dealt with properly, because in practice this happens so rarely... > Hand a trash cycle back to the > programmer instead, via callback or request or whatever, and it's all > explicit without more cruft in the implementation. It's alive again when > they get it back, and they can do anything they want with it (including > resurrecting it, or dropping it again, or breaking cycles -- > anything). That was the idea with calling the finalizer too: it would be called between INCREF/DECREF, so the object would be considered alive for the duration of the finalizer call. Here's another way of looking at my error: for dicts and lists, I would call a special *clear* function; but for instances, I would call *dealloc*, however intending it to perform a *clear*. I wish we didn't have to special-case finalizers on class instances (since each dealloc function is potentially a combination of a finalizer and a deallocation routine), but the truth is that they *are* special -- __del__ has no responsibility for deallocating memory, only for deallocating external resources (such as temp files). And even if we introduced a tp_clean protocol that would clear dicts and lists and call __cleanup__ for instances, we'd still want to call it first for instances, because an instance depends on its __dict__ for its __cleanup__ to succeed (but the __dict__ doesn't depend on the instance for its cleanup). Greg's 3-phase tp_clean protocol seems indeed overly elaborate but I guess it deals with such dependencies in the most general fashion. > I'd focus on the cycles themselves, not on the types of objects > involved. I'm not pretending to address the "order of finalization > at shutdown" question, though (although I'd agree they're deeply > related: how do you follow a topological sort when there *isn't* > one? well, you don't, because you can't). In theory, you just delete the last root (a C global pointing to sys.modules) and you run the garbage collector. It might be more complicated in practiceto track down all roots. Another practical consideration is that now there are cycles of the form <=> which suggests that we should make function objects traceable. Also, modules can cross-reference, so module objects should be made traceable. I don't think that this will grow the sets of traced objects by too much (since the dicts involved are already traced, and a typical program has way fewer functions and modules than it has class instances). On the other hand, we may also have to trace (un)bound method objects, and these may be tricky because they are allocated and deallocated at high rates (once per typical method call). Back to the drawing board... --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Sun Mar 5 17:42:30 2000 From: skip at mojam.com (Skip Montanaro) Date: Sun, 5 Mar 2000 10:42:30 -0600 (CST) Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: <200003051346.IAA05539@eric.cnri.reston.va.us> References: <000401bf8589$7d1364e0$c6a0143f@tim> <200003051346.IAA05539@eric.cnri.reston.va.us> Message-ID: <14530.36471.11654.666900@beluga.mojam.com> Guido> What keeps nagging me though is what to do when there's a Guido> finalizer but no cleanup method. I guess the trash cycle remains Guido> alive. Is this acceptable? (I guess so, because we've given the Guido> programmer a way to resolve the trash: provide a cleanup method.) That assumes the programmer even knows there's a cycle, right? I'd like to see this scheme help provide debugging assistance. If a cycle is discovered but the programmer hasn't declared a cleanup method for the object it wants to cleanup, a default cleanup method is called if it exists (e.g. sys.default_cleanup), which would serve mostly as an alert (print magic hex values to stderr, popup a Tk bomb dialog, raise the blue screen of death, ...) as opposed to actually breaking any cycles. Presumably the programmer would define sys.default_cleanup during development and leave it undefined during production. Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From paul at prescod.net Sat Mar 4 02:04:43 2000 From: paul at prescod.net (Paul Prescod) Date: Fri, 03 Mar 2000 17:04:43 -0800 Subject: [Python-Dev] breaking list.append() References: <38BC86E1.53F69776@prescod.net> <200003010411.XAA12988@eric.cnri.reston.va.us> Message-ID: <38C0612B.7C92F8C4@prescod.net> Guido van Rossum wrote: > > .. > Multi-arg > append probably won't be the only reason why e.g. Digital Creations > may need to release an update to Zope for Python 1.6. Zope comes with > its own version of Python anyway, so they have control over when they > make the switch. My concernc is when I want to build an application with a module that only works with Python 1.5.2 and another one that only works with Python 1.6. If we can avoid that situation by making 1.6 compatible with 1.5.2. we should. By the time 1.7 comes around I will accept that everyone has had enough time to update their modules. Remember that many module authors are just part time volunteers. They may only use Python every few months when they get a spare weekend! I really hope that Andrew is wrong when he predicts that there may be lots of different places where Python 1.6 breaks code! I'm in favor of being a total jerk when it comes to Py3K but Python has been pretty conservative thus far. Could someone remind in one sentence what the downside is for treating this as a warning condition as Java does with its deprecated features? Then the CP4E people don't get into bad habits and those same CP4E people trying to use older modules don't run into frustrating runtime errors. Do it for the CP4E people! (how's that for rhetoric) -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "We still do not know why mathematics is true and whether it is certain. But we know what we do not know in an immeasurably richer way than we did. And learning this has been a remarkable achievement, among the greatest and least known of the modern era." - from "Advent of the Algorithm" David Berlinski http://www.opengroup.com/mabooks/015/0151003386.shtml From jeremy at cnri.reston.va.us Sun Mar 5 18:46:14 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Sun, 5 Mar 2000 12:46:14 -0500 (EST) Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" inevery method In-Reply-To: <000901bf8670$97d8f320$432d153f@tim> References: <000901bf8670$97d8f320$432d153f@tim> Message-ID: <14530.40294.593407.777859@bitdiddle.cnri.reston.va.us> >>>>> "TP" == Tim Peters writes: >>> Grabbing the GregS/BillT enhancement is probably the most >>> practical thing we could build on right now >> You got some pointers? TP> Download python2c (http://www.mudlib.org/~rassilon/p2c/) and TP> grab transformer.py from the zip file. The latter supplies a TP> very useful post-processing pass over the parse module's output, TP> squashing it *way* down. The compiler tools in python/nondist/src/Compiler include Bill & Greg's transformer code, a class-based AST (each node is a subclass of the generic node), and a visitor framework for walking the AST. The APIs and organization are in a bit of flux; Mark Hammond suggested some reorganization that I've not finished yet. I may finish it up this evening. The transformer module does a good job of incuding line numbers, but I've occasionally run into a node that didn't have a lineno attribute when I expected it would. I haven't taken the time to figure out if my expection was unreasonable or if the transformer should be fixed. The compiler-sig might be a good place to discuss this further. A warning framework was one of my original goals for the SIG. I imagine we could convince Guido to move warnings + compiler tools into the standard library if they end up being useful. Jeremy From mal at lemburg.com Sun Mar 5 20:57:32 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 05 Mar 2000 20:57:32 +0100 Subject: [Python-Dev] Unicode mapping tables References: <000601bf8668$cbbdd640$432d153f@tim> Message-ID: <38C2BC2C.FFEB72C3@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > ... > > Here's what I'll do: > > > > * implement .capitalize() in the traditional way for Unicode > > objects (simply convert the first char to uppercase) > > Given .title(), is .capitalize() of use for Unicode strings? Or is it just > a temptation to do something senseless in the Unicode world? If it doesn't > make sense, leave it out (this *seems* like compulsion to implement > all current string methods in *some* way for Unicode, whether or not they > make sense). .capitalize() only touches the first char of the string - not sure whether it makes sense in both worlds ;-) Anyhow, the difference is there but subtle: string.capitalize() will use C's toupper() which is locale dependent, while unicode.capitalize() uses Unicode's toTitleCase() for the first character. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Sun Mar 5 21:15:47 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 05 Mar 2000 21:15:47 +0100 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? References: <000601bf8658$d81d34e0$f42d153f@tim> Message-ID: <38C2C073.CD51688@lemburg.com> Tim Peters wrote: > > [Guido] > > Would there really be someone out there who uses *intentional* > > resurrection? I severely doubt it. I've never heard of this. > > Why would anyone tell you about something that *works*?! You rarely hear > the good stuff, you know. I gave the typical pattern in the preceding msg. > To flesh out the motivation more, you have some external resource that's > very expensive to set up (in KSR's case, it was an IPC connection to a > remote machine). Rights to use that resource are handed out in the form of > an object. When a client is done using the resource, they *should* > explicitly use the object's .release() method, but you can't rely on that. > So the object's __del__ method looks like (for example): > > def __del__(self): > > # Code not shown to figure out whether to disconnect: the downside to > # disconnecting is that it can cost a bundle to create a new connection. > # If the whole app is shutting down, then of course we want to > disconnect. > # Or if a timestamp trace shows that we haven't been making good use of > # all the open connections lately, we may want to disconnect too. > > if decided_to_disconnect: > self.external_resource.disconnect() > else: > # keep the connection alive for reuse > global_available_connection_objects.append(self) > > This is simple & effective, and it relies on both intentional resurrection > and __del__ getting called repeatedly. I don't claim there's no other way > to write it, just that there's *been* no problem doing this for a millennium > . > > Note that MAL spontaneously sketched similar examples, although I can't say > whether he's actually done stuff like this. Not exactly this, but similar things in the weak reference implementation of mxProxy. The idea came from a different area: the C implementation of Python uses free lists a lot and these are basically implementations of the same idiom: save an allocated resource for reviving it at some later point. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From nascheme at enme.ucalgary.ca Mon Mar 6 01:27:54 2000 From: nascheme at enme.ucalgary.ca (nascheme at enme.ucalgary.ca) Date: Sun, 5 Mar 2000 17:27:54 -0700 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: <000001bf857a$60b45ac0$c6a0143f@tim>; from tim_one@email.msn.com on Fri, Mar 03, 2000 at 08:38:43PM -0500 References: <200003031650.LAA21647@eric.cnri.reston.va.us> <000001bf857a$60b45ac0$c6a0143f@tim> Message-ID: <20000305172754.A14998@acs.ucalgary.ca> On Fri, Mar 03, 2000 at 08:38:43PM -0500, Tim Peters wrote: > So here's what I'd consider doing: explicit is better than implicit, and in > the face of ambiguity refuse the temptation to guess. I like Marc's suggestion. Here is my proposal: Allow classes to have a new method, __cleanup__ or whatever you want to call it. When tp_clear is called for an instance, it checks for this method. If it exists, call it, otherwise delete the container objects from the instance's dictionary. When collecting cycles, call tp_clear for instances first. Its simple and allows the programmer to cleanly break cycles if they insist on creating them and using __del__ methods. Neil From tim_one at email.msn.com Mon Mar 6 08:13:21 2000 From: tim_one at email.msn.com (Tim Peters) Date: Mon, 6 Mar 2000 02:13:21 -0500 Subject: [Python-Dev] breaking list.append() In-Reply-To: <38C0612B.7C92F8C4@prescod.net> Message-ID: <000401bf873b$745f8320$ea2d153f@tim> [Paul Prescod] > ... > Could someone remind in one sentence what the downside is for treating > this as a warning condition as Java does with its deprecated features? Simply the lack of anything to build on: Python has no sort of runtime warning system now, and nobody has volunteered to create one. If you do , remember that stdout & stderr may go to the bit bucket in a GUI app. The bit about dropping the "L" suffix on longs seems unwarnable-about in any case (short of warning every time anyone uses long()). remember-that-you-asked-for-the-problems-not-for-solutions-ly y'rs - tim From tim_one at email.msn.com Mon Mar 6 08:33:49 2000 From: tim_one at email.msn.com (Tim Peters) Date: Mon, 6 Mar 2000 02:33:49 -0500 Subject: [Python-Dev] Design question: call __del__ only after successful __init__? In-Reply-To: <38C2C073.CD51688@lemburg.com> Message-ID: <000701bf873e$5032eca0$ea2d153f@tim> [M.-A. Lemburg, on the resurrection/multiple-__del__ "idiom"] > ... > The idea came from a different area: the C implementation > of Python uses free lists a lot and these are basically > implementations of the same idiom: save an allocated > resource for reviving it at some later point. Excellent analogy! Thanks. Now that you phrased it in this clarifying way, I recall that very much the same point was raised in the papers that resulted in the creation of guardians in Scheme. I don't know that anyone is actually using Python __del__ this way today (I am not), but you reminded me why I thought it was natural at one time . generally-__del__-aversive-now-except-in-c++-where-destructors-are- guaranteed-to-be-called-when-you-except-them-to-be-ly y'rs - tim From tim_one at email.msn.com Mon Mar 6 09:12:06 2000 From: tim_one at email.msn.com (Tim Peters) Date: Mon, 6 Mar 2000 03:12:06 -0500 Subject: [Python-Dev] return statements in lambda In-Reply-To: <006f01bf8686$391ced80$34aab5d4@hagrid> Message-ID: <000901bf8743$a9f61aa0$ea2d153f@tim> [/F] > maybe adding an (optional but encouraged) "return" > to lambda would be an improvement? > > lambda x: x + 10 > > vs. > > lambda x: return x + 10 > > or is this just more confusing... opinions? It was an odd complaint to begin with, since Lisp-heads aren't used to using "return" anyway. More of a symptom of taking a shallow syntactic approach to a new (to them) language. For non-Lisp heads, I think it's more confusing in the end, blurring the distinction between stmts and expressions ("the body of a lambda must be an expression" ... "ok, i lied, unless it's a 'return' stmt). If Guido had it to do over again, I vote he rejects the original patch . Short of that, would have been better if the lambda arglist required parens, and if the body were required to be a single return stmt (that would sure end the "lambda x: print x" FAQ -- few would *expect* "return print x" to work!). hindsight-is-great-ly y'rs - tim From tim_one at email.msn.com Mon Mar 6 10:09:45 2000 From: tim_one at email.msn.com (Tim Peters) Date: Mon, 6 Mar 2000 04:09:45 -0500 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? In-Reply-To: <200003051346.IAA05539@eric.cnri.reston.va.us> Message-ID: <000b01bf874b$b6fe9da0$ea2d153f@tim> [Guido] > I'm beginning to believe that handing cycles with finalizers to the > user is better than calling __del__ with a different meaning, You won't be sorry: Python has the chance to be the first language that's both useful and sane here! > and I tentatively withdraw my proposal to change the rules for when > __del__is called (even when __init__ fails; I haven't had any complaints > about that either). Well, everyone liked the parenthetical half of that proposal, although Jack's example did point out a real surprise with it. > There seem to be two competing suggestions for solutions: (1) call > some kind of __cleanup__ (Marc-Andre) or tp_clean (Greg) method of the > object; (2) Tim's proposal of an interface to ask the garbage > collector for a trash cycle with a finalizer (or for an object with a > finalizer in a trash cycle?). Or a maximal strongly-connected component, or *something* -- unsure. > Somehow Tim's version looks less helpful to me, because it *seems* > that whoever gets to handle the cycle (the main code of the program?) > isn't necessarily responsible for creating it (some library you didn't > even know was used under the covers of some other library you called). Yes, to me too. This is the Scheme "guardian" idea in a crippled form (Scheme supports as many distinct guardians as the programmer cares to create), and even in its full-blown form it supplies "a perfectly general mechanism with no policy whatsoever". Greg convinced me (although I haven't admitted this yet ) that "no policy whatsoever" is un-Pythonic too. *Some* policy is helpful, so I won't be pushing the guardian idea any more (although see immediately below for an immediate backstep on that ). > ... > What keeps nagging me though is what to do when there's a finalizer > but no cleanup method. I guess the trash cycle remains alive. Is > this acceptable? (I guess so, because we've given the programmer a > way to resolve the trash: provide a cleanup method.) BDW considers it better to leak than to risk doing a wrong thing, and I agree wholeheartedly with that. GC is one place you want to have a "100% language". This is where something like a guardian can remain useful: while leaking is OK because you've given them an easy & principled alternative, leaking without giving them a clear way to *know* about it is not OK. If gc pushes the leaked stuff off to the side, the gc module should (say) supply an entry point that returns all the leaked stuff in a list. Then users can *know* they're leaking, know how badly they're leaking, and examine exactly the objects that are leaking. Then they've got the info they need to repair their program (or at least track down the 3rd-party module that's leaking). As with a guardian, they *could* also build a reclamation scheme on top of it, but that would no longer be the main (or even an encouraged) thrust. > If we detect individual cycles (the current algorithm doesn't do that > yet, though it seems easy enough to do another scan), could we > special-case cycles with only one finalizer and no cleaner-upper? > (I'm tempted to call the finalizer because it seems little harm can be > done -- but then of course there's the problem of the finalizer being > called again when the refcount really goes to zero. :-( ) "Better safe than sorry" is my immediate view on this -- you can't know that the finalizer won't resurrect the cycle, and "finalizer called iff refcount hits 0" is a wonderfully simple & predictable rule. That's worth a lot to preserve, unless & until it proves to be a disaster in practice. As to the details of cleanup, I haven't succeeded in making the time to understand all the proposals. But I've done my primary job here if I've harassed everyone into not repeating the same mistakes all previous languages have made <0.9 wink>. > ... > I wish we didn't have to special-case finalizers on class instances > (since each dealloc function is potentially a combination of a > finalizer and a deallocation routine), but the truth is that they > *are* special -- __del__ has no responsibility for deallocating > memory, only for deallocating external resources (such as temp files). And the problem is that __del__ can do anything whatsoever than can be expressed in Python, so there's not a chance in hell of outguessing it. > ... > Another practical consideration is that now there are cycles of the form > > <=> > > which suggests that we should make function objects traceable. Also, > modules can cross-reference, so module objects should be made > traceable. I don't think that this will grow the sets of traced > objects by too much (since the dicts involved are already traced, and > a typical program has way fewer functions and modules than it has > class instances). On the other hand, we may also have to trace > (un)bound method objects, and these may be tricky because they are > allocated and deallocated at high rates (once per typical method > call). This relates to what I was trying to get at with my response to your gc implementation sketch: mark-&-sweep needs to chase *everything*, so the set of chased types is maximal from the start. Adding chased types to the "indirectly infer what's unreachable via accounting for internal refcounts within the transitive closure" scheme can end up touching nearly as much as a full M-&-S pass per invocation. I don't know where the break-even point is, but the more stuff you chase in the latter scheme the less often you want to run it. About high rates, so long as a doubly-linked list allows efficient removal of stuff that dies via refcount exhaustion, you won't actually *chase* many bound method objects (i.e., they'll usually go away by themselves). Note in passing that bound method objects often showed up in cycles in IDLE, although you usually managed to break those in other ways. > Back to the drawing board... Good! That means you're making real progress . glad-someone-is-ly y'rs - tim From mal at lemburg.com Mon Mar 6 11:01:31 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 06 Mar 2000 11:01:31 +0100 Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? References: <200003031650.LAA21647@eric.cnri.reston.va.us> <000001bf857a$60b45ac0$c6a0143f@tim> <20000305172754.A14998@acs.ucalgary.ca> Message-ID: <38C381FB.E222D6E4@lemburg.com> nascheme at enme.ucalgary.ca wrote: > > On Fri, Mar 03, 2000 at 08:38:43PM -0500, Tim Peters wrote: > > So here's what I'd consider doing: explicit is better than implicit, and in > > the face of ambiguity refuse the temptation to guess. > > I like Marc's suggestion. Here is my proposal: > > Allow classes to have a new method, __cleanup__ or whatever you > want to call it. When tp_clear is called for an instance, it > checks for this method. If it exists, call it, otherwise delete > the container objects from the instance's dictionary. When > collecting cycles, call tp_clear for instances first. > > Its simple and allows the programmer to cleanly break cycles if > they insist on creating them and using __del__ methods. Right :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Mar 6 12:57:29 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 06 Mar 2000 12:57:29 +0100 Subject: [Python-Dev] Unicode character property methods Message-ID: <38C39D29.A29CE67F@lemburg.com> As you may have noticed, the Unicode objects provide new methods .islower(), .isupper() and .istitle(). Finn Bock mentioned that Java also provides .isdigit() and .isspace(). Question: should Unicode also provide these character property methods: .isdigit(), .isnumeric(), .isdecimal() and .isspace() ? Plus maybe .digit(), .numeric() and .decimal() for the corresponding decoding ? Similar APIs are already available through the unicodedata module, but could easily be moved to the Unicode object (they cause the builtin interpreter to grow a bit in size due to the new mapping tables). BTW, string.atoi et al. are currently not mapped to string methods... should they be ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Mon Mar 6 14:29:04 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 06 Mar 2000 08:29:04 -0500 Subject: [Python-Dev] Unicode character property methods In-Reply-To: Your message of "Mon, 06 Mar 2000 12:57:29 +0100." <38C39D29.A29CE67F@lemburg.com> References: <38C39D29.A29CE67F@lemburg.com> Message-ID: <200003061329.IAA09529@eric.cnri.reston.va.us> > As you may have noticed, the Unicode objects provide > new methods .islower(), .isupper() and .istitle(). Finn Bock > mentioned that Java also provides .isdigit() and .isspace(). > > Question: should Unicode also provide these character > property methods: .isdigit(), .isnumeric(), .isdecimal() > and .isspace() ? Plus maybe .digit(), .numeric() and > .decimal() for the corresponding decoding ? What would be the difference between isdigit, isnumeric, isdecimal? I'd say don't do more than Java. I don't understand what the "corresponding decoding" refers to. What would "3".decimal() return? > Similar APIs are already available through the unicodedata > module, but could easily be moved to the Unicode object > (they cause the builtin interpreter to grow a bit in size > due to the new mapping tables). > > BTW, string.atoi et al. are currently not mapped to > string methods... should they be ? They are mapped to int() c.s. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Mon Mar 6 16:09:55 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 6 Mar 2000 10:09:55 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: <200003051204.HAA05367@eric.cnri.reston.va.us> References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> <14529.55983.263225.691427@weyr.cnri.reston.va.us> <200003051204.HAA05367@eric.cnri.reston.va.us> Message-ID: <14531.51779.650532.881626@weyr.cnri.reston.va.us> Guido van Rossum writes: > - You could put it all in ConfigParser.py but with new classnames. > (Not sure though, since the ConfigParser class, which is really a > kind of weird variant, will be assumed to be the main class because > its name is that of the module.) The ConfigParser class could be clearly marked as deprecated both in the source/docstring and in the documentation. But the class itself should not be used in any way. > - Variants on the syntax could be given through some kind of option > system rather than through subclassing -- they should be combinable > independently. Som possible options (maybe I'm going overboard here) > could be: Yes, you are going overboard. It should contain exactly what's right for .ini files, and that's it. There are really three aspects to the beast: reading, using, and writing. I think there should be a class which does the right thing for using the informatin in the file, and reading & writing can be handled through functions or helper classes. That separates the parsing issues from the use issues, and alternate syntaxes will be easy enough to implement by subclassing the helper or writing a new function. An "editable" version that allows loading & saving without throwing away comments, ordering, etc. would require a largely separate implementation of all three aspects (or at least the reader and writer). > (Well maybe the whole substitution thing should really be done through > a subclass -- it's too weird for normal use.) That and the ad hoc syntax are my biggest beefs with ConfigParser. But it can easily be added by a subclass as long as the method to override is clearly specified in the documenation (it should only require one!). -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake at acm.org Mon Mar 6 18:47:44 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 6 Mar 2000 12:47:44 -0500 (EST) Subject: [Python-Dev] PyBufferProcs Message-ID: <14531.61248.941076.803617@weyr.cnri.reston.va.us> While working on the documentation, I've noticed a naming inconsistency regarding PyBufferProcs; it's peers are all named Py*Methods (PySequenceMethods, PyNumberMethods, etc.). I'd like to propose that a synonym, PyBufferMethods, be made for PyBufferProcs, and use that in the core implementations and the documentation. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From jeremy at cnri.reston.va.us Mon Mar 6 20:28:12 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Mon, 6 Mar 2000 14:28:12 -0500 (EST) Subject: [Python-Dev] example checkers based on compiler package Message-ID: <14532.1740.90292.440395@goon.cnri.reston.va.us> There was some discussion on python-dev over the weekend about generating warnings, and Moshe Zadke posted a selfnanny that warned about methods that didn't have self as the first argument. I think these kinds of warnings are useful, and I'd like to see a more general framework for them built are Python abstract syntax originally from P2C. Ideally, they would be available as command line tools and integrated into GUIs like IDLE in some useful way. I've included a couple of quick examples I coded up last night based on the compiler package (recently re-factored) that is resident in python/nondist/src/Compiler. The analysis on the one that checks for name errors is a bit of a mess, but the overall structure seems right. I'm hoping to collect a few more examples of checkers and generalize from them to develop a framework for checking for errors and reporting them. Jeremy ------------ checkself.py ------------ """Check for methods that do not have self as the first argument""" from compiler import parseFile, walk, ast, misc class Warning: def __init__(self, filename, klass, method, lineno, msg): self.filename = filename self.klass = klass self.method = method self.lineno = lineno self.msg = msg _template = "%(filename)s:%(lineno)s %(klass)s.%(method)s: %(msg)s" def __str__(self): return self._template % self.__dict__ class NoArgsWarning(Warning): super_init = Warning.__init__ def __init__(self, filename, klass, method, lineno): self.super_init(filename, klass, method, lineno, "no arguments") class NotSelfWarning(Warning): super_init = Warning.__init__ def __init__(self, filename, klass, method, lineno, argname): self.super_init(filename, klass, method, lineno, "self slot is named %s" % argname) class CheckSelf: def __init__(self, filename): self.filename = filename self.warnings = [] self.scope = misc.Stack() def inClass(self): if self.scope: return isinstance(self.scope.top(), ast.Class) return 0 def visitClass(self, klass): self.scope.push(klass) self.visit(klass.code) self.scope.pop() return 1 def visitFunction(self, func): if self.inClass(): classname = self.scope.top().name if len(func.argnames) == 0: w = NoArgsWarning(self.filename, classname, func.name, func.lineno) self.warnings.append(w) elif func.argnames[0] != "self": w = NotSelfWarning(self.filename, classname, func.name, func.lineno, func.argnames[0]) self.warnings.append(w) self.scope.push(func) self.visit(func.code) self.scope.pop() return 1 def check(filename): global p, check p = parseFile(filename) check = CheckSelf(filename) walk(p, check) for w in check.warnings: print w if __name__ == "__main__": import sys # XXX need to do real arg processing check(sys.argv[1]) ------------ badself.py ------------ def foo(): return 12 class Foo: def __init__(): pass def foo(self, foo): pass def bar(this, that): def baz(this=that): return this return baz def bar(): class Quux: def __init__(self): self.sum = 1 def quam(x, y): self.sum = self.sum + (x * y) return Quux() ------------ checknames.py ------------ """Check for NameErrors""" from compiler import parseFile, walk from compiler.misc import Stack, Set import __builtin__ from UserDict import UserDict class Warning: def __init__(self, filename, funcname, lineno): self.filename = filename self.funcname = funcname self.lineno = lineno def __str__(self): return self._template % self.__dict__ class UndefinedLocal(Warning): super_init = Warning.__init__ def __init__(self, filename, funcname, lineno, name): self.super_init(filename, funcname, lineno) self.name = name _template = "%(filename)s:%(lineno)s %(funcname)s undefined local %(name)s" class NameError(UndefinedLocal): _template = "%(filename)s:%(lineno)s %(funcname)s undefined name %(name)s" class NameSet(UserDict): """Track names and the line numbers where they are referenced""" def __init__(self): self.data = self.names = {} def add(self, name, lineno): l = self.names.get(name, []) l.append(lineno) self.names[name] = l class CheckNames: def __init__(self, filename): self.filename = filename self.warnings = [] self.scope = Stack() self.gUse = NameSet() self.gDef = NameSet() # _locals is the stack of local namespaces # locals is the top of the stack self._locals = Stack() self.lUse = None self.lDef = None self.lGlobals = None # var declared global # holds scope,def,use,global triples for later analysis self.todo = [] def enterNamespace(self, node): ## print node.name self.scope.push(node) self.lUse = use = NameSet() self.lDef = _def = NameSet() self.lGlobals = gbl = NameSet() self._locals.push((use, _def, gbl)) def exitNamespace(self): ## print self.todo.append((self.scope.top(), self.lDef, self.lUse, self.lGlobals)) self.scope.pop() self._locals.pop() if self._locals: self.lUse, self.lDef, self.lGlobals = self._locals.top() else: self.lUse = self.lDef = self.lGlobals = None def warn(self, warning, funcname, lineno, *args): args = (self.filename, funcname, lineno) + args self.warnings.append(apply(warning, args)) def defName(self, name, lineno, local=1): ## print "defName(%s, %s, local=%s)" % (name, lineno, local) if self.lUse is None: self.gDef.add(name, lineno) elif local == 0: self.gDef.add(name, lineno) self.lGlobals.add(name, lineno) else: self.lDef.add(name, lineno) def useName(self, name, lineno, local=1): ## print "useName(%s, %s, local=%s)" % (name, lineno, local) if self.lUse is None: self.gUse.add(name, lineno) elif local == 0: self.gUse.add(name, lineno) self.lUse.add(name, lineno) else: self.lUse.add(name, lineno) def check(self): for s, d, u, g in self.todo: self._check(s, d, u, g, self.gDef) # XXX then check the globals def _check(self, scope, _def, use, gbl, globals): # check for NameError # a name is defined iff it is in def.keys() # a name is global iff it is in gdefs.keys() gdefs = UserDict() gdefs.update(globals) gdefs.update(__builtin__.__dict__) defs = UserDict() defs.update(gdefs) defs.update(_def) errors = Set() for name in use.keys(): if not defs.has_key(name): firstuse = use[name][0] self.warn(NameError, scope.name, firstuse, name) errors.add(name) # check for UndefinedLocalNameError # order == use & def sorted by lineno # elements are lineno, flag, name # flag = 0 if use, flag = 1 if def order = [] for name, lines in use.items(): if gdefs.has_key(name) and not _def.has_key(name): # this is a global ref, we can skip it continue for lineno in lines: order.append(lineno, 0, name) for name, lines in _def.items(): for lineno in lines: order.append(lineno, 1, name) order.sort() # ready contains names that have been defined or warned about ready = Set() for lineno, flag, name in order: if flag == 0: # use if not ready.has_elt(name) and not errors.has_elt(name): self.warn(UndefinedLocal, scope.name, lineno, name) ready.add(name) # don't warn again else: ready.add(name) # below are visitor methods def visitFunction(self, node, noname=0): for expr in node.defaults: self.visit(expr) if not noname: self.defName(node.name, node.lineno) self.enterNamespace(node) for name in node.argnames: self.defName(name, node.lineno) self.visit(node.code) self.exitNamespace() return 1 def visitLambda(self, node): return self.visitFunction(node, noname=1) def visitClass(self, node): for expr in node.bases: self.visit(expr) self.defName(node.name, node.lineno) self.enterNamespace(node) self.visit(node.code) self.exitNamespace() return 1 def visitName(self, node): self.useName(node.name, node.lineno) def visitGlobal(self, node): for name in node.names: self.defName(name, node.lineno, local=0) def visitImport(self, node): for name in node.names: self.defName(name, node.lineno) visitFrom = visitImport def visitAssName(self, node): self.defName(node.name, node.lineno) def check(filename): global p, checker p = parseFile(filename) checker = CheckNames(filename) walk(p, checker) checker.check() for w in checker.warnings: print w if __name__ == "__main__": import sys # XXX need to do real arg processing check(sys.argv[1]) ------------ badnames.py ------------ # XXX can we detect race conditions on accesses to global variables? # probably can (conservatively) by noting variables _created_ by # global decls in funcs import string import time def foo(x): return x + y def foo2(x): return x + z a = 4 def foo3(x): a, b = x, a def bar(x): z = x global z def bar2(x): f = string.strip a = f(x) import string return string.lower(a) def baz(x, y): return x + y + z def outer(x): def inner(y): return x + y return inner From gstein at lyra.org Mon Mar 6 22:09:33 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 6 Mar 2000 13:09:33 -0800 (PST) Subject: [Python-Dev] PyBufferProcs In-Reply-To: <14531.61248.941076.803617@weyr.cnri.reston.va.us> Message-ID: On Mon, 6 Mar 2000, Fred L. Drake, Jr. wrote: > While working on the documentation, I've noticed a naming > inconsistency regarding PyBufferProcs; it's peers are all named > Py*Methods (PySequenceMethods, PyNumberMethods, etc.). > I'd like to propose that a synonym, PyBufferMethods, be made for > PyBufferProcs, and use that in the core implementations and the > documentation. +0 Although.. I might say that it should be renamed, and a synonym (#define or typedef?) be provided for the old name. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Mon Mar 6 23:04:14 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 06 Mar 2000 23:04:14 +0100 Subject: [Python-Dev] Unicode character property methods References: <38C39D29.A29CE67F@lemburg.com> <200003061329.IAA09529@eric.cnri.reston.va.us> Message-ID: <38C42B5E.42801755@lemburg.com> Guido van Rossum wrote: > > > As you may have noticed, the Unicode objects provide > > new methods .islower(), .isupper() and .istitle(). Finn Bock > > mentioned that Java also provides .isdigit() and .isspace(). > > > > Question: should Unicode also provide these character > > property methods: .isdigit(), .isnumeric(), .isdecimal() > > and .isspace() ? Plus maybe .digit(), .numeric() and > > .decimal() for the corresponding decoding ? > > What would be the difference between isdigit, isnumeric, isdecimal? > I'd say don't do more than Java. I don't understand what the > "corresponding decoding" refers to. What would "3".decimal() return? These originate in the Unicode database; see ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html Here are the descriptions: """ 6 Decimal digit value normative This is a numeric field. If the character has the decimal digit property, as specified in Chapter 4 of the Unicode Standard, the value of that digit is represented with an integer value in this field 7 Digit value normative This is a numeric field. If the character represents a digit, not necessarily a decimal digit, the value is here. This covers digits which do not form decimal radix forms, such as the compatibility superscript digits 8 Numeric value normative This is a numeric field. If the character has the numeric property, as specified in Chapter 4 of the Unicode Standard, the value of that character is represented with an integer or rational number in this field. This includes fractions as, e.g., "1/5" for U+2155 VULGAR FRACTION ONE FIFTH Also included are numerical values for compatibility characters such as circled numbers. u"3".decimal() would return 3. u"\u2155". Some more examples from the unicodedata module (which makes all fields of the database available in Python): >>> unicodedata.decimal(u"3") 3 >>> unicodedata.decimal(u"?") 2 >>> unicodedata.digit(u"?") 2 >>> unicodedata.numeric(u"?") 2.0 >>> unicodedata.numeric(u"\u2155") 0.2 >>> unicodedata.numeric(u'\u215b') 0.125 > > Similar APIs are already available through the unicodedata > > module, but could easily be moved to the Unicode object > > (they cause the builtin interpreter to grow a bit in size > > due to the new mapping tables). > > > > BTW, string.atoi et al. are currently not mapped to > > string methods... should they be ? > > They are mapped to int() c.s. Hmm, I just noticed that int() et friends don't like Unicode... shouldn't they use the "t" parser marker instead of requiring a string or tp_int compatible type ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Tue Mar 7 00:12:33 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 06 Mar 2000 18:12:33 -0500 Subject: [Python-Dev] Unicode character property methods In-Reply-To: Your message of "Mon, 06 Mar 2000 23:04:14 +0100." <38C42B5E.42801755@lemburg.com> References: <38C39D29.A29CE67F@lemburg.com> <200003061329.IAA09529@eric.cnri.reston.va.us> <38C42B5E.42801755@lemburg.com> Message-ID: <200003062312.SAA11697@eric.cnri.reston.va.us> [MAL] > > > As you may have noticed, the Unicode objects provide > > > new methods .islower(), .isupper() and .istitle(). Finn Bock > > > mentioned that Java also provides .isdigit() and .isspace(). > > > > > > Question: should Unicode also provide these character > > > property methods: .isdigit(), .isnumeric(), .isdecimal() > > > and .isspace() ? Plus maybe .digit(), .numeric() and > > > .decimal() for the corresponding decoding ? [Guido] > > What would be the difference between isdigit, isnumeric, isdecimal? > > I'd say don't do more than Java. I don't understand what the > > "corresponding decoding" refers to. What would "3".decimal() return? [MAL] > These originate in the Unicode database; see > > ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html > > Here are the descriptions: > > """ > 6 > Decimal digit value > normative > This is a numeric field. If the > character has the decimal digit > property, as specified in Chapter > 4 of the Unicode Standard, the > value of that digit is represented > with an integer value in this field > 7 > Digit value > normative > This is a numeric field. If the > character represents a digit, not > necessarily a decimal digit, the > value is here. This covers digits > which do not form decimal radix > forms, such as the compatibility > superscript digits > 8 > Numeric value > normative > This is a numeric field. If the > character has the numeric > property, as specified in Chapter > 4 of the Unicode Standard, the > value of that character is > represented with an integer or > rational number in this field. This > includes fractions as, e.g., "1/5" for > U+2155 VULGAR FRACTION > ONE FIFTH Also included are > numerical values for compatibility > characters such as circled > numbers. > > u"3".decimal() would return 3. u"\u2155". > > Some more examples from the unicodedata module (which makes > all fields of the database available in Python): > > >>> unicodedata.decimal(u"3") > 3 > >>> unicodedata.decimal(u"?") > 2 > >>> unicodedata.digit(u"?") > 2 > >>> unicodedata.numeric(u"?") > 2.0 > >>> unicodedata.numeric(u"\u2155") > 0.2 > >>> unicodedata.numeric(u'\u215b') > 0.125 Hm, very Unicode centric. Probably best left out of the general string methods. Isspace() seems useful, and an isdigit() that is only true for ASCII '0' - '9' also makes sense. What about "123".isdigit()? What does Java say? Or do these only apply to single chars there? I think "123".isdigit() should be true if "abc".islower() is true. > > > Similar APIs are already available through the unicodedata > > > module, but could easily be moved to the Unicode object > > > (they cause the builtin interpreter to grow a bit in size > > > due to the new mapping tables). > > > > > > BTW, string.atoi et al. are currently not mapped to > > > string methods... should they be ? > > > > They are mapped to int() c.s. > > Hmm, I just noticed that int() et friends don't like > Unicode... shouldn't they use the "t" parser marker > instead of requiring a string or tp_int compatible > type ? Good catch. Go ahead. --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez at math.huji.ac.il Tue Mar 7 06:25:43 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 7 Mar 2000 07:25:43 +0200 (IST) Subject: [Python-Dev] Re: example checkers based on compiler package In-Reply-To: <14532.1740.90292.440395@goon.cnri.reston.va.us> Message-ID: On Mon, 6 Mar 2000, Jeremy Hylton wrote: > I think these kinds of warnings are useful, and I'd like to see a more > general framework for them built are Python abstract syntax originally > from P2C. Ideally, they would be available as command line tools and > integrated into GUIs like IDLE in some useful way. Yes! Guido already suggested we have a standard API to them. One thing I suggested was that the abstract API include not only the input (one form or another of an AST), but the output: so IDE's wouldn't have to parse strings, but get a warning class. Something like a: An output of a warning can be a subclass of GeneralWarning, and should implemented the following methods: 1. line-no() -- returns an integer 2. columns() -- returns either a pair of integers, or None 3. message() -- returns a string containing a message 4. __str__() -- comes for free if inheriting GeneralWarning, and formats the warning message. > I've included a couple of quick examples I coded up last night based > on the compiler package (recently re-factored) that is resident in > python/nondist/src/Compiler. The analysis on the one that checks for > name errors is a bit of a mess, but the overall structure seems right. One thing I had trouble with is that in my implementation of selfnanny, I used Python's stack for recursion while you used an explicit stack. It's probably because of the visitor pattern, which is just another argument for co-routines and generators. > I'm hoping to collect a few more examples of checkers and generalize > from them to develop a framework for checking for errors and reporting > them. Cool! Brainstorming: what kind of warnings would people find useful? In selfnanny, I wanted to include checking for assigment to self, and checking for "possible use before definition of local variables" sounds good. Another check could be a CP4E "checking that no two identifiers differ only by case". I might code up a few if I have the time... What I'd really want (but it sounds really hard) is a framework for partial ASTs: warning people as they write code. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From mwh21 at cam.ac.uk Tue Mar 7 09:31:23 2000 From: mwh21 at cam.ac.uk (Michael Hudson) Date: 07 Mar 2000 08:31:23 +0000 Subject: [Python-Dev] Re: [Compiler-sig] Re: example checkers based on compiler package In-Reply-To: Moshe Zadka's message of "Tue, 7 Mar 2000 07:25:43 +0200 (IST)" References: Message-ID: Moshe Zadka writes: > On Mon, 6 Mar 2000, Jeremy Hylton wrote: > > > I think these kinds of warnings are useful, and I'd like to see a more > > general framework for them built are Python abstract syntax originally > > from P2C. Ideally, they would be available as command line tools and > > integrated into GUIs like IDLE in some useful way. > > Yes! Guido already suggested we have a standard API to them. One thing > I suggested was that the abstract API include not only the input (one form > or another of an AST), but the output: so IDE's wouldn't have to parse > strings, but get a warning class. That would be seriously cool. > Something like a: > > An output of a warning can be a subclass of GeneralWarning, and should > implemented the following methods: > > 1. line-no() -- returns an integer > 2. columns() -- returns either a pair of integers, or None > 3. message() -- returns a string containing a message > 4. __str__() -- comes for free if inheriting GeneralWarning, > and formats the warning message. Wouldn't it make sense to include function/class name here too? A checker is likely to now, and it would save reparsing to find it out. [little snip] > > I'm hoping to collect a few more examples of checkers and generalize > > from them to develop a framework for checking for errors and reporting > > them. > > Cool! > Brainstorming: what kind of warnings would people find useful? In > selfnanny, I wanted to include checking for assigment to self, and > checking for "possible use before definition of local variables" sounds > good. Another check could be a CP4E "checking that no two identifiers > differ only by case". I might code up a few if I have the time... Is there stuff in the current Compiler code to do control flow analysis? You'd need that to check for use before definition in meaningful cases, and also if you ever want to do any optimisation... > What I'd really want (but it sounds really hard) is a framework for > partial ASTs: warning people as they write code. I agree (on both points). Cheers, M. -- very few people approach me in real life and insist on proving they are drooling idiots. -- Erik Naggum, comp.lang.lisp From mal at lemburg.com Tue Mar 7 10:14:25 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 07 Mar 2000 10:14:25 +0100 Subject: [Python-Dev] Unicode character property methods References: <38C39D29.A29CE67F@lemburg.com> <200003061329.IAA09529@eric.cnri.reston.va.us> <38C42B5E.42801755@lemburg.com> <200003062312.SAA11697@eric.cnri.reston.va.us> Message-ID: <38C4C871.F47E17A3@lemburg.com> Guido van Rossum wrote: > [MAL about adding .isdecimal(), .isdigit() and .isnumeric()] > > Some more examples from the unicodedata module (which makes > > all fields of the database available in Python): > > > > >>> unicodedata.decimal(u"3") > > 3 > > >>> unicodedata.decimal(u"?") > > 2 > > >>> unicodedata.digit(u"?") > > 2 > > >>> unicodedata.numeric(u"?") > > 2.0 > > >>> unicodedata.numeric(u"\u2155") > > 0.2 > > >>> unicodedata.numeric(u'\u215b') > > 0.125 > > Hm, very Unicode centric. Probably best left out of the general > string methods. Isspace() seems useful, and an isdigit() that is only > true for ASCII '0' - '9' also makes sense. Well, how about having all three on Unicode objects and only .isdigit() on string objects ? > What about "123".isdigit()? What does Java say? Or do these only > apply to single chars there? I think "123".isdigit() should be true > if "abc".islower() is true. In the current uPython implementation u"123".isdigit() is true; same for the other two methods. > > > > Similar APIs are already available through the unicodedata > > > > module, but could easily be moved to the Unicode object > > > > (they cause the builtin interpreter to grow a bit in size > > > > due to the new mapping tables). > > > > > > > > BTW, string.atoi et al. are currently not mapped to > > > > string methods... should they be ? > > > > > > They are mapped to int() c.s. > > > > Hmm, I just noticed that int() et friends don't like > > Unicode... shouldn't they use the "t" parser marker > > instead of requiring a string or tp_int compatible > > type ? > > Good catch. Go ahead. Done. float(), int() and long() now accept charbuf compatible objects as argument. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Tue Mar 7 10:23:35 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 07 Mar 2000 10:23:35 +0100 Subject: [Python-Dev] Adding Unicode methods to string objects Message-ID: <38C4CA97.5D0AA9D@lemburg.com> Before starting to code away, I would like to know which of the new Unicode methods should also be available on string objects. Here are the currently available methods: Unicode objects string objects ------------------------------------ capitalize capitalize center count count encode endswith endswith expandtabs find find index index isdecimal isdigit islower isnumeric isspace istitle isupper join join ljust lower lower lstrip lstrip replace replace rfind rfind rindex rindex rjust rstrip rstrip split split splitlines startswith startswith strip strip swapcase swapcase title title translate translate (*) upper upper zfill (*) The two hvae slightly different implementations, e.g. deletions are handled differently. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik at pythonware.com Tue Mar 7 12:54:56 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 7 Mar 2000 12:54:56 +0100 Subject: [Python-Dev] Adding Unicode methods to string objects References: <38C4CA97.5D0AA9D@lemburg.com> Message-ID: <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com> > Unicode objects string objects > expandtabs yes. I'm pretty sure there's "expandtabs" code in the strop module. maybe barry missed it? > center > ljust > rjust probably. the implementation is trivial, and ljust/rjust are somewhat useful, so you might as well add them all (just cut and paste from the unicode class). what about rguido and lguido, btw? > zfill no. From guido at python.org Tue Mar 7 14:52:00 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 07 Mar 2000 08:52:00 -0500 Subject: [Python-Dev] finalization again Message-ID: <200003071352.IAA13571@eric.cnri.reston.va.us> Warning: long message. If you're not interested in reading all this, please skip to "Conclusion" at the end. At Tim's recommendation I had a look at what section 12.6 of the Java language spec says about finalizers. The stuff there is sure seductive for language designers... Have a look at te diagram at http://java.sun.com/docs/books/jls/html/12.doc.html#48746. In all its (seeming) complexity, it helped me understand some of the issues of finalization better. Rather than the complex 8-state state machine that it appears to be, think of it as a simple 3x3 table. The three rows represent the categories reachable, finalizer-reachable (abbreviated in the diagram as f-reachable), and unreachable. These categories correspond directly to categories of objects that the Schemenauer-Tiedemann cycle-reclamation scheme deals with: after moving all the reachable objects to the second list (first the roots and then the objects reachable from the roots), the first list is left with the unreachable and finalizer-reachable objects. If we want to distinguish between unreachable and finalizer-reachable at this point, a straightforward application of the same algorithm will work well: Create a third list (this will contain the finalizer-reachable objects). Start by filling it with all the objects from the first list (which contains the potential garbage at this point) that have a finalizer. We can look for objects that have __del__ or __clean__ or for which tp_clean(CARE_EXEC)==true, it doesn't matter here.(*) Then walk through the third list, following each object's references, and move all referenced objects that are still in the first list to the third list. Now, we have: List 1: truly unreachable objects. These have no finalizers and can be discarded right away. List 2: truly reachable objects. (Roots and objects reachable from roots.) Leave them alone. List 3: finalizer-reachable objects. This contains objects that are unreachable but have a finalizer, and objects that are only reachable through those. We now have to decide on a policy for invoking finalizers. Java suggests the following: Remember the "roots" of the third list -- the nodes that were moved there directly from the first list because they have a finalizer. These objects are marked *finalizable* (a category corresponding to the second *column* of the Java diagram). The Java spec allows the Java garbage collector to call all of these finalizers in any order -- even simultaneously in separate threads. Java never allows an object to go back from the finalizable to the unfinalized state (there are no arrows pointing left in the diagram). The first finalizer that is called could make its object reachable again (up arrow), thereby possibly making other finalizable objects reachable too. But this does not cancel their scheduled finalization! The conclusion is that Java can sometimes call finalization on unreachable objects -- but only if those objects have gone through a phase in their life where they were unreachable or at least finalizer-unreachable. I agree that this is the best that Java can do: if there are cycles containing multiple objects with finalizers, there is no way (short of asking the programmer(s)) to decide which object to finalize first. We could pick one at random, run its finalizer, and start garbage collection all over -- if the finalizer doesn't resurrect anything, this will give us the same set of unreachable objects, from which we could pick the next finalizable object, and so on. That looks very inefficient, might not terminate (the same object could repeatedly show up as the candidate for finalization), and it's still arbitrary: the programmer(s) still can't predict which finalizer in a cycle with multiple finalizers will be called first. Assuming the recommended characteristics of finalizers (brief and robust), it won't make much difference if we call all finalizers (of the now-finalizeable objects) "without looking back". Sure, some objects may find themselves in a perfectly reachable position with their finalizer called -- but they did go through a "near-death experience". I don't find this objectionable, and I don't see how Java could possibly do better for cycles with multiple finalizers. Now let's look again at the rule that an object's finalizer will be called at most once automatically by the garbage collector. The transitions between the colums of the Java diagram enforce this: the columns are labeled from left to right with unfinalized, finalizable, and finalized, and there are no transition arrows pointing left. (In my description above, already finalized objects are considered not to have a finalizer.) I think this rule makes a lot of sense given Java's multi-threaded garbage collection: the invocation of finalizers could run concurreltly with another garbage collection, and we don't want this to find some of the same finalizable objects and call their finalizers again! We could mark them with a "finalization in progress" flag only while their finalizer is running, but in a cycle with multiple finalizers it seems we should keep this flag set until *all* finalizers for objects in the cycle have run. But we don't actually know exactly what the cycles are: all we know is "these objects are involved in trash cycles". More detailed knowledge would require yet another sweep, plus a more hairy two-dimensional data structure (a list of separate cycles). And for what? as soon as we run finalizers from two separate cycles, those cycles could be merged again (e.g. the first finalizer could resurrect its cycle, and the second one could link to it). Now we have a pool of objects that are marked "finalization in progress" until all their finalizations terminate. For an incremental concurrent garbage collector, this seems a pain, since it may continue to find new finalizable objects and add them to the pile. Java takes the logical conclusion: the "finalization in progress" flag is never cleared -- and renamed to "finalized". Conclusion ---------- Are the Java rules complex? Yes. Are there better rules possible? I'm not so sure, given the requirement of allowing concurrent incremental garbage collection algorithms that haven't even been invented yet. (Plus the implied requirement that finalizers in trash cycles should be invoked.) Are the Java rules difficult for the user? Only for users who think they can trick finalizers into doing things for them that they were not designed to do. I would think the following guidelines should do nicely for the rest of us: 1. Avoid finalizers if you can; use them only to release *external* (e.g. OS) resources. 2. Write your finalizer as robust as you can, with as little use of other objects as you can. 3. Your only get one chance. Use it. Unlike Scheme guardians or the proposed __cleanup__ mechanism, you don't have to know whether your object is involved in a cycle -- your finalizer will still be called. I am reconsidering to use the __del__ method as the finalizer. As a compromise to those who want their __del__ to run whenever the reference count reaches zero, the finalized flag can be cleared explicitly. I am considering to use the following implementation: after retrieving the __del__ method, but before calling it, self.__del__ is set to None (better, self.__dict__['__del__'] = None, to avoid confusing __setattr__ hooks). The object call remove self.__del__ to clear the finalized flag. I think I'll use the same mechanism to prevent __del__ from being called upon a failed initialization. Final note: the semantics "__del__ is called whenever the reference count reaches zero" cannot be defended in the light of a migration to different forms of garbage collection (e.g. JPython). There may not be a reference count. --Guido van Rossum (home page: http://www.python.org/~guido/) ____ (*) Footnote: there's one complication: to ask a Python class instance if it has a finalizer, we have to use PyObject_Getattr(obj, ...). If the object's class has a __getattr__ hook, this can invoke arbitrary Python code -- even if the answer to the question is "no"! This can make the object reachable again (in the Java diagram, arrows pointing up or up and right). We could either use instance_getattr1(), which avoids the __getattr__ hook, or mark all class instances as finalizable until proven innocent. From gward at cnri.reston.va.us Tue Mar 7 15:04:30 2000 From: gward at cnri.reston.va.us (Greg Ward) Date: Tue, 7 Mar 2000 09:04:30 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17 In-Reply-To: <200003051204.HAA05367@eric.cnri.reston.va.us>; from guido@python.org on Sun, Mar 05, 2000 at 07:04:56AM -0500 References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> <14529.55983.263225.691427@weyr.cnri.reston.va.us> <200003051204.HAA05367@eric.cnri.reston.va.us> Message-ID: <20000307090430.A16948@cnri.reston.va.us> On 05 March 2000, Guido van Rossum said: > - Variants on the syntax could be given through some kind of option > system rather than through subclassing -- they should be combinable > independently. Som possible options (maybe I'm going overboard here) > could be: > > - comment characters: ('#', ';', both, others?) > - comments after variables allowed? on sections? > - variable characters: (':', '=', both, others?) > - quoting of values with "..." allowed? > - backslashes in "..." allowed? > - does backslash-newline mean a continuation? > - case sensitivity for section names (default on) > - case sensitivity for option names (default off) > - variables allowed before first section name? > - first section name? (default "main") > - character set allowed in section names > - character set allowed in variable names > - %(...) substitution? I agree with Fred that this level of flexibility is probably overkill for a config file parser; you don't want every application author who uses the module to have to explain his particular variant of the syntax. However, if you're interested in a class that *does* provide some of the above flexibility, I have written such a beast. It's currently used to parse the Distutils MANIFEST.in file, and I've considered using it for the mythical Distutils config files. (And it also gets heavy use in my day job.) It's really a class for reading a file in preparation for "text processing the Unix way", though: it doesn't say anything about syntax, it just worries about blank lines, comments, continuations, and a few other things. Here's the class docstring: class TextFile: """Provides a file-like object that takes care of all the things you commonly want to do when processing a text file that has some line-by-line syntax: strip comments (as long as "#" is your comment character), skip blank lines, join adjacent lines by escaping the newline (ie. backslash at end of line), strip leading and/or trailing whitespace, and collapse internal whitespace. All of these are optional and independently controllable. Provides a 'warn()' method so you can generate warning messages that report physical line number, even if the logical line in question spans multiple physical lines. Also provides 'unreadline()' for implementing line-at-a-time lookahead. Constructor is called as: TextFile (filename=None, file=None, **options) It bombs (RuntimeError) if both 'filename' and 'file' are None; 'filename' should be a string, and 'file' a file object (or something that provides 'readline()' and 'close()' methods). It is recommended that you supply at least 'filename', so that TextFile can include it in warning messages. If 'file' is not supplied, TextFile creates its own using the 'open()' builtin. The options are all boolean, and affect the value returned by 'readline()': strip_comments [default: true] strip from "#" to end-of-line, as well as any whitespace leading up to the "#" -- unless it is escaped by a backslash lstrip_ws [default: false] strip leading whitespace from each line before returning it rstrip_ws [default: true] strip trailing whitespace (including line terminator!) from each line before returning it skip_blanks [default: true} skip lines that are empty *after* stripping comments and whitespace. (If both lstrip_ws and rstrip_ws are true, then some lines may consist of solely whitespace: these will *not* be skipped, even if 'skip_blanks' is true.) join_lines [default: false] if a backslash is the last non-newline character on a line after stripping comments and whitespace, join the following line to it to form one "logical line"; if N consecutive lines end with a backslash, then N+1 physical lines will be joined to form one logical line. collapse_ws [default: false] after stripping comments and whitespace and joining physical lines into logical lines, all internal whitespace (strings of whitespace surrounded by non-whitespace characters, and not at the beginning or end of the logical line) will be collapsed to a single space. Note that since 'rstrip_ws' can strip the trailing newline, the semantics of 'readline()' must differ from those of the builtin file object's 'readline()' method! In particular, 'readline()' returns None for end-of-file: an empty string might just be a blank line (or an all-whitespace line), if 'rstrip_ws' is true but 'skip_blanks' is not.""" Interested in having something like this in the core? Adding more options is possible, but the code is already on the hairy side to support all of these. And I'm not a big fan of the subtle difference in semantics with file objects, but honestly couldn't think of a better way at the time. If you're interested, you can download it from http://www.mems-exchange.org/exchange/software/python/text_file/ or just use the version in the Distutils CVS tree. Greg From mal at lemburg.com Tue Mar 7 15:38:09 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 07 Mar 2000 15:38:09 +0100 Subject: [Python-Dev] Adding Unicode methods to string objects References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com> Message-ID: <38C51451.D38B21FE@lemburg.com> Fredrik Lundh wrote: > > > Unicode objects string objects > > expandtabs > > yes. > > I'm pretty sure there's "expandtabs" code in the > strop module. maybe barry missed it? > > > center > > ljust > > rjust > > probably. > > the implementation is trivial, and ljust/rjust are > somewhat useful, so you might as well add them > all (just cut and paste from the unicode class). > > what about rguido and lguido, btw? Ooops, forgot those, thanks :-) > > zfill > > no. Why not ? Since the string implementation had all of the above marked as TBD, I added all four. What about the other new methods (.isXXX() and .splitlines()) ? .isXXX() are mostly needed due to the extended character properties in Unicode. They would be new to the string object world. .splitlines() is Unicode aware and also treats CR/LF combinations across platforms: S.splitlines([maxsplit]]) -> list of strings Return a list of the lines in S, breaking at line boundaries. If maxsplit is given, at most maxsplit are done. Line breaks are not included in the resulting list. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Tue Mar 7 16:38:18 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 07 Mar 2000 10:38:18 -0500 Subject: [Python-Dev] Adding Unicode methods to string objects In-Reply-To: Your message of "Tue, 07 Mar 2000 15:38:09 +0100." <38C51451.D38B21FE@lemburg.com> References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com> <38C51451.D38B21FE@lemburg.com> Message-ID: <200003071538.KAA13977@eric.cnri.reston.va.us> > > > zfill > > > > no. > > Why not ? Zfill is (or ought to be) deprecated. It stems from times before we had things like "%08d" % x and no longer serves a useful purpose. I doubt anyone would miss it. (Of course, now /F will claim that PIL will break in 27 places because of this. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one at email.msn.com Tue Mar 7 18:07:40 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 7 Mar 2000 12:07:40 -0500 Subject: [Python-Dev] finalization again In-Reply-To: <200003071352.IAA13571@eric.cnri.reston.va.us> Message-ID: <000701bf8857$a56ed660$a72d153f@tim> [Guido] > ... > Conclusion > ---------- > > Are the Java rules complex? Yes. Are there better rules possible? I'm > not so sure, given the requirement of allowing concurrent incremental > garbage collection algorithms that haven't even been invented > yet. Guy Steele worked his ass off on Java's rules. He had as much real-world experience with implementing GC as anyone, via his long & deep Lisp implementation background (both SW & HW), and indeed invented several key techniques in high-performance GC. But he had no background in GC with user-defined finalizers -- and it shows! > (Plus the implied requirement that finalizers in trash cycles > should be invoked.) Are the Java rules difficult for the user? Only > for users who think they can trick finalizers into doing things for > them that they were not designed to do. This is so implementation-centric it's hard to know what to say <0.5 wink>. The Java rules weren't designed to do much of anything except guarantee that Java (1) would eventually reclaim all unreachable objects, and (2) wouldn't expose dangling pointers to user finalizers, or chase any itself. Whatever *useful* finalizer semantics may remain are those that just happened to survive. > ... > Unlike Scheme guardians or the proposed __cleanup__ mechanism, you > don't have to know whether your object is involved in a cycle -- your > finalizer will still be called. This is like saying a user doesn't have to know whether the new drug prescribed for them by their doctor has potentially fatal side effects -- they'll be forced to take it regardless . > ... > Final note: the semantics "__del__ is called whenever the reference > count reaches zero" cannot be defended in the light of a migration to > different forms of garbage collection (e.g. JPython). There may not > be a reference count. 1. I don't know why JPython doesn't execute __del__ methods at all now, but have to suspect that the Java rules imply an implementation so grossly inefficient in the presence of __del__ that Barry simply doesn't want to endure the speed complaints. The Java spec itself urges implementations to special-case the snot out of classes that don't override the default do-nothing finalizer, for "go fast" reasons too. 2. The "refcount reaches 0" rule in CPython is merely a wonderfully concrete way to get across the idea of "destruction occurs in an order consistent with a topological sort of the points-to graph". The latter is explicit in the BDW collector, which has no refcounts; the topsort concept is applicable and thoroughly natural in all languages; refcounts in CPython give an exploitable hint about *when* collection will occur, but add no purely semantic constraint beyond the topsort requirement (they neatly *imply* the topsort requirement). There is no topsort in the presence of cycles, so cycles create problems in all languages. The same "throw 'em back at the user" approach makes just as much sense from the topsort view as the RC view; it doesn't rely on RC at all. stop-the-insanity-ly y'rs - tim From guido at python.org Tue Mar 7 18:33:31 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 07 Mar 2000 12:33:31 -0500 Subject: [Python-Dev] finalization again In-Reply-To: Your message of "Tue, 07 Mar 2000 12:07:40 EST." <000701bf8857$a56ed660$a72d153f@tim> References: <000701bf8857$a56ed660$a72d153f@tim> Message-ID: <200003071733.MAA14926@eric.cnri.reston.va.us> [Tim tells Guido again that he finds the Java rules bad, slinging some mud at Guy Steel, but without explaining what the problem with them is, and then asks:] > 1. I don't know why JPython doesn't execute __del__ methods at all now, but > have to suspect that the Java rules imply an implementation so grossly > inefficient in the presence of __del__ that Barry simply doesn't want to > endure the speed complaints. The Java spec itself urges implementations to > special-case the snot out of classes that don't override the default > do-nothing finalizer, for "go fast" reasons too. Something like that, yes, although it was Jim Hugunin. I have a feeling it has to do with the dynamic of __del__ -- this would imply that *all* Python class instances would appear to Java to have a finalizer -- just in most cases it would do a failing lookup of __del__ and bail out quickly. Maybe some source code or class analysis looking for a __del__ could fix this, at the cost of not allowing one to patch __del__ into an existing class after instances have already been created. I don't find that breach of dynamicism a big deal -- e.g. CPython keeps copies of __getattr__, __setattr__ and __delattr__ in the class for similar reasons. > 2. The "refcount reaches 0" rule in CPython is merely a wonderfully concrete > way to get across the idea of "destruction occurs in an order consistent > with a topological sort of the points-to graph". The latter is explicit in > the BDW collector, which has no refcounts; the topsort concept is applicable > and thoroughly natural in all languages; refcounts in CPython give an > exploitable hint about *when* collection will occur, but add no purely > semantic constraint beyond the topsort requirement (they neatly *imply* the > topsort requirement). There is no topsort in the presence of cycles, so > cycles create problems in all languages. The same "throw 'em back at the > user" approach makes just as much sense from the topsort view as the RC > view; it doesn't rely on RC at all. Indeed. I propose to throw it back at the user by calling __del__. The typical user defines __del__ because they want to close a file, say goodbye nicely on a socket connection, or delete a temp file. That sort of thing. This is what finalizers are *for*. As an author of this kind of finalizer, I don't see why I need to know whether I'm involved in a cycle or not. I want my finalizer called when my object goes away, and I don't want my object kept alive by unreachable cycles. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Tue Mar 7 18:39:15 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 07 Mar 2000 18:39:15 +0100 Subject: [Python-Dev] Adding Unicode methods to string objects References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com> <38C51451.D38B21FE@lemburg.com> Message-ID: <38C53EC3.5292ECF@lemburg.com> I've ported most of the Unicode methods to strings now. Here's the new table: Unicode objects string objects ------------------------------------------------------------ capitalize capitalize center center count count encode endswith endswith expandtabs expandtabs find find index index isdecimal isdigit isdigit islower islower isnumeric isspace isspace istitle istitle isupper isupper join join ljust ljust lower lower lstrip lstrip replace replace rfind rfind rindex rindex rjust rjust rstrip rstrip split split splitlines splitlines startswith startswith strip strip swapcase swapcase title title translate translate upper upper zfill zfill I don't think that .isdecimal() and .isnumeric() are needed for strings since most of the added mappings refer to Unicode char points. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Tue Mar 7 18:42:53 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 07 Mar 2000 18:42:53 +0100 Subject: [Python-Dev] Adding Unicode methods to string objects References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com> <38C51451.D38B21FE@lemburg.com> <200003071538.KAA13977@eric.cnri.reston.va.us> Message-ID: <38C53F9D.44C3A0F3@lemburg.com> Guido van Rossum wrote: > > > > > zfill > > > > > > no. > > > > Why not ? > > Zfill is (or ought to be) deprecated. It stems from times before we > had things like "%08d" % x and no longer serves a useful purpose. > I doubt anyone would miss it. > > (Of course, now /F will claim that PIL will break in 27 places because > of this. :-) Ok, I'll remove it from both implementations again... (there was some email overlap). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From bwarsaw at cnri.reston.va.us Tue Mar 7 20:24:39 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 7 Mar 2000 14:24:39 -0500 (EST) Subject: [Python-Dev] finalization again References: <200003071352.IAA13571@eric.cnri.reston.va.us> <000701bf8857$a56ed660$a72d153f@tim> Message-ID: <14533.22391.447739.901802@anthem.cnri.reston.va.us> >>>>> "TP" == Tim Peters writes: TP> 1. I don't know why JPython doesn't execute __del__ methods at TP> all now, but have to suspect that the Java rules imply an TP> implementation so grossly inefficient in the presence of TP> __del__ that Barry simply doesn't want to endure the speed TP> complaints. Actually, it was JimH that discovered this performance gotcha. The problem is that if you want to support __del__, you've got to take the finalize() hit for every instance (i.e. PyInstance object) and it's just not worth it. I just realized that it would be relatively trivial to add a subclass of PyInstance differing only in that it has a finalize() method which would invoke __del__(). Now when the class gets defined, the __del__() would be mined and cached and we'd look at that cache when creating an instance. If there's a function there, we create a PyFinalizableInstance, otherwise we create a PyInstance. The cache means you couldn't dynamically add a __del__ later, but I don't think that's a big deal. It wouldn't be hard to look up the __del__ every time, but that'd be a hit for every instance creation (as opposed to class creation), so again, it's probably not worth it. I just did a quick and dirty hack and it seems at first blush to work. I'm sure there's something I'm missing :). For those of you who don't care about JPython, you can skip the rest. Okay, first the Python script to exercise this, then the PyFinalizableInstance.java file, and then the diffs to PyClass.java. JPython-devers, is it worth adding this? -------------------- snip snip --------------------del.py class B: def __del__(self): print 'In my __del__' b = B() del b from java.lang import System System.gc() -------------------- snip snip --------------------PyFinalizableInstance.java // Copyright ? Corporation for National Research Initiatives // These are just like normal instances, except that their classes included // a definition for __del__(), i.e. Python's finalizer. These two instance // types have to be separated due to Java performance issues. package org.python.core; public class PyFinalizableInstance extends PyInstance { public PyFinalizableInstance(PyClass iclass) { super(iclass); } // __del__ method is invoked upon object finalization. protected void finalize() { __class__.__del__.__call__(this); } } -------------------- snip snip -------------------- Index: PyClass.java =================================================================== RCS file: /projects/cvsroot/jpython/dist/org/python/core/PyClass.java,v retrieving revision 2.8 diff -c -r2.8 PyClass.java *** PyClass.java 1999/10/04 20:44:28 2.8 --- PyClass.java 2000/03/07 19:02:29 *************** *** 21,27 **** // Store these methods for performance optimization // These are only used by PyInstance ! PyObject __getattr__, __setattr__, __delattr__, __tojava__; // Holds the classes for which this is a proxy // Only used when subclassing from a Java class --- 21,27 ---- // Store these methods for performance optimization // These are only used by PyInstance ! PyObject __getattr__, __setattr__, __delattr__, __tojava__, __del__; // Holds the classes for which this is a proxy // Only used when subclassing from a Java class *************** *** 111,116 **** --- 111,117 ---- __setattr__ = lookup("__setattr__", false); __delattr__ = lookup("__delattr__", false); __tojava__ = lookup("__tojava__", false); + __del__ = lookup("__del__", false); } protected void findModule(PyObject dict) { *************** *** 182,188 **** } public PyObject __call__(PyObject[] args, String[] keywords) { ! PyInstance inst = new PyInstance(this); inst.__init__(args, keywords); return inst; } --- 183,194 ---- } public PyObject __call__(PyObject[] args, String[] keywords) { ! PyInstance inst; ! if (__del__ == null) ! inst = new PyInstance(this); ! else ! // the class defined an __del__ method ! inst = new PyFinalizableInstance(this); inst.__init__(args, keywords); return inst; } From bwarsaw at cnri.reston.va.us Tue Mar 7 20:35:44 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 7 Mar 2000 14:35:44 -0500 (EST) Subject: [Python-Dev] finalization again References: <000701bf8857$a56ed660$a72d153f@tim> <200003071733.MAA14926@eric.cnri.reston.va.us> Message-ID: <14533.23056.517661.633574@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> Maybe some source code or class analysis looking for a GvR> __del__ could fix this, at the cost of not allowing one to GvR> patch __del__ into an existing class after instances have GvR> already been created. I don't find that breach of dynamicism GvR> a big deal -- e.g. CPython keeps copies of __getattr__, GvR> __setattr__ and __delattr__ in the class for similar reasons. For those of you who enter the "Being Guido van Rossum" door like I just did, please keep in mind that it dumps you out not on the NJ Turnpike, but in the little ditch back behind CNRI. Stop by and say hi after you brush yourself off. -Barry From Tim_Peters at Dragonsys.com Tue Mar 7 23:30:16 2000 From: Tim_Peters at Dragonsys.com (Tim_Peters at Dragonsys.com) Date: Tue, 7 Mar 2000 17:30:16 -0500 Subject: [Python-Dev] finalization again Message-ID: <8525689B.007AB2BA.00@notes-mta.dragonsys.com> [Guido] > Tim tells Guido again that he finds the Java rules bad, slinging some > mud at Guy Steele, but without explaining what the problem with them > is ... Slinging mud? Let's back off here. You've read the Java spec and were impressed. That's fine -- it is impressive . But go on from there and see where it leads in practice. That Java's GC model did a masterful job but includes a finalization model users dislike is really just conventional wisdom in the Java world. My sketch of Guy Steele's involvement was an attempt to explain why both halves of that are valid. I didn't think "explaining the problem" was necessary, as it's been covered in depth multiple times in c.l.py threads, by Java programmers as well as by me. Searching the web for articles about this turns up many; the first one I hit is typical: http://www.quoininc.com/quoininc/Design_Java0197.html eventually concludes Consequently we recommend that [Java] programmers support but do not rely on finalization. That is, place all finalization semantics in finalize() methods, but call those methods explicitly and in the order required. The points below provide more detail. That's par for the Java course: advice to write finalizers to survive being called multiple times, call them explicitly, and do all you can to ensure that the "by magic" call is a nop. The lack of ordering rules in the language forces people to "do it by hand" (as the Java spec acknowledges: "It is straightforward to implement a Java class that will cause a set of finalizer-like methods to be invoked in a specified order for a set of objects when all the objects become unreachable. Defining such a class is left as an exercise for the reader." But from what I've seen, that exercise is beyond the imagination of most Java programmers! The perceived need for ordering is not.). It's fine that you want to restrict finalizers to "simple" cases; it's not so fine if the language can't ensure that simple cases are the only ones the user can write, & can neither detect & complain at runtime about cases it didn't intend to support. The Java spec is unhelpful here too: Therefore, we recommend that the design of finalize methods be kept simple and that they be programmed defensively, so that they will work in all cases. Mom and apple pie, but what does it mean, exactly? The spec realizes that you're going to be tempted to try things that won't work, but can't really explain what those are in terms simpler than the full set of implementation consequences. As a result, users hate it -- but don't take my word for that! If you look & don't find that Java's finalization rules are widely viewed as "a problem to be wormed around" by serious Java programmers, fine -- then you've got a much better search engine than mine . As for why I claim following topsort rules is very likely to work out better, they follow from the nature of the problem, and can be explained as such, independent of implementation details. See the Boehm reference for more about topsort. will-personally-use-python-regardless-ly y'rs - tim From guido at python.org Wed Mar 8 01:50:38 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 07 Mar 2000 19:50:38 -0500 Subject: [Python-Dev] finalization again In-Reply-To: Your message of "Tue, 07 Mar 2000 17:30:16 EST." <8525689B.007AB2BA.00@notes-mta.dragonsys.com> References: <8525689B.007AB2BA.00@notes-mta.dragonsys.com> Message-ID: <200003080050.TAA19264@eric.cnri.reston.va.us> > [Guido] > > Tim tells Guido again that he finds the Java rules bad, slinging some > > mud at Guy Steele, but without explaining what the problem with them > > is ... > > Slinging mud? Let's back off here. You've read the Java spec and were > impressed. That's fine -- it is impressive . But go on from > there and see where it leads in practice. That Java's GC model did a > masterful job but includes a finalization model users dislike is really > just conventional wisdom in the Java world. My sketch of Guy Steele's > involvement was an attempt to explain why both halves of that are valid. Granted. I can read Java code and sometimes I write some, but I'm not a Java programmer by any measure, and I wasn't aware that finalize() has a general bad rep. > I didn't think "explaining the problem" was necessary, as it's been > covered in depth multiple times in c.l.py threads, by Java programmers > as well as by me. Searching the web for articles about this turns up > many; the first one I hit is typical: > > http://www.quoininc.com/quoininc/Design_Java0197.html > > eventually concludes > > Consequently we recommend that [Java] programmers support but do > not rely on finalization. That is, place all finalization semantics > in finalize() methods, but call those methods explicitly and in the > order required. The points below provide more detail. > > That's par for the Java course: advice to write finalizers to survive > being called multiple times, call them explicitly, and do all you can > to ensure that the "by magic" call is a nop. It seems the authors make one big mistake: they recommend to call finalize() explicitly. This may be par for the Java course: the quality of the materials is often poor, and that has to be taken into account when certain features have gotten a bad rep. (These authors also go on at length about the problems of GC in a real-time situation -- attempts to use Java in sutations for which it is inappropriate are also par for the cours, inspired by all the hype.) Note that e.g. Bruce Eckel in "Thinking in Java" makes it clear that you should never call finalize() explicitly (except that you should always call super.fuinalize() in your finalize() method). (Bruce goes on at length explaining that there aren't a lot of things you should use finalize() for -- except to observe the garbage collector. :-) > The lack of ordering > rules in the language forces people to "do it by hand" (as the Java > spec acknowledges: "It is straightforward to implement a Java class > that will cause a set of finalizer-like methods to be invoked in a > specified order for a set of objects when all the objects become > unreachable. Defining such a class is left as an exercise for the > reader." But from what I've seen, that exercise is beyond the > imagination of most Java programmers! The perceived need for ordering > is not.). True, but note that Python won't have the ordering problem, at least not as long as we stick to reference counting as the primary means of GC. The ordering problem in Python will only happen when there are cycles, and there you really can't blame the poor GC design! > It's fine that you want to restrict finalizers to "simple" cases; it's > not so fine if the language can't ensure that simple cases are the only > ones the user can write, & can neither detect & complain at runtime > about cases it didn't intend to support. The Java spec is unhelpful > here too: > > Therefore, we recommend that the design of finalize methods be kept > simple and that they be programmed defensively, so that they will > work in all cases. > > Mom and apple pie, but what does it mean, exactly? The spec realizes > that you're going to be tempted to try things that won't work, but > can't really explain what those are in terms simpler than the full set > of implementation consequences. As a result, users hate it -- but > don't take my word for that! If you look & don't find that Java's > finalization rules are widely viewed as "a problem to be wormed around" > by serious Java programmers, fine -- then you've got a much better > search engine than mine . Hm. Of course programmers hate finalizers. They hate GC as well. But they hate even more not to have it (witness the relentless complaints about Python's "lack of GC" -- and Java's GC is often touted as one of the reasons for its superiority over C++). I think this stuff is just hard! (Otherwise why would we be here having this argument?) > As for why I claim following topsort rules is very likely to work out > better, they follow from the nature of the problem, and can be > explained as such, independent of implementation details. See the > Boehm reference for more about topsort. Maybe we have a disconnect? We *are* using topsort -- for non-cyclical data structures. Reference counting ensure that. Nothing in my design changes that. The issue at hand is what to do with *cyclical* data structures, where topsort doesn't help. Boehm, on http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html, says: "Cycles involving one or more finalizable objects are never finalized." The question remains, what to do with trash cycles? I find having a separate __cleanup__ protocol cumbersome. I think that the "finalizer only called once by magic" rule is reasonable. I believe that the ordering problems will be much less than in Java, because we use topsort whenever we can. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one at email.msn.com Wed Mar 8 07:25:56 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 8 Mar 2000 01:25:56 -0500 Subject: [Python-Dev] finalization again In-Reply-To: <200003080050.TAA19264@eric.cnri.reston.va.us> Message-ID: <001401bf88c7$29f2a320$452d153f@tim> [Guido] > Granted. I can read Java code and sometimes I write some, but I'm not > a Java programmer by any measure, and I wasn't aware that finalize() > has a general bad rep. It does, albeit often for bad reasons. 1. C++ programmers seeking to emulate techniques based on C++'s rigid specification of the order and timing of destruction of autos. 2. People pushing the limits (as in the URL I happened to post). 3. People trying to do anything . Java's finalization semantics are very weak, and s-l-o-w too (under most current implementations). Now I haven't used Java for real in about two years, and avoided finalizers completely when I did use it. I can't recall any essential use of __del__ I make in Python code, either. So what Python does here makes no personal difference to me. However, I frequently respond to complaints & questions on c.l.py, and don't want to get stuck trying to justify Java's uniquely baroque rules outside of comp.lang.java <0.9 wink>. >> [Tim, passes on the first relevant URL he finds: >> http://www.quoininc.com/quoininc/Design_Java0197.html] > It seems the authors make one big mistake: they recommend to call > finalize() explicitly. This may be par for the Java course: the > quality of the materials is often poor, and that has to be taken into > account when certain features have gotten a bad rep. Well, in the "The Java Programming Language", Gosling recommends to: a) Add a method called close(), that tolerates being called multiple times. b) Write a finalize() method whose body calls close(). People tended to do that at first, but used a bunch of names other than "close" too. I guess people eventually got weary of having two methods that did the same thing, so decided to just use the single name Java guaranteed would make sense. > (These authors also go on at length about the problems of GC in a real- > time situation -- attempts to use Java in sutations for which it is > inappropriate are also par for the course, inspired by all the hype.) I could have picked any number of other URLs, but don't regret picking this one: you can't judge a ship in smooth waters, and people will push *all* features beyond their original intents. Doing so exposes weaknesses. Besides, Sun won't come out & say Java is unsuitable for real-time, no matter how obvious it is . > Note that e.g. Bruce Eckel in "Thinking in Java" makes it clear that > you should never call finalize() explicitly (except that you should > always call super.fuinalize() in your finalize() method). You'll find lots of conflicting advice here, be it about Java or C++. Java may be unique, though, in the universality of the conclusion Bruce draws here: > (Bruce goes on at length explaining that there aren't a lot of things > you should use finalize() for -- except to observe the garbage collector. :-) Frankly, I think Java would be better off without finalizers. Python could do fine without __del__ too -- if you and I were the only users <0.6 wink>. [on Java's lack of ordering promises] > True, but note that Python won't have the ordering problem, at least > not as long as we stick to reference counting as the primary means of > GC. The ordering problem in Python will only happen when there are > cycles, and there you really can't blame the poor GC design! I cannot. Nor do I intend to. The cyclic ordering problem isn't GC's fault, it's the program's; but GC's *response* to it is entirely GC's responsibility. >> ... The Java spec is unhelpful here too: >> >> Therefore, we recommend that the design of finalize methods be kept >> simple and that they be programmed defensively, so that they will >> work in all cases. >> >> Mom and apple pie, but what does it mean, exactly? The spec realizes >> that you're going to be tempted to try things that won't work, but >> can't really explain what those are in terms simpler than the full set >> of implementation consequences. As a result, users hate it -- but >> don't take my word for that! If you look & don't find that Java's >> finalization rules are widely viewed as "a problem to be wormed around" >> by serious Java programmers, fine -- then you've got a much better >> search engine than mine . > Hm. Of course programmers hate finalizers. Oh no! C++ programmers *love* destructors! I mean it, they're absolutely gaga over them. I haven't detected signs that CPython programmers hate __del__ either, except at shutdown time. Regardless of language, they love them when they're predictable and work as expected, they hate them when they're unpredictable and confusing. C++ auto destructors are extremely predictable (e.g., after "{SomeClass a, b; ...}", b is destructed before a, and both destructions are guaranteed before leaving the block they're declared in, regardless of whether via return, exception, goto or falling off the end). CPython's __del__ is largely predictable (modulo shutdown, cycles, and sometimes exceptions). The unhappiness in the Java world comes from Java finalizers' unpredictability and consequent all-around uselessness in messy real life. > They hate GC as well. Yes, when it's unpredictable and confusing . > But they hate even more not to have it (witness the relentless > complaints about Python's "lack of GC" -- and Java's GC is often > touted as one of the reasons for its superiority over C++). Back when JimF & I were looking at gc, we may have talked each other into really believing that paying careful attention to RC issues leads to cleaner and more robust designs. In fact, I still believe that, and have never clamored for "real gc" in Python. Jim now may even be opposed to "real gc". But Jim and I and you all think a lot about the art of programming, and most users just don't have time or inclination for that -- the slowly changing nature of c.l.py is also clear evidence of this. I'm afraid this makes growing "real GC" a genuine necessity for Python's continued growth. It's not a *bad* thing in any case. Think of it as a marketing requirement <0.7 wink>. > I think this stuff is just hard! (Otherwise why would we be here > having this argument?) Honest to Guido, I think it's because you're sorely tempted to go down an un-Pythonic path here, and I'm fighting that. I said early on there are no thoroughly good answers (yes, it's hard), but that's nothing new for Python! We're having this argument solely because you're confusing Python with some other language . [a 2nd or 3rd plug for taking topsort seriously] > Maybe we have a disconnect? Not in the technical analysis, but in what conclusions to take from it. > We *are* using topsort -- for non-cyclical data structures. Reference > counting ensure that. Nothing in my design changes that. And it's great! Everyone understands the RC rules pretty quickly, lots of people like them a whole lot, and if it weren't for cyclic trash everything would be peachy. > The issue at hand is what to do with *cyclical* data structures, where > topsort doesn't help. Boehm, on > http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html, > says: "Cycles involving one or more finalizable objects are never > finalized." This is like some weird echo chamber, where the third time I shout something the first one comes back without any distortion at all . Yes, Boehm's first rule is "Do No Harm". It's a great rule. Python follows the same rule all over the place; e.g., when you see x = "4" + 2 you can't possibly know what was intended, so you refuse to guess: you would rather *kill* the program than make a blind guess! I see cycles with finalizers as much the same: it's plain wrong to guess when you can't possibly know what was intended. Because topsort is the only principled way to decide order of finalization, and they've *created* a situation where a topsort doesn't exist, what they're handing you is no less amibiguous than in trying to add a string to an int. This isn't the time to abandon topsort as inconvenient, it's the time to defend it as inviolate principle! The only throughly rational response is "you know, this doesn't make sense -- since I can't know what you want here, I refuse to pretend that I can". Since that's "the right" response everywhere else in Python, what the heck is so special about this case? It's like you decided Python *had* to allow adding strings to ints, and now we're going to argue about whether Perl, Awk or Tcl makes the best unprincipled guess . > The question remains, what to do with trash cycles? A trash cycle without a finalizer isn't a problem, right? In that case, topsort rules have no visible consquence so it doesn't matter in what order you merely reclaim the memory. If it has an object with a finalizer, though, at the very worst you can let it leak, and make the collection of leaked objects available for inspection. Even that much is a *huge* "improvement" over what they have today: most cycles won't have a finalizer and so will get reclaimed, and for the rest they'll finally have a simple way to identify exactly where the problem is, and a simple criterion for predicting when it will happen. If that's not "good enough", then without abandoning principle the user needs to have some way to reduce such a cycle *to* a topsort case themself. > I find having a separate __cleanup__ protocol cumbersome. Same here, but if you're not comfortable leaking, and you agree Python is not in the business of guesing in inherently ambiguous situations, maybe that's what it takes! MAL and GregS both gravitated to this kind of thing at once, and that's at least suggestive; and MAL has actually been using his approach. It's explicit, and that's Pythonic on the face of it. > I think that the "finalizer only called once by magic" rule is reasonable. If it weren't for its specific use in emulating Java's scheme, would you still be in favor of that? It's a little suspicious that it never came up before . > I believe that the ordering problems will be much less than in Java, because > we use topsort whenever we can. No argument here, except that I believe there's never sufficient reason to abandon topsort ordering. Note that BDW's adamant refusal to yield on this hasn't stopped "why doesn't Python use BDW?" from becoming a FAQ . a-case-where-i-expect-adhering-to-principle-is-more-pragmatic- in-the-end-ly y'rs - tim From tim_one at email.msn.com Wed Mar 8 08:48:24 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 8 Mar 2000 02:48:24 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? Message-ID: <001801bf88d2$af0037c0$452d153f@tim> Mike has a darned good point here. Anyone have a darned good answer ? -----Original Message----- From: python-list-admin at python.org [mailto:python-list-admin at python.org] On Behalf Of Mike Fletcher Sent: Tuesday, March 07, 2000 2:08 PM To: Python Listserv (E-mail) Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage all over the hard-disk, traffic rerouted through the bit-bucket, you aren't getting to work anytime soon Mrs. Programmer) and wondering why we have a FAQ instead of having the win32pipe stuff rolled into the os module to fix it. Is there some incompatibility? Is there a licensing problem? Ideas? Mike __________________________________ Mike C. Fletcher Designer, VR Plumber http://members.home.com/mcfletch -- http://www.python.org/mailman/listinfo/python-list From mal at lemburg.com Wed Mar 8 09:36:57 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 08 Mar 2000 09:36:57 +0100 Subject: [Python-Dev] finalization again References: <8525689B.007AB2BA.00@notes-mta.dragonsys.com> <200003080050.TAA19264@eric.cnri.reston.va.us> Message-ID: <38C61129.2F8C9E95@lemburg.com> > [Guido] > The question remains, what to do with trash cycles? I find having a > separate __cleanup__ protocol cumbersome. I think that the "finalizer > only called once by magic" rule is reasonable. I believe that the > ordering problems will be much less than in Java, because we use > topsort whenever we can. Note that the __cleanup__ protocol is intended to break cycles *before* calling the garbage collector. After those cycles are broken, ordering is not a problem anymore and because __cleanup__ can do its task on a per-object basis all magic is left in the hands of the programmer. The __cleanup__ protocol as I use it is designed to be called in situations where the system knows that all references into a cycle are about to be dropped (I typically use small cyclish object systems in my application, e.g. ones that create and reference namespaces which include a reference to the hosting object itself). In my application that is done by using mxProxies at places where I know these cyclic object subsystems are being referenced. In Python the same could be done whenever the interpreter knows that a certain object is about to be deleted, e.g. during shutdown (important for embedding Python in other applications such as Apache) or some other major subsystem finalization, e.g. unload of a module or killing of a thread (yes, I know these are nonos, but they could be useful, esp. the thread kill operation in multi-threaded servers). After __cleanup__ has done its thing, the finalizer can either choose to leave all remaining cycles in memory (and leak) or apply its own magic to complete the task. In any case, __del__ should be called when the refcount reaches 0. (I find it somewhat strange that people are argueing to keep external resources alive even though there is a chance of freeing them.) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Mar 8 09:46:14 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 08 Mar 2000 09:46:14 +0100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: <001801bf88d2$af0037c0$452d153f@tim> Message-ID: <38C61356.E0598DBF@lemburg.com> Tim Peters wrote: > > Mike has a darned good point here. Anyone have a darned good answer ? > > -----Original Message----- > From: python-list-admin at python.org [mailto:python-list-admin at python.org] > On Behalf Of Mike Fletcher > Sent: Tuesday, March 07, 2000 2:08 PM > To: Python Listserv (E-mail) > Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be > adopted? > > Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage > all over the hard-disk, traffic rerouted through the bit-bucket, you aren't > getting to work anytime soon Mrs. Programmer) and wondering why we have a > FAQ instead of having the win32pipe stuff rolled into the os module to fix > it. Is there some incompatibility? Is there a licensing problem? > > Ideas? I'd suggest moving the popen from the C modules into os.py as Python API and then applying all necessary magic to either use the win32pipe implementation (if available) or the native C one from the posix module in os.py. Unless, of course, the win32 stuff (or some of it) makes it into the core. I'm mostly interested in this for my platform.py module... BTW, is there any interest of moving it into the core ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Wed Mar 8 13:10:53 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 08 Mar 2000 07:10:53 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: Your message of "Wed, 08 Mar 2000 09:46:14 +0100." <38C61356.E0598DBF@lemburg.com> References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> Message-ID: <200003081210.HAA19931@eric.cnri.reston.va.us> > Tim Peters wrote: > > > > Mike has a darned good point here. Anyone have a darned good answer ? > > Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be > > adopted? > > > > Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage > > all over the hard-disk, traffic rerouted through the bit-bucket, you aren't > > getting to work anytime soon Mrs. Programmer) and wondering why we have a > > FAQ instead of having the win32pipe stuff rolled into the os module to fix > > it. Is there some incompatibility? Is there a licensing problem? MAL: > I'd suggest moving the popen from the C modules into os.py > as Python API and then applying all necessary magic to either > use the win32pipe implementation (if available) or the native > C one from the posix module in os.py. > > Unless, of course, the win32 stuff (or some of it) makes it into > the core. No concrete plans -- except that I think the registry access is supposed to go in. Haven't seen the code on patches at python.org yet though. > I'm mostly interested in this for my platform.py module... > BTW, is there any interest of moving it into the core ? "it" == platform.py? Little interest from me personally; I suppose it could go in Tools/scripts/... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Mar 8 15:06:53 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 08 Mar 2000 09:06:53 -0500 Subject: [Python-Dev] finalization again In-Reply-To: Your message of "Wed, 08 Mar 2000 01:25:56 EST." <001401bf88c7$29f2a320$452d153f@tim> References: <001401bf88c7$29f2a320$452d153f@tim> Message-ID: <200003081406.JAA20033@eric.cnri.reston.va.us> > A trash cycle without a finalizer isn't a problem, right? In that case, > topsort rules have no visible consquence so it doesn't matter in what order > you merely reclaim the memory. When we have a pile of garbage, we don't know whether it's all connected or whether it's lots of little cycles. So if we find [objects with -- I'm going to omit this] finalizers, we have to put those on a third list and put everything reachable from them on that list as well (the algorithm I described before). What's left on the first list then consists of finalizer-free garbage. We dispose of this garbage by clearing dicts and lists. Hopefully this makes the refcount of some of the finalizers go to zero -- those are finalized in the normal way. And now we have to deal with the inevitable: finalizers that are part of cycles. It makes sense to reduce the graph of objects to a graph of finalizers only. Example: A <=> b -> C <=> d A and C have finalizers. C is part of a cycle (C-d) that contains no other finalizers, but C is also reachable from A. A is part of a cycle (A-b) that keeps it alive. The interesting thing here is that if we only look at the finalizers, there are no cycles! If we reduce the graph to only finalizers (setting aside for now the problem of how to do that -- we may need to allocate more memory to hold the reduced greaph), we get: A -> C We can now finalize A (even though its refcount is nonzero!). And that's really all we can do! A could break its own cycle, thereby disposing of itself and b. It could also break C's cycle, disposing of C and d. It could do nothing. Or it could resurrect A, thereby resurrecting all of A, b, C, and d. This leads to (there's that weird echo again :-) Boehm's solution: Call A's finalizer and leave the rest to the next time the garbage collection runs. Note that we're now calling finalizers on objects with a non-zero refcount. At some point (probably as a result of finalizing A) its refcount will go to zero. We should not finalize it again -- this would serve no purpose. Possible solution: INCREF(A); A->__del__(); if (A->ob_refcnt == 1) A->__class__ = NULL; /* Make a finalizer-less */ DECREF(A); This avoids finalizing twice if the first finalization broke all cycles in which A is involved. But if it doesn't, A is still cyclical garbage with a finalizer! Even if it didn't resurrect itself. Instead of the code fragment above, we could mark A as "just finalized" and when it shows up at the head of the tree (of finalizers in cyclical trash) again on the next garbage collection, to discard it without calling the finalizer again (because this clearly means that it didn't resurrect itself -- at least not for a very long time). I would be happier if we could still have a rule that says that a finalizer is called only once by magic -- even if we have two forms of magic: refcount zero or root of the tree. Tim: I don't know if you object against this rule as a matter of principle (for the sake of finalizers that resurrect the object) or if your objection is really against the unordered calling of finalizers legitimized by Java's rules. I hope the latter, since I think it that this rule (__del__ called only once by magic) by itself is easy to understand and easy to deal with, and I believe it may be necessary to guarantee progress for the garbage collector. The problem is that the collector can't easily tell whether A has resurrected itself. Sure, if the refcount is 1 after the finalizer run, I know it didn't resurrect itself. But even if it's higher than before, that doesn't mean it's resurrected: it could have linked to itself. Without doing a full collection I can't tell the difference. If I wait until a full collection happens again naturally, and look at the "just finalized flag", I can't tell the difference between the case whereby the object resurrected itself but died again before the next collection, and the case where it was dead already. So I don't know how many times it was expecting the "last rites" to be performed, and the object can't know whether to expect them again or not. This seems worse than the only-once rule to me. Even if someone once found a good use for resurrecting inside __del__, against all recommendations, I don't mind breaking their code, if it's for a good cause. The Java rules aren't a good cause. But top-sorted finalizer calls seem a worthy cause. So now we get to discuss what to do with multi-finalizer cycles, like: A <=> b <=> C Here the reduced graph is: A <=> C About this case you say: > If it has an object with a finalizer, though, at the very worst you can let > it leak, and make the collection of leaked objects available for > inspection. Even that much is a *huge* "improvement" over what they have > today: most cycles won't have a finalizer and so will get reclaimed, and > for the rest they'll finally have a simple way to identify exactly where the > problem is, and a simple criterion for predicting when it will happen. If > that's not "good enough", then without abandoning principle the user needs > to have some way to reduce such a cycle *to* a topsort case themself. > > > I find having a separate __cleanup__ protocol cumbersome. > > Same here, but if you're not comfortable leaking, and you agree Python is > not in the business of guesing in inherently ambiguous situations, maybe > that's what it takes! MAL and GregS both gravitated to this kind of thing > at once, and that's at least suggestive; and MAL has actually been using his > approach. It's explicit, and that's Pythonic on the face of it. > > > I think that the "finalizer only called once by magic" rule is reasonable. > > If it weren't for its specific use in emulating Java's scheme, would you > still be in favor of that? It's a little suspicious that it never came up > before . Suspicious or not, it still comes up. I still like it. I still think that playing games with resurrection is evil. (Maybe my spiritual beliefs shine through here -- I'm a convinced atheist. :-) Anyway, once-only rule aside, we still need a protocol to deal with cyclical dependencies between finalizers. The __cleanup__ approach is one solution, but it also has a problem: we have a set of finalizers. Whose __cleanup__ do we call? Any? All? Suggestions? Note that I'd like some implementation freedom: I may not want to bother with the graph reduction algorithm at first (which seems very hairy) so I'd like to have the right to use the __cleanup__ API as soon as I see finalizers in cyclical trash. I don't mind disposing of finalizer-free cycles first, but once I have more than one finalizer left in the remaining cycles, I'd like the right not to reduce the graph for topsort reasons -- that algorithm seems hard. So we're back to the __cleanup__ design. Strawman proposal: for all finalizers in a trash cycle, call their __cleanup__ method, in arbitrary order. After all __cleanup__ calls are done, if the objects haven't all disposed of themselves, they are all garbage-collected without calling __del__. (This seems to require another garbage colelction cycle -- so perhaps there should also be a once-only rule for __cleanup__?) Separate question: what if there is no __cleanup__? This should probably be reported: "You have cycles with finalizers, buddy! What do you want to do about them?" This same warning could be given when there is a __cleanup__ but it doesn't break all cycles. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed Mar 8 14:34:06 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 08 Mar 2000 14:34:06 +0100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us> Message-ID: <38C656CE.B0ACFF35@lemburg.com> Guido van Rossum wrote: > > > Tim Peters wrote: > > > > > > Mike has a darned good point here. Anyone have a darned good answer ? > > > Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be > > > adopted? > > > > > > Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage > > > all over the hard-disk, traffic rerouted through the bit-bucket, you aren't > > > getting to work anytime soon Mrs. Programmer) and wondering why we have a > > > FAQ instead of having the win32pipe stuff rolled into the os module to fix > > > it. Is there some incompatibility? Is there a licensing problem? > > MAL: > > I'd suggest moving the popen from the C modules into os.py > > as Python API and then applying all necessary magic to either > > use the win32pipe implementation (if available) or the native > > C one from the posix module in os.py. > > > > Unless, of course, the win32 stuff (or some of it) makes it into > > the core. > > No concrete plans -- except that I think the registry access is > supposed to go in. Haven't seen the code on patches at python.org yet > though. Ok, what about the optional "use win32pipe if available" idea then ? > > I'm mostly interested in this for my platform.py module... > > BTW, is there any interest of moving it into the core ? > > "it" == platform.py? Right. > Little interest from me personally; I suppose it > could go in Tools/scripts/... Hmm, it wouldn't help much in there I guess... after all, it defines APIs which are to be queried by other scripts. The default action to print the platform information to stdout is just a useful addition. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Wed Mar 8 15:33:53 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 08 Mar 2000 09:33:53 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: Your message of "Wed, 08 Mar 2000 14:34:06 +0100." <38C656CE.B0ACFF35@lemburg.com> References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us> <38C656CE.B0ACFF35@lemburg.com> Message-ID: <200003081433.JAA20177@eric.cnri.reston.va.us> > > MAL: > > > I'd suggest moving the popen from the C modules into os.py > > > as Python API and then applying all necessary magic to either > > > use the win32pipe implementation (if available) or the native > > > C one from the posix module in os.py. > > > > > > Unless, of course, the win32 stuff (or some of it) makes it into > > > the core. [Guido] > > No concrete plans -- except that I think the registry access is > > supposed to go in. Haven't seen the code on patches at python.org yet > > though. > > Ok, what about the optional "use win32pipe if available" idea then ? Sorry, I meant please send me the patch! --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Wed Mar 8 15:59:46 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 8 Mar 2000 09:59:46 -0500 (EST) Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <200003081210.HAA19931@eric.cnri.reston.va.us> References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us> Message-ID: <14534.27362.139106.701784@weyr.cnri.reston.va.us> Guido van Rossum writes: > "it" == platform.py? Little interest from me personally; I suppose it > could go in Tools/scripts/... I think platform.py is pretty nifty, but I'm not entirely sure how it's expected to be used. Perhaps Marc-Andre could explain further the motivation behind the module? My biggest requirement is that it be accompanied by documentation. The coolness factor and shared use of hackerly knowledge would probably get *me* to put it in, but there are a lot of things about which I'll disagree with Guido just to hear his (well-considered) thoughts on the matter. ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mal at lemburg.com Wed Mar 8 18:37:43 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 08 Mar 2000 18:37:43 +0100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 ... code for thought. References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us> <38C656CE.B0ACFF35@lemburg.com> <200003081433.JAA20177@eric.cnri.reston.va.us> Message-ID: <38C68FE7.63943C5C@lemburg.com> Guido van Rossum wrote: > > > > MAL: > > > > I'd suggest moving the popen from the C modules into os.py > > > > as Python API and then applying all necessary magic to either > > > > use the win32pipe implementation (if available) or the native > > > > C one from the posix module in os.py. > > > > > > > > Unless, of course, the win32 stuff (or some of it) makes it into > > > > the core. > [Guido] > > > No concrete plans -- except that I think the registry access is > > > supposed to go in. Haven't seen the code on patches at python.org yet > > > though. > > > > Ok, what about the optional "use win32pipe if available" idea then ? > > Sorry, I meant please send me the patch! Here's the popen() interface I use in platform.py. It should serve well as basis for a os.popen patch... (don't have time to do it myself right now): class _popen: """ Fairly portable (alternative) popen implementation. This is mostly needed in case os.popen() is not available, or doesn't work as advertised, e.g. in Win9X GUI programs like PythonWin or IDLE. XXX Writing to the pipe is currently not supported. """ tmpfile = '' pipe = None bufsize = None mode = 'r' def __init__(self,cmd,mode='r',bufsize=None): if mode != 'r': raise ValueError,'popen()-emulation only support read mode' import tempfile self.tmpfile = tmpfile = tempfile.mktemp() os.system(cmd + ' > %s' % tmpfile) self.pipe = open(tmpfile,'rb') self.bufsize = bufsize self.mode = mode def read(self): return self.pipe.read() def readlines(self): if self.bufsize is not None: return self.pipe.readlines() def close(self, remove=os.unlink,error=os.error): if self.pipe: rc = self.pipe.close() else: rc = 255 if self.tmpfile: try: remove(self.tmpfile) except error: pass return rc # Alias __del__ = close def popen(cmd, mode='r', bufsize=None): """ Portable popen() interface. """ # Find a working popen implementation preferring win32pipe.popen # over os.popen over _popen popen = None if os.environ.get('OS','') == 'Windows_NT': # On NT win32pipe should work; on Win9x it hangs due to bugs # in the MS C lib (see MS KnowledgeBase article Q150956) try: import win32pipe except ImportError: pass else: popen = win32pipe.popen if popen is None: if hasattr(os,'popen'): popen = os.popen # Check whether it works... it doesn't in GUI programs # on Windows platforms if sys.platform == 'win32': # XXX Others too ? try: popen('') except os.error: popen = _popen else: popen = _popen if bufsize is None: return popen(cmd,mode) else: return popen(cmd,mode,bufsize) if __name__ == '__main__': print """ I confirm that, to the best of my knowledge and belief, this contribution is free of any claims of third parties under copyright, patent or other rights or interests ("claims"). To the extent that I have any such claims, I hereby grant to CNRI a nonexclusive, irrevocable, royalty-free, worldwide license to reproduce, distribute, perform and/or display publicly, prepare derivative versions, and otherwise use this contribution as part of the Python software and its related documentation, or any derivative versions thereof, at no cost to CNRI or its licensed users, and to authorize others to do so. I acknowledge that CNRI may, at its sole discretion, decide whether or not to incorporate this contribution in the Python software and its related documentation. I further grant CNRI permission to use my name and other identifying information provided to CNRI by me for use in connection with the Python software and its related documentation. """ -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Mar 8 18:44:59 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 08 Mar 2000 18:44:59 +0100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us> <14534.27362.139106.701784@weyr.cnri.reston.va.us> Message-ID: <38C6919B.EA3EE2E7@lemburg.com> "Fred L. Drake, Jr." wrote: > > Guido van Rossum writes: > > "it" == platform.py? Little interest from me personally; I suppose it > > could go in Tools/scripts/... > > I think platform.py is pretty nifty, but I'm not entirely sure how > it's expected to be used. Perhaps Marc-Andre could explain further > the motivation behind the module? It was first intended to provide a way to format a platform identifying file name for the mxCGIPython project and then quickly moved on to provide many different APIs to query platform specific information. architecture(executable='/usr/local/bin/python', bits='', linkage='') : Queries the given executable (defaults to the Python interpreter binary) for various architecture informations. Returns a tuple (bits,linkage) which contain information about the bit architecture and the linkage format used for the executable. Both values are returned as strings. Values that cannot be determined are returned as given by the parameter presets. If bits is given as '', the sizeof(long) is used as indicator for the supported pointer size. The function relies on the system's "file" command to do the actual work. This is available on most if not all Unix platforms. On some non-Unix platforms and then only if the executable points to the Python interpreter defaults from _default_architecture are used. dist(distname='', version='', id='') : Tries to determine the name of the OS distribution name The function first looks for a distribution release file in /etc and then reverts to _dist_try_harder() in case no suitable files are found. Returns a tuple distname,version,id which default to the args given as parameters. java_ver(release='', vendor='', vminfo=('', '', ''), osinfo=('', '', '')) : Version interface for JPython. Returns a tuple (release,vendor,vminfo,osinfo) with vminfo being a tuple (vm_name,vm_release,vm_vendor) and osinfo being a tuple (os_name,os_version,os_arch). Values which cannot be determined are set to the defaults given as parameters (which all default to ''). libc_ver(executable='/usr/local/bin/python', lib='', version='') : Tries to determine the libc version against which the file executable (defaults to the Python interpreter) is linked. Returns a tuple of strings (lib,version) which default to the given parameters in case the lookup fails. Note that the function has intimate knowledge of how different libc versions add symbols to the executable is probably only useable for executables compiled using gcc. The file is read and scanned in chunks of chunksize bytes. mac_ver(release='', versioninfo=('', '', ''), machine='') : Get MacOS version information and return it as tuple (release, versioninfo, machine) with versioninfo being a tuple (version, dev_stage, non_release_version). Entries which cannot be determined are set to ''. All tuple entries are strings. Thanks to Mark R. Levinson for mailing documentation links and code examples for this function. Documentation for the gestalt() API is available online at: http://www.rgaros.nl/gestalt/ machine() : Returns the machine type, e.g. 'i386' An empty string is returned if the value cannot be determined. node() : Returns the computer's network name (may not be fully qualified !) An empty string is returned if the value cannot be determined. platform(aliased=0, terse=0) : Returns a single string identifying the underlying platform with as much useful information as possible (but no more :). The output is intended to be human readable rather than machine parseable. It may look different on different platforms and this is intended. If "aliased" is true, the function will use aliases for various platforms that report system names which differ from their common names, e.g. SunOS will be reported as Solaris. The system_alias() function is used to implement this. Setting terse to true causes the function to return only the absolute minimum information needed to identify the platform. processor() : Returns the (true) processor name, e.g. 'amdk6' An empty string is returned if the value cannot be determined. Note that many platforms do not provide this information or simply return the same value as for machine(), e.g. NetBSD does this. release() : Returns the system's release, e.g. '2.2.0' or 'NT' An empty string is returned if the value cannot be determined. system() : Returns the system/OS name, e.g. 'Linux', 'Windows' or 'Java'. An empty string is returned if the value cannot be determined. system_alias(system, release, version) : Returns (system,release,version) aliased to common marketing names used for some systems. It also does some reordering of the information in some cases where it would otherwise cause confusion. uname() : Fairly portable uname interface. Returns a tuple of strings (system,node,release,version,machine,processor) identifying the underlying platform. Note that unlike the os.uname function this also returns possible processor information as additional tuple entry. Entries which cannot be determined are set to ''. version() : Returns the system's release version, e.g. '#3 on degas' An empty string is returned if the value cannot be determined. win32_ver(release='', version='', csd='', ptype='') : Get additional version information from the Windows Registry and return a tuple (version,csd,ptype) referring to version number, CSD level and OS type (multi/single processor). As a hint: ptype returns 'Uniprocessor Free' on single processor NT machines and 'Multiprocessor Free' on multi processor machines. The 'Free' refers to the OS version being free of debugging code. It could also state 'Checked' which means the OS version uses debugging code, i.e. code that checks arguments, ranges, etc. (Thomas Heller). Note: this functions only works if Mark Hammond's win32 package is installed and obviously only runs on Win32 compatible platforms. XXX Is there any way to find out the processor type on WinXX ? XXX Is win32 available on Windows CE ? Adapted from code posted by Karl Putland to comp.lang.python. > My biggest requirement is that it be accompanied by documentation. > The coolness factor and shared use of hackerly knowledge would > probably get *me* to put it in, but there are a lot of things about > which I'll disagree with Guido just to hear his (well-considered) > thoughts on the matter. ;) The module is doc-string documented (see above). This should server well as basis for the latex docs. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From DavidA at ActiveState.com Wed Mar 8 19:36:01 2000 From: DavidA at ActiveState.com (David Ascher) Date: Wed, 8 Mar 2000 10:36:01 -0800 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <200003081210.HAA19931@eric.cnri.reston.va.us> Message-ID: > "it" == platform.py? Little interest from me personally; I suppose it > could go in Tools/scripts/... FWIW, I think it belongs in the standard path. It allows one to do the equivalent of if os.platform == '...' but in a much more useful way. --david From mhammond at skippinet.com.au Wed Mar 8 22:36:12 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 9 Mar 2000 08:36:12 +1100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <200003081210.HAA19931@eric.cnri.reston.va.us> Message-ID: > No concrete plans -- except that I think the registry access is > supposed to go in. Haven't seen the code on patches at python.org yet > though. FYI, that is off with Trent who is supposed to be testing it on the Alpha. Re win32pipe - I responded to that post suggesting that we do with os.pipe and win32pipe what was done with os.path.abspath/win32api - optionally try to import the win32 specific module and use it. My only "concern" is that this then becomes more code for Guido to maintain in the core, even though Guido has expressed a desire to get out of the installers business. Assuming the longer term plan is for other people to put together installation packages, and that these people are free to redistribute win32api/win32pipe, Im wondering if it is worth bothering with? Mark. From trentm at ActiveState.com Wed Mar 8 15:42:06 2000 From: trentm at ActiveState.com (Trent Mick) Date: Wed, 8 Mar 2000 14:42:06 -0000 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <38C6919B.EA3EE2E7@lemburg.com> Message-ID: MAL: > architecture(executable='/usr/local/bin/python', bits='', > linkage='') : > > Values that cannot be determined are returned as given by the > parameter presets. If bits is given as '', the sizeof(long) is > used as indicator for the supported pointer size. Just a heads up, using sizeof(long) will not work on forthcoming WIN64 (LLP64 data model) to determine the supported pointer size. You would want to use the 'P' struct format specifier instead, I think (I am speaking in relative ignorance). However, the docs say that a PyInt is used to store 'P' specified value, which as a C long, will not hold a pointer on LLP64. Hmmmm. The keyword perhaps is "forthcoming". This is the code in question in platform.py: # Use the sizeof(long) as default number of bits if nothing # else is given as default. if not bits: import struct bits = str(struct.calcsize('l')*8) + 'bit' Guido: > > No concrete plans -- except that I think the registry access is > > supposed to go in. Haven't seen the code on patches at python.org yet > > though. > Mark Hammond: > FYI, that is off with Trent who is supposed to be testing it on the Alpha. My Alpha is in pieces right now! I will get to it soon. I will try it on Win64 as well, if I can. Trent Trent Mick trentm at activestate.com From guido at python.org Thu Mar 9 03:59:51 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 08 Mar 2000 21:59:51 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: Your message of "Thu, 09 Mar 2000 08:36:12 +1100." References: Message-ID: <200003090259.VAA20928@eric.cnri.reston.va.us> > My only "concern" is that this then becomes more code for Guido to maintain > in the core, even though Guido has expressed a desire to get out of the > installers business. Theoretically, it shouldn't need much maintenance. I'm more concerned that it will have different semantics than on Unix so that in practice you'd need to know about the platform anyway (apart from the fact that the installed commands are different, of course). > Assuming the longer term plan is for other people to put together > installation packages, and that these people are free to redistribute > win32api/win32pipe, Im wondering if it is worth bothering with? So that everybody could use os.popen() regardless of whether they're on Windows or Unix. --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond at skippinet.com.au Thu Mar 9 04:31:21 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 9 Mar 2000 14:31:21 +1100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <200003090259.VAA20928@eric.cnri.reston.va.us> Message-ID: [Me] > > Assuming the longer term plan is for other people to put together > > installation packages, and that these people are free to redistribute > > win32api/win32pipe, Im wondering if it is worth bothering with? [Guido] > So that everybody could use os.popen() regardless of whether they're > on Windows or Unix. Sure. But what I meant was "should win32pipe code move into the core, or should os.pipe() just auto-detect and redirect to win32pipe if installed?" I was suggesting that over the longer term, it may be reasonable to assume that win32pipe _will_ be installed, as everyone who releases installers for Python should include it :-) It could also be written in such a way that it prints a warning message when win32pipe doesnt exist, so in 99% of cases, it will answer the FAQ before they have had a chance to ask it :-) It also should be noted that the win32pipe support for popen on Windows 95/98 includes a small, dedicated .exe - this just adds to the maintenance burden. But it doesnt worry me at all what happens - I was just trying to save you work . Anyone is free to take win32pipe and move the relevant code into the core anytime they like, with my and Bill's blessing. It quite suits me that people have to download win32all to get this working, so I doubt I will get around to it any time soon :-) Mark. From tim_one at email.msn.com Thu Mar 9 04:52:58 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 8 Mar 2000 22:52:58 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: Message-ID: <000401bf897a$f5a7e620$0d2d153f@tim> I had another take on all this, which I'll now share since nobody seems inclined to fold in the Win32 popen: perhaps os.popen should not be supported at all under Windows! The current function is a mystery wrapped in an enigma -- sometimes it works, sometimes it doesn't, and I've never been able to outguess which one will obtain (there's more to it than just whether a console window is attached). If it's not reliable (it's not), and we can't document the conditions under which it can be used safely (I can't), Python shouldn't expose it. Failing that, the os.popen docs should caution it's "use at your own risk" under Windows, and that this is directly inherited from MS's popen implementation. From tim_one at email.msn.com Thu Mar 9 10:40:26 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 9 Mar 2000 04:40:26 -0500 Subject: [Python-Dev] finalization again In-Reply-To: <200003081406.JAA20033@eric.cnri.reston.va.us> Message-ID: <000701bf89ab$80cb8e20$0d2d153f@tim> [Guido, with some implementation details and nice examples] Normally I'd eat this up -- today I'm gasping for air trying to stay afloat. I'll have to settle for sketching the high-level approach I've had in the back of my mind. I start with the pile of incestuous stuff Toby/Neil discovered have no external references. It consists of dead cycles, and perhaps also non-cycles reachable only from dead cycles. 1. The "points to" relation on this pile defines a graph G. 2. From any graph G, we can derive a related graph G' consisting of the maximal strongly connected components (SCCs) of G. Each (super)node of G' is an SCC of G, where (super)node A' of G' points to (super)node B' of G' iff there exists a node A in A' that points to (wrt G) some node B in B'. It's not obvious, but the SCCs can be found in linear time (via Tarjan's algorithm, which is simple but subtle; Cyclops.py uses a much dumber brute-force approach, which is nevertheless perfectly adequate in the absence of massively large cycles -- premature optimization is the root etc <0.5 wink>). 3. G' is necessarily a DAG. For if distinct A' and B' are both reachable from each other in G', then every pair of A in A' and B in B' are reachable from each other in G, contradicting that A' and B' are distinct maximal SCCs (that is, the union of A' and B' is also an SCC). 4. The point to all this: Every DAG can be topsorted. Start with the nodes of G' without predecessors. There must be at least one, because G' is a DAG. 5. For every node A' in G' without predecessors (wrt G'), it either does or does not contain an object with a potentially dangerous finalizer. If it does not, let's call it a safe node. If there are no safe nodes without predecessors, GC is stuck, and for good reason: every object in the whole pile is reachable from an object with a finalizer, which could change the topology in near-arbitrary ways. The unsafe nodes without predecessors (and again, by #4, there must be at least one) are the heart of the problem, and this scheme identifies them precisely. 6. Else there is a safe node A'. For each A in A', reclaim it, following the normal refcount rules (or in an implementation w/o RC, by following a topsort of "points to" in the original G). This *may* cause reclamation of an object X with a finalizer outside of A'. But doing so cannot cause resurrection of anything in A' (X is reachable from A' else cleaning up A' couldn't have affected X, and if anything in A' were also reachable from X, X would have been in A' to begin with (SCC!), contradicting that A' is safe). So the objects in A' can get reclaimed without difficulty. 7. The simplest thing to do now is just stop: rebuild it from scratch the next time the scheme is invoked. If it was *possible* to make progress without guessing, we did; and if it was impossible, we identified the precise SCC(s) that stopped us. Anything beyond that is optimization <0.6 wink>. Seems the most valuable optimization would be to keep track of whether an object with a finalizer gets reclaimed in step 6 (so long as that doesn't happen, the mutations that can occur to the structure of G' seem nicely behaved enough that it should be possible to loop back to step #5 without crushing pain). On to Guido's msg: [Guido] > When we have a pile of garbage, we don't know whether it's all > connected or whether it's lots of little cycles. So if we find > [objects with -- I'm going to omit this] finalizers, we have to put > those on a third list and put everything reachable from them on that > list as well (the algorithm I described before). SCC determination gives precise answers to all that. > What's left on the first list then consists of finalizer-free garbage. > We dispose of this garbage by clearing dicts and lists. Hopefully > this makes the refcount of some of the finalizers go to zero -- those > are finalized in the normal way. In Python it's even possible for a finalizer to *install* a __del__ method that didn't previously exist, into the class of one of the objects on your "first list". The scheme above is meant to be bulletproof in the face of abuses even I can't conceive of . More mundanely, clearing an item on your first list can cause a chain of events that runs a finalizer, which in turn can resurrect one of the objects on your first list (and so it should *not* get reclaimed). Without doing the SCC bit, I don't think you can out-think that (the reasoning above showed that the finalizer can't resurrect something in the *same* SCC as the object that started it all, but that argument cannot be extended to objects in other safe SCCs: they're vulnerable). > And now we have to deal with the inevitable: finalizers that are part > of cycles. It makes sense to reduce the graph of objects to a graph > of finalizers only. Example: > > A <=> b -> C <=> d > > A and C have finalizers. C is part of a cycle (C-d) that contains no > other finalizers, but C is also reachable from A. A is part of a > cycle (A-b) that keeps it alive. The interesting thing here is that > if we only look at the finalizers, there are no cycles! The scheme above derives G': A' -> C' where A' consists of the A<=>b cycle and C' the C<=>d cycle. That there are no cycles in G' isn't surprising, it's just the natural consequence of doing the natural analysis . The scheme above refuses to do anything here, because the only node in G' without a predecessor (namely A') isn't "safe". > If we reduce the graph to only finalizers (setting aside for now the > problem of how to do that -- we may need to allocate more memory to > hold the reduced greaph), we get: > > A -> C You should really have self-loops on both A and C, right? (because A is reachable from itself via chasing pointers; ditto for C) > We can now finalize A (even though its refcount is nonzero!). And > that's really all we can do! A could break its own cycle, thereby > disposing of itself and b. It could also break C's cycle, disposing > of C and d. It could do nothing. Or it could resurrect A, thereby > resurrecting all of A, b, C, and d. > > This leads to (there's that weird echo again :-) Boehm's solution: > Call A's finalizer and leave the rest to the next time the garbage > collection runs. This time the echo came back distorted : [Boehm] Cycles involving one or more finalizable objects are never finalized. A<=>b is "a cycle involving one or more finalizable objects", so he won't touch it. The scheme at the top doesn't either. If you handed him your *derived* graph (but also without the self-loops), he would; me too. KISS! > Note that we're now calling finalizers on objects with a non-zero > refcount. I don't know why you want to do this. As the next several paragraphs confirm, it creates real headaches for the implementation, and I'm unclear on what it buys in return. Is "we'll do something by magic for cycles with no more than one finalizer" a major gain for the user over "we'll do something by magic for cycles with no finalizer"? 0, 1 and infinity *are* the only interesting numbers , but the difference between 0 and 1 *here* doesn't seem to me worth signing up for any pain at all. > At some point (probably as a result of finalizing A) its > refcount will go to zero. We should not finalize it again -- this > would serve no purpose. I don't believe BDW (or the scheme at the top) has this problem (simply because the only way to run finalizer in a cycle under them is for the user to break the cycle explicitly -- so if an object's finalizer gets run, the user caused it directly, and so can never claim surprise). > Possible solution: > > INCREF(A); > A->__del__(); > if (A->ob_refcnt == 1) > A->__class__ = NULL; /* Make a finalizer-less */ > DECREF(A); > > This avoids finalizing twice if the first finalization broke all > cycles in which A is involved. But if it doesn't, A is still cyclical > garbage with a finalizer! Even if it didn't resurrect itself. > > Instead of the code fragment above, we could mark A as "just > finalized" and when it shows up at the head of the tree (of finalizers > in cyclical trash) again on the next garbage collection, to discard it > without calling the finalizer again (because this clearly means that > it didn't resurrect itself -- at least not for a very long time). I don't think you need to do any of this -- unless you think you need to do the thing that created the need for this, which I didn't think you needed to do either . > I would be happier if we could still have a rule that says that a > finalizer is called only once by magic -- even if we have two forms of > magic: refcount zero or root of the tree. Tim: I don't know if you > object against this rule as a matter of principle (for the sake of > finalizers that resurrect the object) or if your objection is really > against the unordered calling of finalizers legitimized by Java's > rules. I hope the latter, since I think it that this rule (__del__ > called only once by magic) by itself is easy to understand and easy to > deal with, and I believe it may be necessary to guarantee progress for > the garbage collector. My objections to Java's rules have been repeated enough. I would have no objection to "__del__ called only once" if it weren't for that Python currently does something different. I don't know whether people rely on that now; if they do, it's a much more dangerous thing to change than adding a new keyword (the compiler gives automatic 100% coverage of the latter; but nothing mechanical can help people track down reliance-- whether deliberate or accidental --on the former). My best *guess* is that __del__ is used rarely; e.g., there are no more than 40 instances of it in the whole CVS tree, including demo directories; and they all look benign (at least three have bodies consisting of "pass"!). The most complicated one I found in my own code is: def __del__(self): self.break_cycles() def break_cycles(self): for rule in self.rules: if rule is not None: rule.cleanse() But none of this self-sampling is going to comfort some guy in France who has a megaline of code relying on it. Good *bet*, though . > [and another cogent explanation of why breaking the "leave cycles with > finalizers" alone injunction creates headaches] > ... > Even if someone once found a good use for resurrecting inside __del__, > against all recommendations, I don't mind breaking their code, if it's > for a good cause. The Java rules aren't a good cause. But top-sorted > finalizer calls seem a worthy cause. They do to me too, except that I say even a cycle involving but a single object (w/ finalizer) looping on itself is the user's problem. > So now we get to discuss what to do with multi-finalizer cycles, like: > > A <=> b <=> C > > Here the reduced graph is: > > A <=> C The SCC reduction is simply to A and, right, the scheme at the top punts. > [more the on once-only rule chopped] > ... > Anyway, once-only rule aside, we still need a protocol to deal with > cyclical dependencies between finalizers. The __cleanup__ approach is > one solution, but it also has a problem: we have a set of finalizers. > Whose __cleanup__ do we call? Any? All? Suggestions? This is why a variant of guardians were more appealing to me at first: I could ask a guardian for the entire SCC, so I get the *context* of the problem as well as the final microscopic symptom. I see Marc-Andre already declined to get sucked into the magical part of this . Greg should speak for his scheme, and I haven't made time to understand it fully; my best guess is to call x.__cleanup__ for every object in the SCC (but there's no clear way to decide which order to call them in, and unless they're more restricted than __del__ methods they can create all the same problems __del__ methods can!). > Note that I'd like some implementation freedom: I may not want to > bother with the graph reduction algorithm at first (which seems very > hairy) so I'd like to have the right to use the __cleanup__ API > as soon as I see finalizers in cyclical trash. I don't mind disposing > of finalizer-free cycles first, but once I have more than one > finalizer left in the remaining cycles, I'd like the right not to > reduce the graph for topsort reasons -- that algorithm seems hard. I hate to be realistic , but modern GC algorithms are among the hardest you'll ever see in any field; even the outer limits of what we've talked about here is baby stuff. Sun's Java group (the one in Chelmsford, MA, down the road from me) had a group of 4+ people (incl. the venerable Mr. Steele) working full-time for over a year on the last iteration of Java's GC. The simpler BDW is a megabyte of code spread over 100+ files. Etc -- state of the art GC can be crushingly hard. So I've got nothing against taking shortcuts at first -- there's actually no realistic alternative. I think we're overlooking the obvious one, though: if any finalizer appears in any trash cycle, tough luck. Python 3000 -- which may be a spelling of 1.7 , but doesn't *need* to be a spelling of 1.6. > So we're back to the __cleanup__ design. Strawman proposal: for all > finalizers in a trash cycle, call their __cleanup__ method, in > arbitrary order. After all __cleanup__ calls are done, if the objects > haven't all disposed of themselves, they are all garbage-collected > without calling __del__. (This seems to require another garbage > colelction cycle -- so perhaps there should also be a once-only rule > for __cleanup__?) > > Separate question: what if there is no __cleanup__? This should > probably be reported: "You have cycles with finalizers, buddy! What > do you want to do about them?" This same warning could be given when > there is a __cleanup__ but it doesn't break all cycles. If I *ever* have a trash cycle with a finalizer in my code (> 0 -- "exactly 1" isn't special to me), I will consider it to be a bug. So I want a way to get it back from gc, so I can see what the heck it is, so I can fix my code (or harass whoever did it to me). __cleanup__ suffices for that, so the very act of calling it is all I'm really after ("Python invoked __cleanup__ == Tim has a bug"). But after I outgrow that , I'll certainly want the option to get another kind of complaint if __cleanup__ doesn't break the cycles, and after *that* I couldn't care less. I've given you many gracious invitations to say that you don't mind leaking in the face of a buggy program , but as you've declined so far, I take it that never hearing another gripe about leaking is a Primary Life Goal. So collection without calling __del__ is fine -- but so is collection with calling it! If we're going to (at least implicitly) approve of this stuff, it's probably better *to* call __del__, if for no other reason than to catch your case of some poor innocent object caught in a cycle not of its making that expects its __del__ to abort starting World War III if it becomes unreachable . whatever-we-don't-call-a-mistake-is-a-feature-ly y'rs - tim From fdrake at acm.org Thu Mar 9 15:25:35 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 9 Mar 2000 09:25:35 -0500 (EST) Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <000401bf897a$f5a7e620$0d2d153f@tim> References: <000401bf897a$f5a7e620$0d2d153f@tim> Message-ID: <14535.46175.991970.135642@weyr.cnri.reston.va.us> Tim Peters writes: > Failing that, the os.popen docs should caution it's "use at your own risk" > under Windows, and that this is directly inherited from MS's popen > implementation. Tim (& others), Would this additional text be sufficient for the os.popen() documentation? \strong{Note:} This function behaves unreliably under Windows due to the native implementation of \cfunction{popen()}. If someone cares to explain what's weird about it, that might be appropriate as well, but I've never used this under Windows. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mal at lemburg.com Thu Mar 9 15:42:37 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 09 Mar 2000 15:42:37 +0100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: Message-ID: <38C7B85D.E6090670@lemburg.com> Trent Mick wrote: > > MAL: > > architecture(executable='/usr/local/bin/python', bits='', > > linkage='') : > > > > Values that cannot be determined are returned as given by the > > parameter presets. If bits is given as '', the sizeof(long) is > > used as indicator for the supported pointer size. > > Just a heads up, using sizeof(long) will not work on forthcoming WIN64 > (LLP64 data model) to determine the supported pointer size. You would want > to use the 'P' struct format specifier instead, I think (I am speaking in > relative ignorance). However, the docs say that a PyInt is used to store 'P' > specified value, which as a C long, will not hold a pointer on LLP64. Hmmmm. > The keyword perhaps is "forthcoming". > > This is the code in question in platform.py: > > # Use the sizeof(long) as default number of bits if nothing > # else is given as default. > if not bits: > import struct > bits = str(struct.calcsize('l')*8) + 'bit' Python < 1.5.2 doesn't support 'P', but anyway, I'll change those lines according to your suggestion. Does struct.calcsize('P')*8 return 64 on 64bit-platforms as it should (probably ;) ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jim at interet.com Thu Mar 9 16:45:54 2000 From: jim at interet.com (James C. Ahlstrom) Date: Thu, 09 Mar 2000 10:45:54 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: <000401bf897a$f5a7e620$0d2d153f@tim> Message-ID: <38C7C732.D9086C34@interet.com> Tim Peters wrote: > > I had another take on all this, which I'll now share since nobody > seems inclined to fold in the Win32 popen: perhaps os.popen should not be > supported at all under Windows! > > The current function is a mystery wrapped in an enigma -- sometimes it > works, sometimes it doesn't, and I've never been able to outguess which one > will obtain (there's more to it than just whether a console window is > attached). If it's not reliable (it's not), and we can't document the > conditions under which it can be used safely (I can't), Python shouldn't > expose it. OK, I admit I don't understand this either, but here goes... It looks like Python popen() uses the Windows _popen() function. The _popen() docs say that it creates a spawned copy of the command processor (shell) with the given string argument. It further states that it does NOT work in a Windows program and ONLY works when called from a Windows Console program. From tim_one at email.msn.com Thu Mar 9 18:14:17 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 9 Mar 2000 12:14:17 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <38C7C732.D9086C34@interet.com> Message-ID: <000401bf89ea$e6e54180$79a0143f@tim> [James C. Ahlstrom] > OK, I admit I don't understand this either, but here goes... > > It looks like Python popen() uses the Windows _popen() function. > The _popen() docs say ... Screw the docs. Pretend you're a newbie and *try* it. Here: import os p = os.popen("dir") while 1: line = p.readline() if not line: break print line Type that in by hand, or stick it in a file & run it from a cmdline python.exe (which is a Windows console program). Under Win95 the process freezes solid, and even trying to close the DOS box doesn't work. You have to bring up the task manager and kill it that way. I once traced this under the debugger -- it's hung inside an MS DLL. "dir" is not entirely arbitrary here: for *some* cmds it works fine, for others not. The set of which work appears to vary across Windows flavors. Sometimes you can worm around it by wrapping "a bad" cmd in a .bat file, and popen'ing the latter instead; but sometimes not. After hours of poke-&-hope (in the past), as I said, I've never been able to predict which cases will work. > ... > It further states that it does NOT work in a Windows program and ONLY > works when called from a Windows Console program. The latter is a necessary condition but not sufficient; don't know what *is* sufficient, and AFAIK nobody else does either. > From this I assume that popen() works from python.exe (it is a Console > app) if the command can be directly executed by the shell (like "dir"), See above for a counterexample to both . I actually have much better luck with cmds command.com *doesn't* know anything about. So this appears to vary by shell too. > ... > If there is something wrong with _popen() then the way to fix it is > to avoid using it and create the pipes directly. libc pipes ares as flaky as libc popen under Windows, Jim! MarkH has the only versions of these things that come close to working under Windows (he wraps the native Win32 spellings of these things; MS's libc entry points (which Python uses now) are much worse). > ... > Of course, the strength of Python is portable code. popen() should be > fixed the right way. pipes too, but users get baffled by popen much more often simply because they try popen much more often. there's-no-question-about-whether-it-works-right-it-doesn't-ly y'rs - tim From gstein at lyra.org Thu Mar 9 18:47:23 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 9 Mar 2000 09:47:23 -0800 (PST) Subject: [Python-Dev] platform.py (was: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?) In-Reply-To: <38C7B85D.E6090670@lemburg.com> Message-ID: On Thu, 9 Mar 2000, M.-A. Lemburg wrote: >... > Python < 1.5.2 doesn't support 'P', but anyway, I'll change > those lines according to your suggestion. > > Does struct.calcsize('P')*8 return 64 on 64bit-platforms as > it should (probably ;) ? Yes. It returns sizeof(void *). Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Thu Mar 9 15:55:36 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 09 Mar 2000 15:55:36 +0100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: <000401bf897a$f5a7e620$0d2d153f@tim> <14535.46175.991970.135642@weyr.cnri.reston.va.us> Message-ID: <38C7BB68.9FAE3BE9@lemburg.com> "Fred L. Drake, Jr." wrote: > > Tim Peters writes: > > Failing that, the os.popen docs should caution it's "use at your own risk" > > under Windows, and that this is directly inherited from MS's popen > > implementation. > > Tim (& others), > Would this additional text be sufficient for the os.popen() > documentation? > > \strong{Note:} This function behaves unreliably under Windows > due to the native implementation of \cfunction{popen()}. > > If someone cares to explain what's weird about it, that might be > appropriate as well, but I've never used this under Windows. Ehm, hasn't anyone looked at the code I posted yesterday ? It goes a long way to deal with these inconsistencies... even though its not perfect (yet ;). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake at acm.org Thu Mar 9 19:52:40 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 9 Mar 2000 13:52:40 -0500 (EST) Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <38C7BB68.9FAE3BE9@lemburg.com> References: <000401bf897a$f5a7e620$0d2d153f@tim> <14535.46175.991970.135642@weyr.cnri.reston.va.us> <38C7BB68.9FAE3BE9@lemburg.com> Message-ID: <14535.62200.158087.102380@weyr.cnri.reston.va.us> M.-A. Lemburg writes: > Ehm, hasn't anyone looked at the code I posted yesterday ? > It goes a long way to deal with these inconsistencies... even > though its not perfect (yet ;). I probably sent that before I'd read everything, and I'm not the one to change the popen() implementation. At this point, I'm waiting for someone who understands the details to decide what happens (if anything) to the implementation before I check in any changes to the docs. My inclination is to fix popen() on Windows to do the right thing, but I don't know enough about pipes & process management on Windows to get into that fray. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From nascheme at enme.ucalgary.ca Thu Mar 9 20:37:31 2000 From: nascheme at enme.ucalgary.ca (nascheme at enme.ucalgary.ca) Date: Thu, 9 Mar 2000 12:37:31 -0700 Subject: [Python-Dev] finalization again Message-ID: <20000309123731.A3664@acs.ucalgary.ca> [Tim, explaining something I was thinking about more clearly than I ever could] >It's not obvious, but the SCCs can be found in linear time (via Tarjan's >algorithm, which is simple but subtle; Wow, it seems like it should be more expensive than that. What are the space requirements? Also, does the simple algorithm you used in Cyclops have a name? >If there are no safe nodes without predecessors, GC is stuck, >and for good reason: every object in the whole pile is reachable >from an object with a finalizer, which could change the topology >in near-arbitrary ways. The unsafe nodes without predecessors >(and again, by #4, there must be at least one) are the heart of >the problem, and this scheme identifies them precisely. Exactly. What is our policy on these unsafe nodes? Guido seems to feel that it is okay for the programmer to create them and Python should have a way of collecting them. Tim seems to feel that the programmer should not create them in the first place. I agree with Tim. If topological finalization is used, it is possible for the programmer to design their classes so that this problem does not happen. This is explained on Hans Boehm's finalization web page. If the programmer can or does not redesign their classes I don't think it is unreasonable to leak memory. We can link these cycles to a global list of garbage or print a debugging message. This is a large improvement over the current situation (ie. leaking memory with no debugging even for cycles without finalizers). Neil -- "If you're a great programmer, you make all the routines depend on each other, so little mistakes can really hurt you." -- Bill Gates, ca. 1985. From gstein at lyra.org Thu Mar 9 20:50:29 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 9 Mar 2000 11:50:29 -0800 (PST) Subject: [Python-Dev] finalization again In-Reply-To: <20000309123731.A3664@acs.ucalgary.ca> Message-ID: On Thu, 9 Mar 2000 nascheme at enme.ucalgary.ca wrote: >... > If the programmer can or does not redesign their classes I don't > think it is unreasonable to leak memory. We can link these > cycles to a global list of garbage or print a debugging message. > This is a large improvement over the current situation (ie. > leaking memory with no debugging even for cycles without > finalizers). I think we throw an error (as a subclass of MemoryError). As an alternative, is it possible to move those cycles to the garbage list and then never look at them again? That would speed up future collection processing. Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido at python.org Thu Mar 9 20:51:46 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 09 Mar 2000 14:51:46 -0500 Subject: [Python-Dev] finalization again In-Reply-To: Your message of "Thu, 09 Mar 2000 11:50:29 PST." References: Message-ID: <200003091951.OAA26184@eric.cnri.reston.va.us> > As an alternative, is it possible to move those cycles to the garbage list > and then never look at them again? That would speed up future collection > processing. With the current approach, that's almost automatic :-) I'd rather reclaim the memory too. --Guido van Rossum (home page: http://www.python.org/~guido/) From gmcm at hypernet.com Thu Mar 9 20:54:16 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Thu, 9 Mar 2000 14:54:16 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <000401bf89ea$e6e54180$79a0143f@tim> References: <38C7C732.D9086C34@interet.com> Message-ID: <1259490837-400325@hypernet.com> [Tim re popen on Windows] ... > the debugger -- it's hung inside an MS DLL. "dir" is not entirely arbitrary > here: for *some* cmds it works fine, for others not. The set of which work > appears to vary across Windows flavors. Sometimes you can worm around it by > wrapping "a bad" cmd in a .bat file, and popen'ing the latter instead; but > sometimes not. It doesn't work for commands builtin to whatever "shell" you're using. That's different between cmd and command, and the various flavors, versions and extensions thereof. FWIW, I gave up a long time ago. I use redirection and a tempfile. The few times I've wanted "interactive" control, I've used Win32Process, dup'ed, inherited handles... the whole 9 yards. Why? Look at all the questions about popen and child processes in general, on platforms where it *works*, (if it weren't for Donn Cave, nobody'd get it to work anywhere ). To reiterate Tim's point: *none* of the c runtime routines for process control on Windows are adequate (beyond os.system and living with a DOS box popping up). The raw Win32 CreateProcess does everything you could possibly want, but takes a week or more to understand, (if this arg is a that, then that arg is a whatsit, and the next is limited to the values X and Z unless...). your-brain-on-Windows-ly y'rs - Gordon From guido at python.org Thu Mar 9 20:55:23 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 09 Mar 2000 14:55:23 -0500 Subject: [Python-Dev] finalization again In-Reply-To: Your message of "Thu, 09 Mar 2000 04:40:26 EST." <000701bf89ab$80cb8e20$0d2d153f@tim> References: <000701bf89ab$80cb8e20$0d2d153f@tim> Message-ID: <200003091955.OAA26217@eric.cnri.reston.va.us> [Tim describes a more formal approach based on maximal strongly connected components (SCCs).] I like the SCC approach -- it's what I was struggling to invent but came short of discovering. However: [me] > > What's left on the first list then consists of finalizer-free garbage. > > We dispose of this garbage by clearing dicts and lists. Hopefully > > this makes the refcount of some of the finalizers go to zero -- those > > are finalized in the normal way. [Tim] > In Python it's even possible for a finalizer to *install* a __del__ method > that didn't previously exist, into the class of one of the objects on your > "first list". The scheme above is meant to be bulletproof in the face of > abuses even I can't conceive of . Are you *sure* your scheme deals with this? Let's look at an example. (Again, lowercase nodes have no finalizers.) Take G: a <=> b -> C This is G' (a and b are strongly connected): a' -> C' C is not reachable from any root node. We decide to clear a and b. Let's suppose we happen to clear b first. This removes the last reference to C, C's finalizer runs, and it installs a finalizer on a.__class__. So now a' has turned into A', and we're halfway committing a crime we said we would never commit (touching cyclical trash with finalizers). I propose to disregard this absurd possibility, except to the extent that Python shouldn't crash -- but we make no guarantees to the user. > More mundanely, clearing an item on your first list can cause a chain of > events that runs a finalizer, which in turn can resurrect one of the objects > on your first list (and so it should *not* get reclaimed). Without doing > the SCC bit, I don't think you can out-think that (the reasoning above > showed that the finalizer can't resurrect something in the *same* SCC as the > object that started it all, but that argument cannot be extended to objects > in other safe SCCs: they're vulnerable). I don't think so. While my poor wording ("finalizer-free garbage") didn't make this clear, my references to earlier algorithms were intended to imply that this is garbage that consists of truly unreachable objects. I have three lists: let's call them T(rash), R(oot-reachable), and F(inalizer-reachable). The Schemenauer c.s. algorithm moves all reachable nodes to R. I then propose to move all finalizers to F, and to run another pass of Schemenauer c.s. to also move all finalizer-reachable (but not root-reachable) nodes to F. I truly believe that (barring the absurdity of installing a new __del__) the objects on T at this point cannot be resurrected by a finalizer that runs, since they aren't reachable from any finalizers: by virtue of Schemenauer c.s. (which computes a reachability closure given some roots) anything reachable from a finalizer is on F by now (if it isn't on R -- again, nothing on T is reachable from R, because R is calculated a closure). So, unless there's still a bug in my thinking here, I think that as long as we only want to clear SCCs with 0 finalizers, T is exactly the set of nodes we're looking for. > This time the echo came back distorted : > > [Boehm] > Cycles involving one or more finalizable objects are never finalized. > > A<=>b is "a cycle involving one or more finalizable objects", so he won't > touch it. The scheme at the top doesn't either. If you handed him your > *derived* graph (but also without the self-loops), he would; me too. KISS! > > > Note that we're now calling finalizers on objects with a non-zero > > refcount. > > I don't know why you want to do this. As the next several paragraphs > confirm, it creates real headaches for the implementation, and I'm unclear > on what it buys in return. Is "we'll do something by magic for cycles with > no more than one finalizer" a major gain for the user over "we'll do > something by magic for cycles with no finalizer"? 0, 1 and infinity *are* > the only interesting numbers , but the difference between 0 and 1 > *here* doesn't seem to me worth signing up for any pain at all. I do have a reason: if a maximal SCC has only one finalizer, there can be no question about the ordering between finalizer calls. And isn't the whole point of this discussion to have predictable ordering of finalizer calls in the light of trash recycling? > I would have no objection to "__del__ called only once" if it weren't for > that Python currently does something different. I don't know whether people > rely on that now; if they do, it's a much more dangerous thing to change > than adding a new keyword (the compiler gives automatic 100% coverage of the > latter; but nothing mechanical can help people track down reliance-- whether > deliberate or accidental --on the former). [...] > But none of this self-sampling is going to comfort some guy in France who > has a megaline of code relying on it. Good *bet*, though . OK -- so your objection is purely about backwards compatibility. Apart from that, I strongly feel that the only-once rule is a good one. And I don't think that the compatibility issue weighs very strongly here (given all the other problems that typically exist with __del__). > I see Marc-Andre already declined to get sucked into the magical part of > this . Greg should speak for his scheme, and I haven't made time to > understand it fully; my best guess is to call x.__cleanup__ for every object > in the SCC (but there's no clear way to decide which order to call them in, > and unless they're more restricted than __del__ methods they can create all > the same problems __del__ methods can!). Yes, but at least since we're defining a new API (in a reserved portion of the method namespace) there are no previous assumptions to battle. > > Note that I'd like some implementation freedom: I may not want to > > bother with the graph reduction algorithm at first (which seems very > > hairy) so I'd like to have the right to use the __cleanup__ API > > as soon as I see finalizers in cyclical trash. I don't mind disposing > > of finalizer-free cycles first, but once I have more than one > > finalizer left in the remaining cycles, I'd like the right not to > > reduce the graph for topsort reasons -- that algorithm seems hard. > > I hate to be realistic , but modern GC algorithms are among the > hardest you'll ever see in any field; even the outer limits of what we've > talked about here is baby stuff. Sun's Java group (the one in Chelmsford, > MA, down the road from me) had a group of 4+ people (incl. the venerable Mr. > Steele) working full-time for over a year on the last iteration of Java's > GC. The simpler BDW is a megabyte of code spread over 100+ files. Etc -- > state of the art GC can be crushingly hard. > > So I've got nothing against taking shortcuts at first -- there's actually no > realistic alternative. I think we're overlooking the obvious one, though: > if any finalizer appears in any trash cycle, tough luck. Python 3000 -- > which may be a spelling of 1.7 , but doesn't *need* to be a spelling > of 1.6. Kind of sad though -- finally knowing about cycles and then not being able to do anything about them. > > So we're back to the __cleanup__ design. Strawman proposal: for all > > finalizers in a trash cycle, call their __cleanup__ method, in > > arbitrary order. After all __cleanup__ calls are done, if the objects > > haven't all disposed of themselves, they are all garbage-collected > > without calling __del__. (This seems to require another garbage > > colelction cycle -- so perhaps there should also be a once-only rule > > for __cleanup__?) > > > > Separate question: what if there is no __cleanup__? This should > > probably be reported: "You have cycles with finalizers, buddy! What > > do you want to do about them?" This same warning could be given when > > there is a __cleanup__ but it doesn't break all cycles. > > If I *ever* have a trash cycle with a finalizer in my code (> 0 -- "exactly > 1" isn't special to me), I will consider it to be a bug. So I want a way to > get it back from gc, so I can see what the heck it is, so I can fix my code > (or harass whoever did it to me). __cleanup__ suffices for that, so the > very act of calling it is all I'm really after ("Python invoked __cleanup__ > == Tim has a bug"). > > But after I outgrow that , I'll certainly want the option to get > another kind of complaint if __cleanup__ doesn't break the cycles, and after > *that* I couldn't care less. I've given you many gracious invitations to > say that you don't mind leaking in the face of a buggy program , but > as you've declined so far, I take it that never hearing another gripe about > leaking is a Primary Life Goal. So collection without calling __del__ is > fine -- but so is collection with calling it! If we're going to (at least > implicitly) approve of this stuff, it's probably better *to* call __del__, > if for no other reason than to catch your case of some poor innocent object > caught in a cycle not of its making that expects its __del__ to abort > starting World War III if it becomes unreachable . I suppose we can print some obnoxious message to stderr like """Your program has created cyclical trash involving one or more objects with a __del__ method; calling their __cleanup__ method didn't resolve the cycle(s). I'm going to call the __del__ method(s) but I can't guarantee that they will be called in a meaningful order, because of the cyclical dependencies.""" But I'd still like to reclaim the memory. If this is some long-running server process that is executing arbitrary Python commands sent to it by clients, it's not nice to leak, period. (Because of this, I will also need to trace functions, methods and modules -- these create massive cycles that currently require painful cleanup. Of course I also need to track down all the roots then... :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein at lyra.org Thu Mar 9 20:59:48 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 9 Mar 2000 11:59:48 -0800 (PST) Subject: [Python-Dev] finalization again In-Reply-To: <200003091951.OAA26184@eric.cnri.reston.va.us> Message-ID: On Thu, 9 Mar 2000, Guido van Rossum wrote: > > As an alternative, is it possible to move those cycles to the garbage list > > and then never look at them again? That would speed up future collection > > processing. > > With the current approach, that's almost automatic :-) > > I'd rather reclaim the memory too. Well, yah. I would too :-) I'm at ApacheCon right now, so haven't read the thread in detail, but it seems that people saw my algorithm as a bit too complex. Bah. IMO, it's a pretty straightforward way for the interpreter to get cycles cleaned up. (whether the objects in the cycles are lists/dicts, class instances, or extension types!) Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Thu Mar 9 21:18:06 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 9 Mar 2000 12:18:06 -0800 (PST) Subject: [Python-Dev] finalization again In-Reply-To: <200003091955.OAA26217@eric.cnri.reston.va.us> Message-ID: On Thu, 9 Mar 2000, Guido van Rossum wrote: >... > I don't think so. While my poor wording ("finalizer-free garbage") > didn't make this clear, my references to earlier algorithms were > intended to imply that this is garbage that consists of truly > unreachable objects. I have three lists: let's call them T(rash), > R(oot-reachable), and F(inalizer-reachable). The Schemenauer > c.s. algorithm moves all reachable nodes to R. I then propose to move > all finalizers to F, and to run another pass of Schemenauer c.s. to > also move all finalizer-reachable (but not root-reachable) nodes to F. >... > [Tim Peters] > > I see Marc-Andre already declined to get sucked into the magical part of > > this . Greg should speak for his scheme, and I haven't made time to > > understand it fully; my best guess is to call x.__cleanup__ for every object > > in the SCC (but there's no clear way to decide which order to call them in, > > and unless they're more restricted than __del__ methods they can create all > > the same problems __del__ methods can!). My scheme was to identify objects in F, but only those with a finalizer (not the closure). Then call __cleanup__ on each of them, in arbitrary order. If any are left after the sequence of __cleanup__ calls, then I call it an error. [ note that my proposal defined checking for a finalizer by calling tp_clean(TPCLEAN_CARE_CHECK); this accounts for class instances and for extension types with "heavy" processing in tp_dealloc ] The third step was to use tp_clean to try and clean all other objects in a safe fashion. Specifically: the objects have no finalizers, so there is no special care needed in finalizing, so this third step should nuke references that are stored in the object. This means object pointers are still valid (we haven't dealloc'd), but the insides have been emptied. If the third step does not remove all cycles, then one of the PyType objects did not remove all references during the tp_clean call. >... > > If I *ever* have a trash cycle with a finalizer in my code (> 0 -- "exactly > > 1" isn't special to me), I will consider it to be a bug. So I want a way to > > get it back from gc, so I can see what the heck it is, so I can fix my code > > (or harass whoever did it to me). __cleanup__ suffices for that, so the > > very act of calling it is all I'm really after ("Python invoked __cleanup__ > > == Tim has a bug"). Agreed. >... > I suppose we can print some obnoxious message to stderr like A valid alternative to raising an exception, but it falls into the whole trap of "where does stderr go?" >... > But I'd still like to reclaim the memory. If this is some > long-running server process that is executing arbitrary Python > commands sent to it by clients, it's not nice to leak, period. If an exception is raised, the top-level server loop can catch it, log the error, and keep going. But yes: it will leak. > (Because of this, I will also need to trace functions, methods and > modules -- these create massive cycles that currently require painful > cleanup. Of course I also need to track down all the roots > then... :-) Yes. It would be nice to have these participate in the "cleanup protocol" that I've described. It should help a lot at Python finalization time, effectively moving some special casing from import.c to the objects themselves. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jim at interet.com Thu Mar 9 21:20:23 2000 From: jim at interet.com (James C. Ahlstrom) Date: Thu, 09 Mar 2000 15:20:23 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: <000401bf89ea$e6e54180$79a0143f@tim> Message-ID: <38C80787.7791A1A6@interet.com> Tim Peters wrote: > Screw the docs. Pretend you're a newbie and *try* it. I did try it. > > import os > p = os.popen("dir") > while 1: > line = p.readline() > if not line: > break > print line > > Type that in by hand, or stick it in a file & run it from a cmdline > python.exe (which is a Windows console program). Under Win95 the process > freezes solid, and even trying to close the DOS box doesn't work. You have > to bring up the task manager and kill it that way. I once traced this under Point on the curve: This program works perfectly on my machine running NT. > libc pipes ares as flaky as libc popen under Windows, Jim! MarkH has the > only versions of these things that come close to working under Windows (he > wraps the native Win32 spellings of these things; MS's libc entry points > (which Python uses now) are much worse). I believe you when you say popen() is flakey. It is a little harder to believe it is not possible to write a _popen() replacement using pipes which works. Of course I wanted you to do it instead of me! Well, if I get any time before 1.6 comes out... JimA From gstein at lyra.org Thu Mar 9 21:31:38 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 9 Mar 2000 12:31:38 -0800 (PST) Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <38C80787.7791A1A6@interet.com> Message-ID: On Thu, 9 Mar 2000, James C. Ahlstrom wrote: >... > > libc pipes ares as flaky as libc popen under Windows, Jim! MarkH has the > > only versions of these things that come close to working under Windows (he > > wraps the native Win32 spellings of these things; MS's libc entry points > > (which Python uses now) are much worse). > > I believe you when you say popen() is flakey. It is a little > harder to believe it is not possible to write a _popen() > replacement using pipes which works. > > Of course I wanted you to do it instead of me! Well, if > I get any time before 1.6 comes out... It *has* been done. Bill Tutt did it a long time ago. That's what win32pipe is all about. -g -- Greg Stein, http://www.lyra.org/ From jim at interet.com Thu Mar 9 22:04:59 2000 From: jim at interet.com (James C. Ahlstrom) Date: Thu, 09 Mar 2000 16:04:59 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipestuff going to be adopted? References: Message-ID: <38C811FB.B6096FA4@interet.com> Greg Stein wrote: > > On Thu, 9 Mar 2000, James C. Ahlstrom wrote: > > Of course I wanted you to do it instead of me! Well, if > > I get any time before 1.6 comes out... > > It *has* been done. Bill Tutt did it a long time ago. That's what > win32pipe is all about. Thanks for the heads up! Unfortunately, win32pipe is not in the core, and probably covers more ground than just popen() and so might be a maintenance problem. And popen() is not written in it anyway. So we are Not There Yet (TM). Which I guess was Tim's original point. JimA From mhammond at skippinet.com.au Thu Mar 9 22:36:14 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri, 10 Mar 2000 08:36:14 +1100 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <38C80787.7791A1A6@interet.com> Message-ID: > Point on the curve: This program works perfectly on my > machine running NT. And running from Python.exe. I bet you didnt try it from a GUI. The situation is worse WRT Windows 95. MS has a knowledge base article describing the bug, and telling you how to work around it by using a decicated .EXE. So, out of the box, popen works only on a NT from a console - pretty sorry state of affairs :-( > I believe you when you say popen() is flakey. It is a little > harder to believe it is not possible to write a _popen() > replacement using pipes which works. Which is what I believe win32pipe.popen* are. Mark. From guido at python.org Fri Mar 10 02:13:51 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 09 Mar 2000 20:13:51 -0500 Subject: [Python-Dev] writelines() not thread-safe Message-ID: <200003100113.UAA27337@eric.cnri.reston.va.us> Christian Tismer just did an exhaustive search for thread unsafe use of Python operations, and found two weaknesses. One is posix.listdir(), which I had already found; the other is file.writelines(). Here's a program that demonstrates the bug; basically, while writelines is walking down the list, another thread could truncate the list, causing PyList_GetItem() to fail or a string object to be deallocated while writelines is using it. On my SOlaris 7 system it typically crashes in the first or second iteration. It's easy to fix: just don't use release the interpreter lock (get rid of Py_BEGIN_ALLOW_THREADS c.s.). This would however prevent other threads from doing any work while this thread may be blocked for I/O. An alternative solution is to put Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS just around the fwrite() call. This is safe, but would require a lot of lock operations and would probably slow things down too much. Ideas? --Guido van Rossum (home page: http://www.python.org/~guido/) import os import sys import thread import random import time import tempfile def good_guy(fp, list): t0 = time.time() fp.seek(0) fp.writelines(list) t1 = time.time() print fp.tell(), "bytes written" return t1-t0 def bad_guy(dt, list): time.sleep(random.random() * dt) del list[:] def main(): infn = "/usr/dict/words" if sys.argv[1:]: infn = sys.argv[1] print "reading %s..." % infn fp = open(infn) list = fp.readlines() fp.close() print "read %d lines" % len(list) tfn = tempfile.mktemp() fp = None try: fp = open(tfn, "w") print "calibrating..." dt = 0.0 n = 3 for i in range(n): dt = dt + good_guy(fp, list) dt = dt / n # average time it took to write the list to disk print "dt =", round(dt, 3) i = 0 while 1: i = i+1 print "test", i copy = map(lambda x: x[1:], list) thread.start_new_thread(bad_guy, (dt, copy)) good_guy(fp, copy) finally: if fp: fp.close() try: os.unlink(tfn) except os.error: pass main() From tim_one at email.msn.com Fri Mar 10 03:13:51 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 9 Mar 2000 21:13:51 -0500 Subject: [Python-Dev] writelines() not thread-safe In-Reply-To: <200003100113.UAA27337@eric.cnri.reston.va.us> Message-ID: <000601bf8a36$46ebf880$58a2143f@tim> [Guido van Rossum] > Christian Tismer just did an exhaustive search for thread unsafe use > of Python operations, and found two weaknesses. One is > posix.listdir(), which I had already found; the other is > file.writelines(). Here's a program that demonstrates the bug; > basically, while writelines is walking down the list, another thread > could truncate the list, causing PyList_GetItem() to fail or a string > object to be deallocated while writelines is using it. On my SOlaris > 7 system it typically crashes in the first or second iteration. > > It's easy to fix: just don't use release the interpreter lock (get rid > of Py_BEGIN_ALLOW_THREADS c.s.). This would however prevent other > threads from doing any work while this thread may be blocked for I/O. > > An alternative solution is to put Py_BEGIN_ALLOW_THREADS and > Py_END_ALLOW_THREADS just around the fwrite() call. This is safe, but > would require a lot of lock operations and would probably slow things > down too much. > > Ideas? 2.5: 1: Before releasing the lock, make a shallow copy of the list. 1.5: As in #1, but iteratively peeling off "the next N" values, for some N balancing the number of lock operations against the memory burden (I don't care about the speed of a shallow copy here ...). 2. Pull the same trick list.sort() uses: make the list object immutable for the duration (I know you think that's a hack, and it is , but it costs virtually nothing and would raise an approriate error when they attempted the insane mutation). I actually like #2 best now, but won't in the future, because file_writelines() should really accept an argument of any sequence type. This makes 1.5 a better long-term hack. although-adding-1.5-to-1.6-is-confusing-ly y'rs - tim From tim_one at email.msn.com Fri Mar 10 03:52:26 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 9 Mar 2000 21:52:26 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <1259490837-400325@hypernet.com> Message-ID: <000901bf8a3b$ab314660$58a2143f@tim> [Gordon McM, aspires to make sense of the mess] > It doesn't work for commands builtin to whatever "shell" you're > using. That's different between cmd and command, and the > various flavors, versions and extensions thereof. It's not that simple, either; e.g., old apps invoking the 16-bit subsystem can screw up too. Look at Tcl's man page for "exec" and just *try* to wrap your brain around all the caveats they were left with after throwing a few thousand lines of C at this under their Windows port . > FWIW, I gave up a long time ago. I use redirection and a > tempfile. The few times I've wanted "interactive" control, I've > used Win32Process, dup'ed, inherited handles... the whole 9 > yards. Why? Look at all the questions about popen and child > processes in general, on platforms where it *works*, (if it > weren't for Donn Cave, nobody'd get it to work anywhere ). Donn is downright scary that way. I stopped using 'em too, of course. > To reiterate Tim's point: *none* of the c runtime routines for > process control on Windows are adequate (beyond os.system > and living with a DOS box popping up). No, os.system is a problem under command.com flavors of Windows too, as system spawns a new shell and command.com's exit code is *always* 0. So Python's os.system returns 0 no matter what app the user *thinks* they were running, and whether it worked or set the baby on fire. > The raw Win32 CreateProcess does everything you could possibly want, but > takes a week or more to understand, (if this arg is a that, then that arg > is a whatsit, and the next is limited to the values X and Z unless...). Except that CreateProcess doesn't handle shell metacharacters, right? Tcl is the only language I've seen that really works hard at making cmdline-style process control portable. so-all-we-need-to-do-is-a-single-createprocess-to-invoke-tcl-ly y'rs - tim From tim_one at email.msn.com Fri Mar 10 03:52:24 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 9 Mar 2000 21:52:24 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <14535.46175.991970.135642@weyr.cnri.reston.va.us> Message-ID: <000801bf8a3b$aa0c4e60$58a2143f@tim> [Fred L. Drake, Jr.] > Tim (& others), > Would this additional text be sufficient for the os.popen() > documentation? > > \strong{Note:} This function behaves unreliably under Windows > due to the native implementation of \cfunction{popen()}. Yes, that's good! If Mark/Bill's alternatives don't make it in, would also be good to point to the PythonWin extensions (although MarkH will have to give us the Official Name for that). > If someone cares to explain what's weird about it, that might be > appropriate as well, but I've never used this under Windows. As the rest of this thread should have made abundantly clear by now <0.9 wink>, it's such a mess across various Windows flavors that nobody can explain it. From tim_one at email.msn.com Fri Mar 10 04:15:18 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 9 Mar 2000 22:15:18 -0500 Subject: [Python-Dev] RE: finalization again In-Reply-To: <20000309123731.A3664@acs.ucalgary.ca> Message-ID: <000a01bf8a3e$dc8878c0$58a2143f@tim> Quickie: [Tim] >> It's not obvious, but the SCCs can be found in linear time (via Tarjan's >> algorithm, which is simple but subtle; [NeilS] > Wow, it seems like it should be more expensive than that. Oh yes! Many bright people failed to discover the trick; Tarjan didn't discover it until (IIRC) the early 70's, and it was a surprise. It's just a few lines of simple code added to an ordinary depth-first search. However, while the code is simple, a correctness proof is not. BTW, if it wasn't clear, when talking about graph algorithms "linear" is usual taken to mean "in the sum of the number of nodes and edges". Cyclops.py finds all the cycles in linear time in that sense, too (but does not find the SCCs in linear time, at least not in theory -- in practice you can't tell the difference ). > What are the space requirements? Same as depth-first search, plus a way to associate an SCC id with each node, plus a single global "id" vrbl. So it's worst-case linear (in the number of nodes) space. See, e.g., any of the books in Sedgewick's "Algorithms in [Language du Jour]" series for working code. > Also, does the simple algorithm you used in Cyclops have a name? Not officially, but it answers to "hey, dumb-ass!" . then-again-so-do-i-so-make-eye-contact-ly y'rs - tim From bwarsaw at cnri.reston.va.us Fri Mar 10 05:21:46 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 9 Mar 2000 23:21:46 -0500 (EST) Subject: [Python-Dev] finalization again References: <000701bf89ab$80cb8e20$0d2d153f@tim> <200003091955.OAA26217@eric.cnri.reston.va.us> Message-ID: <14536.30810.720836.886023@anthem.cnri.reston.va.us> Okay, I had a flash of inspiration on the way home from my gig tonight. Of course, I'm also really tired so I'm sure Tim will shoot this down in his usual witty but humbling way. I just had to get this out or I wouldn't sleep tonight. What if you timestamp instances when you create them? Then when you have trash cycles with finalizers, you sort them and finalize in chronological order. The nice thing here is that the user can have complete control over finalization order by controlling object creation order. Some random thoughts: - Finalization order of cyclic finalizable trash is completely deterministic. - Given sufficient resolution of your system clock, you should never have two objects with the same timestamp. - You could reduce the memory footprint by only including a timestamp for objects whose classes have __del__'s at instance creation time. Sticking an __del__ into your class dynamically would have no effect on objects that are already created (and I wouldn't poke you with a pointy stick if even post-twiddle instances didn't get timestamped). Thus, such objects would never be finalized -- tough luck. - FIFO order /seems/ more natural to me than FILO, but then I rarely create cyclic objects, and almost never use __del__, so this whole argument has been somewhat academic to me :). - The rule seems easy enough to implement, describe, and understand. I think I came up with a few more points on the drive home, but my post jam, post lightbulb endorphodrenalin rush is quickly subsiding, so I leave the rest until tomorrow. its-simply-a-matter-of-time-ly y'rs, -Barry From moshez at math.huji.ac.il Fri Mar 10 06:32:41 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Fri, 10 Mar 2000 07:32:41 +0200 (IST) Subject: [Python-Dev] finalization again In-Reply-To: Message-ID: On Thu, 9 Mar 2000, Greg Stein wrote: > > But I'd still like to reclaim the memory. If this is some > > long-running server process that is executing arbitrary Python > > commands sent to it by clients, it's not nice to leak, period. > > If an exception is raised, the top-level server loop can catch it, log the > error, and keep going. But yes: it will leak. And Tim's version stops the leaking if the server is smart enough: occasionally, it will call gc.get_dangerous_cycles(), and nuke everything it finds there. (E.g., clean up dicts and lists). Some destructor raises an exception? Ignore it (or whatever). And no willy-nilly "but I'm using a silly OS which has hardly any concept of stderr" problems! If the server wants, it can just send a message to the log. rooting-for-tim-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From tim_one at email.msn.com Fri Mar 10 09:18:29 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 10 Mar 2000 03:18:29 -0500 Subject: [Python-Dev] finalization again In-Reply-To: <200003091955.OAA26217@eric.cnri.reston.va.us> Message-ID: <000001bf8a69$37d57b40$812d153f@tim> This is getting to be fun, but afraid I can only make time for the first easy one tonight: [Tim, conjures a horrid vision of finalizers installing new __del__ methods, then sez ... ] > The scheme above is meant to be bulletproof in the face of abuses even > I can't conceive of . [Guido] > Are you *sure* your scheme deals with this? Never said it did -- only that it *meant* to . Ya, you got me. The things I thought I had *proved* I put in the numbered list, and in a rush put the speculative stuff in the reply body. One practical thing I think I can prove today: after finding SCCs, and identifying the safe nodes without predecessors, all such nodes S1, S2, ... can be cleaned up without fear of resurrection, or of cleaning something in Si causing anything in Sj (i!=j) to get reclaimed either (at the time I wrote it, I could only prove that cleaning *one* Si was non-problematic). Barring, of course, this "__del__ from hell" pathology. Also suspect that this claim is isomorphic to your later elaboration on why the objects on T at this point cannot be resurrected by a finalizer that runs, since they aren't reachable from any finalizers That is, exactly the same is true of "the safe (SCC super)nodes without predecessors", so I expect we've just got two ways of identifying the same set here. Perhaps yours is bigger, though (I realize that isn't clear; later). > Let's look at an example. > (Again, lowercase nodes have no finalizers.) Take G: > > a <=> b -> C > > [and cleaning b can trigger C.__del__ which can create > a.__class__.__del__ before a is decref'ed ...] > > ... and we're halfway committing a crime we said we would never commit > (touching cyclical trash with finalizers). Wholly agreed. > I propose to disregard this absurd possibility, How come you never propose to just shoot people <0.9 wink>? > except to the extent that Python shouldn't crash -- but we make no > guarantees to the user. "Shouldn't crash" is essential, sure. Carry it another step: after C is finalized, we get back to the loop clearing b.__dict__, and the refcount on "a" falls to 0 next. So the new a.__del__ gets called. Since b was visible to a, it's possible for a.__del__ to resurrect b, which latter is now in some bizarre (from the programmer's POV) cleared state (or even in the bit bucket, if we optimistically reclaim b's memory "early"!). I can't (well, don't want to ) believe it will be hard to stop this. It's just irksome to need to think about it at all. making-java's-gc-look-easy?-ly y'rs - tim From guido at python.org Fri Mar 10 14:46:43 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Mar 2000 08:46:43 -0500 Subject: [Python-Dev] finalization again In-Reply-To: Your message of "Thu, 09 Mar 2000 23:21:46 EST." <14536.30810.720836.886023@anthem.cnri.reston.va.us> References: <000701bf89ab$80cb8e20$0d2d153f@tim> <200003091955.OAA26217@eric.cnri.reston.va.us> <14536.30810.720836.886023@anthem.cnri.reston.va.us> Message-ID: <200003101346.IAA27847@eric.cnri.reston.va.us> > What if you timestamp instances when you create them? Then when you > have trash cycles with finalizers, you sort them and finalize in > chronological order. The nice thing here is that the user can have > complete control over finalization order by controlling object > creation order. > > Some random thoughts: > > - Finalization order of cyclic finalizable trash is completely > deterministic. > > - Given sufficient resolution of your system clock, you should never > have two objects with the same timestamp. Forget the clock -- just use a counter that is incremented on each allocation. > - You could reduce the memory footprint by only including a timestamp > for objects whose classes have __del__'s at instance creation time. > Sticking an __del__ into your class dynamically would have no effect > on objects that are already created (and I wouldn't poke you with a > pointy stick if even post-twiddle instances didn't get > timestamped). Thus, such objects would never be finalized -- tough > luck. > > - FIFO order /seems/ more natural to me than FILO, but then I rarely > create cyclic objects, and almost never use __del__, so this whole > argument has been somewhat academic to me :). Ai, there's the rub. Suppose I have a tree with parent and child links. And suppose I have a rule that children need to be finalized before their parents (maybe they represent a Unix directory tree, where you must rm the files before you can rmdir the directory). This suggests that we should choose LIFO: you must create the parents first (you have to create a directory before you can create files in it). However, now we add operations to move nodes around in the tree. Suddenly you can have a child that is older than its parent! Conclusion: the creation time is useless; the application logic and actual link relationships are needed. > - The rule seems easy enough to implement, describe, and understand. > > I think I came up with a few more points on the drive home, but my > post jam, post lightbulb endorphodrenalin rush is quickly subsiding, > so I leave the rest until tomorrow. > > its-simply-a-matter-of-time-ly y'rs, > -Barry Time flies like an arrow -- fruit flies like a banana. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Mar 10 16:06:48 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Mar 2000 10:06:48 -0500 Subject: [Python-Dev] writelines() not thread-safe In-Reply-To: Your message of "Thu, 09 Mar 2000 21:13:51 EST." <000601bf8a36$46ebf880$58a2143f@tim> References: <000601bf8a36$46ebf880$58a2143f@tim> Message-ID: <200003101506.KAA28358@eric.cnri.reston.va.us> OK, here's a patch for writelines() that supports arbitrary sequences and fixes the lock problem using Tim's solution #1.5 (slicing 1000 items at a time). It contains a fast path for when the argument is a list, using PyList_GetSlice; otherwise it uses PyObject_GetItem and a fixed list. Please have a good look at this; I've only tested it lightly. --Guido van Rossum (home page: http://www.python.org/~guido/) Index: fileobject.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/fileobject.c,v retrieving revision 2.70 diff -c -r2.70 fileobject.c *** fileobject.c 2000/02/29 13:59:28 2.70 --- fileobject.c 2000/03/10 14:55:47 *************** *** 884,923 **** PyFileObject *f; PyObject *args; { ! int i, n; if (f->f_fp == NULL) return err_closed(); ! if (args == NULL || !PyList_Check(args)) { PyErr_SetString(PyExc_TypeError, ! "writelines() requires list of strings"); return NULL; } ! n = PyList_Size(args); ! f->f_softspace = 0; ! Py_BEGIN_ALLOW_THREADS ! errno = 0; ! for (i = 0; i < n; i++) { ! PyObject *line = PyList_GetItem(args, i); ! int len; ! int nwritten; ! if (!PyString_Check(line)) { ! Py_BLOCK_THREADS ! PyErr_SetString(PyExc_TypeError, ! "writelines() requires list of strings"); return NULL; } ! len = PyString_Size(line); ! nwritten = fwrite(PyString_AsString(line), 1, len, f->f_fp); ! if (nwritten != len) { ! Py_BLOCK_THREADS ! PyErr_SetFromErrno(PyExc_IOError); ! clearerr(f->f_fp); ! return NULL; } } ! Py_END_ALLOW_THREADS Py_INCREF(Py_None); ! return Py_None; } static PyMethodDef file_methods[] = { --- 884,975 ---- PyFileObject *f; PyObject *args; { ! #define CHUNKSIZE 1000 ! PyObject *list, *line; ! PyObject *result; ! int i, j, index, len, nwritten, islist; ! if (f->f_fp == NULL) return err_closed(); ! if (args == NULL || !PySequence_Check(args)) { PyErr_SetString(PyExc_TypeError, ! "writelines() requires sequence of strings"); return NULL; } ! islist = PyList_Check(args); ! ! /* Strategy: slurp CHUNKSIZE lines into a private list, ! checking that they are all strings, then write that list ! without holding the interpreter lock, then come back for more. */ ! index = 0; ! if (islist) ! list = NULL; ! else { ! list = PyList_New(CHUNKSIZE); ! if (list == NULL) return NULL; + } + result = NULL; + + for (;;) { + if (islist) { + Py_XDECREF(list); + list = PyList_GetSlice(args, index, index+CHUNKSIZE); + if (list == NULL) + return NULL; + j = PyList_GET_SIZE(list); } ! else { ! for (j = 0; j < CHUNKSIZE; j++) { ! line = PySequence_GetItem(args, index+j); ! if (line == NULL) { ! if (PyErr_ExceptionMatches(PyExc_IndexError)) { ! PyErr_Clear(); ! break; ! } ! /* Some other error occurred. ! Note that we may lose some output. */ ! goto error; ! } ! if (!PyString_Check(line)) { ! PyErr_SetString(PyExc_TypeError, ! "writelines() requires sequences of strings"); ! goto error; ! } ! PyList_SetItem(list, j, line); ! } ! } ! if (j == 0) ! break; ! ! Py_BEGIN_ALLOW_THREADS ! f->f_softspace = 0; ! errno = 0; ! for (i = 0; i < j; i++) { ! line = PyList_GET_ITEM(list, i); ! len = PyString_GET_SIZE(line); ! nwritten = fwrite(PyString_AS_STRING(line), ! 1, len, f->f_fp); ! if (nwritten != len) { ! Py_BLOCK_THREADS ! PyErr_SetFromErrno(PyExc_IOError); ! clearerr(f->f_fp); ! Py_DECREF(list); ! return NULL; ! } } + Py_END_ALLOW_THREADS + + if (j < CHUNKSIZE) + break; + index += CHUNKSIZE; } ! Py_INCREF(Py_None); ! result = Py_None; ! error: ! Py_XDECREF(list); ! return result; } static PyMethodDef file_methods[] = { From skip at mojam.com Fri Mar 10 16:28:13 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 10 Mar 2000 09:28:13 -0600 Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler Message-ID: <200003101528.JAA15951@beluga.mojam.com> Consider the following snippet of code from MySQLdb.py: try: self._query(query % escape_row(args, qc)) except TypeError: self._query(query % escape_dict(args, qc)) It's not quite right. There are at least four reasons I can think of why the % operator might raise a TypeError: 1. query has not enough format specifiers 2. query has too many format specifiers 3. argument type mismatch between individual format specifier and corresponding argument 4. query expects dist-style interpolation The except clause only handles the last case. That leaves the other three cases mishandled. The above construct pretends that all TypeErrors possible are handled by calling escape_dict() instead of escape_row(). I stumbled on case 2 yesterday and got a fairly useless error message when the code in the except clause also bombed. Took me a few minutes of head scratching to see that I had an extra %s in my format string. A note to Andy Dustman, MySQLdb's author, yielded the following modified version: try: self._query(query % escape_row(args, qc)) except TypeError, m: if m.args[0] == "not enough arguments for format string": raise if m.args[0] == "not all arguments converted": raise self._query(query % escape_dict(args, qc)) This will do the trick for me for the time being. Note, however, that the only way for Andy to decide which of the cases occurred (case 3 still isn't handled above, but should occur very rarely in MySQLdb since it only uses the more accommodating %s as a format specifier) is to compare the string value of the message to see which of the four cases was raised. This strong coupling via the error message text between the exception being raised (in C code, in this case) and the place where it's caught seems bad to me and encourages authors to either not recover from errors or to recover from them in the crudest fashion. If Guido decides to tweak the TypeError message in any fashion, perhaps to include the count of arguments in the format string and argument tuple, this code will break. It makes me wonder if there's not a better mechanism waiting to be discovered. Would it be possible to publish an interface of some sort via the exceptions module that would allow symbolic names or dictionary references to be used to decide which case is being handled? I envision something like the following in exceptions.py: UNKNOWN_ERROR_CATEGORY = 0 TYP_SHORT_FORMAT = 1 TYP_LONG_FORMAT = 2 ... IND_BAD_RANGE = 1 message_map = { # leave (TypeError, ("not enough arguments for format string",)): TYP_SHORT_FORMAT, (TypeError, ("not all arguments converted",)): TYP_LONG_FORMAT, ... (IndexError, ("list index out of range",)): IND_BAD_RANGE, ... } This would isolate the raw text of exception strings to just a single place (well, just one place on the exception handling side of things). It would be used something like try: self._query(query % escape_row(args, qc)) except TypeError, m: from exceptions import * exc_case = message_map.get((TypeError, m.args), UNKNOWN_ERROR_CATEGORY) if exc_case in [UNKNOWN_ERROR_CATEGORY,TYP_SHORT_FORMAT, TYP_LONG_FORMAT]: raise self._query(query % escape_dict(args, qc)) This could be added to exceptions.py without breaking existing code. Does this (or something like it) seem like a reasonable enhancement for Py2K? If we can narrow things down to an implementable solution I'll create a patch. Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From guido at python.org Fri Mar 10 17:17:56 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Mar 2000 11:17:56 -0500 Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler In-Reply-To: Your message of "Fri, 10 Mar 2000 09:28:13 CST." <200003101528.JAA15951@beluga.mojam.com> References: <200003101528.JAA15951@beluga.mojam.com> Message-ID: <200003101617.LAA28722@eric.cnri.reston.va.us> > Consider the following snippet of code from MySQLdb.py: Skip, I'm not familiar with MySQLdb.py, and I have no idea what your example is about. From the rest of the message I feel it's not about MySQLdb at all, but about string formatting, butthe point escapes me because you never quite show what's in the format string and what error that gives. Could you give some examples based on first principles? A simple interactive session showing the various errors would be helpful... --Guido van Rossum (home page: http://www.python.org/~guido/) From gward at cnri.reston.va.us Fri Mar 10 20:05:04 2000 From: gward at cnri.reston.va.us (Greg Ward) Date: Fri, 10 Mar 2000 14:05:04 -0500 Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler In-Reply-To: <200003101617.LAA28722@eric.cnri.reston.va.us>; from guido@python.org on Fri, Mar 10, 2000 at 11:17:56AM -0500 References: <200003101528.JAA15951@beluga.mojam.com> <200003101617.LAA28722@eric.cnri.reston.va.us> Message-ID: <20000310140503.A8619@cnri.reston.va.us> On 10 March 2000, Guido van Rossum said: > Skip, I'm not familiar with MySQLdb.py, and I have no idea what your > example is about. From the rest of the message I feel it's not about > MySQLdb at all, but about string formatting, butthe point escapes me > because you never quite show what's in the format string and what > error that gives. Could you give some examples based on first > principles? A simple interactive session showing the various errors > would be helpful... I think Skip's point was just this: "TypeError" isn't expressive enough. If you catch TypeError on a statement with multiple possible type errors, you don't know which one you caught. Same holds for any exception type, really: a given statement could blow up with ValueError for any number of reasons. Etc., etc. One possible solution, and I think this is what Skip was getting at, is to add an "error code" to the exception object that identifies the error more reliably than examining the error message. It's just the errno/strerror dichotomy: strerror is for users, errno is for code. I think Skip is just saying that Pythone exception objets need an errno (although it doesn't have to be a number). It would probably only make sense to define error codes for exceptions that can be raised by Python itself, though. Greg From skip at mojam.com Fri Mar 10 21:17:30 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 10 Mar 2000 14:17:30 -0600 (CST) Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler In-Reply-To: <200003101617.LAA28722@eric.cnri.reston.va.us> References: <200003101528.JAA15951@beluga.mojam.com> <200003101617.LAA28722@eric.cnri.reston.va.us> Message-ID: <14537.22618.656740.296408@beluga.mojam.com> Guido> Skip, I'm not familiar with MySQLdb.py, and I have no idea what Guido> your example is about. From the rest of the message I feel it's Guido> not about MySQLdb at all, but about string formatting, My apologies. You're correct, it's really not about MySQLdb. It's about handling multiple cases raised by the same exception. First, a more concrete example that just uses simple string formats: code exception "%s" % ("a", "b") TypeError: 'not all arguments converted' "%s %s" % "a" TypeError: 'not enough arguments for format string' "%(a)s" % ("a",) TypeError: 'format requires a mapping' "%d" % {"a": 1} TypeError: 'illegal argument type for built-in operation' Let's presume hypothetically that it's possible to recover from some subset of the TypeErrors that are raised, but not all of them. Now, also presume that the format strings and the tuple, string or dict literals I've given above can be stored in variables (which they can). If we wrap the code in a try/except statement, we can catch the TypeError exception and try to do something sensible. This is precisely the trick that Andy Dustman uses in MySQLdb: first try expanding the format string using a tuple as the RH operand, then try with a dict if that fails. Unfortunately, as you can see from the above examples, there are four cases that need to be handled. To distinguish them currently, you have to compare the message you get with the exception to string literals that are generally defined in C code in the interpreter. Here's what Andy's original code looked like stripped of the MySQLdb-ese: try: x = format % tuple_generating_function(...) except TypeError: x = format % dict_generating_function(...) That doesn't handle the first two cases above. You have to inspect the message that raise sends out: try: x = format % tuple_generating_function(...) except TypeError, m: if m.args[0] == "not all arguments converted": raise if m.args[0] == "not enough arguments for format string": raise x = format % dict_generating_function(...) This comparison of except arguments with hard-coded strings (especially ones the programmer has no direct control over) seems fragile to me. If you decide to reword the error message strings, you break someone's code. In my previous message I suggested collecting this fragility in the exceptions module where it can be better isolated. My solution is a bit cumbersome, but could probably be cleaned up somewhat, but basically looks like try: x = format % tuple_generating_function(...) except TypeError, m: import exceptions msg_case = exceptions.message_map.get((TypeError, m.args), exceptions.UNKNOWN_ERROR_CATEGORY) # punt on the cases we can't recover from if msg_case == exceptions.TYP_SHORT_FORMAT: raise if msg_case == exceptions.TYP_LONG_FORMAT: raise if msg_case == exceptions.UNKNOWN_ERROR_CATEGORY: raise # handle the one we can x = format % dict_generating_function(...) In private email that crossed my original message, Andy suggested defining more standard exceptions, e.g.: class FormatError(TypeError): pass class TooManyElements(FormatError): pass class TooFewElements(FormatError): pass then raising the appropriate error based on the circumstance. Code that catches TypeError exceptions would still work. So there are two possible changes on the table: 1. define more standard exceptions so you can distinguish classes of errors on a more fine-grained basis using just the first argument of the except clause. 2. provide some machinery in exceptions.py to allow programmers a measure of uncoupling from using hard-coded strings to distinguish cases. Skip From skip at mojam.com Fri Mar 10 21:21:11 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 10 Mar 2000 14:21:11 -0600 (CST) Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler In-Reply-To: <20000310140503.A8619@cnri.reston.va.us> References: <200003101528.JAA15951@beluga.mojam.com> <200003101617.LAA28722@eric.cnri.reston.va.us> <20000310140503.A8619@cnri.reston.va.us> Message-ID: <14537.22839.664131.373727@beluga.mojam.com> Greg> One possible solution, and I think this is what Skip was getting Greg> at, is to add an "error code" to the exception object that Greg> identifies the error more reliably than examining the error Greg> message. It's just the errno/strerror dichotomy: strerror is for Greg> users, errno is for code. I think Skip is just saying that Greg> Pythone exception objets need an errno (although it doesn't have Greg> to be a number). It would probably only make sense to define Greg> error codes for exceptions that can be raised by Python itself, Greg> though. I'm actually allowing the string to be used as the error code. If you raise TypeError with "not all arguments converted" as the argument, then that string literal will appear in the definition of exceptions.message_map as part of a key. The programmer would only refer to the args attribute of the object being raised. either-or-makes-no-real-difference-to-me-ly y'rs, Skip From bwarsaw at cnri.reston.va.us Fri Mar 10 21:56:45 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Fri, 10 Mar 2000 15:56:45 -0500 (EST) Subject: [Python-Dev] finalization again References: <000701bf89ab$80cb8e20$0d2d153f@tim> <200003091955.OAA26217@eric.cnri.reston.va.us> <14536.30810.720836.886023@anthem.cnri.reston.va.us> <200003101346.IAA27847@eric.cnri.reston.va.us> Message-ID: <14537.24973.579056.533282@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: >> Given sufficient resolution of your system >> clock, you should never have two objects with the same >> timestamp. GvR> Forget the clock -- just use a counter that is incremented on GvR> each allocation. Good idea. GvR> Suppose I have a tree with parent and child links. And GvR> suppose I have a rule that children need to be finalized GvR> before their parents (maybe they represent a Unix directory GvR> tree, where you must rm the files before you can rmdir the GvR> directory). This suggests that we should choose LIFO: you GvR> must create the parents first (you have to create a directory GvR> before you can create files in it). However, now we add GvR> operations to move nodes around in the tree. Suddenly you GvR> can have a child that is older than its parent! Conclusion: GvR> the creation time is useless; the application logic and GvR> actual link relationships are needed. One potential way to solve this is to provide an interface for refreshing the counter; for discussion purposes, I'll call this sys.gcrefresh(obj). Throws a TypeError if obj isn't a finalizable instance. Otherwise, it sets the "timestamp" to the current counter value and increments the counter. Thus, in your example, when the child node is reparented, you sys.gcrefresh(child) and now the parent is automatically older. Of course, what if the child has its own children? You've now got an age graph like this parent > child < grandchild with the wrong age relationship between the parent and grandchild. So when you refresh, you've got to walk down the containment tree making sure your grandkids are "younger" than yourself. E.g.: class Node: ... def __del__(self): ... def reparent(self, node): self.parent = node self.refresh() def refresh(self): sys.gcrefresh(self) for c in self.children: c.refresh() The point to all this is that it gives explicit control of the finalizable cycle reclamation order to the user, via a fairly easy to understand, and manipulate mechanism. twas-only-a-flesh-wound-but-waiting-for-the-next-stroke-ly y'rs, -Barry From jim at interet.com Fri Mar 10 22:14:45 2000 From: jim at interet.com (James C. Ahlstrom) Date: Fri, 10 Mar 2000 16:14:45 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? References: <000801bf8a3b$aa0c4e60$58a2143f@tim> Message-ID: <38C965C4.B164C2D5@interet.com> Tim Peters wrote: > > [Fred L. Drake, Jr.] > > Tim (& others), > > Would this additional text be sufficient for the os.popen() > > documentation? > > > > \strong{Note:} This function behaves unreliably under Windows > > due to the native implementation of \cfunction{popen()}. > > Yes, that's good! If Mark/Bill's alternatives don't make it in, would also > be good to point to the PythonWin extensions (although MarkH will have to > give us the Official Name for that). Well, it looks like this thread has fizzled out. But what did we decide? Changing the docs to say popen() "doesn't work reliably" is a little weak. Maybe removing popen() is better, and demanding that Windows users use win32pipe. I played around with a patch to posixmodule.c which eliminates _popen() and implements os.popen() using CreatePipe(). It sort of works on NT and fails on 95. Anyway, I am stuck on how to make a Python file object from a pipe handle. Would it be a good idea to extract the Wisdom from win32pipe and re-implement os.popen() either in C or by using win32pipe directly? Using C is simple and to the point. I feel Tim's original complaint that popen() is a Problem still hasn't been fixed. JimA From moshez at math.huji.ac.il Fri Mar 10 22:29:05 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Fri, 10 Mar 2000 23:29:05 +0200 (IST) Subject: [Python-Dev] finalization again In-Reply-To: <14537.24973.579056.533282@anthem.cnri.reston.va.us> Message-ID: On Fri, 10 Mar 2000 bwarsaw at cnri.reston.va.us wrote: > One potential way to solve this is to provide an interface for > refreshing the counter; for discussion purposes, I'll call this > sys.gcrefresh(obj). Barry, there are other problems with your scheme, but I won't even try to point those out: having to call a function whose purpose can only be described in terms of a concrete implementation of a garbage collection scheme is simply unacceptable. I can almost see you shouting "Come back here, I'll bite your legs off" . > The point to all this is that it gives explicit control of the > finalizable cycle reclamation order to the user, via a fairly easy to > understand, and manipulate mechanism. Oh? This sounds like the most horrendus mechanism alive.... you-probably-jammed-a-*little*-too-loud-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From bwarsaw at cnri.reston.va.us Fri Mar 10 23:15:27 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Fri, 10 Mar 2000 17:15:27 -0500 (EST) Subject: [Python-Dev] finalization again References: <14537.24973.579056.533282@anthem.cnri.reston.va.us> Message-ID: <14537.29695.532507.197580@anthem.cnri.reston.va.us> Just throwing out ideas. From DavidA at ActiveState.com Fri Mar 10 23:20:45 2000 From: DavidA at ActiveState.com (David Ascher) Date: Fri, 10 Mar 2000 14:20:45 -0800 Subject: [Python-Dev] finalization again In-Reply-To: Message-ID: Moshe, some _arguments_ backing your feelings might give them more weight... As they stand, they're just insults, and if I were Barry I'd ignore them. --david ascher Moshe Zadka: > Barry, there are other problems with your scheme, but I won't even try to > point those out: having to call a function whose purpose can only be > described in terms of a concrete implementation of a garbage collection > scheme is simply unacceptable. I can almost see you shouting "Come back > here, I'll bite your legs off" . > [...] > Oh? This sounds like the most horrendus mechanism alive.... From skip at mojam.com Fri Mar 10 23:40:02 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 10 Mar 2000 16:40:02 -0600 Subject: [Python-Dev] on the suitability of ideas tossed out to python-dev Message-ID: <200003102240.QAA07881@beluga.mojam.com> Folks, let's not forget that python-dev is a place where oftentimes half-baked ideas will get advanced. I came up with an idea about decoupling error handling from exception message strings. I don't expect my idea to be adopted as is. Similarly, Barry's ideas about object timestamps were admittedly conceived late at night in the thrill following an apparently good gig. (I like the idea that every object has a modtime, but for other reasons than Barry suggested.) My feeling is that bad ideas will get winnowed out or drastically modified quickly enough anyway. Think of these early ideas as little more than brainstorms. A lot of times if I have an idea, I feel I need to put it down on my virtual whiteboard quickly, because a) I often don't have a lot of time to pursue stuff (do it now or it won't get done), b) because bad ideas can be the catalyst for better ideas, and c) if I don't do it immediately, I'll probably forget the idea altogether, thus missing the opportunity for reason b altogether. Try and collect a bunch of ideas before shooting any down and see what falls out. The best ideas will survive. When people start proving things and using fancy diagrams like "a <=> b -> C", then go ahead and get picky... ;-) Have a relaxing, thought provoking weekend. I'm going to go see a movie this evening with my wife and youngest son, appropriately enough titled, "My Dog Skip". Enough Pythoneering for one day... bow-wow-ly y'rs, Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From guido at python.org Sat Mar 11 01:20:01 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Mar 2000 19:20:01 -0500 Subject: [Python-Dev] Unicode patches checked in Message-ID: <200003110020.TAA17777@eric.cnri.reston.va.us> I've just checked in a massive patch from Marc-Andre Lemburg which adds Unicode support to Python. This work was financially supported by Hewlett-Packard. Marc-Andre has done a tremendous amount of work, for which I cannot thank him enough. We're still awaiting some more things: Marc-Andre gave me documentation patches which will be reviewed by Fred Drake before they are checked in; Fredrik Lundh has developed a new regular expression which is Unicode-aware and which should be checked in real soon now. Also, the documentation is probably incomplete and will be updated, and of course there may be bugs -- this should be considered alpha software. However, I believe it is quite good already, otherwise I wouldn't have checked it in! I'd like to invite everyone with an interest in Unicode or Python 1.6 to check out this new Unicode-aware Python, so that we can ensure a robust code base by the time Python 1.6 is released (planned release date: June 1, 2000). The download links are below. Links: http://www.python.org/download/cvs.html Instructions on how to get access to the CVS version. (David Ascher is making nightly tarballs of the CVS version available at http://starship.python.net/crew/da/pythondists/) http://starship.python.net/crew/lemburg/unicode-proposal.txt The latest version of the specification on which the Marc has based his implementation. http://www.python.org/sigs/i18n-sig/ Home page of the i18n-sig (Internationalization SIG), which has lots of other links about this and related issues. http://www.python.org/search/search_bugs.html The Python Bugs List. Use this for all bug reports. Note that next Tuesday I'm going on a 10-day trip, with limited time to read email and no time to solve problems. The usual crowd will take care of urgent updates. See you at the Intel Computing Continuum Conference in San Francisco or at the Python Track at Software Development 2000 in San Jose! --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one at email.msn.com Sat Mar 11 03:03:47 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 10 Mar 2000 21:03:47 -0500 Subject: [Python-Dev] Finalization in Eiffel Message-ID: <000701bf8afe$0a0fd800$a42d153f@tim> Eiffel is Bertrand Meyer's "design by contract" OO language. Meyer took extreme care in its design, and has written extensively and articulately about the design -- agree with him or not, he's always worth reading! I used Eiffel briefly a few years ago, just out of curiosity. I didn't recall even bumping into a notion of destructors. Turns out it does have them, but they're appallingly (whether relative to Eiffel's usual clarity, or even relative to C++'s usual lack thereof <0.9 wink>) ill-specified. An Eiffel class can register a destructor by inheriting from the system MEMORY class and overriding the latter's "dispose()". This appears to be viewed as a low-level facility, and neither OOSC (2nd ed) nor "Eiffel: The Language" say much about its semantics. Within dispose, you're explicitly discouraged from invoking methods on *any* other object, and resurrection is right out the window. But the language doesn't appear to check for any of that, which is extremely un-Eiffel-like. Many msgs on comp.lang.eiffel from people who should know suggest that all but one Eiffel implementation pay no attention at all to reachability during gc, and that none support resurrection. If you need ordering during finalization, the advice is to write that part in C/C++. Violations of the vague rules appear to lead to random system damage(!). Looking at various Eiffel pkgs on the web, the sole use of dispose was in one-line bodies that released external resources (like memory & db connections) via calling an external C/C++ function. jealous-&-appalled-at-the-same-time-ly y'rs - tim From tim_one at email.msn.com Sat Mar 11 03:03:50 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 10 Mar 2000 21:03:50 -0500 Subject: [Python-Dev] Conventional wisdom on finalization Message-ID: <000801bf8afe$0b3df7c0$a42d153f@tim> David Chase maintains a well-regarded GC FAQ, at http://www.iecc.com/gclist/GC-faq.html Interested folks should look it up. A couple highlights: On cycles with finalizers: In theory, of course, a cycle in the graph of objects to be finalized will prevent a topological sort from succeeding. In practice, the "right" thing to do appears to be to signal an error (at least when debugging) and let the programmer clean this up. People with experience on large systems report that such cycles are in fact exceedingly rare (note, however, that some languages define "finalizers" for almost every object, and that was not the case for the large systems studied -- there, finalizers were not too common). On Java's "finalizer called only once" rule: if an object is revived in finalization, that is fine, but its finalizer will not run a second time. (It isn't clear if this is a matter of design, or merely an accident of the first implementation of the language, but it is in the specification now. Obviously, this encourages careful use of finalization, in much the same way that driving without seatbelts encourages careful driving.) Until today, I had no idea I was so resolutely conventional . seems-we're-trying-to-do-more-than-anyone-other-than-us-expects-ly y'rs - tim From shichang at icubed.com Fri Mar 10 23:33:11 2000 From: shichang at icubed.com (Shichang Zhao) Date: Fri, 10 Mar 2000 22:33:11 -0000 Subject: [Python-Dev] RE: Unicode patches checked in Message-ID: <01BF8AE0.9E911980.shichang@icubed.com> I would love to test the Python 1.6 (Unicode support) in Chinese language aspect, but I don't know where I can get a copy of OS that supports Chinese. Anyone can point me a direction? -----Original Message----- From: Guido van Rossum [SMTP:guido at python.org] Sent: Saturday, March 11, 2000 12:20 AM To: Python mailing list; python-announce at python.org; python-dev at python.org; i18n-sig at python.org; string-sig at python.org Cc: Marc-Andre Lemburg Subject: Unicode patches checked in I've just checked in a massive patch from Marc-Andre Lemburg which adds Unicode support to Python. This work was financially supported by Hewlett-Packard. Marc-Andre has done a tremendous amount of work, for which I cannot thank him enough. We're still awaiting some more things: Marc-Andre gave me documentation patches which will be reviewed by Fred Drake before they are checked in; Fredrik Lundh has developed a new regular expression which is Unicode-aware and which should be checked in real soon now. Also, the documentation is probably incomplete and will be updated, and of course there may be bugs -- this should be considered alpha software. However, I believe it is quite good already, otherwise I wouldn't have checked it in! I'd like to invite everyone with an interest in Unicode or Python 1.6 to check out this new Unicode-aware Python, so that we can ensure a robust code base by the time Python 1.6 is released (planned release date: June 1, 2000). The download links are below. Links: http://www.python.org/download/cvs.html Instructions on how to get access to the CVS version. (David Ascher is making nightly tarballs of the CVS version available at http://starship.python.net/crew/da/pythondists/) http://starship.python.net/crew/lemburg/unicode-proposal.txt The latest version of the specification on which the Marc has based his implementation. http://www.python.org/sigs/i18n-sig/ Home page of the i18n-sig (Internationalization SIG), which has lots of other links about this and related issues. http://www.python.org/search/search_bugs.html The Python Bugs List. Use this for all bug reports. Note that next Tuesday I'm going on a 10-day trip, with limited time to read email and no time to solve problems. The usual crowd will take care of urgent updates. See you at the Intel Computing Continuum Conference in San Francisco or at the Python Track at Software Development 2000 in San Jose! --Guido van Rossum (home page: http://www.python.org/~guido/) -- http://www.python.org/mailman/listinfo/python-list From moshez at math.huji.ac.il Sat Mar 11 10:10:12 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 11 Mar 2000 11:10:12 +0200 (IST) Subject: [Python-Dev] Unicode: When Things Get Hairy Message-ID: The following "problem" is easy to fix. However, what I wanted to know is if people (Skip and Guido most importantly) think it is a problem: >>> "a" in u"bbba" 1 >>> u"a" in "bbba" Traceback (innermost last): File "", line 1, in ? TypeError: string member test needs char left operand Suggested fix: in stringobject.c, explicitly allow a unicode char left operand. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From mal at lemburg.com Sat Mar 11 11:24:26 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 11 Mar 2000 11:24:26 +0100 Subject: [Python-Dev] Unicode: When Things Get Hairy References: Message-ID: <38CA1EDA.423F8A2C@lemburg.com> Moshe Zadka wrote: > > The following "problem" is easy to fix. However, what I wanted to know is > if people (Skip and Guido most importantly) think it is a problem: > > >>> "a" in u"bbba" > 1 > >>> u"a" in "bbba" > Traceback (innermost last): > File "", line 1, in ? > TypeError: string member test needs char left operand > > Suggested fix: in stringobject.c, explicitly allow a unicode char left > operand. Hmm, this must have been introduced by your contains code... it did work before. The normal action taken by the Unicode and the string code in these mixed type situations is to first convert everything to Unicode and then retry the operation. Strings are interpreted as UTF-8 during this conversion. To simplify this task, I added method APIs to the Unicode object which do the conversion for you (they apply all the necessariy coercion business to all arguments). I guess adding another PyUnicode_Contains() wouldn't hurt :-) Perhaps I should also add a tp_contains slot to the Unicode object which then uses the above API as well. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From moshez at math.huji.ac.il Sat Mar 11 12:05:48 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 11 Mar 2000 13:05:48 +0200 (IST) Subject: [Python-Dev] Unicode: When Things Get Hairy In-Reply-To: <38CA1EDA.423F8A2C@lemburg.com> Message-ID: On Sat, 11 Mar 2000, M.-A. Lemburg wrote: > Hmm, this must have been introduced by your contains code... > it did work before. Nope: the string "in" semantics were forever special-cased. Guido beat me soundly for trying to change the semantics... > The normal action taken by the Unicode and the string > code in these mixed type situations is to first > convert everything to Unicode and then retry the operation. > Strings are interpreted as UTF-8 during this conversion. Hmmm....PySeqeunce_Contains doesn't do any conversion of the arguments. Should it? (Again, it didn't before). If it does, then the order of testing for seq_contains and seq_getitem and conversions > Perhaps I should also add a tp_contains slot to the > Unicode object which then uses the above API as well. But that wouldn't help at all for u"a" in "abbbb" PySequence_Contains only dispatches on the container argument :-( (BTW: I discovered it while contemplating adding a seq_contains (not tp_contains) to unicode objects to optimize the searching for a bit.) PS: MAL: thanks for the a great birthday present! I'm enjoying the unicode patch a lot. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From guido at python.org Sat Mar 11 13:16:06 2000 From: guido at python.org (Guido van Rossum) Date: Sat, 11 Mar 2000 07:16:06 -0500 Subject: [Python-Dev] Unicode: When Things Get Hairy In-Reply-To: Your message of "Sat, 11 Mar 2000 13:05:48 +0200." References: Message-ID: <200003111216.HAA12651@eric.cnri.reston.va.us> [Moshe discovers that u"a" in "bbba" raises TypeError] [Marc-Andre] > > Hmm, this must have been introduced by your contains code... > > it did work before. > > Nope: the string "in" semantics were forever special-cased. Guido beat me > soundly for trying to change the semantics... But I believe that Marc-Andre added a special case for Unicode in PySequence_Contains. I looked for evidence, but the last snapshot that I actually saved and built before Moshe's code was checked in is from 2/18 and it isn't in there. Yet I believe Marc-Andre. The special case needs to be added back to string_contains in stringobject.c. > > The normal action taken by the Unicode and the string > > code in these mixed type situations is to first > > convert everything to Unicode and then retry the operation. > > Strings are interpreted as UTF-8 during this conversion. > > Hmmm....PySeqeunce_Contains doesn't do any conversion of the arguments. > Should it? (Again, it didn't before). If it does, then the order of > testing for seq_contains and seq_getitem and conversions Or it could be done this way. > > Perhaps I should also add a tp_contains slot to the > > Unicode object which then uses the above API as well. Yes. > But that wouldn't help at all for > > u"a" in "abbbb" It could if PySeqeunce_Contains would first look for a string and a unicode argument (in either order) and in that case convert the string to unicode. > PySequence_Contains only dispatches on the container argument :-( > > (BTW: I discovered it while contemplating adding a seq_contains (not > tp_contains) to unicode objects to optimize the searching for a bit.) You may beat Marc-Andre to it, but I'll have to let him look at the code anyway -- I'm not sufficiently familiar with the Unicode stuff myself yet. BTW, I added a tag "pre-unicode" to the CVS tree to the revisions before the Unicode changes were made. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Sat Mar 11 14:32:57 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 11 Mar 2000 14:32:57 +0100 Subject: [Python-Dev] Unicode: When Things Get Hairy References: <200003111216.HAA12651@eric.cnri.reston.va.us> Message-ID: <38CA4B08.7B13438D@lemburg.com> Guido van Rossum wrote: > > [Moshe discovers that u"a" in "bbba" raises TypeError] > > [Marc-Andre] > > > Hmm, this must have been introduced by your contains code... > > > it did work before. > > > > Nope: the string "in" semantics were forever special-cased. Guido beat me > > soundly for trying to change the semantics... > > But I believe that Marc-Andre added a special case for Unicode in > PySequence_Contains. I looked for evidence, but the last snapshot that > I actually saved and built before Moshe's code was checked in is from > 2/18 and it isn't in there. Yet I believe Marc-Andre. The special > case needs to be added back to string_contains in stringobject.c. Moshe was right: I had probably not checked the code because the obvious combinations worked out of the box... the only combination which doesn't work is "unicode in string". I'll fix it next week. BTW, there's a good chance that the string/Unicode integration is not complete yet: just keep looking for them. > > > The normal action taken by the Unicode and the string > > > code in these mixed type situations is to first > > > convert everything to Unicode and then retry the operation. > > > Strings are interpreted as UTF-8 during this conversion. > > > > Hmmm....PySeqeunce_Contains doesn't do any conversion of the arguments. > > Should it? (Again, it didn't before). If it does, then the order of > > testing for seq_contains and seq_getitem and conversions > > Or it could be done this way. > > > > Perhaps I should also add a tp_contains slot to the > > > Unicode object which then uses the above API as well. > > Yes. > > > But that wouldn't help at all for > > > > u"a" in "abbbb" > > It could if PySeqeunce_Contains would first look for a string and a > unicode argument (in either order) and in that case convert the string > to unicode. I think the right way to do this is to add a special case to seq_contains in the string implementation. That's how most other auto-coercions work too. Instead of raising an error, the implementation would then delegate the work to PyUnicode_Contains(). > > PySequence_Contains only dispatches on the container argument :-( > > > > (BTW: I discovered it while contemplating adding a seq_contains (not > > tp_contains) to unicode objects to optimize the searching for a bit.) > > You may beat Marc-Andre to it, but I'll have to let him look at the > code anyway -- I'm not sufficiently familiar with the Unicode stuff > myself yet. I'll add that one too. BTW, Happy Birthday, Moshe :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Sat Mar 11 14:57:34 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 11 Mar 2000 14:57:34 +0100 Subject: [Python-Dev] Unicode: When Things Get Hairy References: <200003111216.HAA12651@eric.cnri.reston.va.us> <38CA4B08.7B13438D@lemburg.com> Message-ID: <38CA50CE.BEEFAB5E@lemburg.com> I couldn't resist :-) Here's the patch... BTW, how should we proceed with future patches ? Should I wrap them together about once a week, or send them as soon as they are done ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ -------------- next part -------------- diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Include/unicodeobject.h Python+Unicode/Include/unicodeobject.h --- CVS-Python/Include/unicodeobject.h Fri Mar 10 23:33:05 2000 +++ Python+Unicode/Include/unicodeobject.h Sat Mar 11 14:45:59 2000 @@ -683,6 +683,17 @@ PyObject *args /* Argument tuple or dictionary */ ); +/* Checks whether element is contained in container and return 1/0 + accordingly. + + element has to coerce to an one element Unicode string. -1 is + returned in case of an error. */ + +extern DL_IMPORT(int) PyUnicode_Contains( + PyObject *container, /* Container string */ + PyObject *element /* Element string */ + ); + /* === Characters Type APIs =============================================== */ /* These should not be used directly. Use the Py_UNICODE_IS* and diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_unicode.py Python+Unicode/Lib/test/test_unicode.py --- CVS-Python/Lib/test/test_unicode.py Sat Mar 11 00:23:20 2000 +++ Python+Unicode/Lib/test/test_unicode.py Sat Mar 11 14:52:29 2000 @@ -219,6 +219,19 @@ test('translate', u"abababc", u'iiic', {ord('a'):None, ord('b'):ord('i')}) test('translate', u"abababc", u'iiix', {ord('a'):None, ord('b'):ord('i'), ord('c'):u'x'}) +# Contains: +print 'Testing Unicode contains method...', +assert ('a' in 'abdb') == 1 +assert ('a' in 'bdab') == 1 +assert ('a' in 'bdaba') == 1 +assert ('a' in 'bdba') == 1 +assert ('a' in u'bdba') == 1 +assert (u'a' in u'bdba') == 1 +assert (u'a' in u'bdb') == 0 +assert (u'a' in 'bdb') == 0 +assert (u'a' in 'bdba') == 1 +print 'done.' + # Formatting: print 'Testing Unicode formatting strings...', assert u"%s, %s" % (u"abc", "abc") == u'abc, abc' diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Misc/unicode.txt Python+Unicode/Misc/unicode.txt --- CVS-Python/Misc/unicode.txt Sat Mar 11 00:14:11 2000 +++ Python+Unicode/Misc/unicode.txt Sat Mar 11 14:53:37 2000 @@ -743,8 +743,9 @@ stream codecs as available through the codecs module should be used. -XXX There should be a short-cut open(filename,mode,encoding) available which - also assures that mode contains the 'b' character when needed. +The codecs module should provide a short-cut open(filename,mode,encoding) +available which also assures that mode contains the 'b' character when +needed. File/Stream Input: @@ -810,6 +811,10 @@ Introduction to Unicode (a little outdated by still nice to read): http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html +For comparison: + Introducing Unicode to ECMAScript -- + http://www-4.ibm.com/software/developer/library/internationalization-support.html + Encodings: Overview: @@ -832,7 +837,7 @@ History of this Proposal: ------------------------- -1.2: +1.2: Removed POD about codecs.open() 1.1: Added note about comparisons and hash values. Added note about case mapping algorithms. Changed stream codecs .read() and .write() method to match the standard file-like object methods diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/stringobject.c Python+Unicode/Objects/stringobject.c --- CVS-Python/Objects/stringobject.c Sat Mar 11 10:55:09 2000 +++ Python+Unicode/Objects/stringobject.c Sat Mar 11 14:47:45 2000 @@ -389,7 +389,9 @@ { register char *s, *end; register char c; - if (!PyString_Check(el) || PyString_Size(el) != 1) { + if (!PyString_Check(el)) + return PyUnicode_Contains(a, el); + if (PyString_Size(el) != 1) { PyErr_SetString(PyExc_TypeError, "string member test needs char left operand"); return -1; diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/unicodeobject.c Python+Unicode/Objects/unicodeobject.c --- CVS-Python/Objects/unicodeobject.c Fri Mar 10 23:53:23 2000 +++ Python+Unicode/Objects/unicodeobject.c Sat Mar 11 14:48:52 2000 @@ -2737,6 +2737,49 @@ return -1; } +int PyUnicode_Contains(PyObject *container, + PyObject *element) +{ + PyUnicodeObject *u = NULL, *v = NULL; + int result; + register const Py_UNICODE *p, *e; + register Py_UNICODE ch; + + /* Coerce the two arguments */ + u = (PyUnicodeObject *)PyUnicode_FromObject(container); + if (u == NULL) + goto onError; + v = (PyUnicodeObject *)PyUnicode_FromObject(element); + if (v == NULL) + goto onError; + + /* Check v in u */ + if (PyUnicode_GET_SIZE(v) != 1) { + PyErr_SetString(PyExc_TypeError, + "string member test needs char left operand"); + goto onError; + } + ch = *PyUnicode_AS_UNICODE(v); + p = PyUnicode_AS_UNICODE(u); + e = p + PyUnicode_GET_SIZE(u); + result = 0; + while (p < e) { + if (*p++ == ch) { + result = 1; + break; + } + } + + Py_DECREF(u); + Py_DECREF(v); + return result; + +onError: + Py_XDECREF(u); + Py_XDECREF(v); + return -1; +} + /* Concat to string or Unicode object giving a new Unicode object. */ PyObject *PyUnicode_Concat(PyObject *left, @@ -3817,6 +3860,7 @@ (intintargfunc) unicode_slice, /* sq_slice */ 0, /* sq_ass_item */ 0, /* sq_ass_slice */ + (objobjproc)PyUnicode_Contains, /*sq_contains*/ }; static int From tim_one at email.msn.com Sat Mar 11 21:10:23 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 11 Mar 2000 15:10:23 -0500 Subject: [Python-Dev] finalization again In-Reply-To: <14536.30810.720836.886023@anthem.cnri.reston.va.us> Message-ID: <000e01bf8b95$d52939e0$c72d153f@tim> [Barry A. Warsaw, jamming after hours] > ... > What if you timestamp instances when you create them? Then when you > have trash cycles with finalizers, you sort them and finalize in > chronological order. Well, I strongly agree that would be better than finalizing them in increasing order of storage address . > ... > - FIFO order /seems/ more natural to me than FILO, Forget cycles for a moment, and consider just programs that manipulate *immutable* containers (the simplest kind to think about): at the time you create an immutable container, everything *contained* must already be in existence, so every pointer goes from a newer object (container) to an older one (containee). This is the "deep" reason for why, e.g., you can't build a cycle out of pure tuples in Python (if every pointer goes new->old, you can't get a loop, else each node in the loop would be (transitively) older than itself!). Then, since a finalizer can see objects pointed *to*, a finalizer can see only older objects. Since it's desirable that a finalizer see only wholly intact (unfinalized) objects, it is in fact the oldest object ("first in") that needs to be cleaned up last ("last out"). So, under the assumption of immutability, FILO is sufficient, but FIFO dangerous. So your muse inflamed you with an interesting tune, but you fingered the riff backwards . One problem is that it all goes out the window as soon as mutation is allowed. It's *still* desirable that a finalizer see only unfinalized objects, but in the presence of mutation that no longer bears any relationship to relative creation time. Another problem is in Guido's directory example, which we can twist to view as an "immutable container" problem that builds its image of the directory bottom-up, and where a finalizer on each node tries to remove the file (or delete the directory, whichever the node represents). In this case the physical remove/delete/unlink operations have to follow a *postorder* traversal of the container tree, so that "finalizer sees only unfinalized objects" is the opposite of what the app needs! The lesson to take from that is that the implementation can't possibly guess what ordering an app may need in a fancy finalizer. At best it can promise to follow a "natural" ordering based on the points-to relationship, and while "finalizer sees only unfinalized objects" is at least clear, it's quite possibly unhelpful (in Guido's particular case, it *can* be exploited, though, by adding a postorder remove/delete/unlink method to nodes, and explicitly calling it from __del__ -- "the rules" guarantee that the root of the tree will get finalized first, and the code can rely on that in its own explicit postorder traversal). > but then I rarely create cyclic objects, and almost never use __del__, > so this whole argument has been somewhat academic to me :). Well, not a one of us creates cycles often in CPython today, simply because we don't want to track down leaks <0.5 wink>. It seems that nobody here uses __del__ much, either; indeed, my primary use of __del__ is simply to call an explicit break_cycles() function from the header node of a graph! The need for that goes away as soon as Python reclaims cycles by itself, and I may never use __del__ at all then in the vast bulk of my code. It's because we've seen no evidence here (and also that I've seen none elsewhere either) that *anyone* is keen on mixing cycles with finalizers that I've been so persistent in saying "screw it -- let it leak, but let the user get at it if they insist on doing it". Seems we're trying to provide slick support for something nobody wants to do. If it happens by accident anyway, well, people sometimes divide by 0 by accident too <0.0 wink>: give them a way to know about it, but don't move heaven & earth trying to treat it like a normal case. if-it-were-easy-to-implement-i-wouldn't-care-ly y'rs - tim From moshez at math.huji.ac.il Sat Mar 11 21:35:43 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 11 Mar 2000 22:35:43 +0200 (IST) Subject: [Python-Dev] finalization again In-Reply-To: <000e01bf8b95$d52939e0$c72d153f@tim> Message-ID: In a continuation (yes, a dangerous word in these parts) of the timbot's looks at the way other languages handle finalization, let me add something from the Sather manual I'm now reading (when I'm done with it, you'll see me begging for iterators here, and having some weird ideas in the types-sig): =============================== Finalization will only occur once, even if new references are created to the object during finalization. Because few guarantees can be made about the environment in which finalization occurs, finalization is considered dangerous and should only be used in the rare cases that conventional coding will not suffice. =============================== (Sather is garbage-collected, BTW) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html From tim_one at email.msn.com Sat Mar 11 21:51:47 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 11 Mar 2000 15:51:47 -0500 Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler In-Reply-To: <200003101528.JAA15951@beluga.mojam.com> Message-ID: <001001bf8b9b$9e09d720$c72d153f@tim> [Skip Montanaro, with an expression that may raise TypeError for any of several distinct reasons, and wants to figure out which one after the fact] The existing exception machinery is sufficiently powerful for building a solution, so nothing new is needed in the language. What you really need here is an exhaustive list of all exceptions the language can raise, and when, and why, and a formally supported "detail" field (whether numeric id or string or whatever) that you can rely on to tell them apart at runtime. There are at least a thousand cases that need to be so documented and formalized. That's why not a one of them is now <0.9 wink>. If P3K is a rewrite from scratch, a rational scheme could be built in from the start. Else it would seem to require a volunteer with even less of a life than us . From tim_one at email.msn.com Sat Mar 11 21:51:49 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 11 Mar 2000 15:51:49 -0500 Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <38C965C4.B164C2D5@interet.com> Message-ID: <001101bf8b9b$9f37f6e0$c72d153f@tim> [James C. Ahlstrom] > Well, it looks like this thread has fizzled out. But what did we > decide? Far as I could tell, nothing specific. > ... > I feel Tim's original complaint that popen() is a Problem > still hasn't been fixed. I was passing it on from MikeF's c.l.py posting. This isn't a new problem, of course, it just drags on year after year -- which is the heart of MikeF's gripe. People have code that *does* work, but for whatever reasons it never gets moved to the core. In the meantime, the Library Ref implies the broken code that is in the core does work. One or the other has to change, and it looks most likely to me that Fred will change the docs for 1.6. While not ideal, that would be a huge improvement over the status quo. luckily-few-people-expect-windows-to-work-anyway<0.9-wink>-ly y'rs - tim From mhammond at skippinet.com.au Mon Mar 13 04:50:35 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon, 13 Mar 2000 14:50:35 +1100 Subject: [Python-Dev] string.replace behaviour change since Unicode patch. Message-ID: Hi, After applying the Unicode changes string.replace() seems to have changed its behaviour: Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> import string >>> string.replace("foo\nbar", "\n", "") 'foobar' >>> But since the Unicode update: Python 1.5.2+ (#0, Feb 2 2000, 16:46:55) [MSC 32 bit (Intel)] on win32 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> import string >>> string.replace("foo\nbar", "\n", "") Traceback (innermost last): File "", line 1, in ? File "L:\src\python-cvs\lib\string.py", line 407, in replace return s.replace(old, new, maxsplit) ValueError: empty replacement string >>> The offending check is stringmodule.c, line 1578: if (repl_len <= 0) { PyErr_SetString(PyExc_ValueError, "empty replacement string"); return NULL; } Changing the check to "< 0" fixes the immediate problem, but it is unclear why the check was added at all, so I didnt bother submitting a patch... Mark. From mal at lemburg.com Mon Mar 13 10:13:50 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 13 Mar 2000 10:13:50 +0100 Subject: [Python-Dev] string.replace behaviour change since Unicode patch. References: Message-ID: <38CCB14D.C07ACC26@lemburg.com> Mark Hammond wrote: > > Hi, > After applying the Unicode changes string.replace() seems to have changed > its behaviour: > > Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> import string > >>> string.replace("foo\nbar", "\n", "") > 'foobar' > >>> > > But since the Unicode update: > > Python 1.5.2+ (#0, Feb 2 2000, 16:46:55) [MSC 32 bit (Intel)] on win32 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> import string > >>> string.replace("foo\nbar", "\n", "") > Traceback (innermost last): > File "", line 1, in ? > File "L:\src\python-cvs\lib\string.py", line 407, in replace > return s.replace(old, new, maxsplit) > ValueError: empty replacement string > >>> > > The offending check is stringmodule.c, line 1578: > if (repl_len <= 0) { > PyErr_SetString(PyExc_ValueError, "empty replacement string"); > return NULL; > } > > Changing the check to "< 0" fixes the immediate problem, but it is unclear > why the check was added at all, so I didnt bother submitting a patch... Dang. Must have been my mistake -- it should read: if (sub_len <= 0) { PyErr_SetString(PyExc_ValueError, "empty pattern string"); return NULL; } Thanks for reporting this... I'll include the fix in the next patch set. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake at acm.org Mon Mar 13 16:43:09 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 13 Mar 2000 10:43:09 -0500 (EST) Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted? In-Reply-To: <001101bf8b9b$9f37f6e0$c72d153f@tim> References: <38C965C4.B164C2D5@interet.com> <001101bf8b9b$9f37f6e0$c72d153f@tim> Message-ID: <14541.3213.590243.359394@weyr.cnri.reston.va.us> Tim Peters writes: > code that is in the core does work. One or the other has to change, and it > looks most likely to me that Fred will change the docs for 1.6. While not > ideal, that would be a huge improvement over the status quo. Actually, I just checked in my proposed change for the 1.5.2 doc update that I'm releasing soon. I'd like to remove it for 1.6, if the appropriate implementation is moved into the core. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gvwilson at nevex.com Mon Mar 13 22:10:52 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Mon, 13 Mar 2000 16:10:52 -0500 (EST) Subject: [Python-Dev] Python 1.7 tokenization feature request Message-ID: Once 1.6 is out the door, would people be willing to consider extending Python's token set to make HTML/XML-ish spellings using entity references legal? This would make the following 100% legal Python: i = 0 while i < 10: print i & 1 i = i + 1 which would in turn make it easier to embed Python in XML such as config-files-for-whatever-Software-Carpentry-produces-to-replace-make, PMZ, and so on. Greg From skip at mojam.com Mon Mar 13 22:23:17 2000 From: skip at mojam.com (Skip Montanaro) Date: Mon, 13 Mar 2000 15:23:17 -0600 (CST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: References: Message-ID: <14541.23621.89087.357783@beluga.mojam.com> Greg> Once 1.6 is out the door, would people be willing to consider Greg> extending Python's token set to make HTML/XML-ish spellings using Greg> entity references legal? This would make the following 100% legal Greg> Python: Greg> i = 0 Greg> while i < 10: Greg> print i & 1 Greg> i = i + 1 What makes it difficult to pump your Python code through cgi.escape when embedding it? There doesn't seem to be an inverse function to cgi.escape (at least not in the cgi module), but I suspect it could rather easily be written. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From akuchlin at mems-exchange.org Mon Mar 13 22:23:29 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Mon, 13 Mar 2000 16:23:29 -0500 (EST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: References: Message-ID: <14541.23633.873411.86833@amarok.cnri.reston.va.us> gvwilson at nevex.com writes: >Once 1.6 is out the door, would people be willing to consider extending >Python's token set to make HTML/XML-ish spellings using entity references >legal? This would make the following 100% legal Python: > >i = 0 >while i < 10: > print i & 1 > i = i + 1 I don't think that would be sufficient. What about user-defined entities, as in résultat = max(a,b)? (r?sultat, in French.) Would Python have to also parse a DTD from somewhere? What about other places when Python and XML syntax collide, as in this contrived example: b: print ... Oops! The ]]> looks like the end of the CDATA section, but it's legal Python code. IMHO whatever tool is outputting the XML should handle escaping wacky characters in the Python code, which will be undone by the parser when the XML gets parsed. Users certainly won't be writing this XML by hand; writing 'if (i < 10)' is very strange. -- A.M. Kuchling http://starship.python.net/crew/amk/ Art history is the nightmare from which art is struggling to awake. -- Robert Fulford From gvwilson at nevex.com Mon Mar 13 22:58:27 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Mon, 13 Mar 2000 16:58:27 -0500 (EST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: <14541.23633.873411.86833@amarok.cnri.reston.va.us> Message-ID: > >Greg Wilson wrote: > >...would people be willing to consider extending > >Python's token set to make HTML/XML-ish spellings using entity references > >legal? > > > >i = 0 > >while i < 10: > > print i & 1 > > i = i + 1 > Skip Montanaro wrote: > What makes it difficult to pump your Python code through cgi.escape when > embedding it? Most non-programmers use WYSIWYG editor, and many of these are moving toward XML-compliant formats. Parsing the standard character entities seemed like a good first step toward catering to this (large) audience. > Andrew Kuchling wrote: > I don't think that would be sufficient. What about user-defined > entities, as in résultat = max(a,b)? (r?sultat, in French.) > Would Python have to also parse a DTD from somewhere? Longer term, I believe that someone is going to come out with a programming language that (finally) leaves the flat-ASCII world behind, and lets people use the structuring mechanisms (e.g. XML) that we have developed for everyone else's data. I think it would be to Python's advantage to be first, and if I'm wrong, there's little harm done. User-defined entities, DTD's, and the like are probably part of that, but I don't think I know enough to know what to ask for. Escaping the standard entites seems like an easy start. > Andrew Kuchling also wrote: > What about other places when Python and XML syntax collide, as in this > contrived example: > > # Python code starts here > if a[index[1]]>b: > print ... > > Oops! The ]]> looks like the end of the CDATA section, but it's legal > Python code. Yup; that's one of the reasons I'd like to be able to write: # Python code starts here if a[index[1]]>b: print ... > Users certainly won't be writing this XML by hand; writing 'if (i < > 10)' is very strange. I'd expect my editor to put '<' in the file when I press the '<' key, and to display '<' on the screen when viewing the file. thanks, Greg From beazley at rustler.cs.uchicago.edu Mon Mar 13 23:35:24 2000 From: beazley at rustler.cs.uchicago.edu (David M. Beazley) Date: Mon, 13 Mar 2000 16:35:24 -0600 (CST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: References: Message-ID: <200003132235.QAA08031@rustler.cs.uchicago.edu> gvwilson at nevex.com writes: > Once 1.6 is out the door, would people be willing to consider extending > Python's token set to make HTML/XML-ish spellings using entity references > legal? This would make the following 100% legal Python: > > i = 0 > while i < 10: > print i & 1 > i = i + 1 > > which would in turn make it easier to embed Python in XML such as > config-files-for-whatever-Software-Carpentry-produces-to-replace-make, > PMZ, and so on. > Sure, and while we're at it, maybe we can add support for C trigraph sequences as well. Maybe I'm missing the point, but why can't you just use a filter (cgi.escape() or something comparable)? I for one, am *NOT* in favor of complicating the Python parser in this most bogus manner. Furthermore, with respect to the editor argument, I can't think of a single reason why any sane programmer would be writing programs in Microsoft Word or whatever it is that you're talking about. Therefore, I don't think that the Python parser should be modified in any way to account for XML tags, entities, or other extraneous markup that's not part of the core language. I know that I, for one, would be extremely pissed if I fired up emacs and had to maintain someone else's code that had all of this garbage in it. Just my 0.02. -- Dave From gvwilson at nevex.com Mon Mar 13 23:48:33 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Mon, 13 Mar 2000 17:48:33 -0500 (EST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: <200003132235.QAA08031@rustler.cs.uchicago.edu> Message-ID: > David M. Beazley wrote: > ...and while we're at it, maybe we can add support for C trigraph > sequences as well. I don't know of any mass-market editors that generate C trigraphs. > ...I can't think of a single reason why any sane programmer would be > writing programs in Microsoft Word or whatever it is that you're > talking about. 'S funny --- my non-programmer friends can't figure out why any sane person would use a glorified glass TTY like emacs... or why they should have to, just to program... I just think that someone's going to do this for some language, some time soon, and I'd rather Python be in the lead than play catch-up. Thanks, Greg From effbot at telia.com Tue Mar 14 00:16:41 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 14 Mar 2000 00:16:41 +0100 Subject: [Python-Dev] Python 1.7 tokenization feature request References: Message-ID: <00ca01bf8d42$6a154500$34aab5d4@hagrid> Greg wrote: > > ...I can't think of a single reason why any sane programmer would be > > writing programs in Microsoft Word or whatever it is that you're > > talking about. > > 'S funny --- my non-programmer friends can't figure out why any sane > person would use a glorified glass TTY like emacs... or why they should > have to, just to program... I just think that someone's going to do this > for some language, some time soon, and I'd rather Python be in the lead > than play catch-up. I don't get it. the XML specification contains a lot of stuff, and I completely fail to see how adding support for a very small part of XML would make it possible to use XML editors to write Python code. what am I missing? From DavidA at ActiveState.com Tue Mar 14 00:15:25 2000 From: DavidA at ActiveState.com (David Ascher) Date: Mon, 13 Mar 2000 15:15:25 -0800 Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: Message-ID: > 'S funny --- my non-programmer friends can't figure out why any sane > person would use a glorified glass TTY like emacs... or why they should > have to, just to program... I just think that someone's going to do this > for some language, some time soon, and I'd rather Python be in the lead > than play catch-up. But the scheme you put forth causes major problems for current Python users who *are* using glass TTYs, so I don't think it'll fly for very basic political reasons nicely illustrated by Dave-the-diplomat's response. While storage of Python files in XML documents is a good thing, it's hard to see why XML should be viewed as the only storage format for Python files. I think a much richer XML schema could be useful in some distant future: ... What might be more useful in the short them IMO is to define a _standard_ mechanism for Python-in-XML encoding/decoding, so that all code which encodes Python in XML is done the same way, and so that XML editors can figure out once and for all how to decode Python-in-CDATA. Strawman Encoding # 1: replace < with < and > with > when not in strings, and vice versa on the decoding side. Strawman Encoding # 2: - do Strawman 1, AND - replace space-determined indentation with { and } tokens or other INDENT and DEDENT markers using some rare Unicode characters to work around inevitable bugs in whitespace handling of XML processors. --david From gvwilson at nevex.com Tue Mar 14 00:14:43 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Mon, 13 Mar 2000 18:14:43 -0500 (EST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: Message-ID: > David Ascher wrote: > But the scheme you put forth causes major problems for current Python > users who *are* using glass TTYs, so I don't think it'll fly for very > basic political reasons nicely illustrated by Dave's response. Understood. I thought that handling standard entities might be a useful first step toward storage of Python as XML, which in turn would help make Python more accessible to people who don't want to switch editors just to program. I felt that an all-or-nothing approach would be even less likely to get a favorable response than handling entities... :-) Greg From beazley at rustler.cs.uchicago.edu Tue Mar 14 00:12:55 2000 From: beazley at rustler.cs.uchicago.edu (David M. Beazley) Date: Mon, 13 Mar 2000 17:12:55 -0600 (CST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: References: <200003132235.QAA08031@rustler.cs.uchicago.edu> Message-ID: <200003132312.RAA08107@rustler.cs.uchicago.edu> gvwilson at nevex.com writes: > > 'S funny --- my non-programmer friends can't figure out why any sane > person would use a glorified glass TTY like emacs... or why they should > have to, just to program... Look, I'm all for CP4E and making programming more accessible to the masses, but as a professional programmer, I frankly do not care what non-programmers think about the tools that I (and most of the programming world) use to write software. Furthermore, if all of your non-programmer friends don't want to care about the underlying details, they certainly won't care how programs are represented---including a nice and *simple* text representation without markup, entities, and other syntax that is not an essential part of the language. However, as a professional, I most certainly DO care about how programs are represented--specifically, I want to be able to move them around between machines. Edit them with essentially any editor, transform them as I see fit, and be able to easily read them and have a sense of what is going on. Markup is just going to make this a huge pain in the butt. No, I'm not for this idea one bit. Sorry. > I just think that someone's going to do this > for some language, some time soon, and I'd rather Python be in the lead > than play catch-up. What gives you the idea that Python is behind? What is it playing catch up to? -- Dave From DavidA at ActiveState.com Tue Mar 14 00:36:54 2000 From: DavidA at ActiveState.com (David Ascher) Date: Mon, 13 Mar 2000 15:36:54 -0800 Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: Message-ID: > > David Ascher wrote: > > But the scheme you put forth causes major problems for current Python > > users who *are* using glass TTYs, so I don't think it'll fly for very > > basic political reasons nicely illustrated by Dave's response. > > Understood. I thought that handling standard entities might be a > useful first step toward storage of Python as XML, which in turn would > help make Python more accessible to people who don't want to switch > editors just to program. I felt that an all-or-nothing approach would be > even less likely to get a favorable response than handling entities... :-) > > Greg If you propose a transformation between Python Syntax and XML, then you potentially have something which all parties can agree to as being a good thing. Forcing one into the other is denying the history and current practices of both domains and user populations. You cannot ignore the fact that "I can read anyone's Python" is a key selling point of Python among its current practitioners, or that its cleanliness and lack of magic characters ($ is usually invoked, but < is just as magic/ugly) are part of its appeal/success. No XML editor is going to edit all XML documents without custom editors anyway! I certainly don't expect to be drawing SVG diagrams with a keyboard! That's what schemas and custom editors are for. Define a schema for 'encoded Python' (well, first, find a schema notation that will survive), write a plugin to your favorite XML editor, and then your (theoretical? =) users can use the same 'editor' to edit PythonXML or any other XML. Most XML probably won't be edited with a keyboard but with a pointing device or a speech recognizer anyway... IMO, you're being seduced by the apparent closeness between XML and Python-in-ASCII. It's only superficial... Think of Python-in-ASCII as a rendering of Python-in-XML, Dave will think of Python-in-XML as a rendering of Python-in-ASCII, and everyone will be happy (as long as everyone agrees on the one-to-one transformation). --david From paul at prescod.net Tue Mar 14 00:43:48 2000 From: paul at prescod.net (Paul Prescod) Date: Mon, 13 Mar 2000 15:43:48 -0800 Subject: [Python-Dev] Python 1.7 tokenization feature request References: Message-ID: <38CD7D34.6569C1AA@prescod.net> You should use your entities in the XML files, and then whatever application actually launches Python (PMZ, your make engine, XMetaL) could decode the data and launch Python. This is already how it works in XMetaL. I've just reinstalled recently so I don't have my macro file. Therefore, please excuse the Javascript (not Python) example. This is in "journalist.mcr" in the "Macros" folder of XMetaL. This already works fine for Python. You change lang="Python" and thanks to the benevalence of Bill Gates and the hard work of Mark Hammond, you can use Python for XMetaL macros. It doesn't work perfectly: exceptions crash XMetaL, last I tried. As long as you don't make mistakes, everything works nicely. :) You can write XMetaL macros in Python and the whole thing is stored as XML. Still, XMetaL is not very friendly as a Python editor. It doesn't have nice whitespace handling! -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Out of timber so crooked as that which man is made nothing entirely straight can be built. - Immanuel Kant From paul at prescod.net Tue Mar 14 00:59:23 2000 From: paul at prescod.net (Paul Prescod) Date: Mon, 13 Mar 2000 15:59:23 -0800 Subject: [Python-Dev] Python 1.7 tokenization feature request References: Message-ID: <38CD80DB.39150F33@prescod.net> gvwilson at nevex.com wrote: > > 'S funny --- my non-programmer friends can't figure out why any sane > person would use a glorified glass TTY like emacs... or why they should > have to, just to program... I just think that someone's going to do this > for some language, some time soon, and I'd rather Python be in the lead > than play catch-up. Your goal is worth pursuing but I agree with the others that the syntax change is not the right way. It _is_ possible to teach XMetaL to edit Python programs -- structurally -- just as it does XML. What you do is hook into the macro engine (which already supports Python) and use the Python tokenizer to build a parse tree. You copy that into a DOM using the same elements and attributes you would use if you were doing some kind of batch conversion. Then on "save" you reverse the process. Implementation time: ~3 days. The XMetaL competitor, Documentor has an API specifically designed to make this sort of thing easy. Making either of them into a friendly programmer's editor is a much larger task. I think this is where the majority of the R&D should occur, not at the syntax level. If one invents a fundamentally better way of working with the structures behind Python code, then it would be relatively easy to write code that maps that to today's Python syntax. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Out of timber so crooked as that which man is made nothing entirely straight can be built. - Immanuel Kant From moshez at math.huji.ac.il Tue Mar 14 02:14:09 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 14 Mar 2000 03:14:09 +0200 (IST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: Message-ID: On Mon, 13 Mar 2000 gvwilson at nevex.com wrote: > Once 1.6 is out the door, would people be willing to consider extending > Python's token set to make HTML/XML-ish spellings using entity references > legal? This would make the following 100% legal Python: > > i = 0 > while i < 10: > print i & 1 > i = i + 1 > > which would in turn make it easier to embed Python in XML such as > config-files-for-whatever-Software-Carpentry-produces-to-replace-make, > PMZ, and so on. Why? Whatever XML parser you use will output "i<1" as "i<1", so the Python that comes out of the XML parser is quite all right. Why change Python to do an XML parser job? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mhammond at skippinet.com.au Tue Mar 14 02:18:45 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue, 14 Mar 2000 12:18:45 +1100 Subject: [Python-Dev] unicode objects and C++ Message-ID: I struck a bit of a snag with the Unicode support when trying to use the most recent update in a C++ source file. The problem turned out to be that unicodeobject.h did a #include "wchar.h", but did it while an 'extern "C"' block was open. This upset the MSVC6 wchar.h, as it has special C++ support. Attached below is a patch I made to unicodeobject.h that solved my problem and allowed my compilations to succeed. Theoretically the same problem could exist for wctype.h, and probably lots of other headers, but this is the immediate problem :-) An alternative patch would be to #include "whcar.h" in PC\config.h outside of any 'extern "C"' blocks - wchar.h on Windows has guards that allows for multiple includes, so the unicodeobject.h include of that file will succeed, but not have the side-effect it has now. Im not sure what the preferred solution is - quite possibly the PC\config.h change, but Ive include the unicodeobject.h patch anyway :-) Mark. *** unicodeobject.h 2000/03/13 23:22:24 2.2 --- unicodeobject.h 2000/03/14 01:06:57 *************** *** 85,91 **** --- 85,101 ---- #endif #ifdef HAVE_WCHAR_H + + #ifdef __cplusplus + } /* Close the 'extern "C"' before bringing in system headers */ + #endif + # include "wchar.h" + + #ifdef __cplusplus + extern "C" { + #endif + #endif #ifdef HAVE_USABLE_WCHAR_T From mal at lemburg.com Tue Mar 14 00:31:30 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 14 Mar 2000 00:31:30 +0100 Subject: [Python-Dev] Python 1.7 tokenization feature request References: Message-ID: <38CD7A52.5709DF5F@lemburg.com> gvwilson at nevex.com wrote: > > > David Ascher wrote: > > But the scheme you put forth causes major problems for current Python > > users who *are* using glass TTYs, so I don't think it'll fly for very > > basic political reasons nicely illustrated by Dave's response. > > Understood. I thought that handling standard entities might be a > useful first step toward storage of Python as XML, which in turn would > help make Python more accessible to people who don't want to switch > editors just to program. I felt that an all-or-nothing approach would be > even less likely to get a favorable response than handling entities... :-) This should be easy to implement provided a hook for compile() is added to e.g. the sys-module which then gets used instead of calling the byte code compiler directly... Then you could redirect the compile() arguments to whatever codec you wish (e.g. a SGML entity codec) and the builtin compiler would only see the output of that codec. Well, just a thought... I don't think encoding programs would make life as a programmer easier, but instead harder. It adds one more level of confusion on top of it all. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Tue Mar 14 10:45:49 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 14 Mar 2000 10:45:49 +0100 Subject: [Python-Dev] unicode objects and C++ References: Message-ID: <38CE0A4D.1209B830@lemburg.com> Mark Hammond wrote: > > I struck a bit of a snag with the Unicode support when trying to use the > most recent update in a C++ source file. > > The problem turned out to be that unicodeobject.h did a #include "wchar.h", > but did it while an 'extern "C"' block was open. This upset the MSVC6 > wchar.h, as it has special C++ support. Thanks for reporting this. > Attached below is a patch I made to unicodeobject.h that solved my problem > and allowed my compilations to succeed. Theoretically the same problem > could exist for wctype.h, and probably lots of other headers, but this is > the immediate problem :-) > > An alternative patch would be to #include "whcar.h" in PC\config.h outside > of any 'extern "C"' blocks - wchar.h on Windows has guards that allows for > multiple includes, so the unicodeobject.h include of that file will succeed, > but not have the side-effect it has now. > > Im not sure what the preferred solution is - quite possibly the PC\config.h > change, but Ive include the unicodeobject.h patch anyway :-) > > Mark. > > *** unicodeobject.h 2000/03/13 23:22:24 2.2 > --- unicodeobject.h 2000/03/14 01:06:57 > *************** > *** 85,91 **** > --- 85,101 ---- > #endif > > #ifdef HAVE_WCHAR_H > + > + #ifdef __cplusplus > + } /* Close the 'extern "C"' before bringing in system headers */ > + #endif > + > # include "wchar.h" > + > + #ifdef __cplusplus > + extern "C" { > + #endif > + > #endif > > #ifdef HAVE_USABLE_WCHAR_T > I've included this patch (should solve the problem for all inlcuded system header files, since it wraps only the Unicode APIs in extern "C"): --- /home/lemburg/clients/cnri/CVS-Python/Include/unicodeobject.h Fri Mar 10 23:33:05 2000 +++ unicodeobject.h Tue Mar 14 10:38:08 2000 @@ -1,10 +1,7 @@ #ifndef Py_UNICODEOBJECT_H #define Py_UNICODEOBJECT_H -#ifdef __cplusplus -extern "C" { -#endif /* Unicode implementation based on original code by Fredrik Lundh, modified by Marc-Andre Lemburg (mal at lemburg.com) according to the @@ -167,10 +165,14 @@ typedef unsigned short Py_UNICODE; #define Py_UNICODE_MATCH(string, offset, substring)\ (!memcmp((string)->str + (offset), (substring)->str,\ (substring)->length*sizeof(Py_UNICODE))) +#ifdef __cplusplus +extern "C" { +#endif + /* --- Unicode Type ------------------------------------------------------- */ typedef struct { PyObject_HEAD int length; /* Length of raw Unicode data in buffer */ I'll post a complete Unicode update patch by the end of the week for inclusion in CVS. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From ping at lfw.org Tue Mar 14 12:19:59 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 14 Mar 2000 06:19:59 -0500 (EST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: Message-ID: On Tue, 14 Mar 2000, Moshe Zadka wrote: > On Mon, 13 Mar 2000 gvwilson at nevex.com wrote: > > legal? This would make the following 100% legal Python: > > > > i = 0 > > while i < 10: > > print i & 1 > > i = i + 1 > > Why? Whatever XML parser you use will output "i<1" as "i<1", so > the Python that comes out of the XML parser is quite all right. Why change > Python to do an XML parser job? I totally agree. To me, this is the key issue: it is NOT the responsibility of the programming language to accommodate any particular encoding format. While we're at it, why don't we change Python to accept quoted-printable source code? Or base64-encoded source code? XML already defines a perfectly reasonable mechanism for escaping a plain stream of text -- adding this processing to Python adds nothing but confusion. The possible useful benefit from adding the proposed "feature" is exactly zero. -- ?!ng "This code is better than any code that doesn't work has any right to be." -- Roger Gregory, on Xanadu From ping at lfw.org Tue Mar 14 12:21:59 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 14 Mar 2000 06:21:59 -0500 (EST) Subject: [Python-Dev] Python 1.7 tokenization feature request In-Reply-To: Message-ID: On Mon, 13 Mar 2000, David Ascher wrote: > > If you propose a transformation between Python Syntax and XML, then you > potentially have something which all parties can agree to as being a good > thing. Indeed. I know that i wouldn't have any use for it at the moment, but i can see the potential for usefulness of a structured representation for Python source code (like an AST in XML) which could be directly edited in an XML editor, and processed (by an XSL stylesheet?) to produce actual runnable Python. But attempting to mix the two doesn't get you anywhere. -- ?!ng "This code is better than any code that doesn't work has any right to be." -- Roger Gregory, on Xanadu From effbot at telia.com Tue Mar 14 16:41:01 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 14 Mar 2000 16:41:01 +0100 Subject: [Python-Dev] Python 1.7 tokenization feature request References: Message-ID: <002201bf8dcb$ba9a11c0$34aab5d4@hagrid> Greg: > Understood. I thought that handling standard entities might be a > useful first step toward storage of Python as XML, which in turn would > help make Python more accessible to people who don't want to switch > editors just to program. I felt that an all-or-nothing approach would be > even less likely to get a favorable response than handling entities... :-) well, I would find it easier to support a more aggressive proposal: make sure Python 1.7 can deal with source code written in Unicode, using any supported encoding. with that in place, you can plug in your favourite unicode encoding via the Unicode framework. From effbot at telia.com Tue Mar 14 23:21:38 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 14 Mar 2000 23:21:38 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> Message-ID: <000901bf8e03$abf88420$34aab5d4@hagrid> > I've just checked in a massive patch from Marc-Andre Lemburg which > adds Unicode support to Python. massive, indeed. didn't notice this before, but I just realized that after the latest round of patches, the python15.dll is now 700k larger than it was for 1.5.2 (more than twice the size). my original unicode DLL was 13k. hmm... From akuchlin at mems-exchange.org Tue Mar 14 23:19:44 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Tue, 14 Mar 2000 17:19:44 -0500 (EST) Subject: [Python-Dev] Unicode patches checked in In-Reply-To: <000901bf8e03$abf88420$34aab5d4@hagrid> References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> Message-ID: <14542.47872.184978.985612@amarok.cnri.reston.va.us> Fredrik Lundh writes: >didn't notice this before, but I just realized that after the >latest round of patches, the python15.dll is now 700k larger >than it was for 1.5.2 (more than twice the size). Most of that is due to Modules/unicodedata.c, which is 2.1Mb of source code, and produces a 632168-byte .o file on my Sparc. (Will some compiler systems choke on a file that large? Could we read database info from a file instead, or mmap it into memory?) -- A.M. Kuchling http://starship.python.net/crew/amk/ "Are you OK, dressed like that? You don't seem to notice the cold." "I haven't come ten thousand miles to discuss the weather, Mr Moberly." -- Moberly and the Doctor, in "The Seeds of Doom" From mal at lemburg.com Wed Mar 15 09:32:29 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 15 Mar 2000 09:32:29 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> Message-ID: <38CF4A9D.13A0080@lemburg.com> "Andrew M. Kuchling" wrote: > > Fredrik Lundh writes: > >didn't notice this before, but I just realized that after the > >latest round of patches, the python15.dll is now 700k larger > >than it was for 1.5.2 (more than twice the size). > > Most of that is due to Modules/unicodedata.c, which is 2.1Mb of source > code, and produces a 632168-byte .o file on my Sparc. (Will some > compiler systems choke on a file that large? Could we read database > info from a file instead, or mmap it into memory?) That is dues to the unicodedata module being compiled into the DLL statically. On Unix you can build it shared too -- there are no direct references to it in the implementation. I suppose that on Windows the same should be done... the question really is whether this is intended or not -- moving the module into a DLL is at least technically no problem (someone would have to supply a patch for the MSVC project files though). Note that unicodedata is only needed by programs which do a lot of Unicode manipulations and in the future probably by some codecs too. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From pf at artcom-gmbh.de Wed Mar 15 11:42:26 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 15 Mar 2000 11:42:26 +0100 (MET) Subject: [Python-Dev] Unicode in Python and Tcl/Tk compared (was Unicode patches checked in...) In-Reply-To: <38CF4A9D.13A0080@lemburg.com> from "M.-A. Lemburg" at "Mar 15, 2000 9:32:29 am" Message-ID: Hi! > > Fredrik Lundh writes: > > >didn't notice this before, but I just realized that after the > > >latest round of patches, the python15.dll is now 700k larger > > >than it was for 1.5.2 (more than twice the size). > > > "Andrew M. Kuchling" wrote: > > Most of that is due to Modules/unicodedata.c, which is 2.1Mb of source > > code, and produces a 632168-byte .o file on my Sparc. (Will some > > compiler systems choke on a file that large? Could we read database > > info from a file instead, or mmap it into memory?) > M.-A. Lemburg wrote: > That is dues to the unicodedata module being compiled > into the DLL statically. On Unix you can build it shared too > -- there are no direct references to it in the implementation. > I suppose that on Windows the same should be done... the > question really is whether this is intended or not -- moving > the module into a DLL is at least technically no problem > (someone would have to supply a patch for the MSVC project > files though). > > Note that unicodedata is only needed by programs which do > a lot of Unicode manipulations and in the future probably > by some codecs too. Now as the unicode patches were checked in and as Fredrik Lundh noticed a considerable increase of the size of the python-DLL, which was obviously mostly caused by those tables, I had some fear that a Python/Tcl/Tk based application could eat up much more memory, if we update from Python1.5.2 and Tcl/Tk 8.0.5 to Python 1.6 and Tcl/Tk 8.3.0. As some of you certainly know, some kind of unicode support has also been added to Tcl/Tk since 8.1. So I did some research and would like to share what I have found out so far: Here are the compared sizes of the tcl/tk shared libs on Linux: old: | new: | bloat increase in %: -----------------------+------------------------+--------------------- libtcl8.0.so 533414 | libtcl8.3.so 610241 | 14.4 % libtk8.0.so 714908 | libtk8.3.so 811916 | 13.6 % The addition of unicode wasn't the only change to TclTk. So this seems reasonable. Unfortunately there is no python shared library, so a direct comparison of increased memory consumption is impossible. Nevertheless I've the following figures (stripped binary sizes of the Python interpreter): 1.5.2 382616 CVS_10-02-00 393668 (a month before unicode) CVS_12-03-00 507448 (just after unicode) That is an increase of "only" 111 kBytes. Not so bad but nevertheless a "bloat increase" of 32.6 %. And additionally there is now unicodedata.so 634940 _codecsmodule.so 38955 which (I guess) will also be loaded if the application starts using some of the new features. Since I didn't take care of unicode in the past, I feel unable to compare the implementations of unicode in both systems and what impact they will have on the real memory performance and even more important on the functionality of the combined use of both packages together with Tkinter. Tcl/Tk keeps around a sub-directory called 'encoding', which --I guess-- contains information somehow similar or related to that in 'unicodedata.so', but separated into several files? So below I included a shortened excerpts from the 200k+ tcl8.3.0/changes and the tk8.3.0/changes files about unicode. May be someone else more involved with unicode can shed some light on this topic? Do we need some changes to Tkinter.py or _tkinter or both? ---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ---- [...] ======== Changes for 8.1 go below this line ======== 6/18/97 (new feature) Tcl now supports international character sets: - All C APIs now accept UTF-8 strings instead of iso8859-1 strings, wherever you see "char *", unless explicitly noted otherwise. - All Tcl strings represented in UTF-8, which is a convenient multi-byte encoding of Unicode. Variable names, procedure names, and all other values in Tcl may include arbitrary Unicode characters. For example, the Tcl command "string length" returns how many Unicode characters are in the argument string. - For Java compatibility, embedded null bytes in C strings are represented as \xC080 in UTF-8 strings, but the null byte at the end of a UTF-8 string remains \0. Thus Tcl strings once again do not contain null bytes, except for termination bytes. - For Java compatibility, "\uXXXX" is used in Tcl to enter a Unicode character. "\u0000" through "\uffff" are acceptable Unicode characters. - "\xXX" is used to enter a small Unicode character (between 0 and 255) in Tcl. - Tcl automatically translates between UTF-8 and the normal encoding for the platform during interactions with the system. - The fconfigure command now supports a -encoding option for specifying the encoding of an open file or socket. Tcl will automatically translate between the specified encoding and UTF-8 during I/O. See the directory library/encoding to find out what encodings are supported (eventually there will be an "encoding" command that makes this information more accessible). - There are several new C APIs that support UTF-8 and various encodings. See Utf.3 for procedures that translate between Unicode and UTF-8 and manipulate UTF-8 strings. See Encoding.3 for procedures that create new encodings and translate between encodings. See ToUpper.3 for procedures that perform case conversions on UTF-8 strings. [...] 1/16/98 (new feature) Tk now supports international characters sets: - Font display mechanism overhauled to display Unicode strings containing full set of international characters. You do not need Unicode fonts on your system in order to use tk or see international characters. For those familiar with the Japanese or Chinese patches, there is no "-kanjifont" option. Characters from any available fonts will automatically be used if the widget's originally selected font is not capable of displaying a given character. - Textual widgets are international aware. For instance, cursor positioning commands would now move the cursor forwards/back by 1 international character, not by 1 byte. - Input Method Editors (IMEs) work on Mac and Windows. Unix is still in progress. [...] 10/15/98 (bug fix) Changed regexp and string commands to properly handle case folding according to the Unicode character tables. (stanton) 10/21/98 (new feature) Added an "encoding" command to facilitate translations of strings between different character encodings. See the encoding.n manual entry for more details. (stanton) 11/3/98 (bug fix) The regular expression character classification syntax now includes Unicode characters in the supported classes. (stanton) [...] 11/17/98 (bug fix) "scan" now correctly handles Unicode characters. (stanton) [...] 11/19/98 (bug fix) Fixed menus and titles so they properly display Unicode characters under Windows. [Bug: 819] (stanton) [...] 4/2/99 (new apis) Made various Unicode utility functions public. Tcl_UtfToUniCharDString, Tcl_UniCharToUtfDString, Tcl_UniCharLen, Tcl_UniCharNcmp, Tcl_UniCharIsAlnum, Tcl_UniCharIsAlpha, Tcl_UniCharIsDigit, Tcl_UniCharIsLower, Tcl_UniCharIsSpace, Tcl_UniCharIsUpper, Tcl_UniCharIsWordChar, Tcl_WinUtfToTChar, Tcl_WinTCharToUtf (stanton) [...] 4/5/99 (bug fix) Fixed handling of Unicode in text searches. The -count option was returning byte counts instead of character counts. [...] 5/18/99 (bug fix) Fixed clipboard code so it handles Unicode data properly on Windows NT and 95. [Bug: 1791] (stanton) [...] 6/3/99 (bug fix) Fixed selection code to handle Unicode data in COMPOUND_TEXT and STRING selections. [Bug: 1791] (stanton) [...] 6/7/99 (new feature) Optimized string index, length, range, and append commands. Added a new Unicode object type. (hershey) [...] 6/14/99 (new feature) Merged string and Unicode object types. Added new public Tcl API functions: Tcl_NewUnicodeObj, Tcl_SetUnicodeObj, Tcl_GetUnicode, Tcl_GetUniChar, Tcl_GetCharLength, Tcl_GetRange, Tcl_AppendUnicodeToObj. (hershey) [...] 6/23/99 (new feature) Updated Unicode character tables to reflect Unicode 2.1 data. (stanton) [...] --- Released 8.3.0, February 10, 2000 --- See ChangeLog for details --- ---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ---- Sorry if this was boring old stuff for some of you. Best Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From marangoz at python.inrialpes.fr Wed Mar 15 12:40:21 2000 From: marangoz at python.inrialpes.fr (Vladimir Marangozov) Date: Wed, 15 Mar 2000 12:40:21 +0100 (CET) Subject: [Python-Dev] Unicode patches checked in In-Reply-To: <38CF4A9D.13A0080@lemburg.com> from "M.-A. Lemburg" at Mar 15, 2000 09:32:29 AM Message-ID: <200003151140.MAA30301@python.inrialpes.fr> M.-A. Lemburg wrote: > > Note that unicodedata is only needed by programs which do > a lot of Unicode manipulations and in the future probably > by some codecs too. Perhaps it would make sense to move the Unicode database on the Python side (write it in Python)? Or init the database dynamically in the unicodedata module on import? It's quite big, so if it's possible to avoid the static declaration (and if the unicodata module is enabled by default), I'd vote for a dynamic initialization of the database from reference (Python ?) file(s). M-A, is something in this spirit doable? -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tismer at tismer.com Wed Mar 15 13:57:04 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 15 Mar 2000 13:57:04 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> Message-ID: <38CF88A0.CF876A74@tismer.com> "M.-A. Lemburg" wrote: ... > Note that unicodedata is only needed by programs which do > a lot of Unicode manipulations and in the future probably > by some codecs too. Would it be possible to make the Unicode support configurable? My problem is that patches in the CVS are of different kinds. Some are error corrections and enhancements which I would definately like to use. Others are brand new features like the Unicode support. Absolutely great stuff! But this will most probably change a number of times again, and I think it is a bad idea when I include it into my Stackless distribution. I'd appreciate it very much if I could use the same CVS tree for testing new stuff, and to build my distribution, with new features switched off. Please :-) ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From jim at digicool.com Wed Mar 15 14:35:48 2000 From: jim at digicool.com (Jim Fulton) Date: Wed, 15 Mar 2000 08:35:48 -0500 Subject: [Python-Dev] Finalizers considered questionable ;) Message-ID: <38CF91B4.A36C8C5@digicool.com> Here's my $0.02. I agree with the sentiments that use of finalizers should be discouraged. They are extremely helpful in cases like tempfile.TemporaryFileWrapper, so I think that they should be supported. I do think that the language should not promise a high level of service. Some observations: - I spent a little bit of time on the ANSI Smalltalk committee, where I naively advocated adding finalizers to the language. I was resoundingly told no. :) - Most of the Python objects I deal with these days are persistent. Their lifetimes are a lot more complicated that most Python objects. They get created once, but they get loaded into and out of memory many times. In fact, they can be in memory many times simultaneously. :) A couple of years ago I realized that it only made sense to call __init__ when an object was first created, not when it is subsequently (re)loaded into memory. This led to a change in Python pickling semantics and the deprecation of the loathsome __getinitargs__ protocol. :) For me, a similar case can be made against use of __del__ for persistent objects. For persistent objects, a __del__ method should only be used for cleaning up the most volatile of resources. A persistent object __del__ should not perform any semantically meaningful operations because __del__ has no semantic meaning. - Zope has a few uses of __del__. These are all for non-persistent objects. Interesting, in grepping for __del__, I found a lot of cases where __del__ was used and then commented out. Finalizers seem to be the sort of thing that people want initially and then get over. I'm inclined to essentially keep the current rules and simply not promise that __del__ will be able to run correctly. That is, Python should call __del__ and ignore exceptions raised (or provide some *optional* logging or other debugging facility). There is no reason for __del__ to fail unless it depends on cyclicly-related objects, which should be viewed as a design mistake. OTOH, __del__ should never fail because module globals go away. IMO, the current circular references involving module globals are unnecessary, but that's a different topic. ;) Jim -- Jim Fulton mailto:jim at digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From mal at lemburg.com Wed Mar 15 16:00:14 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 15 Mar 2000 16:00:14 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> Message-ID: <38CFA57E.21A3B3EF@lemburg.com> Christian Tismer wrote: > > "M.-A. Lemburg" wrote: > ... > > > Note that unicodedata is only needed by programs which do > > a lot of Unicode manipulations and in the future probably > > by some codecs too. > > Would it be possible to make the Unicode support configurable? This is currently not planned as the Unicode integration touches many different parts of the interpreter to enhance string/Unicode integration... sorry. Also, I'm not sure whether adding #ifdefs throuhgout the code would increase its elegance ;-) > My problem is that patches in the CVS are of different kinds. > Some are error corrections and enhancements which I would > definately like to use. > Others are brand new features like the Unicode support. > Absolutely great stuff! But this will most probably change > a number of times again, and I think it is a bad idea when > I include it into my Stackless distribution. Why not ? All you have to do is rebuild the distribution every time you push a new version -- just like I did for the Unicode version before the CVS checkin was done. > I'd appreciate it very much if I could use the same CVS tree > for testing new stuff, and to build my distribution, with > new features switched off. Please :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Mar 15 15:57:13 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 15 Mar 2000 15:57:13 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003151140.MAA30301@python.inrialpes.fr> Message-ID: <38CFA4C9.E6B8EB5D@lemburg.com> Vladimir Marangozov wrote: > > M.-A. Lemburg wrote: > > > > Note that unicodedata is only needed by programs which do > > a lot of Unicode manipulations and in the future probably > > by some codecs too. > > Perhaps it would make sense to move the Unicode database on the > Python side (write it in Python)? Or init the database dynamically > in the unicodedata module on import? It's quite big, so if it's > possible to avoid the static declaration (and if the unicodata module > is enabled by default), I'd vote for a dynamic initialization of the > database from reference (Python ?) file(s). The unicodedatabase module contains the Unicode database as static C data - this makes it shareable among (Python) processes. Python modules don't provide this feature: instead a dictionary would have to be built on import which would increase the heap size considerably. Those dicts would *not* be shareable. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tismer at tismer.com Wed Mar 15 16:20:06 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 15 Mar 2000 16:20:06 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> Message-ID: <38CFAA26.2B2F0D01@tismer.com> "M.-A. Lemburg" wrote: > > Christian Tismer wrote: ... > > Absolutely great stuff! But this will most probably change > > a number of times again, and I think it is a bad idea when > > I include it into my Stackless distribution. > > Why not ? All you have to do is rebuild the distribution > every time you push a new version -- just like I did > for the Unicode version before the CVS checkin was done. But how can I then publish my source code, when I always pull Unicode into it. I don't like to be exposed to side effects like 700kb code bloat, just by chance, since it is in the dist right now (and will vanish again). I don't say there must be #ifdefs all and everywhere, but can I build without *using* Unicode? I don't want to introduce something new to my users what they didn't ask for. And I don't want to take care about their installations. Finally I will for sure not replace a 500k DLL by a 1.2M monster, so this is definately not what I want at the moment. How do I build a dist that doesn't need to change a lot of stuff in the user's installation? Note that Stackless Python is a drop-in replacement, not a Python distribution. Or should it be? ciao - chris (who really wants to get SLP 1.1 out) -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From effbot at telia.com Wed Mar 15 17:04:54 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 15 Mar 2000 17:04:54 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> Message-ID: <014001bf8e98$35644480$34aab5d4@hagrid> CT: > How do I build a dist that doesn't need to change a lot of > stuff in the user's installation? somewhere in this thread, Guido wrote: > BTW, I added a tag "pre-unicode" to the CVS tree to the revisions > before the Unicode changes were made. maybe you could base SLP on that one? From marangoz at python.inrialpes.fr Wed Mar 15 17:27:36 2000 From: marangoz at python.inrialpes.fr (Vladimir Marangozov) Date: Wed, 15 Mar 2000 17:27:36 +0100 (CET) Subject: [Python-Dev] Unicode patches checked in In-Reply-To: <38CFA4C9.E6B8EB5D@lemburg.com> from "M.-A. Lemburg" at Mar 15, 2000 03:57:13 PM Message-ID: <200003151627.RAA32543@python.inrialpes.fr> > [me] > > > > Perhaps it would make sense to move the Unicode database on the > > Python side (write it in Python)? Or init the database dynamically > > in the unicodedata module on import? It's quite big, so if it's > > possible to avoid the static declaration (and if the unicodata module > > is enabled by default), I'd vote for a dynamic initialization of the > > database from reference (Python ?) file(s). [Marc-Andre] > > The unicodedatabase module contains the Unicode database > as static C data - this makes it shareable among (Python) > processes. The static data is shared if the module is a shared object (.so). If unicodedata is not a .so, then you'll have a seperate copy of the database in each process. > > Python modules don't provide this feature: instead a dictionary > would have to be built on import which would increase the heap > size considerably. Those dicts would *not* be shareable. I haven't mentioned dicts, have I? I suggested that the entries in the C version of the database be rewritten in Python (or a text file) The unicodedata module would, in it's init function, allocate memory for the database and would populate it before returning "import okay" to Python -- this is one way to init the db dynamically, among others. As to sharing the database among different processes, this is a classic IPC pb, which has nothing to do with the static C declaration of the db. Or, hmmm, one of us is royally confused . -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tismer at tismer.com Wed Mar 15 17:22:42 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 15 Mar 2000 17:22:42 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> <014001bf8e98$35644480$34aab5d4@hagrid> Message-ID: <38CFB8D2.537FCAD9@tismer.com> Fredrik Lundh wrote: > > CT: > > How do I build a dist that doesn't need to change a lot of > > stuff in the user's installation? > > somewhere in this thread, Guido wrote: > > > BTW, I added a tag "pre-unicode" to the CVS tree to the revisions > > before the Unicode changes were made. > > maybe you could base SLP on that one? I have no idea how this works. Would this mean that I cannot get patctes which come after unicode? Meanwhile, I've looked into the sources. It is easy for me to get rid of the problem by supplying my own unicodedata.c, where I replace all functions by some unimplemented exception. Furthermore, I wondered about the data format. Is the unicode database used inyou re package as well? Otherwise, I see only references form unicodedata.c, and that means the data structure can be massively enhanced. At the moment, that baby is 64k entries long, with four bytes and an optional string. This is a big waste. The strings are almost all some distinct prefixes, together with a list of hex smallwords. This is done as strings, probably this makes 80 percent of the space. The only function that uses the "decomposition" field (namely the string) is unicodedata_decomposition. It does nothing more than to wrap it into a PyObject. We can do a little better here. I gues I can bring it down to a third of this space without much effort, just by using - binary encoding for the tags as enumeration - binary encoding of the hexed entries - omission of the spaces Instead of a 64 k of structures which contain pointers anyway, I can use a 64k pointer array with offsets into one packed table. The unicodedata access functions would change *slightly*, just building some hex strings and so on. I guess this is not a time critical section? Should I try this evening? :-) cheers - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From mal at lemburg.com Wed Mar 15 17:04:43 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 15 Mar 2000 17:04:43 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> Message-ID: <38CFB49B.885B8B16@lemburg.com> Christian Tismer wrote: > > "M.-A. Lemburg" wrote: > > > > Christian Tismer wrote: > ... > > > Absolutely great stuff! But this will most probably change > > > a number of times again, and I think it is a bad idea when > > > I include it into my Stackless distribution. > > > > Why not ? All you have to do is rebuild the distribution > > every time you push a new version -- just like I did > > for the Unicode version before the CVS checkin was done. > > But how can I then publish my source code, when I always > pull Unicode into it. I don't like to be exposed to > side effects like 700kb code bloat, just by chance, since it > is in the dist right now (and will vanish again). All you have to do is build the unicodedata module shared and not statically bound into python.dll. This one module causes most of the code bloat... > I don't say there must be #ifdefs all and everywhere, but > can I build without *using* Unicode? I don't want to > introduce something new to my users what they didn't ask for. > And I don't want to take care about their installations. > Finally I will for sure not replace a 500k DLL by a 1.2M > monster, so this is definately not what I want at the moment. > > How do I build a dist that doesn't need to change a lot of > stuff in the user's installation? I don't think that the Unicode stuff will disable the running environment... (haven't tried this though). The unicodedata module is not used by the interpreter and the rest is imported on-the-fly, not during init time, so at least in theory, not using Unicode will result in Python not looking for e.g. the encodings package. > Note that Stackless Python is a drop-in replacement, > not a Python distribution. Or should it be? Probably... I think it's simply easier to install and probably also easier to maintain because it doesn't cause dependencies on other "default" installations. The user will then explicitly know that she is installing something a little different from the default distribution... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Mar 15 18:26:15 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 15 Mar 2000 18:26:15 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> <014001bf8e98$35644480$34aab5d4@hagrid> <38CFB8D2.537FCAD9@tismer.com> Message-ID: <38CFC7B7.A1ABD51C@lemburg.com> Christian Tismer wrote: > > Fredrik Lundh wrote: > > > > CT: > > > How do I build a dist that doesn't need to change a lot of > > > stuff in the user's installation? > > > > somewhere in this thread, Guido wrote: > > > > > BTW, I added a tag "pre-unicode" to the CVS tree to the revisions > > > before the Unicode changes were made. > > > > maybe you could base SLP on that one? > > I have no idea how this works. Would this mean that I cannot > get patctes which come after unicode? > > Meanwhile, I've looked into the sources. It is easy for me > to get rid of the problem by supplying my own unicodedata.c, > where I replace all functions by some unimplemented exception. No need (see my other posting): simply disable the module altogether... this shouldn't hurt any part of the interpreter as the module is a user-land only module. > Furthermore, I wondered about the data format. Is the unicode > database used inyou re package as well? Otherwise, I see > only references form unicodedata.c, and that means the data > structure can be massively enhanced. > At the moment, that baby is 64k entries long, with four bytes > and an optional string. > This is a big waste. The strings are almost all some distinct > prefixes, together with a list of hex smallwords. This > is done as strings, probably this makes 80 percent of the space. I have made no attempt to optimize the structure... (due to lack of time mostly) the current implementation is really not much different from a rewrite of the UnicodeData.txt file availble at the unicode.org site. If you want to, I can mail you the marshalled Python dict version of that database to play with. > The only function that uses the "decomposition" field (namely > the string) is unicodedata_decomposition. It does nothing > more than to wrap it into a PyObject. > We can do a little better here. I gues I can bring it down > to a third of this space without much effort, just by using > - binary encoding for the tags as enumeration > - binary encoding of the hexed entries > - omission of the spaces > Instead of a 64 k of structures which contain pointers anyway, > I can use a 64k pointer array with offsets into one packed > table. > > The unicodedata access functions would change *slightly*, > just building some hex strings and so on. I guess this > is not a time critical section? It may be if these functions are used in codecs, so you should pay attention to speed too... > Should I try this evening? :-) Sure :-) go ahead... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Mar 15 18:39:14 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 15 Mar 2000 18:39:14 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003151627.RAA32543@python.inrialpes.fr> Message-ID: <38CFCAC2.7690DF55@lemburg.com> Vladimir Marangozov wrote: > > > [me] > > > > > > Perhaps it would make sense to move the Unicode database on the > > > Python side (write it in Python)? Or init the database dynamically > > > in the unicodedata module on import? It's quite big, so if it's > > > possible to avoid the static declaration (and if the unicodata module > > > is enabled by default), I'd vote for a dynamic initialization of the > > > database from reference (Python ?) file(s). > > [Marc-Andre] > > > > The unicodedatabase module contains the Unicode database > > as static C data - this makes it shareable among (Python) > > processes. > > The static data is shared if the module is a shared object (.so). > If unicodedata is not a .so, then you'll have a seperate copy of the > database in each process. Uhm, comparing the two versions Python 1.5 and the current CVS Python I get these figures on Linux: Executing : ./python -i -c '1/0' Python 1.5: 1208kB / 728 kB (resident/shared) Python CVS: 1280kB / 808 kB ("/") Not much of a change if you ask me and the CVS version has the unicodedata module linked statically... so there's got to be some sharing and load-on-demand going on behind the scenes: this is what I was referring to when I mentioned static C data. The OS can much better deal with these sharing techniques and delayed loads than anything we could implement on top of it in C or Python. But perhaps this is Linux-specific... > > Python modules don't provide this feature: instead a dictionary > > would have to be built on import which would increase the heap > > size considerably. Those dicts would *not* be shareable. > > I haven't mentioned dicts, have I? I suggested that the entries in the > C version of the database be rewritten in Python (or a text file) > The unicodedata module would, in it's init function, allocate memory > for the database and would populate it before returning "import okay" > to Python -- this is one way to init the db dynamically, among others. I'm leaving this as exercise to the interested reader ;-) Really, if you have better ideas for the unicodedata module, please go ahead. > As to sharing the database among different processes, this is a classic > IPC pb, which has nothing to do with the static C declaration of the db. > Or, hmmm, one of us is royally confused . Could you check this on other platforms ? Perhaps Linux is doing more than other OSes are in this field. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From effbot at telia.com Wed Mar 15 19:23:59 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 15 Mar 2000 19:23:59 +0100 Subject: [Python-Dev] first public SRE snapshot now available! References: <200003151627.RAA32543@python.inrialpes.fr> <38CFCAC2.7690DF55@lemburg.com> Message-ID: <01f901bf8eab$a353e780$34aab5d4@hagrid> I just uploaded the first public SRE snapshot to: http://w1.132.telia.com/~u13208596/sre.htm -- this kit contains windows binaries only (make sure you have built the interpreter from a recent CVS version) -- the engine fully supports unicode target strings. (not sure about the pattern compiler, though...) -- it's probably buggy as hell. for things I'm working on at this very moment, see: http://w1.132.telia.com/~u13208596/sre/status.htm I hope to get around to fix the core dump (it crashes half- ways through sre_fulltest.py, by no apparent reason) and the backreferencing problem later today. stay tuned. PS. note that "public" doesn't really mean "suitable for the c.l.python crowd", or "suitable for production use". in other words, let's keep this one on this list for now. thanks! From tismer at tismer.com Wed Mar 15 19:15:27 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 15 Mar 2000 19:15:27 +0100 Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> <014001bf8e98$35644480$34aab5d4@hagrid> <38CFB8D2.537FCAD9@tismer.com> <38CFC7B7.A1ABD51C@lemburg.com> Message-ID: <38CFD33F.3C02BF43@tismer.com> "M.-A. Lemburg" wrote: > > Christian Tismer wrote: [the old data comression guy has been reanimated] > If you want to, I can mail you the marshalled Python dict version of > that database to play with. ... > > Should I try this evening? :-) > > Sure :-) go ahead... Thank you. Meanwhile I've heard that there is some well-known bot working on that under the hood, with a much better approach than mine. So I'll take your advice, and continue to write silly stackless enhancements. They say this is my destiny :-) ciao - continuous -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From DavidA at ActiveState.com Wed Mar 15 19:21:40 2000 From: DavidA at ActiveState.com (David Ascher) Date: Wed, 15 Mar 2000 10:21:40 -0800 Subject: [Python-Dev] Unicode patches checked in In-Reply-To: <38CFA4C9.E6B8EB5D@lemburg.com> Message-ID: > The unicodedatabase module contains the Unicode database > as static C data - this makes it shareable among (Python) > processes. > > Python modules don't provide this feature: instead a dictionary > would have to be built on import which would increase the heap > size considerably. Those dicts would *not* be shareable. I know it's complicating things, but wouldn't an mmap'ed buffer allow inter-process sharing while keeping DLL size down and everything on-disk until needed? Yes, I know, mmap calls aren't uniform across platforms and isn't supported on all platforms -- I still think that it's silly not to use it on those platforms where it is available, and I'd like to see mmap unification move forward, so this is as good a motivation as any to bite the bullet. Just a thought, --david From jim at digicool.com Wed Mar 15 19:24:53 2000 From: jim at digicool.com (Jim Fulton) Date: Wed, 15 Mar 2000 13:24:53 -0500 Subject: [Python-Dev] Allowing multiple socket maps in asyncore (and asynchat) Message-ID: <38CFD575.A0536439@digicool.com> I find asyncore to be quite useful, however, it is currently geared to having a single main loop. It uses a global socket map that all asyncore dispatchers register with. I have an application in which I want to have multiple socket maps. I propose that we start moving toward a model in which selection of a socket map and control of the asyncore loop is a bit more explicit. If no one objects, I'll work up some initial patches. Who should I submit these to? Sam? Should the medusa public CVS form the basis? Jim -- Jim Fulton mailto:jim at digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From jcw at equi4.com Wed Mar 15 20:39:37 2000 From: jcw at equi4.com (Jean-Claude Wippler) Date: Wed, 15 Mar 2000 20:39:37 +0100 Subject: [Python-Dev] Unicode patches checked in References: Message-ID: <38CFE6F9.3E8E9385@equi4.com> David Ascher wrote: [shareable unicodedatabase] > I know it's complicating things, but wouldn't an mmap'ed buffer allow > inter-process sharing while keeping DLL size down and everything > on-disk until needed? AFAIK, on platforms which support mmap, static data already gets mmap'ed in by the OS (just like all code), so this might have little effect. I'm more concerned by the distribution size increase. -jcw From bwarsaw at cnri.reston.va.us Wed Mar 15 19:41:00 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 15 Mar 2000 13:41:00 -0500 (EST) Subject: [Python-Dev] Unicode patches checked in References: <200003110020.TAA17777@eric.cnri.reston.va.us> <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> <014001bf8e98$35644480$34aab5d4@hagrid> Message-ID: <14543.55612.969101.206695@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> somewhere in this thread, Guido wrote: >> BTW, I added a tag "pre-unicode" to the CVS tree to the >> revisions before the Unicode changes were made. FL> maybe you could base SLP on that one? /F's got it exactly right. Check out a new directory using a stable tag (maybe you want to base your changes on pre-unicode tag, or python 1.52?). Patch in that subtree and then eventually you'll have to merge your changes into the head of the branch. -Barry From rushing at nightmare.com Thu Mar 16 02:52:22 2000 From: rushing at nightmare.com (Sam Rushing) Date: Wed, 15 Mar 2000 17:52:22 -0800 (PST) Subject: [Python-Dev] Allowing multiple socket maps in asyncore (and asynchat) In-Reply-To: <38CFD575.A0536439@digicool.com> References: <38CFD575.A0536439@digicool.com> Message-ID: <14544.15958.546712.466506@seattle.nightmare.com> Jim Fulton writes: > I find asyncore to be quite useful, however, it is currently > geared to having a single main loop. It uses a global socket > map that all asyncore dispatchers register with. > > I have an application in which I want to have multiple > socket maps. But still only a single event loop, yes? Why do you need multiple maps? For a priority system of some kind? > I propose that we start moving toward a model in which selection of > a socket map and control of the asyncore loop is a bit more > explicit. > > If no one objects, I'll work up some initial patches. If it can be done in a backward-compatible fashion, that sounds fine; but it sounds tricky. Even the simple {:object...} change broke so many things that we're still using the old stuff at eGroups. > Who should I submit these to? Sam? > Should the medusa public CVS form the basis? Yup, yup. -Sam From tim_one at email.msn.com Thu Mar 16 08:06:23 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 16 Mar 2000 02:06:23 -0500 Subject: [Python-Dev] Finalizers considered questionable ;) In-Reply-To: <38CF91B4.A36C8C5@digicool.com> Message-ID: <000201bf8f16$237e5e80$662d153f@tim> [Jim Fulton] > ... > There is no reason for __del__ to fail unless it depends on > cyclicly-related objects, which should be viewed as a design > mistake. > > OTOH, __del__ should never fail because module globals go away. > IMO, the current circular references involving module globals are > unnecessary, but that's a different topic. ;) IOW, you view "the current circular references involving module globals" as "a design mistake" . And perhaps they are! I wouldn't call it a different topic, though: so long as people are *viewing* shutdown __del__ problems as just another instance of finalizers in cyclic trash, it makes the latter *seem* inescapably "normal", and so something that has to be catered to. If you have a way to take the shutdown problems out of the discussion, it would help clarify both topics, at the very least by deconflating them. it's-a-mailing-list-so-no-need-to-stay-on-topic-ly y'rs - tim From gstein at lyra.org Thu Mar 16 13:01:36 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 16 Mar 2000 04:01:36 -0800 (PST) Subject: [Python-Dev] Unicode patches checked in In-Reply-To: <38CF88A0.CF876A74@tismer.com> Message-ID: On Wed, 15 Mar 2000, Christian Tismer wrote: >... > Would it be possible to make the Unicode support configurable? This might be interesting from the standpoint of those guys who are doing the tiny Python interpreter thingy for embedded systems. > My problem is that patches in the CVS are of different kinds. > Some are error corrections and enhancements which I would > definately like to use. > Others are brand new features like the Unicode support. > Absolutely great stuff! But this will most probably change > a number of times again, and I think it is a bad idea when > I include it into my Stackless distribution. > > I'd appreciate it very much if I could use the same CVS tree > for testing new stuff, and to build my distribution, with > new features switched off. Please :-) But! I find this reason completely off the mark. In essence, you're arguing that we should not put *any* new feature into the CVS repository because it might mess up what *you* are doing. Sorry, but that just irks me. If you want a stable Python, then don't use the CVS version. Or base it off a specific tag in CVS. Or something. Just don't ask for development to be stopped. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Thu Mar 16 13:08:43 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 16 Mar 2000 04:08:43 -0800 (PST) Subject: [Python-Dev] const data (was: Unicode patches checked in) In-Reply-To: <200003151627.RAA32543@python.inrialpes.fr> Message-ID: On Wed, 15 Mar 2000, Vladimir Marangozov wrote: > > [me] > > > > > > Perhaps it would make sense to move the Unicode database on the > > > Python side (write it in Python)? Or init the database dynamically > > > in the unicodedata module on import? It's quite big, so if it's > > > possible to avoid the static declaration (and if the unicodata module > > > is enabled by default), I'd vote for a dynamic initialization of the > > > database from reference (Python ?) file(s). > > [Marc-Andre] > > > > The unicodedatabase module contains the Unicode database > > as static C data - this makes it shareable among (Python) > > processes. > > The static data is shared if the module is a shared object (.so). > If unicodedata is not a .so, then you'll have a seperate copy of the > database in each process. Nope. A shared module means that multiple executables can share the code. Whether the const data resides in an executable or a .so, the OS will map it into readonly memory and share it across all procsses. > > Python modules don't provide this feature: instead a dictionary > > would have to be built on import which would increase the heap > > size considerably. Those dicts would *not* be shareable. > > I haven't mentioned dicts, have I? I suggested that the entries in the > C version of the database be rewritten in Python (or a text file) > The unicodedata module would, in it's init function, allocate memory > for the database and would populate it before returning "import okay" > to Python -- this is one way to init the db dynamically, among others. This would place all that data into the per-process heap. Definitely not shared, and definitely a big hit for each Python process. > As to sharing the database among different processes, this is a classic > IPC pb, which has nothing to do with the static C declaration of the db. > Or, hmmm, one of us is royally confused . This isn't IPC. It is sharing of some constant data. The most effective way to manage this is through const C data. The OS will properly manage it. And sorry, David, but mmap'ing a file will simply add complexity. As jcw mentioned, the OS is pretty much doing this anyhow when it deals with a const data segment in your executable. I don't believe this is Linux specific. This kind of stuff has been done for a *long* time on the platforms, too. Side note: the most effective way of exposing this const data up to Python (without shoving it onto the heap) is through buffers created via: PyBuffer_FromMemory(ptr, size) This allows the data to reside in const, shared memory while it is also exposed up to Python. Cheers, -g -- Greg Stein, http://www.lyra.org/ From marangoz at python.inrialpes.fr Thu Mar 16 13:39:42 2000 From: marangoz at python.inrialpes.fr (Vladimir Marangozov) Date: Thu, 16 Mar 2000 13:39:42 +0100 (CET) Subject: [Python-Dev] const data (was: Unicode patches checked in) In-Reply-To: from "Greg Stein" at Mar 16, 2000 04:08:43 AM Message-ID: <200003161239.NAA01671@python.inrialpes.fr> Greg Stein wrote: > > [me] > > The static data is shared if the module is a shared object (.so). > > If unicodedata is not a .so, then you'll have a seperate copy of the > > database in each process. > > Nope. A shared module means that multiple executables can share the code. > Whether the const data resides in an executable or a .so, the OS will map > it into readonly memory and share it across all procsses. I must have been drunk yesterday. You're right. > I don't believe this is Linux specific. This kind of stuff has been done > for a *long* time on the platforms, too. Yes. > > Side note: the most effective way of exposing this const data up to Python > (without shoving it onto the heap) is through buffers created via: > PyBuffer_FromMemory(ptr, size) > This allows the data to reside in const, shared memory while it is also > exposed up to Python. And to avoid the size increase of the Python library, perhaps unicodedata needs to be uncommented by default in Setup.in (for the release, not now). As M-A pointed out, the module isn't isn't necessary for the normal operation of the interpreter. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From gstein at lyra.org Thu Mar 16 13:56:21 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 16 Mar 2000 04:56:21 -0800 (PST) Subject: [Python-Dev] Finalizers considered questionable ;) In-Reply-To: <000201bf8f16$237e5e80$662d153f@tim> Message-ID: On Thu, 16 Mar 2000, Tim Peters wrote: >... > IOW, you view "the current circular references involving module globals" as > "a design mistake" . And perhaps they are! I wouldn't call it a > different topic, though: so long as people are *viewing* shutdown __del__ > problems as just another instance of finalizers in cyclic trash, it makes > the latter *seem* inescapably "normal", and so something that has to be > catered to. If you have a way to take the shutdown problems out of the > discussion, it would help clarify both topics, at the very least by > deconflating them. Bah. Module globals are easy. My tp_clean suggestion handles them quite easily at shutdown. No more special-code in import.c. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tismer at tismer.com Thu Mar 16 13:53:46 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 16 Mar 2000 13:53:46 +0100 Subject: [Python-Dev] Unicode patches checked in References: Message-ID: <38D0D95A.B13EC17E@tismer.com> Greg Stein wrote: > > On Wed, 15 Mar 2000, Christian Tismer wrote: > >... > > Would it be possible to make the Unicode support configurable? > > This might be interesting from the standpoint of those guys who are doing > the tiny Python interpreter thingy for embedded systems. > > > My problem is that patches in the CVS are of different kinds. > > Some are error corrections and enhancements which I would > > definately like to use. > > Others are brand new features like the Unicode support. > > Absolutely great stuff! But this will most probably change > > a number of times again, and I think it is a bad idea when > > I include it into my Stackless distribution. > > > > I'd appreciate it very much if I could use the same CVS tree > > for testing new stuff, and to build my distribution, with > > new features switched off. Please :-) > > But! I find this reason completely off the mark. In essence, you're > arguing that we should not put *any* new feature into the CVS repository > because it might mess up what *you* are doing. No, this is your interpretation, and a reduction which I can't follow. There are inprovements and features in the CVS version which I need. I prefer to build against it, instead of the old 1.5.2. What's wrong with that? I want to find a way that gives me the least trouble in doing so. > Sorry, but that just irks me. If you want a stable Python, then don't use > the CVS version. Or base it off a specific tag in CVS. Or something. Just > don't ask for development to be stopped. No, I ask for development to be stopped. Code freeze until Y3k :-) Why are you trying to put such a nonsense into my mouth? You know that I know that you know better. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From tismer at tismer.com Thu Mar 16 14:25:48 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 16 Mar 2000 14:25:48 +0100 Subject: [Python-Dev] const data (was: Unicode patches checked in) References: <200003161239.NAA01671@python.inrialpes.fr> Message-ID: <38D0E0DC.B997F836@tismer.com> Vladimir Marangozov wrote: > > Greg Stein wrote: > > Side note: the most effective way of exposing this const data up to Python > > (without shoving it onto the heap) is through buffers created via: > > PyBuffer_FromMemory(ptr, size) > > This allows the data to reside in const, shared memory while it is also > > exposed up to Python. > > And to avoid the size increase of the Python library, perhaps unicodedata > needs to be uncommented by default in Setup.in (for the release, not now). > As M-A pointed out, the module isn't isn't necessary for the normal > operation of the interpreter. Sounds like a familiar idea. :-) BTW., yesterday evening I wrote an analysis script, to see how far this data is compactable without going into real compression, just redundancy folding and byte/short indexing was used. If I'm not wrong, this reduces the size of the database to less than 25kb. That small amount of extra data would make the uncommenting feature quite unimportant, except for the issue of building tiny Pythons. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From gstein at lyra.org Thu Mar 16 14:06:46 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 16 Mar 2000 05:06:46 -0800 (PST) Subject: [Python-Dev] Unicode patches checked in In-Reply-To: <38D0D95A.B13EC17E@tismer.com> Message-ID: On Thu, 16 Mar 2000, Christian Tismer wrote: > Greg Stein wrote: >... > > Sorry, but that just irks me. If you want a stable Python, then don't use > > the CVS version. Or base it off a specific tag in CVS. Or something. Just > > don't ask for development to be stopped. > > No, I ask for development to be stopped. Code freeze until Y3k :-) > Why are you trying to put such a nonsense into my mouth? > You know that I know that you know better. Simply because that is what it sounds like on this side of my monitor :-) I'm seeing your request as asking for people to make special considerations in their patches for your custom distribution. While I don't have a problem with making Python more flexible to distro maintainers, it seemed like you were approaching it from the "wrong" angle. Like I said, making Unicode optional for the embedded space makes sense; making it optional so it doesn't bloat your distro didn't :-) Not a big deal... it is mostly a perception on my part. I also tend to dislike things that hold development back. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Fri Mar 17 19:53:39 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 17 Mar 2000 19:53:39 +0100 Subject: [Python-Dev] Unicode Update 2000-03-17 Message-ID: <38D27F33.4055A942@lemburg.com> Attached you find an update of the Unicode implementation. The patch is against the current CVS version. I would appreciate if someone with CVS checkin permissions could check the changes in. The patch contains all bugs and patches sent this week and also fixes a leak in the codecs code and a bug in the free list code for Unicode objects (which only shows up when compiling Python with Py_DEBUG; thanks to MarkH for spotting this one). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ -------------- next part -------------- Only in CVS-Python/Doc/tools: anno-api.py diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Include/unicodeobject.h Python+Unicode/Include/unicodeobject.h --- CVS-Python/Include/unicodeobject.h Fri Mar 17 15:24:30 2000 +++ Python+Unicode/Include/unicodeobject.h Tue Mar 14 10:38:08 2000 @@ -1,8 +1,5 @@ #ifndef Py_UNICODEOBJECT_H #define Py_UNICODEOBJECT_H -#ifdef __cplusplus -extern "C" { -#endif /* @@ -109,8 +106,9 @@ /* --- Internal Unicode Operations ---------------------------------------- */ /* If you want Python to use the compiler's wctype.h functions instead - of the ones supplied with Python, define WANT_WCTYPE_FUNCTIONS. - This reduces the interpreter's code size. */ + of the ones supplied with Python, define WANT_WCTYPE_FUNCTIONS or + configure Python using --with-ctype-functions. This reduces the + interpreter's code size. */ #if defined(HAVE_USABLE_WCHAR_T) && defined(WANT_WCTYPE_FUNCTIONS) @@ -169,6 +167,10 @@ (!memcmp((string)->str + (offset), (substring)->str,\ (substring)->length*sizeof(Py_UNICODE))) +#ifdef __cplusplus +extern "C" { +#endif + /* --- Unicode Type ------------------------------------------------------- */ typedef struct { @@ -647,7 +649,7 @@ int direction /* Find direction: +1 forward, -1 backward */ ); -/* Count the number of occurances of substr in str[start:end]. */ +/* Count the number of occurrences of substr in str[start:end]. */ extern DL_IMPORT(int) PyUnicode_Count( PyObject *str, /* String */ @@ -656,7 +658,7 @@ int end /* Stop index */ ); -/* Replace at most maxcount occurances of substr in str with replstr +/* Replace at most maxcount occurrences of substr in str with replstr and return the resulting Unicode object. */ extern DL_IMPORT(PyObject *) PyUnicode_Replace( diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/codecs.py Python+Unicode/Lib/codecs.py --- CVS-Python/Lib/codecs.py Sat Mar 11 00:20:43 2000 +++ Python+Unicode/Lib/codecs.py Mon Mar 13 14:33:54 2000 @@ -55,7 +55,7 @@ """ def encode(self,input,errors='strict'): - """ Encodes the object intput and returns a tuple (output + """ Encodes the object input and returns a tuple (output object, length consumed). errors defines the error handling to apply. It defaults to diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/encodings/__init__.py Python+Unicode/Lib/encodings/__init__.py --- CVS-Python/Lib/encodings/__init__.py Sat Mar 11 00:17:18 2000 +++ Python+Unicode/Lib/encodings/__init__.py Mon Mar 13 14:30:33 2000 @@ -30,13 +30,13 @@ import string,codecs,aliases _cache = {} -_unkown = '--unkown--' +_unknown = '--unknown--' def search_function(encoding): # Cache lookup - entry = _cache.get(encoding,_unkown) - if entry is not _unkown: + entry = _cache.get(encoding,_unknown) + if entry is not _unknown: return entry # Import the module diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_string.py Python+Unicode/Lib/test/test_string.py --- CVS-Python/Lib/test/test_string.py Sat Mar 11 10:52:43 2000 +++ Python+Unicode/Lib/test/test_string.py Mon Mar 13 10:12:46 2000 @@ -143,6 +143,7 @@ test('translate', 'xyz', 'xyz', table) test('replace', 'one!two!three!', 'one at two!three!', '!', '@', 1) +test('replace', 'one!two!three!', 'onetwothree', '!', '') test('replace', 'one!two!three!', 'one at two@three!', '!', '@', 2) test('replace', 'one!two!three!', 'one at two@three@', '!', '@', 3) test('replace', 'one!two!three!', 'one at two@three@', '!', '@', 4) diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_unicode.py Python+Unicode/Lib/test/test_unicode.py --- CVS-Python/Lib/test/test_unicode.py Fri Mar 17 15:24:31 2000 +++ Python+Unicode/Lib/test/test_unicode.py Mon Mar 13 10:13:05 2000 @@ -108,6 +108,7 @@ test('translate', u'xyz', u'xyz', table) test('replace', u'one!two!three!', u'one at two!three!', u'!', u'@', 1) +test('replace', u'one!two!three!', u'onetwothree', '!', '') test('replace', u'one!two!three!', u'one at two@three!', u'!', u'@', 2) test('replace', u'one!two!three!', u'one at two@three@', u'!', u'@', 3) test('replace', u'one!two!three!', u'one at two@three@', u'!', u'@', 4) diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Misc/unicode.txt Python+Unicode/Misc/unicode.txt --- CVS-Python/Misc/unicode.txt Sat Mar 11 00:14:11 2000 +++ Python+Unicode/Misc/unicode.txt Fri Mar 17 16:55:11 2000 @@ -743,8 +743,9 @@ stream codecs as available through the codecs module should be used. -XXX There should be a short-cut open(filename,mode,encoding) available which - also assures that mode contains the 'b' character when needed. +The codecs module should provide a short-cut open(filename,mode,encoding) +available which also assures that mode contains the 'b' character when +needed. File/Stream Input: @@ -810,6 +811,10 @@ Introduction to Unicode (a little outdated by still nice to read): http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html +For comparison: + Introducing Unicode to ECMAScript -- + http://www-4.ibm.com/software/developer/library/internationalization-support.html + Encodings: Overview: @@ -832,7 +837,7 @@ History of this Proposal: ------------------------- -1.2: +1.2: Removed POD about codecs.open() 1.1: Added note about comparisons and hash values. Added note about case mapping algorithms. Changed stream codecs .read() and .write() method to match the standard file-like object methods diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Modules/stropmodule.c Python+Unicode/Modules/stropmodule.c --- CVS-Python/Modules/stropmodule.c Wed Mar 1 10:22:53 2000 +++ Python+Unicode/Modules/stropmodule.c Mon Mar 13 14:33:23 2000 @@ -1054,7 +1054,7 @@ strstr replacement for arbitrary blocks of memory. - Locates the first occurance in the memory pointed to by MEM of the + Locates the first occurrence in the memory pointed to by MEM of the contents of memory pointed to by PAT. Returns the index into MEM if found, or -1 if not found. If len of PAT is greater than length of MEM, the function returns -1. diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/stringobject.c Python+Unicode/Objects/stringobject.c --- CVS-Python/Objects/stringobject.c Tue Mar 14 00:14:17 2000 +++ Python+Unicode/Objects/stringobject.c Mon Mar 13 14:33:24 2000 @@ -1395,7 +1395,7 @@ strstr replacement for arbitrary blocks of memory. - Locates the first occurance in the memory pointed to by MEM of the + Locates the first occurrence in the memory pointed to by MEM of the contents of memory pointed to by PAT. Returns the index into MEM if found, or -1 if not found. If len of PAT is greater than length of MEM, the function returns -1. @@ -1578,7 +1578,7 @@ return NULL; if (sub_len <= 0) { - PyErr_SetString(PyExc_ValueError, "empty replacement string"); + PyErr_SetString(PyExc_ValueError, "empty pattern string"); return NULL; } new_s = mymemreplace(str,len,sub,sub_len,repl,repl_len,count,&out_len); Only in CVS-Python/Objects: stringobject.c.orig diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/unicodeobject.c Python+Unicode/Objects/unicodeobject.c --- CVS-Python/Objects/unicodeobject.c Tue Mar 14 00:14:17 2000 +++ Python+Unicode/Objects/unicodeobject.c Wed Mar 15 10:49:19 2000 @@ -83,7 +83,7 @@ all objects on the free list having a size less than this limit. This reduces malloc() overhead for small Unicode objects. - At worse this will result in MAX_UNICODE_FREELIST_SIZE * + At worst this will result in MAX_UNICODE_FREELIST_SIZE * (sizeof(PyUnicodeObject) + STAYALIVE_SIZE_LIMIT + malloc()-overhead) bytes of unused garbage. @@ -180,7 +180,7 @@ unicode_freelist = *(PyUnicodeObject **)unicode_freelist; unicode_freelist_size--; unicode->ob_type = &PyUnicode_Type; - _Py_NewReference(unicode); + _Py_NewReference((PyObject *)unicode); if (unicode->str) { if (unicode->length < length && _PyUnicode_Resize(unicode, length)) { @@ -199,16 +199,19 @@ unicode->str = PyMem_NEW(Py_UNICODE, length + 1); } - if (!unicode->str) { - PyMem_DEL(unicode); - PyErr_NoMemory(); - return NULL; - } + if (!unicode->str) + goto onError; unicode->str[length] = 0; unicode->length = length; unicode->hash = -1; unicode->utf8str = NULL; return unicode; + + onError: + _Py_ForgetReference((PyObject *)unicode); + PyMem_DEL(unicode); + PyErr_NoMemory(); + return NULL; } static @@ -224,7 +227,6 @@ *(PyUnicodeObject **)unicode = unicode_freelist; unicode_freelist = unicode; unicode_freelist_size++; - _Py_ForgetReference(unicode); } else { free(unicode->str); @@ -489,7 +491,7 @@ } else { PyErr_Format(PyExc_ValueError, - "UTF-8 decoding error; unkown error handling code: %s", + "UTF-8 decoding error; unknown error handling code: %s", errors); return -1; } @@ -611,7 +613,7 @@ else { PyErr_Format(PyExc_ValueError, "UTF-8 encoding error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } @@ -733,7 +735,7 @@ } else { PyErr_Format(PyExc_ValueError, - "UTF-16 decoding error; unkown error handling code: %s", + "UTF-16 decoding error; unknown error handling code: %s", errors); return -1; } @@ -921,7 +923,7 @@ else { PyErr_Format(PyExc_ValueError, "Unicode-Escape decoding error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } @@ -1051,6 +1053,10 @@ */ +static const Py_UNICODE *findchar(const Py_UNICODE *s, + int size, + Py_UNICODE ch); + static PyObject *unicodeescape_string(const Py_UNICODE *s, int size, @@ -1069,9 +1075,6 @@ p = q = PyString_AS_STRING(repr); if (quotes) { - static const Py_UNICODE *findchar(const Py_UNICODE *s, - int size, - Py_UNICODE ch); *p++ = 'u'; *p++ = (findchar(s, size, '\'') && !findchar(s, size, '"')) ? '"' : '\''; @@ -1298,7 +1301,7 @@ else { PyErr_Format(PyExc_ValueError, "Latin-1 encoding error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } @@ -1369,7 +1372,7 @@ else { PyErr_Format(PyExc_ValueError, "ASCII decoding error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } @@ -1431,7 +1434,7 @@ else { PyErr_Format(PyExc_ValueError, "ASCII encoding error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } @@ -1502,7 +1505,7 @@ else { PyErr_Format(PyExc_ValueError, "charmap decoding error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } @@ -1618,7 +1621,7 @@ else { PyErr_Format(PyExc_ValueError, "charmap encoding error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } @@ -1750,7 +1753,7 @@ else { PyErr_Format(PyExc_ValueError, "translate error; " - "unkown error handling code: %s", + "unknown error handling code: %s", errors); return -1; } diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Python/codecs.c Python+Unicode/Python/codecs.c --- CVS-Python/Python/codecs.c Fri Mar 10 23:57:27 2000 +++ Python+Unicode/Python/codecs.c Wed Mar 15 11:27:54 2000 @@ -93,9 +93,14 @@ PyObject *_PyCodec_Lookup(const char *encoding) { - PyObject *result, *args = NULL, *v; + PyObject *result, *args = NULL, *v = NULL; int i, len; + if (_PyCodec_SearchCache == NULL || _PyCodec_SearchPath == NULL) { + PyErr_SetString(PyExc_SystemError, + "codec module not properly initialized"); + goto onError; + } if (!import_encodings_called) import_encodings(); @@ -109,6 +114,7 @@ result = PyDict_GetItem(_PyCodec_SearchCache, v); if (result != NULL) { Py_INCREF(result); + Py_DECREF(v); return result; } @@ -121,6 +127,7 @@ if (args == NULL) goto onError; PyTuple_SET_ITEM(args,0,v); + v = NULL; for (i = 0; i < len; i++) { PyObject *func; @@ -146,7 +153,7 @@ if (i == len) { /* XXX Perhaps we should cache misses too ? */ PyErr_SetString(PyExc_LookupError, - "unkown encoding"); + "unknown encoding"); goto onError; } @@ -156,6 +163,7 @@ return result; onError: + Py_XDECREF(v); Py_XDECREF(args); return NULL; } @@ -378,5 +386,7 @@ void _PyCodecRegistry_Fini() { Py_XDECREF(_PyCodec_SearchPath); + _PyCodec_SearchPath = NULL; Py_XDECREF(_PyCodec_SearchCache); + _PyCodec_SearchCache = NULL; } From bwarsaw at cnri.reston.va.us Fri Mar 17 20:16:02 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 17 Mar 2000 14:16:02 -0500 (EST) Subject: [Python-Dev] Unicode Update 2000-03-17 References: <38D27F33.4055A942@lemburg.com> Message-ID: <14546.33906.771022.916209@anthem.cnri.reston.va.us> >>>>> "M" == M writes: M> The patch is against the current CVS version. I would M> appreciate if someone with CVS checkin permissions could check M> the changes in. Hi MAL, I just tried to apply your patch against the tree, however patch complains that the Lib/codecs.py patch is reversed. I haven't looked closely at it, but do you have any ideas? Or why don't you just send me Lib/codecs.py and I'll drop it in place. Everything else patched cleanly. -Barry From ping at lfw.org Fri Mar 17 15:06:13 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 08:06:13 -0600 (CST) Subject: [Python-Dev] Boolean type for Py3K? Message-ID: I wondered to myself today while reading through the Python tutorial whether it would be a good idea to have a separate boolean type in Py3K. Would this help catch common mistakes? I won't presume to truly understand the new-to-Python experience, but one might *guess* that >>> 5 > 3 true would make a little more sense to a beginner than >>> 5 > 3 1 Of course this means introducing "true" and "false" as keywords (or built-in values like None -- perhaps they should be spelled True and False?) and completely changing the way a lot of code runs by introducing a bunch of type checking, so it may be too radical a change, but -- And i don't know if it's already been discussed a lot, but -- I thought it wouldn't hurt just to raise the question. -- ?!ng From ping at lfw.org Fri Mar 17 15:06:55 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 08:06:55 -0600 (CST) Subject: [Python-Dev] Should None be a keyword? Message-ID: Related to my last message: should None become a keyword in Py3K? -- ?!ng From bwarsaw at cnri.reston.va.us Fri Mar 17 21:49:24 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 17 Mar 2000 15:49:24 -0500 (EST) Subject: [Python-Dev] Boolean type for Py3K? References: Message-ID: <14546.39508.312796.221069@anthem.cnri.reston.va.us> >>>>> "KY" == Ka-Ping Yee writes: KY> I wondered to myself today while reading through the Python KY> tutorial whether it would be a good idea to have a separate KY> boolean type in Py3K. Would this help catch common mistakes? Almost a year ago, I mused about a boolean type in c.l.py, and came up with this prototype in Python. -------------------- snip snip -------------------- class Boolean: def __init__(self, flag=0): self.__flag = not not flag def __str__(self): return self.__flag and 'true' or 'false' def __repr__(self): return self.__str__() def __nonzero__(self): return self.__flag == 1 def __cmp__(self, other): if (self.__flag and other) or (not self.__flag and not other): return 0 else: return 1 def __rcmp__(self, other): return -self.__cmp__(other) true = Boolean(1) false = Boolean() -------------------- snip snip -------------------- I think it makes sense to augment Python's current truth rules with a built-in boolean type and True and False values. But unless it's tied in more deeply (e.g. comparisons return one of these instead of integers -- and what are the implications of that?) then it's pretty much just syntactic sugar <0.75 lick>. -Barry From bwarsaw at cnri.reston.va.us Fri Mar 17 21:50:00 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 17 Mar 2000 15:50:00 -0500 (EST) Subject: [Python-Dev] Should None be a keyword? References: Message-ID: <14546.39544.673335.378797@anthem.cnri.reston.va.us> >>>>> "KY" == Ka-Ping Yee writes: KY> Related to my last message: should None become a keyword in KY> Py3K? Why? Just to reserve it? -Barry From moshez at math.huji.ac.il Fri Mar 17 21:52:29 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Fri, 17 Mar 2000 22:52:29 +0200 (IST) Subject: [Python-Dev] Boolean type for Py3K? In-Reply-To: <14546.39508.312796.221069@anthem.cnri.reston.va.us> Message-ID: On Fri, 17 Mar 2000, Barry A. Warsaw wrote: > Almost a year ago, I mused about a boolean type in c.l.py, and came up > with this prototype in Python. Cool prototype! However, I think I have a problem with the proposed semantics: > def __cmp__(self, other): > if (self.__flag and other) or (not self.__flag and not other): > return 0 > else: > return 1 This means: true == 1 true == 2 But 1 != 2 I have some difficulty with == not being an equivalence relation... > I think it makes sense to augment Python's current truth rules with a > built-in boolean type and True and False values. Right on! Except for the built-in...why not have it like exceptions.py, Python code necessary for the interpreter? Languages which compile themselves are not unheard of > But unless it's tied > in more deeply (e.g. comparisons return one of these instead of > integers -- and what are the implications of that?) Breaking loads of horrible code. Unacceptable for the 1.x series, but perfectly fine in Py3K -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From effbot at telia.com Fri Mar 17 22:12:15 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 17 Mar 2000 22:12:15 +0100 Subject: [Python-Dev] Should None be a keyword? References: <14546.39544.673335.378797@anthem.cnri.reston.va.us> Message-ID: <004e01bf9055$79012000$34aab5d4@hagrid> Barry A. Warsaw wrote: > >>>>> "KY" == Ka-Ping Yee writes: > > KY> Related to my last message: should None become a keyword in > KY> Py3K? > > Why? Just to reserve it? to avoid stuff errors like: def foo(): result = None # two screenfuls of code None, a, b = mytuple # perlish unpacking which gives an interesting error on the first line, instead of a syntax error on the last. From guido at python.org Fri Mar 17 22:20:05 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 17 Mar 2000 16:20:05 -0500 Subject: [Python-Dev] Should None be a keyword? In-Reply-To: Your message of "Fri, 17 Mar 2000 08:06:55 CST." References: Message-ID: <200003172120.QAA09045@eric.cnri.reston.va.us> Yes. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Mar 17 22:20:36 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 17 Mar 2000 16:20:36 -0500 Subject: [Python-Dev] Boolean type for Py3K? In-Reply-To: Your message of "Fri, 17 Mar 2000 08:06:13 CST." References: Message-ID: <200003172120.QAA09115@eric.cnri.reston.va.us> Yes. True and False make sense. --Guido van Rossum (home page: http://www.python.org/~guido/) From pf at artcom-gmbh.de Fri Mar 17 22:17:06 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Fri, 17 Mar 2000 22:17:06 +0100 (MET) Subject: [Python-Dev] Should None be a keyword? In-Reply-To: <14546.39544.673335.378797@anthem.cnri.reston.va.us> from "Barry A. Warsaw" at "Mar 17, 2000 3:50: 0 pm" Message-ID: > >>>>> "KY" == Ka-Ping Yee writes: > > KY> Related to my last message: should None become a keyword in > KY> Py3K? Barry A. Warsaw schrieb: > Why? Just to reserve it? This is related to the general type checking discussion. IMO the suggested >>> 1 > 0 True wouldn't buy us much, as long the following behaviour stays in Py3K: >>> a = '2' ; b = 3 >>> a < b 0 >>> a > b 1 This is irritating to Newcomers (at least from rather short time experience as member of python-help)! And this is esspecially irritating, since you can't do >>> c = a + b Traceback (innermost last): File "", line 1, in ? TypeError: illegal argument type for built-in operation IMO this difference is far more difficult to catch for newcomers than the far more often discussed 5/3 == 1 behaviour. Have a nice weekend and don't forget to hunt for remaining bugs in Fred upcoming 1.5.2p2 docs ;-), Peter. -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From ping at lfw.org Fri Mar 17 16:53:38 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 09:53:38 -0600 (CST) Subject: [Python-Dev] list.shift() Message-ID: Has list.shift() been proposed? # pretend lists are implemented in Python and 'self' is a list def shift(self): item = self[0] del self[:1] return item This would make queues read nicely... use "append" and "pop" for a stack, "append" and "shift" for a queue. (This is while on the thought-train of "making built-in types do more, rather than introducing more special types", as you'll see in my next message.) -- ?!ng From gvanrossum at beopen.com Fri Mar 17 23:00:18 2000 From: gvanrossum at beopen.com (Guido van Rossum) Date: Fri, 17 Mar 2000 17:00:18 -0500 Subject: [Python-Dev] list.shift() References: Message-ID: <38D2AAF2.CFBF3A2@beopen.com> Ka-Ping Yee wrote: > > Has list.shift() been proposed? > > # pretend lists are implemented in Python and 'self' is a list > def shift(self): > item = self[0] > del self[:1] > return item > > This would make queues read nicely... use "append" and "pop" for > a stack, "append" and "shift" for a queue. > > (This is while on the thought-train of "making built-in types do > more, rather than introducing more special types", as you'll see > in my next message.) You can do this using list.pop(0). I don't think the name "shift" is very intuitive (smells of sh and Perl :-). Do we need a new function? --Guido From ping at lfw.org Fri Mar 17 17:08:37 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 10:08:37 -0600 (CST) Subject: [Python-Dev] Using lists as sets Message-ID: A different way to provide sets in Python, which occurred to me on Wednesday at Guido's talk in Mountain View (hi Guido!), is to just make lists work better. Someone asked Guido a question about the ugliness of using dicts in a certain way, and it was clear that what he wanted was a real set. Guido's objection to introducing more core data types is that it makes it more difficult to choose which data type to use, and opens the possibility of using entirely the wrong one -- a very well-taken point, i thought. (That recently-mentioned study of scripting vs. system language performance seems relevant here: a few of the C programs submitted were much *slower* than the ones in Python or Perl just because people had to choose and implement their own data structures, and so they were able to completely shoot themselves in both feet and lose a leg or two in the process.) So... Hypothesis: The only real reason people might want a separate set type, or have to use dicts as sets, is that linear search on a list is too slow. Therefore: All we have to do is speed up "in" on lists, and now we have a set type that is nice to read and write, and already has nice spellings for set semantics like "in". Implementation possibilities: + Whip up a hash table behind the scenes if "in" gets used a lot on a particular list and all its members are hashable. This makes "in" no longer O(n), which is most of the battle. remove() can also be cheap -- though you have to do a little more bookkeeping to take care of multiple copies of elements. + Or, add a couple of methods, e.g. take() appends an item to a list if it's not there already, drop() removes all copies of an item from a list. These tip us off: the first time one of these methods gets used, we make the hash table then. I think the semantics would be pretty understandable and simple to explain, which is the main thing. Any thoughts? -- ?!ng From ping at lfw.org Fri Mar 17 17:12:22 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 10:12:22 -0600 (CST) Subject: [Python-Dev] list.shift() In-Reply-To: <38D2AAF2.CFBF3A2@beopen.com> Message-ID: On Fri, 17 Mar 2000, Guido van Rossum wrote: > You can do this using list.pop(0). I don't think the name "shift" is very > intuitive (smells of sh and Perl :-). Do we need a new function? Oh -- sorry, that's my ignorance showing. I didn't know pop() took an argument (of course it would -- duh...). No need to add anything more, then, i think. Sorry! Fred et al. on doc-sig: it would be really good for the tutorial to show a queue example and a stack example in the section where list methods are introduced. -- ?!ng From ping at lfw.org Fri Mar 17 17:13:44 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 10:13:44 -0600 (CST) Subject: [Python-Dev] Boolean type for Py3K? In-Reply-To: <200003172120.QAA09115@eric.cnri.reston.va.us> Message-ID: Guido: (re None being a keyword) > Yes. Guido: (re booleans) > Yes. True and False make sense. Astounding. I don't think i've ever seen such quick agreement on anything! And twice in one day! I'm think i'm going to go lie down. :) :) -- ?!ng From DavidA at ActiveState.com Fri Mar 17 23:23:53 2000 From: DavidA at ActiveState.com (David Ascher) Date: Fri, 17 Mar 2000 14:23:53 -0800 Subject: [Python-Dev] Using lists as sets In-Reply-To: Message-ID: > I think the semantics would be pretty understandable and simple to > explain, which is the main thing. > > Any thoughts? Would (a,b) in Set return true of (a,b) was a subset of Set, or if (a,b) was an element of Set? --david From mal at lemburg.com Fri Mar 17 23:41:46 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 17 Mar 2000 23:41:46 +0100 Subject: [Python-Dev] Boolean type for Py3K? References: <200003172120.QAA09115@eric.cnri.reston.va.us> Message-ID: <38D2B4AA.2EE933BD@lemburg.com> Guido van Rossum wrote: > > Yes. True and False make sense. mx.Tools defines these as new builtins... and they correspond to the C level singletons Py_True and Py_False. # Truth constants True = (1==1) False = (1==0) I'm not sure whether breaking the idiom of True == 1 and False == 0 (or in other words: truth values are integers) would be such a good idea. Nothing against adding name bindings in __builtins__ though... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From ping at lfw.org Fri Mar 17 17:53:12 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 10:53:12 -0600 (CST) Subject: [Python-Dev] Boolean type for Py3K? In-Reply-To: <14546.39508.312796.221069@anthem.cnri.reston.va.us> Message-ID: On Fri, 17 Mar 2000, Barry A. Warsaw wrote: > Almost a year ago, I mused about a boolean type in c.l.py, and came up > with this prototype in Python. > > -------------------- snip snip -------------------- > class Boolean: [...] > > I think it makes sense to augment Python's current truth rules with a > built-in boolean type and True and False values. But unless it's tied > in more deeply (e.g. comparisons return one of these instead of > integers -- and what are the implications of that?) then it's pretty > much just syntactic sugar <0.75 lick>. Yeah, and the whole point *is* the change in semantics, not the syntactic sugar. I'm hoping we can gain some safety from the type checking... though i can't seem to think of a good example off the top of my head. It's easier to think of examples if things like 'if', 'and', 'or', etc. only accept booleans as conditional arguments -- but i can't imagine going that far, as that would just be really annoying. Let's see. Specifically, the following would probably return booleans: magnitude comparisons: <, >, <=, >= (and __cmp__) value equality comparisons: ==, != identity comparisons: is, is not containment tests: in, not in (and __contains__) ... and booleans would be different from integers in that arithmetic would be illegal... but that's about it. (?) Booleans are still storable immutable values; they could be keys to dicts but not lists; i don't know what else. Maybe this wouldn't actually buy us anything except for the nicer spelling of "True" and "False", which might not be worth it. ... Hmm. Can anyone think of common cases where this could help? -- n!?g From ping at lfw.org Fri Mar 17 17:59:17 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 10:59:17 -0600 (CST) Subject: [Python-Dev] Using lists as sets In-Reply-To: Message-ID: On Fri, 17 Mar 2000, David Ascher wrote: > > I think the semantics would be pretty understandable and simple to > > explain, which is the main thing. > > > > Any thoughts? > > Would > > (a,b) in Set > > return true of (a,b) was a subset of Set, or if (a,b) was an element of Set? This would return true if (a, b) was an element of the set -- exactly the same semantics as we currently have for lists. Ideally it would also be kind of nice to use < > <= >= as subset/superset operators, but that requires revising the way we do comparisons, and you know, it might not really be used all that often anyway. -, |, and & could operate on lists sensibly when we use them as sets -- just define a few simple rules for ordering and you should be fine. e.g. c = a - b is equivalent to c = a for item in b: c.drop(item) c = a | b is equivalent to c = a for item in b: c.take(item) c = a & b is equivalent to c = [] for item in a: if item in b: c.take(item) where c.take(item) is equivalent to if item not in c: c.append(item) c.drop(item) is equivalent to while item in c: c.remove(item) The above is all just semantics, of course, to make the point that the semantics can be simple. The implementation could do different things that are much faster when there's a hash table helping out. -- ?!ng From gvwilson at nevex.com Sat Mar 18 00:28:05 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Fri, 17 Mar 2000 18:28:05 -0500 (EST) Subject: [Python-Dev] Boolean type for Py3K? In-Reply-To: Message-ID: > Guido: (re None being a keyword) > > Yes. > Guido: (re booleans) > > Yes. True and False make sense. > Ka-Ping: > Astounding. I don't think i've ever seen such quick agreement on > anything! And twice in one day! I'm think i'm going to go lie down. No, no, keep going --- you're on a roll. Greg From ping at lfw.org Fri Mar 17 18:49:18 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 11:49:18 -0600 (CST) Subject: [Python-Dev] Using lists as sets In-Reply-To: Message-ID: On Fri, 17 Mar 2000, Ka-Ping Yee wrote: > > c.take(item) is equivalent to > > if item not in c: c.append(item) > > c.drop(item) is equivalent to > > while item in c: c.remove(item) I think i've decided that i like the verb "include" much better than the rather vague word "take". Perhaps this also suggests "exclude" instead of "drop". -- ?!ng From klm at digicool.com Sat Mar 18 01:32:56 2000 From: klm at digicool.com (Ken Manheimer) Date: Fri, 17 Mar 2000 19:32:56 -0500 (EST) Subject: [Python-Dev] Using lists as sets In-Reply-To: Message-ID: On Fri, 17 Mar 2000, Ka-Ping Yee wrote: > On Fri, 17 Mar 2000, David Ascher wrote: > > > I think the semantics would be pretty understandable and simple to > > > explain, which is the main thing. > > > > > > Any thoughts? > > > > Would > > > > (a,b) in Set > > > > return true of (a,b) was a subset of Set, or if (a,b) was an element of Set? > > This would return true if (a, b) was an element of the set -- > exactly the same semantics as we currently have for lists. I really like the idea of using dynamically-tuned lists provide set functionality! I often wind up needing something like set functionality, and implementing little convenience routines (unique, difference, etc) repeatedly. I don't mind that so much, but the frequency signifies that i, at least, would benefit from built-in support for sets... I guess the question is whether it's practical to come up with a reasonably adequate, reasonably general dynamic optimization strategy. Seems like an interesting challenge - is there prior art? As ping says, maintaining the existing list semantics handily answers challenges like david's question. New methods, like [].subset('a', 'b'), could provide the desired additional functionality - and contribute to biasing the object towards set optimization, etc. Neato! Ken klm at digicool.com From ping at lfw.org Fri Mar 17 20:02:13 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 17 Mar 2000 13:02:13 -0600 (CST) Subject: [Python-Dev] Using lists as sets In-Reply-To: Message-ID: On Fri, 17 Mar 2000, Ken Manheimer wrote: > > I really like the idea of using dynamically-tuned lists provide set > functionality! I often wind up needing something like set functionality, > and implementing little convenience routines (unique, difference, etc) > repeatedly. I don't mind that so much, but the frequency signifies that > i, at least, would benefit from built-in support for sets... Greg asked about how to ensure that a given item only appears once in each list when used as a set, and whether i would flag the list as "i'm now operating as a set". My answer is no -- i don't want there to be any visible state on the list. (It can internally decide to optimize its behaviour for a particular purpose, but in no event should this decision ever affect the semantics of its manifested behaviour.) Externally visible state puts us back right where we started -- now the user has to decide what type of thing she wants to use, and that's more decisions and loaded guns pointing at feet that we were trying to avoid in the first place. There's something very nice about there being just two mutable container types in Python. As Guido said, the first two types you learn are lists and dicts, and it's pretty obvious which one to pick for your purposes, and you can't really go wrong. I'd like to copy my reply to Greg here because it exposes some of the philosophy i'm attempting with this proposal: You'd trust the client to use take() (or should i say include()) instead of append(). But, in the end, this wouldn't make any difference to the result of "in". In fact, you could do multisets since lists already have count(). What i'm trying to do is to put together a few very simple pieces to get all the behaviour necessary to work with sets, if you want it. I don't want the object itself to have any state that manifests itself as "now i'm a set", or "now i'm a list". You just pick the methods you want to use. It's just like stacks and queues. There's no state on the list that says "now i'm a stack, so read from the end" or "now i'm a queue, so read from the front". You decide where you want to read items by picking the appropriate method, and this lets you get the best of both worlds -- flexibility and simplicity. Back to Ken: > I guess the question is whether it's practical to come up with a > reasonably adequate, reasonably general dynamic optimization strategy. > Seems like an interesting challenge - is there prior art? I'd be quite happy with just turning on set optimization when include() and exclude() get used (nice and predictable). Maybe you could provide a set() built-in that would construct you a list with set optimization turned on, but i'm not too sure if we really want to expose it that way. -- ?!ng From moshez at math.huji.ac.il Sat Mar 18 06:27:13 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 18 Mar 2000 07:27:13 +0200 (IST) Subject: [Python-Dev] list.shift() In-Reply-To: Message-ID: On Fri, 17 Mar 2000, Ka-Ping Yee wrote: > > Has list.shift() been proposed? > > # pretend lists are implemented in Python and 'self' is a list > def shift(self): > item = self[0] > del self[:1] > return item > > This would make queues read nicely... use "append" and "pop" for > a stack, "append" and "shift" for a queue. Actually, I once thought about writing a Deque in Python for a couple of hours (I later wrote it, and then threw it away because I had nothing to do with it, but that isn't my point). So I did write "shift" (though I'm certain I didn't call it that). It's not as easy to write a maintainable yet efficient "shift": I got stuck with a pointer to the beginning of the "real list" which I incremented on a "shift", and a complex heuristic for when lists de- and re-allocate. I think the tradeoffs are shaky enough that it is better to write it in pure Python rather then having more functions in C (whether in an old builtin type rather then a new one). Anyone needing to treat a list as a Deque would just construct one l = Deque(l) built-in-functions:-just-say-no-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From artcom0!pf at artcom-gmbh.de Fri Mar 17 23:43:35 2000 From: artcom0!pf at artcom-gmbh.de (artcom0!pf at artcom-gmbh.de) Date: Fri, 17 Mar 2000 23:43:35 +0100 (MET) Subject: [Python-Dev] dict.supplement() (was Re: list.shift()) In-Reply-To: <38D2AAF2.CFBF3A2@beopen.com> from Guido van Rossum at "Mar 17, 2000 5: 0:18 pm" Message-ID: Ka-Ping Yee wrote: [...] > > # pretend lists are implemented in Python and 'self' is a list > > def shift(self): > > item = self[0] > > del self[:1] > > return item [...] Guido van Rossum: > You can do this using list.pop(0). I don't think the name "shift" is very > intuitive (smells of sh and Perl :-). Do we need a new function? I think no. But what about this one?: # pretend self and dict are dictionaries: def supplement(self, dict): for k, v in dict.items(): if not self.data.has_key(k): self.data[k] = v Note the similarities to {}.update(dict), but update replaces existing entries in self, which is sometimes not desired. I know, that supplement can also simulated with: tmp = dict.copy() tmp.update(self) self.data = d But this is stll a little ugly. IMO a builtin method to supplement (complete?) a dictionary with default values from another dictionary would sometimes be a useful tool. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From ping at lfw.org Sat Mar 18 19:48:10 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sat, 18 Mar 2000 10:48:10 -0800 (PST) Subject: [Python-Dev] dict.supplement() (was Re: list.shift()) In-Reply-To: Message-ID: On Fri, 17 Mar 2000 artcom0!pf at artcom-gmbh.de wrote: > > I think no. But what about this one?: > > # pretend self and dict are dictionaries: > def supplement(self, dict): > for k, v in dict.items(): > if not self.data.has_key(k): > self.data[k] = v I'd go for that. It would be nice to have a non-overwriting update(). The only issue is the choice of verb; "supplement" sounds pretty reasonable to me. -- ?!ng "If I have not seen as far as others, it is because giants were standing on my shoulders." -- Hal Abelson From pf at artcom-gmbh.de Sat Mar 18 20:23:37 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Sat, 18 Mar 2000 20:23:37 +0100 (MET) Subject: [Python-Dev] dict.supplement() In-Reply-To: from Ka-Ping Yee at "Mar 18, 2000 10:48:10 am" Message-ID: Hi! > > # pretend self and dict are dictionaries: > > def supplement(self, dict): > > for k, v in dict.items(): > > if not self.data.has_key(k): > > self.data[k] = v Ka-Ping Yee schrieb: > I'd go for that. It would be nice to have a non-overwriting update(). > The only issue is the choice of verb; "supplement" sounds pretty > reasonable to me. In German we have the verb "erg?nzen" which translates either into "supplement" or "complete" (from my dictionary). "supplement" has the disadvantage of being rather long for the name of a builtin method. Nevertheless I've used this in my class derived from UserDict.UserDict. Now let's witch topic to the recent discussion about Set type: you all certainly know, that something similar has been done before by Aaron Watters? see: Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From gvwilson at nevex.com Mon Mar 20 15:52:12 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Mon, 20 Mar 2000 09:52:12 -0500 (EST) Subject: [Python-Dev] re: Using lists as sets Message-ID: [After discussion with Ping, and weekend thought] I would like to vote against using lists as sets: 1. It blurs Python's categorization of containers. The rest of the world thinks of sets as unordered, associative, and binary-valued (a term I just made up to mean "containing 0 or 1 instance of X"). Lists, on the other hand, are ordered, positionally-indexed, and multi-valued. While a list is always a legal queue or stack (although lists permit state transitions that are illegal for queues or stacks), most lists are not legal sets. 2. Python has, in dictionaries, a much more logical starting point for sets. A set is exactly a dictionary whose keys matter, and whose values don't. Adding operations to dictionaries to insert keys, etc., without having to supply a value, naively appears no harder than adding operations to lists, and would probably be much easier to explain when teaching a class. 3. (Long-term speculation) Even if P3K isn't written in C++, many modules for it will be. It would therefore seem sensible to design P3K in a C++-friendly way --- in particular, to align Python's container hierarchy with that used in the Standard Template Library. Using lists as a basis for sets would give Python a very different container type hierarchy than the STL, which could make it difficult for automatic tools like SWIG to map STL-based things to Python and vice versa. Using dictionaries as a basis for sets would seem to be less problematic. (Note that if Wadler et al's Generic Java proposal becomes part of that language, an STL clone will almost certainly become part of that language, and require JPython interfacing.) On a semi-related note, can someone explain why programs are not allowed to iterate directly through the elements of a dictionary: for (key, value) in dict: ...body... Thanks, Greg "No XML entities were harmed in the production of this message." From moshez at math.huji.ac.il Mon Mar 20 16:03:47 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Mon, 20 Mar 2000 17:03:47 +0200 (IST) Subject: [Python-Dev] re: Using lists as sets In-Reply-To: Message-ID: On Mon, 20 Mar 2000 gvwilson at nevex.com wrote: > [After discussion with Ping, and weekend thought] > > I would like to vote against using lists as sets: I'd like to object too, but for slightly different reasons: 20-something lines of Python can implement a set (I just chacked it) with the new __contains__. We can just suply it in the standard library (Set module?) and be over and done with. Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From jcw at equi4.com Mon Mar 20 16:37:19 2000 From: jcw at equi4.com (Jean-Claude Wippler) Date: Mon, 20 Mar 2000 16:37:19 +0100 Subject: [Python-Dev] re: Using lists as sets References: Message-ID: <38D645AF.661CA335@equi4.com> gvwilson at nevex.com wrote: > > [After discussion with Ping, and weekend thought] [good stuff] Allow me to offer yet another perspective on this. I'll keep it short. Python has sequences (indexable collections) and maps (associative collections). C++'s STL has vectors, sets, multi-sets, maps, and multi-maps. I find the distinction between these puzzling, and hereby offer another, somewhat relational-database minded, categorization as food for thought: - collections consist of objects, each of them with attributes - the first N attributes form the "key", the rest is the "residue" - there is also an implicit position attribute, which I'll call "#" - so an object consists of attributes: (K1,K2,...KN,#,R1,R2,...,RM) - one more bit of specification is needed: whether # is part of the key Let me mark the position between key attributes and residue with ":", so everything before the colon marks the uniquely identifying attributes. A vector (sequence) is: #:R1,R2,...,RM A set is: K1,K2,...KN: A multi-set is: K1,K2,...KN,#: A map is: K1,K2,...KN:#,R1,R2,...,RM A multi-map is: K1,K2,...KN,#:R1,R2,...,RM And a somewhat esoteric member of this classification: A singleton is: :R1,R2,...,RM I have no idea what this means for Python, but merely wanted to show how a relational, eh, "view" on all this might perhaps simplify the issues. -jcw From fdrake at acm.org Mon Mar 20 17:55:59 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 20 Mar 2000 11:55:59 -0500 (EST) Subject: [Python-Dev] dict.supplement() (was Re: list.shift()) In-Reply-To: References: <38D2AAF2.CFBF3A2@beopen.com> Message-ID: <14550.22559.550660.403909@weyr.cnri.reston.va.us> artcom0!pf at artcom-gmbh.de writes: > Note the similarities to {}.update(dict), but update replaces existing > entries in self, which is sometimes not desired. I know, that supplement > can also simulated with: Peter, I like this! > tmp = dict.copy() > tmp.update(self) > self.data = d I presume you mean "self.data = tmp"; "self.data.update(tmp)" would be just a little more robust, at the cost of an additional update. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From tismer at tismer.com Mon Mar 20 18:10:34 2000 From: tismer at tismer.com (Christian Tismer) Date: Mon, 20 Mar 2000 18:10:34 +0100 Subject: [Python-Dev] re: Using lists as sets References: <38D645AF.661CA335@equi4.com> Message-ID: <38D65B8A.50B81D08@tismer.com> Jean-Claude Wippler wrote: [relational notation] > A vector (sequence) is: #:R1,R2,...,RM > A set is: K1,K2,...KN: > A multi-set is: K1,K2,...KN,#: > A map is: K1,K2,...KN:#,R1,R2,...,RM > A multi-map is: K1,K2,...KN,#:R1,R2,...,RM This is a nice classification! To my understanding, why not A map is: K1,K2,...KN:R1,R2,...,RM Where is a # in a map? And what do you mean by N and M? Is K1..KN one key, mae up of N sub keys, or do you mean the whole set of keys, where each one is mapped somehow. I guess not, the notation looks like I should think of tuples. No, that would imply that N and M were fixed, but they are not. But you say "- collections consist of objects, each of them with attributes". Ok, N and M seem to be individual for each object, right? But when defining a map for instance, and we're talking of the objects, then the map is the set of these objects, and I have to think of K[0]..K(N(o)):R[0]..R(M(o)) where N and M are functions of the individual object o, right? Isn't it then better to think different of these objects, saying they can produce some key object and some value object of any shape, and a position, where each of these can be missing? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From jeremy at cnri.reston.va.us Mon Mar 20 18:28:28 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Mon, 20 Mar 2000 12:28:28 -0500 (EST) Subject: [Python-Dev] re: Using lists as sets In-Reply-To: References: Message-ID: <14550.24508.341533.908941@goon.cnri.reston.va.us> >>>>> "GVW" == gvwilson writes: GVW> On a semi-related note, can someone explain why programs are GVW> not allowed to iterate directly through the elements of a GVW> dictionary: GVW> for (key, value) in dict: ...body... Pythonic design rules #2: Explicit is better than implicit. There are at least three "natural" ways to interpret "for ... in dict:" In addition to the version that strikes you as most natural, some people also imagine that a for loop should iterate over the keys or the values. Instead of guessing, Python provides explicit methods for each possibility: items, keys, values. Yet another possibility, implemented in early versions of JPython and later removed, was to treat a dictionary exactly like a list: Call __getitem__(0), then 1, ..., until a KeyError was raised. In other words, a dictionary could behave like a list provided that it had integer keys. Jeremy From jcw at equi4.com Mon Mar 20 18:56:44 2000 From: jcw at equi4.com (Jean-Claude Wippler) Date: Mon, 20 Mar 2000 18:56:44 +0100 Subject: [Python-Dev] re: Using lists as sets References: <38D645AF.661CA335@equi4.com> <38D65B8A.50B81D08@tismer.com> Message-ID: <38D6665C.ECDE09DE@equi4.com> Christian, > A map is: K1,K2,...KN:R1,R2,...,RM Yes, my list was inconsistent. > Is K1..KN one key, made up of N sub keys, or do you mean the > whole set of keys, where each one is mapped somehow. [...] > Ok, N and M seem to be individual for each object, right? [...] > Isn't it then better to think different of these objects, saying > they can produce some key object and some value object of any > shape, and a position, where each of these can be missing? Depends on your perspective. In the relational world, the (K1,...,KN) attributes identify the object, but they are not themselves considered an object. In OO-land, (K1,...,KN) is an object, and a map takes such as an object as input and delivers (R1,...,RM) as result. This tension shows the boundary of both relational and OO models, IMO. I wish it'd be possible to unify them, but I haven't figured it out. -jcw, concept maverick / fool on the hill - pick one :) From pf at artcom-gmbh.de Mon Mar 20 19:28:17 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Mon, 20 Mar 2000 19:28:17 +0100 (MET) Subject: [Python-Dev] dict.supplement() (was Re: list.shift()) In-Reply-To: <14550.22559.550660.403909@weyr.cnri.reston.va.us> from "Fred L. Drake, Jr." at "Mar 20, 2000 11:55:59 am" Message-ID: I wrote: > > Note the similarities to {}.update(dict), but update replaces existing > > entries in self, which is sometimes not desired. I know, that supplement > > can also simulated with: > Fred L. Drake, Jr.: > Peter, > I like this! > > > tmp = dict.copy() > > tmp.update(self) > > self.data = d > > I presume you mean "self.data = tmp"; "self.data.update(tmp)" would > be just a little more robust, at the cost of an additional update. Ouppss... I should have tested this before posting. But currently I use the more explicit (and probably slower version) in my code: class ConfigDict(UserDict.UserDict): def supplement(self, defaults): for k, v in defaults.items(): if not self.data.has_key(k): self.data[k] = v Works fine so far, although it requires usually an additional copy operation. Consider another example, where arbitrary instance attributes should be specified as keyword arguments to the constructor: >>> class Example: ... _defaults = {'a': 1, 'b': 2} ... _config = _defaults ... def __init__(self, **kw): ... if kw: ... self._config = self._defaults.copy() ... self._config.update(kw) ... >>> A = Example(a=12345) >>> A._config {'b': 2, 'a': 12345} >>> B = Example(c=3) >>> B._config {'b': 2, 'c': 3, 'a': 1} If 'supplement' were a dictionary builtin method, this would become simply: kw.supplement(self._defaults) self._config = kw Unfortunately this can't be achieved using a wrapper class like UserDict, since the **kw argument is always a builtin dictionary object. Regards, Peter -- Peter Funk, Oldenburger Str.86, 27777 Ganderkesee, Tel: 04222 9502 70, Fax: -60 From ping at lfw.org Mon Mar 20 13:36:34 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 20 Mar 2000 06:36:34 -0600 (CST) Subject: [Python-Dev] dict.supplement() (was Re: list.shift()) In-Reply-To: Message-ID: On Mon, 20 Mar 2000, Peter Funk wrote: > Consider another example, where arbitrary instance attributes should be > specified as keyword arguments to the constructor: > > >>> class Example: > ... _defaults = {'a': 1, 'b': 2} > ... _config = _defaults > ... def __init__(self, **kw): > ... if kw: > ... self._config = self._defaults.copy() > ... self._config.update(kw) Yes! I do this all the time. I wrote a user-interface module to take care of exactly this kind of hassle when creating lots of UI components. When you're making UI, you can easily drown in keyword arguments and default values if you're not careful. -- ?!ng From fdrake at acm.org Mon Mar 20 20:02:48 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 20 Mar 2000 14:02:48 -0500 (EST) Subject: [Python-Dev] dict.supplement() (was Re: list.shift()) In-Reply-To: References: <14550.22559.550660.403909@weyr.cnri.reston.va.us> Message-ID: <14550.30168.129259.356581@weyr.cnri.reston.va.us> Peter Funk writes: > Ouppss... I should have tested this before posting. But currently I use > the more explicit (and probably slower version) in my code: The performance is based entirely on the size of each; in the (probably typical) case of smallish dictionaries (<50 entries), it's probably cheaper to use a temporary dict and do the update. For large dicts (on the defaults side), it may make more sense to reduce the number of objects that need to be created: target = ... has_key = target.has_key for key in defaults.keys(): if not has_key(key): target[key] = defaults[key] This saves the construction of len(defaults) 2-tuples. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From moshez at math.huji.ac.il Mon Mar 20 20:23:01 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Mon, 20 Mar 2000 21:23:01 +0200 (IST) Subject: [Python-Dev] re: Using lists as sets In-Reply-To: <14550.24508.341533.908941@goon.cnri.reston.va.us> Message-ID: On Mon, 20 Mar 2000, Jeremy Hylton wrote: > Yet another possibility, implemented in early versions of JPython and > later removed, was to treat a dictionary exactly like a list: Call > __getitem__(0), then 1, ..., until a KeyError was raised. In other > words, a dictionary could behave like a list provided that it had > integer keys. Two remarks: Jeremy meant "consecutive natural keys starting with 0", (yes, I've managed to learn mind-reading from the timbot) and that (the following is considered a misfeature): import UserDict a = UserDict.UserDict() a[0]="hello" a[1]="world" for word in a: print word Will print "hello", "world", and then die with KeyError. I realize why this is happening, and realize it could only be fixed in Py3K. However, a temporary (though not 100% backwards compatible) fix is that "for" will catch LookupError, rather then IndexError. Any comments? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mhammond at skippinet.com.au Mon Mar 20 20:39:31 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon, 20 Mar 2000 11:39:31 -0800 Subject: [Python-Dev] Unicode and Windows Message-ID: I would like to discuss Unicode on the Windows platform, and how it relates to MBCS that Windows uses. My main goal here is to ensure that Unicode on Windows can make a round-trip to and from native Unicode stores. As an example, let's take the registry - a Windows user should be able to read a Unicode value from the registry then write it back. The value written back should be _identical_ to the value read. Ditto for the file system: If the filesystem is Unicode, then I would expect the following code: for fname in os.listdir(): f = open(fname + ".tmp", "w") To create filenames on the filesystem with the exact base name even when the basename contains non-ascii characters. However, the Unicode patches do not appear to make this possible. open() uses PyArg_ParseTuple(args, "s..."); PyArg_ParseTuple() will automatically convert a Unicode object to UTF-8, so we end up passing a UTF-8 encoded string to the C runtime fopen function. The end result of all this is that we end up with UTF-8 encoded names in the registry/on the file system. It does not seem possible to get a true Unicode string onto either the file system or in the registry. Unfortunately, Im not experienced enough to know the full ramifications, but it _appears_ that on Windows the default "unicode to string" translation should be done via the WideCharToMultiByte() API. This will then pass an MBCS encoded ascii string to Windows, and the "right thing" should magically happen. Unfortunately, MBCS encoding is dependant on the current locale (ie, one MBCS sequence will mean completely different things depending on the locale). I dont see a portability issue here, as the documentation could state that "Unicode->ASCII conversions use the most appropriate conversion for the platform. If the platform is not Unicode aware, then UTF-8 will be used." This issue is the final one before I release the win32reg module. It seems _critical_ to me that if Python supports Unicode and the platform supports Unicode, then Python unicode values must be capable of being passed to the platform. For the win32reg module I could quite possibly hack around the problem, but the more general problem (categorized by the open() example above) still remains... Any thoughts? Mark. From jeremy at cnri.reston.va.us Mon Mar 20 20:51:28 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Mon, 20 Mar 2000 14:51:28 -0500 (EST) Subject: [Python-Dev] re: Using lists as sets In-Reply-To: References: <14550.24508.341533.908941@goon.cnri.reston.va.us> Message-ID: <14550.33088.110785.78631@goon.cnri.reston.va.us> >>>>> "MZ" == Moshe Zadka writes: MZ> On Mon, 20 Mar 2000, Jeremy Hylton wrote: >> Yet another possibility, implemented in early versions of JPython >> and later removed, was to treat a dictionary exactly like a list: >> Call __getitem__(0), then 1, ..., until a KeyError was raised. >> In other words, a dictionary could behave like a list provided >> that it had integer keys. MZ> Two remarks: Jeremy meant "consecutive natural keys starting MZ> with 0", (yes, I've managed to learn mind-reading from the MZ> timbot) I suppose I meant that (perhaps you can read my mind as well as I can); I also meant using values of Python's integer datatype :-). and that (the following is considered a misfeature): MZ> import UserDict MZ> a = UserDict.UserDict() MZ> a[0]="hello" MZ> a[1]="world" MZ> for word in a: print word MZ> Will print "hello", "world", and then die with KeyError. I MZ> realize why this is happening, and realize it could only be MZ> fixed in Py3K. However, a temporary (though not 100% backwards MZ> compatible) fix is that "for" will catch LookupError, rather MZ> then IndexError. I'm not sure what you mean by "fix." (Please read your mind for me .) I think by fix you mean, "allow the broken code above to execute without raising an exception." Yuck! As far as I can tell, the problem is caused by the special way that a for loop uses the __getitem__ protocol. There are two related issues that lead to confusion. In cases other than for loops, __getitem__ is invoked when the syntactic construct x[i] is used. This means either lookup in a list or in a dict depending on the type of x. If it is a list, the index must be an integer and IndexError can be raised. If it is a dict, the index can be anything (even an unhashable type; TypeError is only raised by insertion for this case) and KeyError can be raised. In a for loop, the same protocol (__getitem__) is used, but with the special convention that the object should be a sequence. Python will detect when you try to use a builtin type that is not a sequence, e.g. a dictionary. If the for loop iterates over an instance type rather than a builtin type, there is no way to check whether the __getitem__ protocol is being implemented by a sequence or a mapping. The right solution, I think, is to allow a means for stating explicitly whether a class with an __getitem__ method is a sequence or a mapping (or both?). Then UserDict can declare itself to be a mapping and using it in a for loop will raise the TypeError, "loop over non-sequence" (which has a standard meaning defined in Skip's catalog <0.8 wink>). I believe this is where types-vs.-classes meets subtyping-vs.-inheritance. I suspect that the right solution, circa Py3K, is that classes must explicitly state what types they are subtypes of or what interfaces they implement. Jeremy From moshez at math.huji.ac.il Mon Mar 20 21:13:20 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Mon, 20 Mar 2000 22:13:20 +0200 (IST) Subject: [Python-Dev] re: Using lists as sets In-Reply-To: <14550.33088.110785.78631@goon.cnri.reston.va.us> Message-ID: On Mon, 20 Mar 2000, Jeremy Hylton wrote: > I'm not sure what you mean by "fix." I mean any sane behaviour -- either failing on TypeError at the beginning, like "for" does, or executing without raising an exception. Raising an exception in the middle which is imminent is definitely (for the right values of definitely) a suprising behaviour (I know it suprised me!). >I think by fix you mean, "allow the broken code above to > execute without raising an exception." Yuck! I agree it is yucky -- it is all a weird echo of the yuckiness of the type/class dichotomy. What I suggested it a temporary patch... > As far as I can tell, the problem is caused by the special > way that a for loop uses the __getitem__ protocol. Well, my look is that it is caused by the fact __getitem__ is used both for the sequence protocol and the mapping protocol (well, I'm cheating through my teeth here, but you understand what I mean ) Agreed though, that the whole iteration protocol should be revisited -- but that is a subject for another post. > The right solution, I think, is to allow a means for stating > explicitly whether a class with an __getitem__ method is a sequence or > a mapping (or both?). And this is the fix I wanted for Py3K (details to be debated, still). See? You read my mind perfectly. > I suspect that the right solution, circa > Py3K, is that classes must explicitly state what types they are > subtypes of or what interfaces they implement. Exactly. And have subclassable built-in classes in the same fell swoop. getting-all-excited-for-py3k-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping at lfw.org Mon Mar 20 15:34:12 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 20 Mar 2000 08:34:12 -0600 (CST) Subject: [Python-Dev] Set options Message-ID: I think that at this point the possibilities for doing sets come down to four options: 1. use lists visible changes: new methods l.include, l.exclude invisible changes: faster 'in' usage: s = [1, 2], s.include(3), s.exclude(3), if item in s, for item in s 2. use dicts visible changes: for/if x in dict means keys accept dicts without values (e.g. {1, 2}) new special non-printing value ": Present" new method d.insert(x) means d[x] = Present invisible changes: none usage: s = {1, 2}, s.insert(3), del s[3], if item in s, for item in s 3. new type visible changes: set() built-in new with methods .insert, .remove invisible changes: none usage: s = set(1, 2), s.insert(3), s.remove(3) if item in s, for item in s 4. do nothing visible changes: none invisible changes: none usage: s = {1: 1, 2: 1}, s[3] = 1, del s[3], if s.has_key(item), for item in s.keys() Let me say a couple of things about #1 and #2. I'm happy with both. I quite like the idea of using dicts this way (#2), in fact -- i think it was the first idea i remember chatting about. If i remember correctly, Guido's objection to #2 was that "in" on a dictionary would work on the keys, which isn't consistent with the fact that "in" on a list works on the values. However, this doesn't really bother me at all. It's a very simple rule, especially when you think of how people understand dictionaries. If you hand someone a *real* dictionary, and ask them Is the word "python" in the dictionary? they'll go look up "python" in the *keys* of the dictionary (the words), not the values (the definitions). So i'm quite all right with saying for x in dict: and having that loop over the keys, or saying if x in dict: and having that check whether x is a valid key. It makes perfect sense to me. My main issue with #2 was that sets would print like {"Alice": 1, "Bob": 1, "Ted": 1} and this would look weird. However, as Greg explained to me, it would be possible to introduce a default value to go with set members that just says "i'm here", such as 'Present' (read as: "Alice" is present in the set) or 'Member' or even 'None', and this value wouldn't print out -- thus s = {"Bob"} s.include("Alice") print s would produce {"Alice", "Bob"} representing a dictionary that actually contained {"Alice": Present, "Bob": Present} You'd construct set constants like this too: {2, 4, 7} Using dicts this way (rather than having a separate set type that just happened to be spelled with {}) avoids the parsing issue: no need for look-ahead; you just toss in "Present" when the text doesn't supply a colon, and move on. I'd be okay with this, though i'm not sure everyone would; and together with Guido's initial objection, that's what motivated me to propose the lists-as-sets thing: fewer changes all around, no ambiguities introduced -- just two new methods, and we're done. Hmm. I know someone who's just learning Python. I will attempt to ask some questions about what she would find natural, and see if that reveals anything interesting. -- ?!ng From bwarsaw at cnri.reston.va.us Mon Mar 20 23:01:00 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Mon, 20 Mar 2000 17:01:00 -0500 (EST) Subject: [Python-Dev] re: Using lists as sets References: <14550.24508.341533.908941@goon.cnri.reston.va.us> <14550.33088.110785.78631@goon.cnri.reston.va.us> Message-ID: <14550.40860.72418.648591@anthem.cnri.reston.va.us> >>>>> "JH" == Jeremy Hylton writes: JH> As far as I can tell, the problem is caused by the special way JH> that a for loop uses the __getitem__ protocol. There are two JH> related issues that lead to confusion. >>>>> "MZ" == Moshe Zadka writes: MZ> Well, my look is that it is caused by the fact __getitem__ is MZ> used both for the sequence protocol and the mapping protocol Right. MZ> Agreed though, that the whole iteration protocol should be MZ> revisited -- but that is a subject for another post. Yup. JH> The right solution, I think, is to allow a means for stating JH> explicitly whether a class with an __getitem__ method is a JH> sequence or a mapping (or both?). Or should the two protocol use different method names (code breakage!). JH> I believe this is where types-vs.-classes meets JH> subtyping-vs.-inheritance. meets protocols-vs.-interfaces. From moshez at math.huji.ac.il Tue Mar 21 06:16:00 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 21 Mar 2000 07:16:00 +0200 (IST) Subject: [Python-Dev] re: Using lists as sets In-Reply-To: <14550.40860.72418.648591@anthem.cnri.reston.va.us> Message-ID: On Mon, 20 Mar 2000, Barry A. Warsaw wrote: > MZ> Agreed though, that the whole iteration protocol should be > MZ> revisited -- but that is a subject for another post. > > Yup. (Go Stackless, go!?) > JH> I believe this is where types-vs.-classes meets > JH> subtyping-vs.-inheritance. > > meets protocols-vs.-interfaces. It took me 5 minutes of intensive thinking just to understand what Barry meant. Just wait until we introduce Sather-like "supertypes" (which are pretty Pythonic, IMHO) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Tue Mar 21 06:21:24 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 21 Mar 2000 07:21:24 +0200 (IST) Subject: [Python-Dev] Set options In-Reply-To: Message-ID: On Mon, 20 Mar 2000, Ka-Ping Yee wrote: > I think that at this point the possibilities for doing sets > come down to four options: > > > 1. use lists > 2. use dicts > 3. new type > 4. do nothing 5. new Python module with a class "Set" (The issues are similar to #3, but this has the advantage of not changing the interpreter) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal at lemburg.com Tue Mar 21 01:25:09 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 21 Mar 2000 01:25:09 +0100 Subject: [Python-Dev] Unicode and Windows References: Message-ID: <38D6C165.EEF58232@lemburg.com> Mark Hammond wrote: > > I would like to discuss Unicode on the Windows platform, and how it relates > to MBCS that Windows uses. > > My main goal here is to ensure that Unicode on Windows can make a round-trip > to and from native Unicode stores. As an example, let's take the registry - > a Windows user should be able to read a Unicode value from the registry then > write it back. The value written back should be _identical_ to the value > read. Ditto for the file system: If the filesystem is Unicode, then I would > expect the following code: > for fname in os.listdir(): > f = open(fname + ".tmp", "w") > > To create filenames on the filesystem with the exact base name even when the > basename contains non-ascii characters. > > However, the Unicode patches do not appear to make this possible. open() > uses PyArg_ParseTuple(args, "s..."); PyArg_ParseTuple() will automatically > convert a Unicode object to UTF-8, so we end up passing a UTF-8 encoded > string to the C runtime fopen function. Right. The idea with open() was to write a special version (using #ifdefs) for use on Windows platforms which does all the needed magic to convert Unicode to whatever the native format and locale is... Using parser markers for this is obviously *not* the right way to get to the core of the problem. Basically, you will have to write a helper which takes a string, Unicode or some other "t" compatible object as name object and then converts it to the system's view of things. I think we had a private discussion about this a few months ago: there was some way to convert Unicode to a platform independent format which then got converted to MBCS -- don't remember the details though. > The end result of all this is that we end up with UTF-8 encoded names in the > registry/on the file system. It does not seem possible to get a true > Unicode string onto either the file system or in the registry. > > Unfortunately, Im not experienced enough to know the full ramifications, but > it _appears_ that on Windows the default "unicode to string" translation > should be done via the WideCharToMultiByte() API. This will then pass an > MBCS encoded ascii string to Windows, and the "right thing" should magically > happen. Unfortunately, MBCS encoding is dependant on the current locale > (ie, one MBCS sequence will mean completely different things depending on > the locale). I dont see a portability issue here, as the documentation > could state that "Unicode->ASCII conversions use the most appropriate > conversion for the platform. If the platform is not Unicode aware, then > UTF-8 will be used." No, no, no... :-) The default should be (and is) UTF-8 on all platforms -- whether the platform supports Unicode or not. If a platform uses a different encoding, an encoder should be used which applies the needed transformation. > This issue is the final one before I release the win32reg module. It seems > _critical_ to me that if Python supports Unicode and the platform supports > Unicode, then Python unicode values must be capable of being passed to the > platform. For the win32reg module I could quite possibly hack around the > problem, but the more general problem (categorized by the open() example > above) still remains... > > Any thoughts? Can't you use the wchar_t interfaces for the task (see the unicodeobject.h file for details) ? Perhaps you can first transfer Unicode to wchar_t and then on to MBCS using a win32 API ?! -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Tue Mar 21 10:27:56 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 21 Mar 2000 10:27:56 +0100 Subject: [Python-Dev] Set options References: Message-ID: <38D7409C.169B0C42@lemburg.com> Moshe Zadka wrote: > > On Mon, 20 Mar 2000, Ka-Ping Yee wrote: > > > I think that at this point the possibilities for doing sets > > come down to four options: > > > > > > 1. use lists > > 2. use dicts > > 3. new type > > 4. do nothing > > 5. new Python module with a class "Set" > (The issues are similar to #3, but this has the advantage of not changing > the interpreter) Perhaps someone could take Aaron's kjbuckets and write a Python emulation for it (I think he's even already done something like this for gadfly). Then the emulation could go into the core and if people want speed they can install his extension (the emulation would have to detect this and use the real thing then). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jack at oratrix.nl Tue Mar 21 12:54:30 2000 From: jack at oratrix.nl (Jack Jansen) Date: Tue, 21 Mar 2000 12:54:30 +0100 Subject: [Python-Dev] Unicode and Windows In-Reply-To: Message by "M.-A. Lemburg" , Tue, 21 Mar 2000 01:25:09 +0100 , <38D6C165.EEF58232@lemburg.com> Message-ID: <20000321115430.88A11370CF2@snelboot.oratrix.nl> I guess we need another format specifier than "s" here. "s" does the conversion to standard-python-utf8 for wide strings, and we'd need another format for conversion to current-local-os-convention-8-bit-encoding-of-unicode- strings. I assume that that would also come in handy for MacOS, where we'll have the same problem (filenames are in Apple's proprietary 8bit encoding). -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mal at lemburg.com Tue Mar 21 13:14:54 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 21 Mar 2000 13:14:54 +0100 Subject: [Python-Dev] Unicode and Windows References: <20000321115430.88A11370CF2@snelboot.oratrix.nl> Message-ID: <38D767BE.C45F8286@lemburg.com> Jack Jansen wrote: > > I guess we need another format specifier than "s" here. "s" does the > conversion to standard-python-utf8 for wide strings, Actually, "t" does the UTF-8 conversion... "s" will give you the raw internal UTF-16 representation in platform byte order. > and we'd need another > format for conversion to current-local-os-convention-8-bit-encoding-of-unicode- > strings. I'd suggest adding some king of generic PyOS_FilenameFromObject(PyObject *v, void *buffer, int buffer_len) API for the conversion of strings, Unicode and text buffers to an OS dependent filename buffer. And/or perhaps sepcific APIs for each OS... e.g. PyOS_MBCSFromObject() (only on WinXX) PyOS_AppleFromObject() (only on Mac ;) > I assume that that would also come in handy for MacOS, where we'll have the > same problem (filenames are in Apple's proprietary 8bit encoding). Is that encoding already supported by the encodings package ? If not, could you point me to a map file for the encoding ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake at acm.org Tue Mar 21 15:56:47 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 21 Mar 2000 09:56:47 -0500 (EST) Subject: [Python-Dev] Unicode and Windows In-Reply-To: <38D767BE.C45F8286@lemburg.com> References: <20000321115430.88A11370CF2@snelboot.oratrix.nl> <38D767BE.C45F8286@lemburg.com> Message-ID: <14551.36271.33825.841965@weyr.cnri.reston.va.us> M.-A. Lemburg writes: > And/or perhaps sepcific APIs for each OS... e.g. > > PyOS_MBCSFromObject() (only on WinXX) > PyOS_AppleFromObject() (only on Mac ;) Another approach may be to add some format modifiers: te -- text in an encoding specified by a C string (somewhat similar to O&) tE -- text, encoding specified by a Python object (probably a string passed as a parameter or stored from some other call) (I'd prefer the [eE] before the t, but the O modifiers follow, so consistency requires this ugly construct.) This brings up the issue of using a hidden conversion function which may create a new object that needs the same lifetime guarantees as the real parameters; we discussed this issue a month or two ago. Somewhere, there's a call context that includes the actual parameter tuple. PyArg_ParseTuple() could have access to a "scratch" area where it could place objects constructed during parameter parsing. This area could just be a hidden tuple. When the C call returns, the scratch area can be discarded. The difficulty is in giving PyArg_ParseTuple() access to the scratch area, but I don't know how hard that would be off the top of my head. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From jeremy at cnri.reston.va.us Tue Mar 21 18:14:07 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Tue, 21 Mar 2000 12:14:07 -0500 (EST) Subject: [Python-Dev] Set options In-Reply-To: <38D7409C.169B0C42@lemburg.com> References: <38D7409C.169B0C42@lemburg.com> Message-ID: <14551.44511.805860.808811@goon.cnri.reston.va.us> >>>>> "MAL" == M -A Lemburg writes: MAL> Perhaps someone could take Aaron's kjbuckets and write a Python MAL> emulation for it (I think he's even already done something like MAL> this for gadfly). Then the emulation could go into the core and MAL> if people want speed they can install his extension (the MAL> emulation would have to detect this and use the real thing MAL> then). I've been waiting for Tim Peters to say something about sets, but I'll chime in with what I recall him saying last time a discussion like this came up on c.l.py. (I may misremember, in which case I'll at least draw him into the discussion in order to correct me <0.5 wink>.) The problem with a set module is that there are a number of different ways to implement them -- in C using kjbuckets is one example. Each approach is appropriate for some applications, but not for every one. A set is pretty simple to build from a list or a dictionary, so we leave it to application writers to write the one that is appropriate for their application. Jeremy From skip at mojam.com Tue Mar 21 18:25:57 2000 From: skip at mojam.com (Skip Montanaro) Date: Tue, 21 Mar 2000 11:25:57 -0600 (CST) Subject: [Python-Dev] Set options In-Reply-To: <38D7409C.169B0C42@lemburg.com> References: <38D7409C.169B0C42@lemburg.com> Message-ID: <14551.45221.447838.534003@beluga.mojam.com> Marc> Perhaps someone could take Aaron's kjbuckets and write a Python Marc> emulation for it ... Any reason why kjbuckets and friends have never been placed in the core? If, as it seems from the discussion, a set type is a good thing to add to the core, it seems to me that Aaron's code would be a good candidate implementation/foundation. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From bwarsaw at cnri.reston.va.us Tue Mar 21 18:47:49 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 21 Mar 2000 12:47:49 -0500 (EST) Subject: [Python-Dev] Set options References: <38D7409C.169B0C42@lemburg.com> <14551.45221.447838.534003@beluga.mojam.com> Message-ID: <14551.46533.918688.13801@anthem.cnri.reston.va.us> >>>>> "SM" == Skip Montanaro writes: SM> Any reason why kjbuckets and friends have never been placed in SM> the core? If, as it seems from the discussion, a set type is SM> a good thing to add to the core, it seems to me that Aaron's SM> code would be a good candidate implementation/foundation. It would seem to me that distutils is a better way to go for kjbuckets. The core already has basic sets (via dictionaries). We're pretty much just quibbling about efficiency, API, and syntax, aren't we? -Barry From mhammond at skippinet.com.au Tue Mar 21 18:48:06 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue, 21 Mar 2000 09:48:06 -0800 Subject: [Python-Dev] Unicode and Windows In-Reply-To: <38D6C165.EEF58232@lemburg.com> Message-ID: > > Right. The idea with open() was to write a special version (using > #ifdefs) for use on Windows platforms which does all the needed > magic to convert Unicode to whatever the native format and locale > is... That works for open() - but what about other extension modules? This seems to imply that any Python extension on Windows that wants to pass a Unicode string to an external function can not use PyArg_ParseTuple() with anything other than "O", and perform the magic themselves. This just seems a little back-to-front to me. Platforms that have _no_ native Unicode support have useful utilities for working with Unicode. Platforms that _do_ have native Unicode support can not make use of these utilities. Is this by design, or simply a sad side-effect of the design? So - it is trivial to use Unicode on platforms that dont support it, but quite difficult on platforms that do. > Using parser markers for this is obviously *not* the right way > to get to the core of the problem. Basically, you will have to > write a helper which takes a string, Unicode or some other > "t" compatible object as name object and then converts it to > the system's view of things. Why "obviously"? What on earth does the existing mechamism buy me on Windows, other than grief that I can not use it? > I think we had a private discussion about this a few months ago: > there was some way to convert Unicode to a platform independent > format which then got converted to MBCS -- don't remember the details > though. There is a Win32 API function for this. However, as you succinctly pointed out, not many people are going to be aware of its name, or how to use the multitude of flags offered by these conversion functions, or know how to deal with the memory management, etc. > Can't you use the wchar_t interfaces for the task (see > the unicodeobject.h file for details) ? Perhaps you can > first transfer Unicode to wchar_t and then on to MBCS > using a win32 API ?! Sure - I can. But can everyone who writes interfaces to Unicode functions? You wrote the Python Unicode support but dont know its name - pity the poor Joe Average trying to write an extension. It seems to me that, on Windows, the Python Unicode support as it stands is really internal. I can not think of a single time that an extension writer on Windows would ever want to use the "t" markers - am I missing something? I dont believe that a single Unicode-aware function in the Windows extensions (of which there are _many_) could be changed to use the "t" markers. It still seems to me that the Unicode support works well on platforms with no Unicode support, and is fairly useless on platforms with the support. I dont believe that any extension on Windows would want to use the "t" marker - so, as Fred suggested, how about providing something for us that can help us interface to the platform's Unicode? This is getting too hard for me - I will release my windows registry module without Unicode support, and hope that in the future someone cares enough to address it, and to add a large number of LOC that will be needed simply to get Unicode talking to Unicode... Mark. From skip at mojam.com Tue Mar 21 19:04:11 2000 From: skip at mojam.com (Skip Montanaro) Date: Tue, 21 Mar 2000 12:04:11 -0600 (CST) Subject: [Python-Dev] Set options In-Reply-To: <14551.46533.918688.13801@anthem.cnri.reston.va.us> References: <38D7409C.169B0C42@lemburg.com> <14551.45221.447838.534003@beluga.mojam.com> <14551.46533.918688.13801@anthem.cnri.reston.va.us> Message-ID: <14551.47515.648064.969034@beluga.mojam.com> BAW> It would seem to me that distutils is a better way to go for BAW> kjbuckets. The core already has basic sets (via dictionaries). BAW> We're pretty much just quibbling about efficiency, API, and syntax, BAW> aren't we? Yes (though I would quibble with your use of the word "quibbling" ;-). If new syntax is in the offing as some have proposed, why not go for a more efficient implementation at the same time? I believe Aaron has maintained that kjbuckets is generally more efficient than Python's dictionary object. Skip From mal at lemburg.com Tue Mar 21 18:44:11 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 21 Mar 2000 18:44:11 +0100 Subject: [Python-Dev] Unicode and Windows References: <20000321115430.88A11370CF2@snelboot.oratrix.nl> <38D767BE.C45F8286@lemburg.com> <14551.36271.33825.841965@weyr.cnri.reston.va.us> Message-ID: <38D7B4EB.66DAEBF3@lemburg.com> "Fred L. Drake, Jr." wrote: > > M.-A. Lemburg writes: > > And/or perhaps sepcific APIs for each OS... e.g. > > > > PyOS_MBCSFromObject() (only on WinXX) > > PyOS_AppleFromObject() (only on Mac ;) > > Another approach may be to add some format modifiers: > > te -- text in an encoding specified by a C string (somewhat > similar to O&) > tE -- text, encoding specified by a Python object (probably a > string passed as a parameter or stored from some other > call) > > (I'd prefer the [eE] before the t, but the O modifiers follow, so > consistency requires this ugly construct.) > This brings up the issue of using a hidden conversion function which > may create a new object that needs the same lifetime guarantees as the > real parameters; we discussed this issue a month or two ago. > Somewhere, there's a call context that includes the actual parameter > tuple. PyArg_ParseTuple() could have access to a "scratch" area where > it could place objects constructed during parameter parsing. This > area could just be a hidden tuple. When the C call returns, the > scratch area can be discarded. > The difficulty is in giving PyArg_ParseTuple() access to the scratch > area, but I don't know how hard that would be off the top of my head. Some time ago, I considered adding "U+" with builtin auto-conversion to the tuple parser... after some discussion about the error handling issues involved with this I quickly dropped that idea again and used the standard "O" approach plus a call to a helper function which then applied the conversion. (Note the "+" behind "U": this was intended to indicate that the returned object has had the refcount incremented and that the caller must take care of decrementing it again.) The "O" + helper approach is a little clumsy, but works just fine. Plus it doesn't add any more overhead to the already convoluted PyArg_ParseTuple(). BTW, what other external char formats are we talking about ? E.g. how do you handle MBCS or DBCS under WinXX ? Are there routines to have wchar_t buffers converted into the two ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gmcm at hypernet.com Tue Mar 21 19:25:43 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Tue, 21 Mar 2000 13:25:43 -0500 Subject: [Python-Dev] Set options In-Reply-To: <14551.44511.805860.808811@goon.cnri.reston.va.us> References: <38D7409C.169B0C42@lemburg.com> Message-ID: <1258459347-36172889@hypernet.com> Jeremy wrote: > The problem with a set module is that there are a number of different > ways to implement them -- in C using kjbuckets is one example. Nah. Sets are pretty unambiguous. They're also easy, and boring. The interesting stuff is graphs and operations like composition, closure and transpositions. That's also where stuff gets ambiguous. E.g., what's the right behavior when you invert {'a':1,'b':1}? Hint: any answer you give will be met by the wrath of God. I would love this stuff, and as a faithful worshipper of Our Lady of Corrugated Ironism, I could probably live with whatever rules are arrived at; but I'm afraid I would have to considerably enlarge my kill file. - Gordon From gstein at lyra.org Tue Mar 21 19:40:20 2000 From: gstein at lyra.org (Greg Stein) Date: Tue, 21 Mar 2000 10:40:20 -0800 (PST) Subject: [Python-Dev] Set options In-Reply-To: <14551.44511.805860.808811@goon.cnri.reston.va.us> Message-ID: On Tue, 21 Mar 2000, Jeremy Hylton wrote: > >>>>> "MAL" == M -A Lemburg writes: > MAL> Perhaps someone could take Aaron's kjbuckets and write a Python > MAL> emulation for it (I think he's even already done something like > MAL> this for gadfly). Then the emulation could go into the core and > MAL> if people want speed they can install his extension (the > MAL> emulation would have to detect this and use the real thing > MAL> then). > > I've been waiting for Tim Peters to say something about sets, but I'll > chime in with what I recall him saying last time a discussion like > this came up on c.l.py. (I may misremember, in which case I'll at > least draw him into the discussion in order to correct me <0.5 wink>.) > > The problem with a set module is that there are a number of different > ways to implement them -- in C using kjbuckets is one example. Each > approach is appropriate for some applications, but not for every one. > A set is pretty simple to build from a list or a dictionary, so we > leave it to application writers to write the one that is appropriate > for their application. Yah... +1 on what Jeremy said. Leave them out of the distro since we can't do them Right for all people. Cheers, -g -- Greg Stein, http://www.lyra.org/ From moshez at math.huji.ac.il Tue Mar 21 19:34:56 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 21 Mar 2000 20:34:56 +0200 (IST) Subject: [Python-Dev] Set options In-Reply-To: <14551.47515.648064.969034@beluga.mojam.com> Message-ID: On Tue, 21 Mar 2000, Skip Montanaro wrote: > BAW> It would seem to me that distutils is a better way to go for > BAW> kjbuckets. The core already has basic sets (via dictionaries). > BAW> We're pretty much just quibbling about efficiency, API, and syntax, > BAW> aren't we? > > If new syntax is in the offing as some have proposed, FWIW, I'm against new syntax. The core-language has changed quite a lot between 1.5.2 and 1.6 -- * strings have grown methods * there are unicode strings * "in" operator overloadable The second change even includes a syntax change (u"some string") whose variants I'm still not familiar enough to comment on (ru"some\string"? ur"some\string"? Both legal?). I feel too many changes destabilize the language (this might seem a bit extreme, considering I pushed towards one of the changes), and we should try to improve on things other then the core -- one of these is a more hierarchical standard library, and a standard distribution mechanism, to rival CPAN -- then anyone could import data.sets.kjbuckets With only a trivial >>> import dist >>> dist.install("data.sets.kjbuckets") > why not go for a more efficient implementation at the same time? Because Python dicts are "pretty efficient", and it is not a trivial question to check optimiality in this area: tests can be rigged to prove almost anything with the right test-cases, and there's no promise we'll choose the "right ones". -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Tue Mar 21 19:38:02 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 21 Mar 2000 20:38:02 +0200 (IST) Subject: [Python-Dev] Set options In-Reply-To: <1258459347-36172889@hypernet.com> Message-ID: On Tue, 21 Mar 2000, Gordon McMillan wrote: > E.g., what's the right behavior when you > invert {'a':1,'b':1}? Hint: any answer you give will be met by the > wrath of God. Isn't "wrath of God" translated into Python is "an exception"? raise ValueError("dictionary is not 1-1") seems fine to me. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From skip at mojam.com Tue Mar 21 19:42:55 2000 From: skip at mojam.com (Skip Montanaro) Date: Tue, 21 Mar 2000 12:42:55 -0600 (CST) Subject: [Python-Dev] Set options In-Reply-To: References: <14551.47515.648064.969034@beluga.mojam.com> Message-ID: <14551.49839.377385.99637@beluga.mojam.com> Skip> If new syntax is in the offing as some have proposed, Moshe> FWIW, I'm against new syntax. The core-language has changed quite Moshe> a lot between 1.5.2 and 1.6 -- I thought we were talking about Py3K, where syntax changes are somewhat more expected. Just to make things clear, the syntax change I was referring to was the value-less dict syntax that someone proposed a few days ago: myset = {"a", "b", "c"} Note that I wasn't necessarily supporting the proposal, only acknowledging that it had been made. In general, I think we need to keep straight where people feel various proposals are going to fit. When a thread goes for more than a few messages it's easy to forget. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From ping at lfw.org Tue Mar 21 14:07:51 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 21 Mar 2000 07:07:51 -0600 (CST) Subject: [Python-Dev] Set options In-Reply-To: <14551.46533.918688.13801@anthem.cnri.reston.va.us> Message-ID: Jeremy Hylton wrote: > The problem with a set module is that there are a number of different > ways to implement them -- in C using kjbuckets is one example. Each > approach is appropriate for some applications, but not for every one. For me, anyway, this is not about trying to engineer a universally perfect solution into Python -- it's about providing some simple, basic, easy-to-understand functionality that takes care of the common case. For example, dictionaries are simple, their workings are easy enough to understand, and they aren't written to efficiently support things like inversion and composition because most of the time no one needs to do these things. The same holds true for sets. All i would want is something i can put things into, and take things out of, and ask about what's inside. Barry Warsaw wrote: > It would seem to me that distutils is a better way to go for > kjbuckets. The core already has basic sets (via dictionaries). We're > pretty much just quibbling about efficiency, API, and syntax, aren't we? Efficiency: Hashtables have proven quite adequate for dicts, so i think they're quite adequate for sets. API and syntax: I believe the goal is obvious, because Python already has very nice notation ("in", "not in") -- it just doesn't work quite the way one would want. It works semantically right on lists, but they're a little slow. It doesn't work on dicts, but we can make it so. Here is where my "explanation metric" comes into play. How much additional explaining do you have to do in each case to answer the question "what do i do when i need a set"? 1. Use lists. Explain that "include()" means "append if not already present", and "exclude()" means "remove if present". You are done. 2. Use dicts. Explain that "for x in dict" iterates over the keys, and "if x in dict" looks for a key. Explain what happens when you write "{1, 2, 3}", and the special non-printing value constant. Explain how to add elements to a set and remove elements from a set. 3. Create a new type. Explain that there exists another type "set" with methods "insert" and "remove". Explain how to construct sets. Explain how "in" and "not in" work, where this type fits in with the other types, and when to choose this type over other types. 4. Do nothing. Explain that dictionaries can be used as sets if you assign keys a dummy value, use "del" to remove keys, iterate over "dict.keys()", and use "dict.has_key()" to test membership. This is what motivated my proposal for using lists: it requires by far the least explanation. This is no surprise because a lot of things about lists have been explained already. My preference in terms of elegance is about equal for 1, 2, 3, with 4 distinctly behind; but my subjective ranking of "explanation complexity" (as in "how to get there from here") is 1 < 4 < 3 < 2. -- ?!ng From tismer at tismer.com Tue Mar 21 21:13:38 2000 From: tismer at tismer.com (Christian Tismer) Date: Tue, 21 Mar 2000 21:13:38 +0100 Subject: [Python-Dev] Unicode Database Compression Message-ID: <38D7D7F2.14A2FBB5@tismer.com> Hi, I have spent the last four days on compressing the Unicode database. With little decoding effort, I can bring the data down to 25kb. This would still be very fast, since codes are randomly accessible, although there are some simple shifts and masks. With a bit more effort, this can be squeezed down to 15kb by some more aggressive techniques like common prefix elimination. Speed would be *slightly* worse, since a small loop (average 8 cycles) is performed to obtain a character from a packed nybble. This is just all the data which is in Marc's unicodedatabase.c file. I checked efficiency by creating a delimited file like the original database text file with only these columns and ran PkZip over it. The result was 40kb. This says that I found a lot of correlations which automatic compressors cannot see. Now, before generating the final C code, I'd like to ask some questions: What is more desirable: Low compression and blinding speed? Or high compression and less speed, since we always want to unpack a whole code page? Then, what about the other database columns? There are a couple of extra atrributes which I find coded as switch statements elsewhere. Should I try to pack these codes into my squeezy database, too? And last: There are also two quite elaborated columns with textual descriptions of the codes (the uppercase blah version of character x). Do we want these at all? And if so, should I try to compress them as well? Should these perhaps go into a different source file as a dynamic module, since they will not be used so often? waiting for directives - ly y'rs - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From moshez at math.huji.ac.il Wed Mar 22 06:44:00 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 22 Mar 2000 07:44:00 +0200 (IST) Subject: [1.x] Re: [Python-Dev] Set options In-Reply-To: <14551.49839.377385.99637@beluga.mojam.com> Message-ID: On Tue, 21 Mar 2000, Skip Montanaro wrote: > Skip> If new syntax is in the offing as some have proposed, > > Moshe> FWIW, I'm against new syntax. The core-language has changed quite > Moshe> a lot between 1.5.2 and 1.6 -- > > I thought we were talking about Py3K My argument was strictly a 1.x argument. I'm hoping to get sets it in 1.7 or 1.8. > In general, I think we need to keep straight where people feel various > proposals are going to fit. You're right. I'll start prefixing my posts accordingally. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal at lemburg.com Wed Mar 22 11:11:25 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 22 Mar 2000 11:11:25 +0100 Subject: [Python-Dev] Re: Unicode Database Compression References: <38D7D7F2.14A2FBB5@tismer.com> Message-ID: <38D89C4D.370C19D@lemburg.com> Christian Tismer wrote: > > Hi, > > I have spent the last four days on compressing the > Unicode database. Cool :-) > With little decoding effort, I can bring the data down to 25kb. > This would still be very fast, since codes are randomly > accessible, although there are some simple shifts and masks. > > With a bit more effort, this can be squeezed down to 15kb > by some more aggressive techniques like common prefix > elimination. Speed would be *slightly* worse, since a small > loop (average 8 cycles) is performed to obtain a character > from a packed nybble. > > This is just all the data which is in Marc's unicodedatabase.c > file. I checked efficiency by creating a delimited file like > the original database text file with only these columns and > ran PkZip over it. The result was 40kb. This says that I found > a lot of correlations which automatic compressors cannot see. Not bad ;-) > Now, before generating the final C code, I'd like to ask some > questions: > > What is more desirable: Low compression and blinding speed? > Or high compression and less speed, since we always want to > unpack a whole code page? I'd say high speed and less compression. The reason is that the Asian codecs will need fast access to the database. With their large mapping tables size the few more kB don't hurt, I guess. > Then, what about the other database columns? > There are a couple of extra atrributes which I find coded > as switch statements elsewhere. Should I try to pack these > codes into my squeezy database, too? You basically only need to provide the APIs (and columns) defined in the unicodedata Python API, e.g. the character description column is not needed. > And last: There are also two quite elaborated columns with > textual descriptions of the codes (the uppercase blah version > of character x). Do we want these at all? And if so, should > I try to compress them as well? Should these perhaps go > into a different source file as a dynamic module, since they > will not be used so often? I guess you are talking about the "Unicode 1.0 Name" and the "10646 comment field" -- see above, there's no need to include these descriptions in the database... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Mar 22 12:04:32 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 22 Mar 2000 12:04:32 +0100 Subject: [Python-Dev] Unicode and Windows References: Message-ID: <38D8A8C0.66123F2C@lemburg.com> Mark Hammond wrote: > > > > > Right. The idea with open() was to write a special version (using > > #ifdefs) for use on Windows platforms which does all the needed > > magic to convert Unicode to whatever the native format and locale > > is... > > That works for open() - but what about other extension modules? > > This seems to imply that any Python extension on Windows that wants to pass > a Unicode string to an external function can not use PyArg_ParseTuple() with > anything other than "O", and perform the magic themselves. > > This just seems a little back-to-front to me. Platforms that have _no_ > native Unicode support have useful utilities for working with Unicode. > Platforms that _do_ have native Unicode support can not make use of these > utilities. Is this by design, or simply a sad side-effect of the design? > > So - it is trivial to use Unicode on platforms that dont support it, but > quite difficult on platforms that do. The problem is that Windows seems to use a completely different internal Unicode format than most of the rest of the world. As I've commented on in a different post, the only way to have PyArg_ParseTuple() perform auto-conversion is by allowing it to return objects which are garbage collected by the caller. The problem with this is error handling, since PyArg_ParseTuple() will have to keep track of all objects it created until the call returns successfully. An alternative approach is sketched below. Note that *all* platforms will have to use this approach... not only Windows or other platforms with Unicode support. > > Using parser markers for this is obviously *not* the right way > > to get to the core of the problem. Basically, you will have to > > write a helper which takes a string, Unicode or some other > > "t" compatible object as name object and then converts it to > > the system's view of things. > > Why "obviously"? What on earth does the existing mechamism buy me on > Windows, other than grief that I can not use it? Sure, you can :-) Just fetch the object, coerce it to Unicode and then encode it according to your platform needs (PyUnicode_FromObject() takes care of the coercion part for you). > > I think we had a private discussion about this a few months ago: > > there was some way to convert Unicode to a platform independent > > format which then got converted to MBCS -- don't remember the details > > though. > > There is a Win32 API function for this. However, as you succinctly pointed > out, not many people are going to be aware of its name, or how to use the > multitude of flags offered by these conversion functions, or know how to > deal with the memory management, etc. > > > Can't you use the wchar_t interfaces for the task (see > > the unicodeobject.h file for details) ? Perhaps you can > > first transfer Unicode to wchar_t and then on to MBCS > > using a win32 API ?! > > Sure - I can. But can everyone who writes interfaces to Unicode functions? > You wrote the Python Unicode support but dont know its name - pity the poor > Joe Average trying to write an extension. Hey, Mark... I'm not a Windows geek. How can I know which APIs are available and which of them to use ? And that's my point: add conversion APIs and codecs for the different OSes which make the extension writer life easier. > It seems to me that, on Windows, the Python Unicode support as it stands is > really internal. I can not think of a single time that an extension writer > on Windows would ever want to use the "t" markers - am I missing something? > I dont believe that a single Unicode-aware function in the Windows > extensions (of which there are _many_) could be changed to use the "t" > markers. "t" is intended to return a text representation of a buffer interface aware type... this happens to be UTF-8 for Unicode objects -- what other encoding would you have expected ? > It still seems to me that the Unicode support works well on platforms with > no Unicode support, and is fairly useless on platforms with the support. I > dont believe that any extension on Windows would want to use the "t" > marker - so, as Fred suggested, how about providing something for us that > can help us interface to the platform's Unicode? That's exactly what I'm talking about all the time... there currently are PyUnicode_AsWideChar() and PyUnicode_FromWideChar() to interface to the compiler's wchar_t type. I have no problem adding more of these APIs for the various OSes -- but they would have to be coded by someone with Unicode skills on each of those platforms, e.g. PyUnicode_AsMBCS() and PyUnicode_FromMBCS() on Windows. > This is getting too hard for me - I will release my windows registry module > without Unicode support, and hope that in the future someone cares enough to > address it, and to add a large number of LOC that will be needed simply to > get Unicode talking to Unicode... I think you're getting this wrong: I'm not argueing against adding better support for Windows. The only way I can think of using parser markers in this context would be by having PyArg_ParseTuple() *copy* data into a given data buffer rather than only passing a reference to it. This would enable PyArg_ParseTuple() to apply whatever conversion is needed while still keeping the temporary objects internal. Hmm, sketching a little: "es#",&encoding,&buffer,&buffer_len -- could mean: coerce the object to Unicode, then encode it using the given encoding and then copy at most buffer_len bytes of data into buffer and update buffer_len to the number of bytes copied This costs some cycles for copying data, but gets rid off the problems involved in cleaning up after errors. The caller will have to ensure that the buffer is large enough and that the encoding fits the application's needs. Error handling will be poor since the caller can't take any action other than to pass on the error generated by PyArg_ParseTuple(). Thoughts ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Mar 22 14:40:23 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 22 Mar 2000 14:40:23 +0100 Subject: [Python-Dev] Unicode and Windows References: <20000322113129.5E67C370CF2@snelboot.oratrix.nl> Message-ID: <38D8CD47.E573A246@lemburg.com> Jack Jansen wrote: > > > "es#",&encoding,&buffer,&buffer_len > > -- could mean: coerce the object to Unicode, then > > encode it using the given encoding and then > > copy at most buffer_len bytes of data into > > buffer and update buffer_len to the number of bytes > > copied > > This is a possible solution, but I think I would really prefer to also have > "eS", &encoding, &buffer_ptr > -- coerce the object to Unicode, then encode it using the given > encoding, malloc() a buffer to put the result in and return that. > > I don't mind doing something like > > { > char *filenamebuffer = NULL; > > if ( PyArg_ParseTuple(args, "eS", &macencoding, &filenamebuffer) > ... > open(filenamebuffer, ....); > PyMem_XDEL(filenamebuffer); > ... > } > > I think this would be much less error-prone than having fixed-length buffers > all over the place. PyArg_ParseTuple() should probably raise an error in case the data doesn't fit into the buffer. > And if this is indeed going to be used mainly in open() > calls and such the cost of the extra malloc()/free() is going to be dwarfed by > what the underlying OS call is going to use. Good point. You'll still need the buffer_len output parameter though -- otherwise you wouldn't be able tell the size of the allocated buffer (the returned data may not be terminated). How about this: "es#", &encoding, &buffer, &buffer_len -- both buffer and buffer_len are in/out parameters -- if **buffer is non-NULL, copy the data into it (at most buffer_len bytes) and update buffer_len on output; truncation produces an error -- if **buffer is NULL, malloc() a buffer of size buffer_len and return it through *buffer; if buffer_len is -1, the allocated buffer should be large enough to hold all data; again, truncation is an error -- apply coercion and encoding as described above (could be that I've got the '*'s wrong, but you get the picture...:) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jack at oratrix.nl Wed Mar 22 14:46:50 2000 From: jack at oratrix.nl (Jack Jansen) Date: Wed, 22 Mar 2000 14:46:50 +0100 Subject: [Python-Dev] Unicode and Windows In-Reply-To: Message by "M.-A. Lemburg" , Wed, 22 Mar 2000 14:40:23 +0100 , <38D8CD47.E573A246@lemburg.com> Message-ID: <20000322134650.ED1BD370CF2@snelboot.oratrix.nl> > > [on the user-supplies-buffer interface] > > I think this would be much less error-prone than having fixed-length buffers > > all over the place. > > PyArg_ParseTuple() should probably raise an error in case the > data doesn't fit into the buffer. Ah, that's right, that solves most of that problem. > > [on the malloced interface] > Good point. You'll still need the buffer_len output parameter > though -- otherwise you wouldn't be able tell the size of the > allocated buffer (the returned data may not be terminated). Are you sure? I would expect the "eS" format to be used to obtain 8-bit data in some local encoding, and I would expect that all 8-bit encodings of unicode data would still allow for null-termination. Or are there 8-bit encodings out there where a zero byte is normal occurrence and where it can't be used as terminator? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mal at lemburg.com Wed Mar 22 17:31:26 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 22 Mar 2000 17:31:26 +0100 Subject: [Python-Dev] Unicode and Windows References: <20000322134650.ED1BD370CF2@snelboot.oratrix.nl> Message-ID: <38D8F55E.6E324281@lemburg.com> Jack Jansen wrote: > > > > [on the user-supplies-buffer interface] > > > I think this would be much less error-prone than having fixed-length buffers > > > all over the place. > > > > PyArg_ParseTuple() should probably raise an error in case the > > data doesn't fit into the buffer. > > Ah, that's right, that solves most of that problem. > > > > [on the malloced interface] > > Good point. You'll still need the buffer_len output parameter > > though -- otherwise you wouldn't be able tell the size of the > > allocated buffer (the returned data may not be terminated). > > Are you sure? I would expect the "eS" format to be used to obtain 8-bit data > in some local encoding, and I would expect that all 8-bit encodings of unicode > data would still allow for null-termination. Or are there 8-bit encodings out > there where a zero byte is normal occurrence and where it can't be used as > terminator? Not sure whether these exist or not, but they are certainly a possibility to keep in mind. Perhaps adding "es#" and "es" (with 0-byte check) would be ideal ?! -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From pf at artcom-gmbh.de Wed Mar 22 17:54:42 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 22 Mar 2000 17:54:42 +0100 (MET) Subject: [Python-Dev] Nitpicking on UserList implementation Message-ID: Hi! Please have a look at the following method cited from Lib/UserList.py: def __radd__(self, other): if isinstance(other, UserList): # <-- ? return self.__class__(other.data + self.data) # <-- ? elif isinstance(other, type(self.data)): return self.__class__(other + self.data) else: return self.__class__(list(other) + self.data) The reference manual tells about the __r*__ methods: """These functions are only called if the left operand does not support the corresponding operation.""" So if the left operand is a UserList instance, it should always have a __add__ method, which will be called instead of the right operands __radd__. So I think the condition 'isinstance(other, UserList)' in __radd__ above will always evaluate to False and so the two lines marked with '# <-- ?' seem to be superfluous. But 'UserList' is so mature: Please tell me what I've oveerlooked before I make a fool of myself and submit a patch removing this two lines. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From gvwilson at nevex.com Thu Mar 23 18:10:16 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Thu, 23 Mar 2000 12:10:16 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods Message-ID: [The following passed the Ping test, so I'm posting it here] If None becomes a keyword, I would like to ask whether it could be used to signal that a method is a class method, as opposed to an instance method: class Ping: def __init__(self, arg): ...as usual... def method(self, arg): ...no change... def classMethod(None, arg): ...equivalent of C++ 'static'... p = Ping("thinks this is cool") # as always p.method("who am I to argue?") # as always Ping.classMethod("hey, cool!") # no 'self' p.classMethod("hey, cool!") # also selfless I'd also like to ask (separately) that assignment to None be defined as a no-op, so that programmers can write: year, month, None, None, None, None, weekday, None, None = gmtime(time()) instead of having to create throw-away variables to fill in slots in tuples that they don't care about. I think both behaviors are readable; the first provides genuinely new functionality, while I often found the second handy when I was doing logic programming. Greg From jim at digicool.com Thu Mar 23 18:18:29 2000 From: jim at digicool.com (Jim Fulton) Date: Thu, 23 Mar 2000 12:18:29 -0500 Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <38DA51E5.B39D3E7B@digicool.com> gvwilson at nevex.com wrote: > > [The following passed the Ping test, so I'm posting it here] > > If None becomes a keyword, I would like to ask whether it could be used to > signal that a method is a class method, as opposed to an instance method: > > class Ping: > > def __init__(self, arg): > ...as usual... > > def method(self, arg): > ...no change... > > def classMethod(None, arg): > ...equivalent of C++ 'static'... (snip) As a point of jargon, please lets call this thing a "static method" (or an instance function, or something) rather than a "class method". The distinction between "class methods" and "static methods" has been discussed at length in the types sig (over a year ago). If this proposal goes forward and the name "class method" is used, I'll have to argue strenuously, and I really don't want to do that. :] So, if you can live with the term "static method", you could save us alot of trouble by just saying "static method". Jim -- Jim Fulton mailto:jim at digicool.com Technical Director (888) 344-4332 Python Powered! Digital Creations http://www.digicool.com http://www.python.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From gvwilson at nevex.com Thu Mar 23 18:21:48 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Thu, 23 Mar 2000 12:21:48 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <38DA51E5.B39D3E7B@digicool.com> Message-ID: > As a point of jargon, please lets call this thing a "static method" > (or an instance function, or something) rather than a "class method". I'd call it a penguin if that was what it took to get something like this implemented... :-) greg From jim at digicool.com Thu Mar 23 18:28:25 2000 From: jim at digicool.com (Jim Fulton) Date: Thu, 23 Mar 2000 12:28:25 -0500 Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <38DA5439.F5FE8FE6@digicool.com> gvwilson at nevex.com wrote: > > > As a point of jargon, please lets call this thing a "static method" > > (or an instance function, or something) rather than a "class method". > > I'd call it a penguin if that was what it took to get something like this > implemented... :-) Thanks a great name. Let's go with penguin. :) Jim -- Jim Fulton mailto:jim at digicool.com Technical Director (888) 344-4332 Python Powered! Digital Creations http://www.digicool.com http://www.python.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. From mhammond at skippinet.com.au Thu Mar 23 18:29:53 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 23 Mar 2000 09:29:53 -0800 Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: Message-ID: ... > If None becomes a keyword, I would like to ask whether it could be used to > signal that a method is a class method, as opposed to an instance method: > > def classMethod(None, arg): > ...equivalent of C++ 'static'... ... > I'd also like to ask (separately) that assignment to None be defined as a > no-op, so that programmers can write: > > year, month, None, None, None, None, weekday, None, None = > gmtime(time()) In the vernacular of a certain Mr Stein... +2 on both of these :-) [Although I do believe "static method" is a better name than "penguin" :-] Mark. From ping at lfw.org Thu Mar 23 18:47:47 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 23 Mar 2000 09:47:47 -0800 (PST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: Message-ID: On Thu, 23 Mar 2000 gvwilson at nevex.com wrote: > > If None becomes a keyword, I would like to ask whether it could be used to > signal that a method is a class method, as opposed to an instance method: > > class Ping: [...] Ack! I've been reduced to a class with just three methods. Oh well, i never really considered it a such a bad thing to be called "simple-minded". :) > def classMethod(None, arg): > ...equivalent of C++ 'static'... Yeah, i agree with Jim; you might as well call this a "static method" as opposed to a "class method". I like the way "None" is explicitly stated here, so there's no confusion about what the method does. (Without it, there's the question of whether the first argument will get thrown in, or what...) Hmm... i guess this also means one should ask what def function(None, arg): ... does outside a class definition. I suppose that should simply be illegal. > I'd also like to ask (separately) that assignment to None be defined as a > no-op, so that programmers can write: > > year, month, None, None, None, None, weekday, None, None = gmtime(time()) > > instead of having to create throw-away variables to fill in slots in > tuples that they don't care about. For what it's worth, i sometimes use "_" for this purpose (shades of Prolog!) but i can't make much of an argument for its readability... -- ?!ng I never dreamt that i would get to be The creature that i always meant to be But i thought, in spite of dreams, You'd be sitting somewhere here with me. From fdrake at acm.org Thu Mar 23 19:11:39 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 23 Mar 2000 13:11:39 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: References: Message-ID: <14554.24155.948286.451340@weyr.cnri.reston.va.us> gvwilson at nevex.com writes: > p.classMethod("hey, cool!") # also selfless This is the example that I haven't seen before (I'm not on the types-sig, so it may have been presented there), and I think this is what makes it interesting; a method in a module isn't quite sufficient here, since a subclass can override or extend the penguin this way. (Er, if we *do* go with penguin, does this mean it only works on Linux? ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From pf at artcom-gmbh.de Thu Mar 23 19:25:57 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Thu, 23 Mar 2000 19:25:57 +0100 (MET) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: from "gvwilson@nevex.com" at "Mar 23, 2000 12:10:16 pm" Message-ID: Hi! gvwilson at nevex.com: > I'd also like to ask (separately) that assignment to None be defined as a > no-op, so that programmers can write: > > year, month, None, None, None, None, weekday, None, None = gmtime(time()) You can already do this today with 1.5.2, if you use a 'del None' statement: Python 1.5.2 (#1, Jul 23 1999, 06:38:16) [GCC egcs-2.91.66 19990314/Linux (egcs- on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> from time import time, gmtime >>> year, month, None, None, None, None, weekday, None, None = gmtime(time()) >>> print year, month, None, weekday 2000 3 0 3 >>> del None >>> print year, month, None, weekday 2000 3 None 3 >>> if None will become a keyword in Py3K this pyidiom should better be written as year, month, None, None, None, None, ... = ... if sys.version[0] == '1': del None or try: del None except SyntaxError: pass # Wow running Py3K here! I wonder, how much existinng code the None --> keyword change would brake. Regards, Peter From paul at prescod.net Thu Mar 23 19:47:55 2000 From: paul at prescod.net (Paul Prescod) Date: Thu, 23 Mar 2000 10:47:55 -0800 Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <38DA66DB.635E8731@prescod.net> gvwilson at nevex.com wrote: > > [The following passed the Ping test, so I'm posting it here] > > If None becomes a keyword, I would like to ask whether it could be used to > signal that a method is a class method, as opposed to an instance method: +1 Idea is good, but I'm not really happy with any of the the proposed terminology...Python doesn't really have static anything. I would vote at the same time to make self a keyword and signal if the first argument is not one of None or self. Even now, one of my most common Python mistakes is in forgetting self. I expect it happens to anyone who shifts between other languages and Python. Why does None have an upper case "N"? Maybe the keyword version should be lower-case... -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "I and my companions suffer from a disease of the heart that can only be cured with gold", Hernan Cortes From bwarsaw at cnri.reston.va.us Thu Mar 23 19:57:00 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 23 Mar 2000 13:57:00 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <14554.26876.514559.320219@anthem.cnri.reston.va.us> >>>>> "gvwilson" == writes: gvwilson> If None becomes a keyword, I would like to ask whether gvwilson> it could be used to signal that a method is a class gvwilson> method, as opposed to an instance method: It still seems mildly weird that None would be a special kind of keyword, one that has a value and is used in ways that no other keyword is used. Greg gives an example, and here's a few more: def baddaboom(x, y, z=None): ... if z is None: ... try substituting `else' for `None' in these examples. ;) Putting that issue aside, Greg's suggestion for static method definitions is interesting. class Ping: # would this be a SyntaxError? def __init__(None, arg): ... def staticMethod(None, arg): ... p = Ping() Ping.staticMethod(p, 7) # TypeError Ping.staticMethod(7) # This is fine p.staticMethod(7) # So's this Ping.staticMethod(p) # and this !! -Barry From paul at prescod.net Thu Mar 23 19:52:25 2000 From: paul at prescod.net (Paul Prescod) Date: Thu, 23 Mar 2000 10:52:25 -0800 Subject: [Python-Dev] dir() Message-ID: <38DA67E9.AA593B7A@prescod.net> Can someone explain why dir(foo) does not return all of foo's methods? I know it's documented that way, I just don't know why it *is* that way. I'm also not clear why instances don't have auto-populated __methods__ and __members__ members? If there isn't a good reason (there probably is) then I would advocate that these functions and members should be more comprehensive. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "I and my companions suffer from a disease of the heart that can only be cured with gold", Hernan Cortes From bwarsaw at cnri.reston.va.us Thu Mar 23 20:00:57 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 23 Mar 2000 14:00:57 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <14554.27113.546575.170565@anthem.cnri.reston.va.us> >>>>> "PF" == Peter Funk writes: | try: | del None | except SyntaxError: | pass # Wow running Py3K here! I know how to break your Py3K code: stick None=None some where higher up :) PF> I wonder, how much existinng code the None --> keyword change PF> would brake. Me too. -Barry From gvwilson at nevex.com Thu Mar 23 20:01:06 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Thu, 23 Mar 2000 14:01:06 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <14554.26876.514559.320219@anthem.cnri.reston.va.us> Message-ID: > class Ping: > # would this be a SyntaxError? > def __init__(None, arg): > ... Absolutely a syntax error; ditto any of the other special names (e.g. __add__). Greg From akuchlin at mems-exchange.org Thu Mar 23 20:06:33 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Thu, 23 Mar 2000 14:06:33 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <14554.27113.546575.170565@anthem.cnri.reston.va.us> References: <14554.27113.546575.170565@anthem.cnri.reston.va.us> Message-ID: <14554.27449.69043.924322@amarok.cnri.reston.va.us> Barry A. Warsaw writes: >>>>>> "PF" == Peter Funk writes: > PF> I wonder, how much existinng code the None --> keyword change > PF> would brake. >Me too. I can't conceive of anyone using None as a function name or a variable name, except through a bug or thinking that 'None, useful, None = 1,2,3' works. Even though None isn't a fixed constant, it might as well be. How much C code have you see lately that starts with int function(void *NULL) ? Being able to do "None = 2" also smacks a bit of those legendary Fortran compilers that let you accidentally change 2 into 4. +1 on this change for Py3K, and I doubt it would cause breakage even if introduced into 1.x. -- A.M. Kuchling http://starship.python.net/crew/amk/ Principally I played pedants, idiots, old fathers, and drunkards. As you see, I had a narrow escape from becoming a professor. -- Robertson Davies, "Shakespeare over the Port" From paul at prescod.net Thu Mar 23 20:02:33 2000 From: paul at prescod.net (Paul Prescod) Date: Thu, 23 Mar 2000 11:02:33 -0800 Subject: [Python-Dev] Unicode character names Message-ID: <38DA6A49.A60E405B@prescod.net> Here's a feature I like from Perl's Unicode support: """ Support for interpolating named characters The new \N escape interpolates named characters within strings. For example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a unicode smiley face at the end. """ I get really tired of looking up the Unicode character for "ndash" or "right dagger". Does our Unicode database have enough information to make something like this possible? Obviously using the official (English) name is only really helpful for people who speak English, so we should not remove the numeric option. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "I and my companions suffer from a disease of the heart that can only be cured with gold", Hernan Cortes From tismer at tismer.com Thu Mar 23 20:27:53 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 23 Mar 2000 20:27:53 +0100 Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <38DA7039.B7CDC6FF@tismer.com> Mark Hammond wrote: > > ... > > If None becomes a keyword, I would like to ask whether it could be used to > > signal that a method is a class method, as opposed to an instance method: > > > > def classMethod(None, arg): > > ...equivalent of C++ 'static'... > ... > > > I'd also like to ask (separately) that assignment to None be defined as a > > no-op, so that programmers can write: > > > > year, month, None, None, None, None, weekday, None, None = > > gmtime(time()) > > In the vernacular of a certain Mr Stein... > > +2 on both of these :-) me 2, ?h 1.5... The assignment no-op seems to be ok. Having None as a place holder for static methods creates the problem that we loose compatibility with ordinary functions. What I would propose instead is: make the parameter name "self" mandatory for methods, and turn everything else into a static method. This does not change function semantics, but just the way the method binding works. > [Although I do believe "static method" is a better name than "penguin" :-] pynguin -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From gvwilson at nevex.com Thu Mar 23 20:33:47 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Thu, 23 Mar 2000 14:33:47 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <38DA7039.B7CDC6FF@tismer.com> Message-ID: Hi, Christian; thanks for your mail. > What I would propose instead is: > make the parameter name "self" mandatory for methods, and turn > everything else into a static method. In my experience, significant omissions (i.e. something being important because it is *not* there) often give beginners trouble. For example, in C++, you can't tell whether: int foo::bar(int bah) { return 0; } belongs to instances, or to the class as a whole, without referring back to the header file [1]. To quote the immortal Jeremy Hylton: Pythonic design rules #2: Explicit is better than implicit. Also, people often ask why 'self' is required as a method argument in Python, when it is not in C++ or Java; this proposal would (retroactively) answer that question... Greg [1] I know this isn't a problem in Java or Python; I'm just using it as an illustration. From skip at mojam.com Thu Mar 23 21:09:00 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 23 Mar 2000 14:09:00 -0600 (CST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <14554.27449.69043.924322@amarok.cnri.reston.va.us> References: <14554.27113.546575.170565@anthem.cnri.reston.va.us> <14554.27449.69043.924322@amarok.cnri.reston.va.us> Message-ID: <14554.31196.387213.472302@beluga.mojam.com> AMK> +1 on this change for Py3K, and I doubt it would cause breakage AMK> even if introduced into 1.x. Or if it did, it's probably code that's marginally broken already... -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From tismer at tismer.com Thu Mar 23 21:21:09 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 23 Mar 2000 21:21:09 +0100 Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <38DA7CB5.87D62E14@tismer.com> Yo, gvwilson at nevex.com wrote: > > Hi, Christian; thanks for your mail. > > > What I would propose instead is: > > make the parameter name "self" mandatory for methods, and turn > > everything else into a static method. > > In my experience, significant omissions (i.e. something being important > because it is *not* there) often give beginners trouble. For example, > in C++, you can't tell whether: > > int foo::bar(int bah) > { > return 0; > } > > belongs to instances, or to the class as a whole, without referring back > to the header file [1]. To quote the immortal Jeremy Hylton: > > Pythonic design rules #2: > Explicit is better than implicit. Sure. I am explicitly *not* using self if I want no self. :-) > Also, people often ask why 'self' is required as a method argument in > Python, when it is not in C++ or Java; this proposal would (retroactively) > answer that question... You prefer to use the explicit keyword None? How would you then deal with def outside(None, blah): pass # stuff I believe one answer about the explicit "self" is that it should be simple and compatible with ordinary functions. Guido had just to add the semantics that in methods the first parameter automatically binds to the instance. The None gives me a bit of trouble, but not much. What I would like to spell is ordinary functions (as it is now) functions which are instance methods (with the immortal self) functions which are static methods ??? functions which are class methods !!! Static methods can work either with the "1st param==None" rule or with the "1st paramname!=self" rule or whatever. But how would you do class methods, which IMHO should have their class passed in as first parameter? Do you see a clean syntax for this? I thought of some weirdness like def meth(self, ... def static(self=None, ... # eek def classm(self=class, ... # ahem but this breaks the rule of default argument order. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From akuchlin at mems-exchange.org Thu Mar 23 21:27:41 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Thu, 23 Mar 2000 15:27:41 -0500 (EST) Subject: [Python-Dev] Unicode character names In-Reply-To: <38DA6A49.A60E405B@prescod.net> References: <38DA6A49.A60E405B@prescod.net> Message-ID: <14554.32317.730574.967165@amarok.cnri.reston.va.us> Paul Prescod writes: >The new \N escape interpolates named characters within strings. For >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a >unicode smiley face at the end. Cute idea, and it certainly means you can avoid looking up Unicode numbers. (You can look up names instead. :) ) Note that this means the Unicode database is no longer optional if this is done; it has to be around at code-parsing time. Python could import it automatically, as exceptions.py is imported. Christian's work on compressing unicodedatabase.c is therefore really important. (Is Perl5.6 actually dragging around the Unicode database in the binary, or is it read out of some external file or data structure?) -- A.M. Kuchling http://starship.python.net/crew/amk/ About ten days later, it being the time of year when the National collected down and outs to walk on and understudy I arrived at the head office of the National Theatre in Aquinas Street in Waterloo. -- Tom Baker, in his autobiography From bwarsaw at cnri.reston.va.us Thu Mar 23 21:39:43 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 23 Mar 2000 15:39:43 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods References: <38DA7039.B7CDC6FF@tismer.com> Message-ID: <14554.33039.4390.591036@anthem.cnri.reston.va.us> >>>>> "gvwilson" == writes: gvwilson> belongs to instances, or to the class as a whole, gvwilson> without referring back to the header file [1]. To quote gvwilson> the immortal Jeremy Hylton: Not to take anything away from Jeremy, who has contributed some wonderfully Pythonic quotes of his own, but this one is taken from Tim Peters' Zen of Python http://www.python.org/doc/Humor.html#zen timbot-is-the-only-one-who's-gonna-outlive-his-current-chip-set- around-here-ly y'rs, -Barry From jeremy at cnri.reston.va.us Thu Mar 23 21:55:25 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Thu, 23 Mar 2000 15:55:25 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: References: <38DA7039.B7CDC6FF@tismer.com> Message-ID: <14554.33590.844200.145871@walden> >>>>> "GVW" == gvwilson writes: GVW> To quote the immortal Jeremy Hylton: GVW> Pythonic design rules #2: GVW> Explicit is better than implicit. I wish I could take credit for that :-). Tim Peters posted a list of 20 Pythonic theses to comp.lang.python under the title "The Python Way." I'll collect them all here in hopes of future readers mistaking me for Tim again . Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those! See http://x27.deja.com/getdoc.xp?AN=485548918&CONTEXT=953844380.1254555688&hitnum=9 for the full post. to-be-immortal-i'd-need-to-be-a-bot-ly y'rs Jeremy From jeremy at alum.mit.edu Thu Mar 23 22:01:01 2000 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 23 Mar 2000 16:01:01 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: References: Message-ID: <14554.34037.232728.670271@walden> >>>>> "GVW" == gvwilson writes: GVW> I'd also like to ask (separately) that assignment to None be GVW> defined as a no-op, so that programmers can write: GVW> year, month, None, None, None, None, weekday, None, None = GVW> gmtime(time()) GVW> instead of having to create throw-away variables to fill in GVW> slots in tuples that they don't care about. I think both GVW> behaviors are readable; the first provides genuinely new GVW> functionality, while I often found the second handy when I was GVW> doing logic programming. -1 on this proposal Pythonic design rule #8: Special cases aren't special enough to break the rules. I think it's confusing to have assignment mean pop the top of the stack for the special case that the name is None. If Py3K makes None a keyword, then it would also be the only keyword that can be used in an assignment. Finally, we'd need to explain to the rare newbie who used None as variable name why they assigned 12 to None but that it's value was its name when it was later referenced. (Think 'print None'.) When I need to ignore some of the return values, I use the name nil. year, month, nil, nil, nil, nil, weekday, nil, nil = gmtime(time()) I think that's just as clear, only a whisker less efficient, and requires no special cases. Heck, it's even less typing <0.5 wink>. Jeremy From gvwilson at nevex.com Thu Mar 23 21:59:41 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Thu, 23 Mar 2000 15:59:41 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <14554.33590.844200.145871@walden> Message-ID: > GVW> To quote the immortal Jeremy Hylton: > GVW> Pythonic design rules #2: > GVW> Explicit is better than implicit. > > I wish I could take credit for that :-). Tim Peters posted a list of > 20 Pythonic theses to comp.lang.python under the title "The Python > Way." Traceback (innermost last): File "", line 1, in ? AttributionError: insight incorrectly ascribed From paul at prescod.net Thu Mar 23 22:26:42 2000 From: paul at prescod.net (Paul Prescod) Date: Thu, 23 Mar 2000 13:26:42 -0800 Subject: [Python-Dev] None as a keyword / class methods References: <14554.34037.232728.670271@walden> Message-ID: <38DA8C12.DFFD63D5@prescod.net> Jeremy Hylton wrote: > > ... > year, month, nil, nil, nil, nil, weekday, nil, nil = gmtime(time()) So you're proposing nil as a new keyword? I like it. +2 -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "No, I'm not QUITE that stupid", Paul Prescod From pf at artcom-gmbh.de Thu Mar 23 22:46:49 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Thu, 23 Mar 2000 22:46:49 +0100 (MET) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <14554.27113.546575.170565@anthem.cnri.reston.va.us> from "Barry A. Warsaw" at "Mar 23, 2000 2: 0:57 pm" Message-ID: Hi Barry! > >>>>> "PF" == Peter Funk writes: > > | try: > | del None > | except SyntaxError: > | pass # Wow running Py3K here! Barry A. Warsaw: > I know how to break your Py3K code: stick None=None some where higher > up :) Hmm.... I must admit, that I don't understand your argument. In Python <= 1.5.2 'del None' works fine, iff it follows any assignment to None in the same scope regardless, whether there has been a None=None in the surrounding scope or in the same scope before this. Since something like 'del for' or 'del import' raises a SyntaxError exception in Py152, I expect 'del None' to raise the same exception in Py3K, after None has become a keyword. Right? Regards, Peter From andy at reportlab.com Thu Mar 23 22:54:23 2000 From: andy at reportlab.com (Andy Robinson) Date: Thu, 23 Mar 2000 21:54:23 GMT Subject: [Python-Dev] Unicode Character Names In-Reply-To: <20000323202533.ABDB31CEF8@dinsdale.python.org> References: <20000323202533.ABDB31CEF8@dinsdale.python.org> Message-ID: <38da90b4.756297@post.demon.co.uk> >Message: 20 >From: "Andrew M. Kuchling" >Date: Thu, 23 Mar 2000 15:27:41 -0500 (EST) >To: "python-dev at python.org" >Subject: Re: [Python-Dev] Unicode character names > >Paul Prescod writes: >>The new \N escape interpolates named characters within strings. For >>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a >>unicode smiley face at the end. > >Cute idea, and it certainly means you can avoid looking up Unicode >numbers. (You can look up names instead. :) ) Note that this means the >Unicode database is no longer optional if this is done; it has to be >around at code-parsing time. Python could import it automatically, as >exceptions.py is imported. Christian's work on compressing >unicodedatabase.c is therefore really important. (Is Perl5.6 actually >dragging around the Unicode database in the binary, or is it read out >of some external file or data structure?) I agree - the names are really useful. If you are doing conversion work, often you want to know what a character is, but don't have a complete Unicode font handy. Being able to get the description for a Unicode character is useful, as well as being able to use the description as a constructor for it. Also, there are some language specific things that might make it useful to have the full character descriptions in Christian's database. For example, we'll have an (optional, not in the standard library) Japanese module with functions like isHalfWidthKatakana(), isFullWidthKatakana() to help normalize things. Parsing the database and looking for strings in the descriptions is one way to build this - not the only one, but it might be useful. So I'd vote to put names in at first, and give us a few weeks to see how useful they are before a final decision. - Andy Robinson From paul at prescod.net Thu Mar 23 23:09:42 2000 From: paul at prescod.net (Paul Prescod) Date: Thu, 23 Mar 2000 14:09:42 -0800 Subject: [Python-Dev] Unicode character names References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us> Message-ID: <38DA9626.8B62DB77@prescod.net> "Andrew M. Kuchling" wrote: > > Paul Prescod writes: > >The new \N escape interpolates named characters within strings. For > >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a > >unicode smiley face at the end. > > Cute idea, and it certainly means you can avoid looking up Unicode > numbers. (You can look up names instead. :) ) More important, though, the code is "self documenting". You never have to go from the number back to the name. > Note that this means the > Unicode database is no longer optional if this is done; it has to be > around at code-parsing time. I don't like the idea enough to exclude support for small machines or anything like that. We should way the costs of requiring the Unicode database at compile time. > (Is Perl5.6 actually > dragging around the Unicode database in the binary, or is it read out > of some external file or data structure?) I have no idea. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "I and my companions suffer from a disease of the heart that can only be cured with gold", Hernan Cortes From pf at artcom-gmbh.de Thu Mar 23 23:12:25 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Thu, 23 Mar 2000 23:12:25 +0100 (MET) Subject: [Python-Dev] Py3K: True and False builtin or keyword? Message-ID: Regarding the discussion about None becoming a keyword in Py3K: Recently the truth values True and False have been mentioned. Should they become builtin values --like None is now-- or should they become keywords? Nevertheless: for the time being I came up with the following weird idea: If you put this in front of the main module of a Python app: #!/usr/bin/env python if __name__ == "__main__": import sys if sys.version[0] <= '1': __builtins__.True = 1 __builtins__.False = 0 del sys # --- continue with your app from here: --- import foo, bar, ... .... Now you can start to use False and True in any immported module as if they were already builtins. Of course this is no surprise here and Python is really fun, Peter. From mal at lemburg.com Thu Mar 23 22:07:35 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 23 Mar 2000 22:07:35 +0100 Subject: [Python-Dev] Unicode character names References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us> Message-ID: <38DA8797.F16301E4@lemburg.com> "Andrew M. Kuchling" wrote: > > Paul Prescod writes: > >The new \N escape interpolates named characters within strings. For > >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a > >unicode smiley face at the end. > > Cute idea, and it certainly means you can avoid looking up Unicode > numbers. (You can look up names instead. :) ) Note that this means the > Unicode database is no longer optional if this is done; it has to be > around at code-parsing time. Python could import it automatically, as > exceptions.py is imported. Christian's work on compressing > unicodedatabase.c is therefore really important. (Is Perl5.6 actually > dragging around the Unicode database in the binary, or is it read out > of some external file or data structure?) Sorry to disappoint you guys, but the Unicode name and comments are *not* included in the unicodedatabase.c file Christian is currently working on. The reason is simple: it would add huge amounts of string data to the file. So this is a no-no for the core distribution... Still, the above is easily possible by inventing a new encoding, say unicode-with-smileys, which then reads in a file containing the Unicode names and applies the necessary magic to decode/encode data as Paul described above. Would probably make a cool fun-project for someone who wants to dive into writing codecs. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From bwarsaw at cnri.reston.va.us Fri Mar 24 00:02:06 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Thu, 23 Mar 2000 18:02:06 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods References: <14554.27113.546575.170565@anthem.cnri.reston.va.us> Message-ID: <14554.41582.688247.569547@anthem.cnri.reston.va.us> Hi Peter! >>>>> "PF" == Peter Funk writes: PF> Since something like 'del for' or 'del import' raises a PF> SyntaxError exception in Py152, I expect 'del None' to raise PF> the same exception in Py3K, after None has become a keyword. PF> Right? I misread your example the first time through, but it still doesn't quite parse on my second read. -------------------- snip snip -------------------- pyvers = '2k' try: del import except SyntaxError: pyvers = '3k' -------------------- snip snip -------------------- % python /tmp/foo.py File "/tmp/foo.py", line 3 del import ^ SyntaxError: invalid syntax -------------------- snip snip -------------------- See, you can't catch that SyntaxError because it doesn't happen at run-time. Maybe you meant to wrap the try suite in an exec? Here's a code sample that ought to work with 1.5.2 and the mythical Py3K-with-a-None-keyword. -------------------- snip snip -------------------- pyvers = '2k' try: exec "del None" except SyntaxError: pyvers = '3k' except NameError: pass print pyvers -------------------- snip snip -------------------- Cheers, -Barry From klm at digicool.com Fri Mar 24 00:05:08 2000 From: klm at digicool.com (Ken Manheimer) Date: Thu, 23 Mar 2000 18:05:08 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: Message-ID: On Thu, 23 Mar 2000 pf at artcom-gmbh.de wrote: > Hi Barry! > > > >>>>> "PF" == Peter Funk writes: > > > > | try: > > | del None > > | except SyntaxError: > > | pass # Wow running Py3K here! > > Barry A. Warsaw: > > I know how to break your Py3K code: stick None=None some where higher > > up :) Huh. Does anyone really think we're going to catch SyntaxError at runtime, ever? Seems like the code fragment above wouldn't work in the first place. But i suppose, with most of a millennium to emerge, py3k could have more fundamental changes than i could even imagine...-) Ken klm at digicool.com From pf at artcom-gmbh.de Thu Mar 23 23:53:34 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Thu, 23 Mar 2000 23:53:34 +0100 (MET) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <14554.27449.69043.924322@amarok.cnri.reston.va.us> from "Andrew M. Kuchling" at "Mar 23, 2000 2: 6:33 pm" Message-ID: Hi! > Barry A. Warsaw writes: > >>>>>> "PF" == Peter Funk writes: > > PF> I wonder, how much existinng code the None --> keyword change > > PF> would brake. > >Me too. Andrew M. Kuchling: > I can't conceive of anyone using None as a function name or a variable > name, except through a bug or thinking that 'None, useful, None = > 1,2,3' works. Even though None isn't a fixed constant, it might as > well be. How much C code have you see lately that starts with int > function(void *NULL) ? I agree. urban legend: Once upon a time someone found the following neat snippet of C source hidden in some header file of a very very huge software, after he has spend some nights trying to figure out, why some simple edits he made in order to make the code more readable broke the system: #ifdef TRUE /* eat this: you arrogant Quiche Eaters */ #undef TRUE #undef FALSE #define TRUE (0) #define FALSE (1) #endif Obviously the poor guy would have found this particular small piece of evil code much earlier, if he had simply 'grep'ed for comments... there were not so many in this system. ;-) > Being able to do "None = 2" also smacks a bit of those legendary > Fortran compilers that let you accidentally change 2 into 4. +1 on > this change for Py3K, and I doubt it would cause breakage even if > introduced into 1.x. We'll see: those "Real Programmers" never die. Fortunately they prefer Perl over Python. <0.5 grin> Regards, Peter From klm at digicool.com Fri Mar 24 00:15:42 2000 From: klm at digicool.com (Ken Manheimer) Date: Thu, 23 Mar 2000 18:15:42 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: <14554.41582.688247.569547@anthem.cnri.reston.va.us> Message-ID: On Thu, 23 Mar 2000 bwarsaw at cnri.reston.va.us wrote: > See, you can't catch that SyntaxError because it doesn't happen at > run-time. Maybe you meant to wrap the try suite in an exec? Here's a Huh. Guess i should have read barry's re-response before i posted mine: Desperately desiring to redeem myself, and contribute something to the discussion, i'll settle the class/static method naming quandry with the obvious alternative: > > p.classMethod("hey, cool!") # also selfless These should be called buddha methods - no self, samadhi, one with everything, etc. There, now i feel better. :-) Ken klm at digicool.com A Zen monk walks up to a hotdog vendor and says "make me one with everything." Ha. But that's not all. He gets the hot dog and pays with a ten. After several moments waiting, he says to the vendor, "i was expecting change", and the vendor say, "you of all people should know, change comes from inside." That's all. From bwarsaw at cnri.reston.va.us Fri Mar 24 00:19:28 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 23 Mar 2000 18:19:28 -0500 (EST) Subject: [Python-Dev] Py3K: True and False builtin or keyword? References: Message-ID: <14554.42624.213027.854942@anthem.cnri.reston.va.us> >>>>> "PF" == Peter Funk writes: PF> Now you can start to use False and True in any immported PF> module as if they were already builtins. Of course this is no PF> surprise here and Python is really fun, Peter. You /can/ do this, but that doesn't mean you /should/ :) Mucking with builtins is fun the way huffing dry erase markers is fun. Things are very pretty at first, but eventually the brain cell lossage will more than outweigh that cheap thrill. I've seen a few legitimate uses for hacking builtins. In Zope, I believe Jim hacks get_transaction() or somesuch into builtins because that way it's easy to get at without passing it through the call tree. And in Zope it makes sense since this is a fancy database application and your current transaction is a central concept. I've occasionally wrapped an existing builtin because I needed to extend it's functionality while keeping it's semantics and API unchanged. An example of this was my pre-Python-1.5.2 open_ex() in Mailman's CGI driver script. Before builtin open() would print the failing file name, my open_ex() -- shown below -- would hack that into the exception object. But one of the things about Python that I /really/ like is that YOU KNOW WHERE THINGS COME FROM. If I suddenly start seeing True and False in your code, I'm going to look for function locals and args, then module globals, then from ... import *'s. If I don't see it in any of those, I'm going to put down my dry erase markers, look again, and then utter a loud "huh?" :) -Barry realopen = open def open_ex(filename, mode='r', bufsize=-1, realopen=realopen): from Mailman.Utils import reraise try: return realopen(filename, mode, bufsize) except IOError, e: strerror = e.strerror + ': ' + filename e.strerror = strerror e.filename = filename e.args = (e.args[0], strerror) reraise(e) import __builtin__ __builtin__.__dict__['open'] = open_ex From pf at artcom-gmbh.de Fri Mar 24 00:23:57 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Fri, 24 Mar 2000 00:23:57 +0100 (MET) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: from Ken Manheimer at "Mar 23, 2000 6: 5: 8 pm" Message-ID: Hi! > > > | try: > > > | del None > > > | except SyntaxError: > > > | pass # Wow running Py3K here! > > > > Barry A. Warsaw: > > > I know how to break your Py3K code: stick None=None some where higher > > > up :) > Ken Manheimer: > Huh. Does anyone really think we're going to catch SyntaxError at > runtime, ever? Seems like the code fragment above wouldn't work in the > first place. Ouuppps... Unfortunately I had no chance to test this with Py3K before making a fool of myself by posting this silly example. Now I understand what Barry meant. So if None really becomes a keyword in Py3K we can be sure to catch all those imaginary 'del None' statements very quickly. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From billtut at microsoft.com Fri Mar 24 03:46:06 2000 From: billtut at microsoft.com (Bill Tutt) Date: Thu, 23 Mar 2000 18:46:06 -0800 Subject: [Python-Dev] Re: Unicode character names Message-ID: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> MAL wrote: >Andrew M. Kuchling" wrote: >> >> Paul Prescod writes: >>>The new \N escape interpolates named characters within strings. For >>>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a >>>unicode smiley face at the end. >> >> Cute idea, and it certainly means you can avoid looking up Unicode >> numbers. (You can look up names instead. :) ) Note that this means the >> Unicode database is no longer optional if this is done; it has to be >> around at code-parsing time. Python could import it automatically, as >> exceptions.py is imported. Christian's work on compressing >> unicodedatabase.c is therefore really important. (Is Perl5.6 actually >> dragging around the Unicode database in the binary, or is it read out >> of some external file or data structure?) > > Sorry to disappoint you guys, but the Unicode name and comments > are *not* included in the unicodedatabase.c file Christian > is currently working on. The reason is simple: it would add > huge amounts of string data to the file. So this is a no-no > for the core distribution... > Ok, now you're just being silly. Its possible to put the character names in a separate structure so that they don't automatically get paged in with the normal unicode character property data. If you never use it, it won't get paged in, its that simple.... Looking up the Unicode code value from the Unicode character name smells like a good time to use gperf to generate a perfect hash function for the character names. Esp. for the Unicode 3.0 character namespace. Then you can just store the hashkey -> Unicode character mapping, and hardly ever need to page in the actual full character name string itself. I haven't looked at what the comment field contains, so I have no idea how useful that info is. *waits while gperf crunches through the ~10,550 Unicode characters where this would be useful* Bill From akuchlin at mems-exchange.org Fri Mar 24 03:51:25 2000 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 23 Mar 2000 21:51:25 -0500 (EST) Subject: [Python-Dev] 1.6 job list Message-ID: <200003240251.VAA19921@newcnri.cnri.reston.va.us> I've written up a list of things that need to get done before 1.6 is finished. This is my vision of what needs to be done, and doesn't have an official stamp of approval from GvR or anyone else. So it's very probably wrong. http://starship.python.net/crew/amk/python/1.6-jobs.html Here's the list formatted as text. The major outstanding things at the moment seem to be sre and Distutils; once they go in, you could probably release an alpha, because the other items are relatively minor. Still to do * XXX Revamped import hooks (or is this a post-1.6 thing?) * Update the documentation to match 1.6 changes. * Document more undocumented modules * Unicode: Add Unicode support for open() on Windows * Unicode: Compress the size of unicodedatabase * Unicode: Write \N{SMILEY} codec for Unicode * Unicode: the various XXX items in Misc/unicode.txt * Add module: Distutils * Add module: Jim Ahlstrom's zipfile.py * Add module: PyExpat interface * Add module: mmapfile * Add module: sre * Drop cursesmodule and package it separately. (Any other obsolete modules that should go?) * Delete obsolete subdirectories in Demo/ directory * Refurbish Demo subdirectories to be properly documented, match modern coding style, etc. * Support Unicode strings in PyExpat interface * Fix ./ld_so_aix installation problem on AIX * Make test.regrtest.py more usable outside of the Python test suite * Conservative garbage collection of cycles (maybe?) * Write friendly "What's New in 1.6" document/article Done Nothing at the moment. After 1.7 * Rich comparisons * Revised coercions * Parallel for loop (for i in L; j in M: ...), * Extended slicing for all sequences. * GvR: "I've also been thinking about making classes be types (not as huge a change as you think, if you don't allow subclassing built-in types), and adding a built-in array type suitable for use by NumPy." --amk From esr at thyrsus.com Fri Mar 24 04:30:53 2000 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 23 Mar 2000 22:30:53 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003240251.VAA19921@newcnri.cnri.reston.va.us>; from Andrew Kuchling on Thu, Mar 23, 2000 at 09:51:25PM -0500 References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> Message-ID: <20000323223053.J28880@thyrsus.com> Andrew Kuchling : > * Drop cursesmodule and package it separately. (Any other obsolete > modules that should go?) Annoyingly enough, I may need this to stay in, for use by the new Linux-kernel configuration system I'm writing. Why is it on the hit list? -- Eric S. Raymond Still, if you will not fight for the right when you can easily win without bloodshed, if you will not fight when your victory will be sure and not so costly, you may come to the moment when you will have to fight with all the odds against you and only a precarious chance for survival. There may be a worse case. You may have to fight when there is no chance of victory, because it is better to perish than to live as slaves. --Winston Churchill From dan at cgsoftware.com Fri Mar 24 04:52:54 2000 From: dan at cgsoftware.com (Daniel Berlin+list.python-dev) Date: 23 Mar 2000 22:52:54 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: "Eric S. Raymond"'s message of "Thu, 23 Mar 2000 22:30:53 -0500" References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> Message-ID: <4s9x6n3d.fsf@dan.resnet.rochester.edu> "Eric S. Raymond" writes: > Andrew Kuchling : > > * Drop cursesmodule and package it separately. (Any other obsolete > > modules that should go?) > > Annoyingly enough, I may need this to stay in, for use by the new Linux-kernel > configuration system I'm writing. Why is it on the hit list? IIRC, it's because nobody really maintains it, and those that care about it, use a different one (either ncurses module, or a newer cursesmodule). So from what i understand, you get complaints, but no real advantage to having it there. I'm just trying to summarize, not fall on either side (some people get touchy about issues like this). --Dan From esr at thyrsus.com Fri Mar 24 05:11:37 2000 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 23 Mar 2000 23:11:37 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: <4s9x6n3d.fsf@dan.resnet.rochester.edu>; from Daniel Berlin+list.python-dev on Thu, Mar 23, 2000 at 10:52:54PM -0500 References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> <4s9x6n3d.fsf@dan.resnet.rochester.edu> Message-ID: <20000323231137.U28880@thyrsus.com> Daniel Berlin+list.python-dev : > > Andrew Kuchling : > > > * Drop cursesmodule and package it separately. (Any other obsolete > > > modules that should go?) > > > > Annoyingly enough, I may need this to stay in, for use by the new Linux-kernel > > configuration system I'm writing. Why is it on the hit list? > > IIRC, it's because nobody really maintains it, and those that care > about it, use a different one (either ncurses module, or a newer cursesmodule). > So from what i understand, you get complaints, but no real advantage > to having it there. OK. Then what I guess I'd like is for a maintained equivalent of this to join the core -- the ncurses module you referred to, for choice. I'm not being random. I'm trying to replace the mess that currently constitutes the kbuild system -- but I'll need to support an equivalent of menuconfig. -- Eric S. Raymond "The state calls its own violence `law', but that of the individual `crime'" -- Max Stirner From akuchlin at mems-exchange.org Fri Mar 24 05:33:24 2000 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 23 Mar 2000 23:33:24 -0500 (EST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <20000323231137.U28880@thyrsus.com> References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> <4s9x6n3d.fsf@dan.resnet.rochester.edu> <20000323231137.U28880@thyrsus.com> Message-ID: <14554.61460.311650.599253@newcnri.cnri.reston.va.us> Eric S. Raymond writes: >OK. Then what I guess I'd like is for a maintained equivalent of this >to join the core -- the ncurses module you referred to, for choice. See the "Whither cursesmodule" thread in the python-dev archives: http://www.python.org/pipermail/python-dev/2000-February/003796.html One possibility was to blow off backward compatibility; are there any systems that only have BSD curses, not SysV curses / ncurses? Given that Pavel Curtis announced he was dropping BSD curses maintainance some years ago, I expect even the *BSDs use ncurses these days. However, Oliver Andrich doesn't seem interested in maintaining his ncurses module, and someone just started a SWIG-generated interface (http://pyncurses.sourceforge.net), so it's not obvious which one you'd use. (I *would* be willing to take over maintaining Andrich's code; maintaining the BSD curses version just seems pointless these days.) --amk From dan at cgsoftware.com Fri Mar 24 05:43:51 2000 From: dan at cgsoftware.com (Daniel Berlin+list.python-dev) Date: 23 Mar 2000 23:43:51 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: Andrew Kuchling's message of "Thu, 23 Mar 2000 23:33:24 -0500 (EST)" References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> <4s9x6n3d.fsf@dan.resnet.rochester.edu> <20000323231137.U28880@thyrsus.com> <14554.61460.311650.599253@newcnri.cnri.reston.va.us> Message-ID: Andrew Kuchling writes: > Eric S. Raymond writes: > >OK. Then what I guess I'd like is for a maintained equivalent of this > >to join the core -- the ncurses module you referred to, for choice. > > See the "Whither cursesmodule" thread in the python-dev archives: > http://www.python.org/pipermail/python-dev/2000-February/003796.html > > One possibility was to blow off backward compatibility; are there any > systems that only have BSD curses, not SysV curses / ncurses? Given > that Pavel Curtis announced he was dropping BSD curses maintainance > some years ago, I expect even the *BSDs use ncurses these days. Yes, they do. ls /usr/src/lib/libncurses/ Makefile ncurses_cfg.h pathnames.h termcap.c grep 5\.0 /usr/src/contrib/ncurses/* At least, this is FreeBSD. So there is no need for BSD curses anymore, on FreeBSD's account. > --amk > From esr at thyrsus.com Fri Mar 24 05:47:56 2000 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 23 Mar 2000 23:47:56 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: <14554.61460.311650.599253@newcnri.cnri.reston.va.us>; from Andrew Kuchling on Thu, Mar 23, 2000 at 11:33:24PM -0500 References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> <4s9x6n3d.fsf@dan.resnet.rochester.edu> <20000323231137.U28880@thyrsus.com> <14554.61460.311650.599253@newcnri.cnri.reston.va.us> Message-ID: <20000323234756.A29775@thyrsus.com> Andrew Kuchling : > Eric S. Raymond writes: > >OK. Then what I guess I'd like is for a maintained equivalent of this > >to join the core -- the ncurses module you referred to, for choice. > > See the "Whither cursesmodule" thread in the python-dev archives: > http://www.python.org/pipermail/python-dev/2000-February/003796.html > > One possibility was to blow off backward compatibility; are there any > systems that only have BSD curses, not SysV curses / ncurses? Given > that Pavel Curtis announced he was dropping BSD curses maintainance > some years ago, I expect even the *BSDs use ncurses these days. BSD curses was officially declared dead by its maintainer, Keith Bostic, in early 1995. Keith and I conspired to kill it of in favor of ncurses :-). -- Eric S. Raymond If gun laws in fact worked, the sponsors of this type of legislation should have no difficulty drawing upon long lists of examples of criminal acts reduced by such legislation. That they cannot do so after a century and a half of trying -- that they must sweep under the rug the southern attempts at gun control in the 1870-1910 period, the northeastern attempts in the 1920-1939 period, the attempts at both Federal and State levels in 1965-1976 -- establishes the repeated, complete and inevitable failure of gun laws to control serious crime. -- Senator Orrin Hatch, in a 1982 Senate Report From andy at reportlab.com Fri Mar 24 11:14:44 2000 From: andy at reportlab.com (Andy Robinson) Date: Fri, 24 Mar 2000 10:14:44 GMT Subject: [Python-Dev] Unicode character names In-Reply-To: <20000324024913.B8C3A1CF22@dinsdale.python.org> References: <20000324024913.B8C3A1CF22@dinsdale.python.org> Message-ID: <38db3fc6.7370137@post.demon.co.uk> On Thu, 23 Mar 2000 21:49:13 -0500 (EST), you wrote: >Sorry to disappoint you guys, but the Unicode name and comments >are *not* included in the unicodedatabase.c file Christian >is currently working on. The reason is simple: it would add >huge amounts of string data to the file. So this is a no-no >for the core distribution... You're right about what is compiled into the core. I have to keep reminding myself to distinguish three places functionality can live: 1. What is compiled into the Python core 2. What is in the standard Python library relating to encodings. 3. Completely separate add-on packages, maintained outside of Python, to provide extra functionality for (e.g.) Asian encodings. It is clear that both the Unicode database, and the mapping tables and other files at unicode.org, are a great resource; but they could be placed in (2) or (3) easily, along with scripts to unpack them. It probably makes sense for the i18n-sig to kick off a separate 'CodecKit' project for now, and we can see what good emerges from it before thinking about what should go into the library. >Still, the above is easily possible by inventing a new >encoding, say unicode-with-smileys, which then reads in >a file containing the Unicode names and applies the necessary >magic to decode/encode data as Paul described above. >Would probably make a cool fun-project for someone who wants >to dive into writing codecs. Yup. Prime candidate for CodecKit. - Andy From mal at lemburg.com Fri Mar 24 09:52:36 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 09:52:36 +0100 Subject: [Python-Dev] Re: Unicode character names References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> Message-ID: <38DB2CD4.CAD9F0E2@lemburg.com> Bill Tutt wrote: > > MAL wrote: > > >Andrew M. Kuchling" wrote: > >> > >> Paul Prescod writes: > >>>The new \N escape interpolates named characters within strings. For > >>>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a > >>>unicode smiley face at the end. > >> > >> Cute idea, and it certainly means you can avoid looking up Unicode > >> numbers. (You can look up names instead. :) ) Note that this means the > >> Unicode database is no longer optional if this is done; it has to be > >> around at code-parsing time. Python could import it automatically, as > >> exceptions.py is imported. Christian's work on compressing > >> unicodedatabase.c is therefore really important. (Is Perl5.6 actually > >> dragging around the Unicode database in the binary, or is it read out > >> of some external file or data structure?) > > > > Sorry to disappoint you guys, but the Unicode name and comments > > are *not* included in the unicodedatabase.c file Christian > > is currently working on. The reason is simple: it would add > > huge amounts of string data to the file. So this is a no-no > > for the core distribution... > > > > Ok, now you're just being silly. Its possible to put the character names in > a separate structure so that they don't automatically get paged in with the > normal unicode character property data. If you never use it, it won't get > paged in, its that simple.... Sure, but it would still cause the interpreter binary or DLL to increase in size considerably... that caused some major noise a few days ago due to the fact that the unicodedata module adds some 600kB to the interpreter -- even though it would only get swapped in when needed (the interpreter itself doesn't use it). > Looking up the Unicode code value from the Unicode character name smells > like a good time to use gperf to generate a perfect hash function for the > character names. Esp. for the Unicode 3.0 character namespace. Then you can > just store the hashkey -> Unicode character mapping, and hardly ever need to > page in the actual full character name string itself. Great idea, but why not put this into separate codec module ? > I haven't looked at what the comment field contains, so I have no idea how > useful that info is. Probably not worth looking at... > *waits while gperf crunches through the ~10,550 Unicode characters where > this would be useful* -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Fri Mar 24 11:37:53 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 11:37:53 +0100 Subject: [Python-Dev] Unicode and Windows References: <20000322134650.ED1BD370CF2@snelboot.oratrix.nl> <38D8F55E.6E324281@lemburg.com> Message-ID: <38DB4581.EB5315E0@lemburg.com> Ok, I've just added two new parser markers to PyArg_ParseTuple() which will hopefully make life a little easier for extension writers. The new code will be in the next patch set which I will release early next week. Here are the docs: Internal Argument Parsing: -------------------------- These markers are used by the PyArg_ParseTuple() APIs: "U": Check for Unicode object and return a pointer to it "s": For Unicode objects: auto convert them to the and return a pointer to the object's buffer. "s#": Access to the Unicode object via the bf_getreadbuf buffer interface (see Buffer Interface); note that the length relates to the buffer length, not the Unicode string length (this may be different depending on the Internal Format). "t#": Access to the Unicode object via the bf_getcharbuf buffer interface (see Buffer Interface); note that the length relates to the buffer length, not necessarily to the Unicode string length (this may be different depending on the ). "es": Takes two parameters: encoding (const char **) and buffer (char **). The input object is first coerced to Unicode in the usual way and then encoded into a string using the given encoding. On output, a buffer of the needed size is allocated and returned through *buffer as NULL-terminated string. The encoded may not contain embedded NULL characters. The caller is responsible for free()ing the allocated *buffer after usage. "es#": Takes three parameters: encoding (const char **), buffer (char **) and buffer_len (int *). The input object is first coerced to Unicode in the usual way and then encoded into a string using the given encoding. If *buffer is non-NULL, *buffer_len must be set to sizeof(buffer) on input. Output is then copied to *buffer. If *buffer is NULL, a buffer of the needed size is allocated and output copied into it. *buffer is then updated to point to the allocated memory area. The caller is responsible for free()ing *buffer after usage. In both cases *buffer_len is updated to the number of characters written (excluding the trailing NULL-byte). The output buffer is assured to be NULL-terminated. Examples: Using "es#" with auto-allocation: static PyObject * test_parser(PyObject *self, PyObject *args) { PyObject *str; const char *encoding = "latin-1"; char *buffer = NULL; int buffer_len = 0; if (!PyArg_ParseTuple(args, "es#:test_parser", &encoding, &buffer, &buffer_len)) return NULL; if (!buffer) { PyErr_SetString(PyExc_SystemError, "buffer is NULL"); return NULL; } str = PyString_FromStringAndSize(buffer, buffer_len); free(buffer); return str; } Using "es" with auto-allocation returning a NULL-terminated string: static PyObject * test_parser(PyObject *self, PyObject *args) { PyObject *str; const char *encoding = "latin-1"; char *buffer = NULL; if (!PyArg_ParseTuple(args, "es:test_parser", &encoding, &buffer)) return NULL; if (!buffer) { PyErr_SetString(PyExc_SystemError, "buffer is NULL"); return NULL; } str = PyString_FromString(buffer); free(buffer); return str; } Using "es#" with a pre-allocated buffer: static PyObject * test_parser(PyObject *self, PyObject *args) { PyObject *str; const char *encoding = "latin-1"; char _buffer[10]; char *buffer = _buffer; int buffer_len = sizeof(_buffer); if (!PyArg_ParseTuple(args, "es#:test_parser", &encoding, &buffer, &buffer_len)) return NULL; if (!buffer) { PyErr_SetString(PyExc_SystemError, "buffer is NULL"); return NULL; } str = PyString_FromStringAndSize(buffer, buffer_len); return str; } -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein at lyra.org Fri Mar 24 11:54:02 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 02:54:02 -0800 (PST) Subject: [Python-Dev] Unicode and Windows In-Reply-To: <38DB4581.EB5315E0@lemburg.com> Message-ID: On Fri, 24 Mar 2000, M.-A. Lemburg wrote: >... > "s": For Unicode objects: auto convert them to the > and return a pointer to the object's buffer. Guess that I didn't notice this before, but it seems wierd that "s" and "s#" return different encodings. Why? > "es": > Takes two parameters: encoding (const char **) and > buffer (char **). >... > "es#": > Takes three parameters: encoding (const char **), > buffer (char **) and buffer_len (int *). I see no reason to make the encoding (const char **) rather than (const char *). We are never returning a value, so this just makes it harder to pass the encoding into ParseTuple. There is precedent for passing in single-ref pointers. For example: PyArg_ParseTuple(args, "O!", &s, PyString_Type) I would recommend using just one pointer level for the encoding. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Fri Mar 24 12:29:12 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 12:29:12 +0100 Subject: [Python-Dev] Unicode and Windows References: Message-ID: <38DB5188.AA580652@lemburg.com> Greg Stein wrote: > > On Fri, 24 Mar 2000, M.-A. Lemburg wrote: > >... > > "s": For Unicode objects: auto convert them to the > > and return a pointer to the object's buffer. > > Guess that I didn't notice this before, but it seems wierd that "s" and > "s#" return different encodings. > > Why? This is due to the buffer interface being used for "s#". Since "s#" refers to the getreadbuf slot, it returns raw data. In this case this is UTF-16 in platform dependent byte order. "s" relies on NULL-terminated strings and doesn't use the buffer interface at all. Thus "s" returns NULL-terminated UTF-8 (UTF-16 is full of NULLs). "t#" uses the getcharbuf slot and thus should return character data. UTF-8 is the right encoding here. > > "es": > > Takes two parameters: encoding (const char **) and > > buffer (char **). > >... > > "es#": > > Takes three parameters: encoding (const char **), > > buffer (char **) and buffer_len (int *). > > I see no reason to make the encoding (const char **) rather than > (const char *). We are never returning a value, so this just makes it > harder to pass the encoding into ParseTuple. > > There is precedent for passing in single-ref pointers. For example: > > PyArg_ParseTuple(args, "O!", &s, PyString_Type) > > I would recommend using just one pointer level for the encoding. You have a point there... even though it breaks the notion of prepending all parameters with an '&' (ok, except the type check one). OTOH, it would allow passing the encoding right with the PyArg_ParseTuple() call which probably makes more sense in this context. I'll change it... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tismer at tismer.com Fri Mar 24 14:13:02 2000 From: tismer at tismer.com (Christian Tismer) Date: Fri, 24 Mar 2000 14:13:02 +0100 Subject: [Python-Dev] Unicode character names References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us> <38DA8797.F16301E4@lemburg.com> Message-ID: <38DB69DE.6D04B084@tismer.com> "M.-A. Lemburg" wrote: > > "Andrew M. Kuchling" wrote: > > > > Paul Prescod writes: > > >The new \N escape interpolates named characters within strings. For > > >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a > > >unicode smiley face at the end. > > > > Cute idea, and it certainly means you can avoid looking up Unicode > > numbers. (You can look up names instead. :) ) Note that this means the > > Unicode database is no longer optional if this is done; it has to be > > around at code-parsing time. Python could import it automatically, as > > exceptions.py is imported. Christian's work on compressing > > unicodedatabase.c is therefore really important. (Is Perl5.6 actually > > dragging around the Unicode database in the binary, or is it read out > > of some external file or data structure?) > > Sorry to disappoint you guys, but the Unicode name and comments > are *not* included in the unicodedatabase.c file Christian > is currently working on. The reason is simple: it would add > huge amounts of string data to the file. So this is a no-no > for the core distribution... This is not settled, still an open question. What I have for non-textual data: 25 kb with dumb compression 15 kb with enhanced compression What amounts of data am I talking about? - The whole unicode database text file has size 632 kb. - With PkZip this goes down to 96 kb. Now, I produced another text file with just the currently used data in it, and this sounds so: - the stripped unicode text file has size 216 kb. - PkZip melts this down to 40 kb. Please compare that to my results above: I can do at least twice as good. I hope I can compete for the text sections as well (since this is something where zip is *good* at), but just let me try. Let's target 60 kb for the whole crap, and I'd be very pleased. Then, there is still the question where to put the data. Having one file in the dll and another externally would be an option. I could also imagine to use a binary external file all the time, with maximum possible compression. By loading this structure, this would be partially expanded to make it fast. An advantage is that the compressed Unicode database could become a stand-alone product. The size is in fact so crazy small, that I'd like to make this available to any other language. > Still, the above is easily possible by inventing a new > encoding, say unicode-with-smileys, which then reads in > a file containing the Unicode names and applies the necessary > magic to decode/encode data as Paul described above. That sounds reasonable. Compression makes sense as well here, since the expanded stuff makes quite an amount of kb, compared to what it is "worth", compared to, say, the Python dll. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From mal at lemburg.com Fri Mar 24 14:41:27 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 14:41:27 +0100 Subject: [Python-Dev] Unicode character names References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us> <38DA8797.F16301E4@lemburg.com> <38DB69DE.6D04B084@tismer.com> Message-ID: <38DB7087.1B105AC7@lemburg.com> Christian Tismer wrote: > > "M.-A. Lemburg" wrote: > > > > "Andrew M. Kuchling" wrote: > > > > > > Paul Prescod writes: > > > >The new \N escape interpolates named characters within strings. For > > > >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a > > > >unicode smiley face at the end. > > > > > > Cute idea, and it certainly means you can avoid looking up Unicode > > > numbers. (You can look up names instead. :) ) Note that this means the > > > Unicode database is no longer optional if this is done; it has to be > > > around at code-parsing time. Python could import it automatically, as > > > exceptions.py is imported. Christian's work on compressing > > > unicodedatabase.c is therefore really important. (Is Perl5.6 actually > > > dragging around the Unicode database in the binary, or is it read out > > > of some external file or data structure?) > > > > Sorry to disappoint you guys, but the Unicode name and comments > > are *not* included in the unicodedatabase.c file Christian > > is currently working on. The reason is simple: it would add > > huge amounts of string data to the file. So this is a no-no > > for the core distribution... > > This is not settled, still an open question. Well, ok, depends on how much you can sqeeze out of the text columns ;-) I still think that its better to leave these gimmicks out of the core and put them into some add-on, though. > What I have for non-textual data: > 25 kb with dumb compression > 15 kb with enhanced compression Looks good :-) With these sizes I think we could even integrate the unicodedatabase.c + API into the core interpreter and only have the unicodedata module to access the database from within Python. > What amounts of data am I talking about? > - The whole unicode database text file has size > 632 kb. > - With PkZip this goes down to > 96 kb. > > Now, I produced another text file with just the currently > used data in it, and this sounds so: > - the stripped unicode text file has size > 216 kb. > - PkZip melts this down to > 40 kb. > > Please compare that to my results above: I can do at least > twice as good. I hope I can compete for the text sections > as well (since this is something where zip is *good* at), > but just let me try. > Let's target 60 kb for the whole crap, and I'd be very pleased. > > Then, there is still the question where to put the data. > Having one file in the dll and another externally would > be an option. I could also imagine to use a binary external > file all the time, with maximum possible compression. > By loading this structure, this would be partially expanded > to make it fast. > An advantage is that the compressed Unicode database > could become a stand-alone product. The size is in fact > so crazy small, that I'd like to make this available > to any other language. You could take the unicodedatabase.c file (+ header file) and use it everywhere... I don't think it needs to contain any Python specific code. The API names would have to follow the Python naming schemes though. > > Still, the above is easily possible by inventing a new > > encoding, say unicode-with-smileys, which then reads in > > a file containing the Unicode names and applies the necessary > > magic to decode/encode data as Paul described above. > > That sounds reasonable. Compression makes sense as well here, > since the expanded stuff makes quite an amount of kb, compared > to what it is "worth", compared to, say, the Python dll. With 25kB for the non-text columns, I'd suggest simply adding the file to the core. Text columns could then go into a separate module. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Fri Mar 24 15:14:51 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 09:14:51 -0500 Subject: [Python-Dev] Hi -- I'm back! Message-ID: <200003241414.JAA11740@eric.cnri.reston.va.us> I'm back from ten days on the road. I'll try to dig through the various mailing list archives over the next few days, but it would be more efficient if you are waiting for me to take action or express an opinion on a particular issue (in *any* Python-related mailing list) to mail me a summary or at least a pointer. --Guido van Rossum (home page: http://www.python.org/~guido/) From jack at oratrix.nl Fri Mar 24 16:01:25 2000 From: jack at oratrix.nl (Jack Jansen) Date: Fri, 24 Mar 2000 16:01:25 +0100 Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: Message by Ka-Ping Yee , Thu, 23 Mar 2000 09:47:47 -0800 (PST) , Message-ID: <20000324150125.7144A370CF2@snelboot.oratrix.nl> > Hmm... i guess this also means one should ask what > > def function(None, arg): > ... > > does outside a class definition. I suppose that should simply > be illegal. No, it forces you to call the function with keyword arguments! (initially meant jokingly, but thinking about it for a couple of seconds there might actually be cases where this is useful) -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From skip at mojam.com Fri Mar 24 16:14:11 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 24 Mar 2000 09:14:11 -0600 (CST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003240251.VAA19921@newcnri.cnri.reston.va.us> References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> Message-ID: <14555.34371.749039.946891@beluga.mojam.com> AMK> I've written up a list of things that need to get done before 1.6 AMK> is finished. This is my vision of what needs to be done, and AMK> doesn't have an official stamp of approval from GvR or anyone else. AMK> So it's very probably wrong. Might I suggest moving robotparser.py from Tools/webchecker to Lib? Modules of general usefulness (this is at least generally useful for anyone writing web spiders ;-) shouldn't live in Tools, because it's not always available and users need to do extra work to make them available. I'd be happy to write up some documentation for it and twiddle the module to include doc strings. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From fdrake at acm.org Fri Mar 24 16:20:03 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 10:20:03 -0500 (EST) Subject: [Python-Dev] Unicode and Windows In-Reply-To: References: <38DB4581.EB5315E0@lemburg.com> Message-ID: <14555.34723.841426.504538@weyr.cnri.reston.va.us> Greg Stein writes: > There is precedent for passing in single-ref pointers. For example: > > PyArg_ParseTuple(args, "O!", &s, PyString_Type) ^^^^^^^^^^^^^^^^^ Feeling ok? I *suspect* these are reversed. :) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake at acm.org Fri Mar 24 16:24:13 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 10:24:13 -0500 (EST) Subject: [Python-Dev] Unicode and Windows In-Reply-To: <38DB5188.AA580652@lemburg.com> References: <38DB5188.AA580652@lemburg.com> Message-ID: <14555.34973.303273.716146@weyr.cnri.reston.va.us> M.-A. Lemburg writes: > You have a point there... even though it breaks the notion > of prepending all parameters with an '&' (ok, except the I've never heard of this notion; I hope I didn't just miss it in the docs! The O& also doesn't require a & in front of the name of the conversion function, you just pass the right value. So there are at least two cases where you *typically* don't use a &. (Other cases in the 1.5.2 API are probably just plain weird if they don't!) Changing it to avoid the extra machinery is the Right Thing; you get to feel good today. ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mal at lemburg.com Fri Mar 24 17:38:06 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 17:38:06 +0100 Subject: [Python-Dev] Unicode and Windows References: <38DB5188.AA580652@lemburg.com> <14555.34973.303273.716146@weyr.cnri.reston.va.us> Message-ID: <38DB99EE.F5949889@lemburg.com> "Fred L. Drake, Jr." wrote: > > M.-A. Lemburg writes: > > You have a point there... even though it breaks the notion > > of prepending all parameters with an '&' (ok, except the > > I've never heard of this notion; I hope I didn't just miss it in the > docs! If you scan the parameters list in getargs.c you'll come to this conclusion and thus my notion: I've been programming like this for years now :-) > The O& also doesn't require a & in front of the name of the > conversion function, you just pass the right value. So there are at > least two cases where you *typically* don't use a &. (Other cases in > the 1.5.2 API are probably just plain weird if they don't!) > Changing it to avoid the extra machinery is the Right Thing; you get > to feel good today. ;) Ok, feeling good now ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Fri Mar 24 21:44:02 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 15:44:02 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: Your message of "Fri, 24 Mar 2000 09:14:11 CST." <14555.34371.749039.946891@beluga.mojam.com> References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <14555.34371.749039.946891@beluga.mojam.com> Message-ID: <200003242044.PAA00677@eric.cnri.reston.va.us> > Might I suggest moving robotparser.py from Tools/webchecker to Lib? Modules > of general usefulness (this is at least generally useful for anyone writing > web spiders ;-) shouldn't live in Tools, because it's not always available > and users need to do extra work to make them available. > > I'd be happy to write up some documentation for it and twiddle the module to > include doc strings. Deal. Soon as we get the docs we'll move it to Lib. --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein at lyra.org Fri Mar 24 21:50:43 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 12:50:43 -0800 (PST) Subject: [Python-Dev] Unicode and Windows In-Reply-To: <14555.34723.841426.504538@weyr.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Fred L. Drake, Jr. wrote: > Greg Stein writes: > > There is precedent for passing in single-ref pointers. For example: > > > > PyArg_ParseTuple(args, "O!", &s, PyString_Type) > ^^^^^^^^^^^^^^^^^ > > Feeling ok? I *suspect* these are reversed. :) I just checked the code to ensure that it took a single pointer rather than a double-pointer. I guess that I didn't verify the order :-) Concept is valid, tho... the params do not necessarily require an ampersand. oop! Actually... this does require an ampersand: PyArg_ParseTuple(args, "O!", &PyString_Type, &s) Don't want to pass the whole structure... Well, regardless: I would much prefer to see the encoding passed as a constant string, rather than having to shove the sucker into a variable first, just so that I can insert a useless address-of operator. Cheers, -g -- Greg Stein, http://www.lyra.org/ From akuchlin at mems-exchange.org Fri Mar 24 21:51:56 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Fri, 24 Mar 2000 15:51:56 -0500 (EST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242044.PAA00677@eric.cnri.reston.va.us> References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <14555.34371.749039.946891@beluga.mojam.com> <200003242044.PAA00677@eric.cnri.reston.va.us> Message-ID: <14555.54636.811100.254309@amarok.cnri.reston.va.us> Guido van Rossum writes: >> Might I suggest moving robotparser.py from Tools/webchecker to Lib? Modules >Deal. Soon as we get the docs we'll move it to Lib. What about putting it in a package like 'www' or 'web'? Packagizing the existing library is hard because of backward compatibility, but there's no such constraint for new modules. -- A.M. Kuchling http://starship.python.net/crew/amk/ One need not be a chamber to be haunted; / One need not be a house; / The brain has corridors surpassing / Material place. -- Emily Dickinson, "Time and Eternity" From gstein at lyra.org Fri Mar 24 22:00:25 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 13:00:25 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <14555.54636.811100.254309@amarok.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Andrew M. Kuchling wrote: > Guido van Rossum writes: > >> Might I suggest moving robotparser.py from Tools/webchecker to Lib? Modules > >Deal. Soon as we get the docs we'll move it to Lib. > > What about putting it in a package like 'www' or 'web'? Packagizing > the existing library is hard because of backward compatibility, but > there's no such constraint for new modules. Or in the "network" package that was suggested a month ago? And why *can't* we start on repackaging old module? I think the only reason that somebody came up with to NOT do it was "well, if we don't repackage the whole thing, then we should repackage nothing." Which, IMO, is totally bogus. We'll never get anywhere operating under that principle. Cheers, -g -- Greg Stein, http://www.lyra.org/ From fdrake at acm.org Fri Mar 24 22:00:19 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 16:00:19 -0500 (EST) Subject: [Python-Dev] 1.6 job list In-Reply-To: References: <14555.54636.811100.254309@amarok.cnri.reston.va.us> Message-ID: <14555.55139.484135.602894@weyr.cnri.reston.va.us> Greg Stein writes: > Or in the "network" package that was suggested a month ago? +1 > And why *can't* we start on repackaging old module? I think the only > reason that somebody came up with to NOT do it was "well, if we don't > repackage the whole thing, then we should repackage nothing." Which, IMO, > is totally bogus. We'll never get anywhere operating under that principle. That doesn't bother me, but I tend to be a little conservative (though usually not as conservative as Guido on such matters). I *would* like to decided theat 1.7 will be fully packagized, and not wait until 2.0. As long as 1.7 is a "testing the evolutionary path" release, I think that's the right thing to do. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido at python.org Fri Mar 24 22:03:54 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 16:03:54 -0500 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead Message-ID: <200003242103.QAA03288@eric.cnri.reston.va.us> Someone noticed that socket.connect() and a few related functions (connect_ex() and bind()) take either a single (host, port) tuple or two separate arguments, but that only the tuple is documented. Similar to append(), I'd like to close this gap, and I've made the necessary changes. This will probably break lots of code. Similar to append(), I'd like people to fix their code rather than whine -- two-arg connect() has never been documented, although it's found in much code (even the socket module test code :-( ). Similar to append(), I may revert the change if it is shown to cause too much pain during beta testing... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Mar 24 22:05:57 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 16:05:57 -0500 Subject: [Python-Dev] Unicode and Windows In-Reply-To: Your message of "Fri, 24 Mar 2000 12:50:43 PST." References: Message-ID: <200003242105.QAA03543@eric.cnri.reston.va.us> > Well, regardless: I would much prefer to see the encoding passed as a > constant string, rather than having to shove the sucker into a variable > first, just so that I can insert a useless address-of operator. Of course. Use & for output args, not as a matter of principle. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Mar 24 22:11:25 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 16:11:25 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: Your message of "Fri, 24 Mar 2000 13:00:25 PST." References: Message-ID: <200003242111.QAA04208@eric.cnri.reston.va.us> [Greg] > And why *can't* we start on repackaging old module? I think the only > reason that somebody came up with to NOT do it was "well, if we don't > repackage the whole thing, then we should repackage nothing." Which, IMO, > is totally bogus. We'll never get anywhere operating under that principle. The reason is backwards compatibility. Assume we create a package "web" and move all web related modules into it: httplib, urllib, htmllib, etc. Now for backwards compatibility, we add the web directory to sys.path, so one can write either "import web.urllib" or "import urllib". But that loads the same code twice! And in this (carefully chosen :-) example, urllib actually has some state which shouldn't be replicated. Plus, it's too much work -- I'd rather focus on getting 1.6 out of the door, and there's a lot of other stuff I need to do besides moving modules around. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Mar 24 22:15:00 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 16:15:00 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: Your message of "Fri, 24 Mar 2000 16:00:19 EST." <14555.55139.484135.602894@weyr.cnri.reston.va.us> References: <14555.54636.811100.254309@amarok.cnri.reston.va.us> <14555.55139.484135.602894@weyr.cnri.reston.va.us> Message-ID: <200003242115.QAA04648@eric.cnri.reston.va.us> > Greg Stein writes: > > Or in the "network" package that was suggested a month ago? [Fred] > +1 Which reminds me of another reason to wait: coming up with the right package hierarchy is hard. (E.g. I find network too long; plus, does htmllib belong there?) > That doesn't bother me, but I tend to be a little conservative > (though usually not as conservative as Guido on such matters). I > *would* like to decided theat 1.7 will be fully packagized, and not > wait until 2.0. As long as 1.7 is a "testing the evolutionary path" > release, I think that's the right thing to do. Agreed. At the SD conference I gave a talk about the future of Python, and there was (again) a good suggestion about forwards compatibility. Starting with 1.7 (if not sooner), several Python 3000 features that necessarily have to be incompatible (like 1/2 yielding 0.5 instead of 0) could issue warnings when (or unless?) Python is invoked with a compatibility flag. --Guido van Rossum (home page: http://www.python.org/~guido/) From bwarsaw at cnri.reston.va.us Fri Mar 24 22:21:54 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 24 Mar 2000 16:21:54 -0500 (EST) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead References: <200003242103.QAA03288@eric.cnri.reston.va.us> Message-ID: <14555.56434.974884.832078@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> Someone noticed that socket.connect() and a few related GvR> functions (connect_ex() and bind()) take either a single GvR> (host, port) tuple or two separate arguments, but that only GvR> the tuple is documented. GvR> Similar to append(), I'd like to close this gap, and I've GvR> made the necessary changes. This will probably break lots of GvR> code. I don't agree that socket.connect() and friends need this fix. Yes, obviously append() needed fixing because of the application of Tim's Twelfth Enlightenment to the semantic ambiguity. But socket.connect() has no such ambiguity; you may spell it differently, but you know exactly what you mean. My suggestion would be to not break any code, but extend connect's interface to allow an optional second argument. Thus all of these calls would be legal: sock.connect(addr) sock.connect(addr, port) sock.connect((addr, port)) One nit on the documentation of the socket module. The second entry says: bind (address) Bind the socket to address. The socket must not already be bound. (The format of address depends on the address family -- see above.) Huh? What "above" part should I see? Note that I'm reading this doc off the web! -Barry From gstein at lyra.org Fri Mar 24 22:27:57 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 13:27:57 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242111.QAA04208@eric.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Guido van Rossum wrote: > [Greg] > > And why *can't* we start on repackaging old module? I think the only > > reason that somebody came up with to NOT do it was "well, if we don't > > repackage the whole thing, then we should repackage nothing." Which, IMO, > > is totally bogus. We'll never get anywhere operating under that principle. > > The reason is backwards compatibility. Assume we create a package > "web" and move all web related modules into it: httplib, urllib, > htmllib, etc. Now for backwards compatibility, we add the web > directory to sys.path, so one can write either "import web.urllib" or > "import urllib". But that loads the same code twice! And in this > (carefully chosen :-) example, urllib actually has some state which > shouldn't be replicated. We don't add it to the path. Instead, we create new modules that look like: ---- httplib.py ---- from web.httplib import * ---- The only backwards-compat issue with this approach is that people who poke values into the module will have problems. I don't believe that any of the modules were designed for that, anyhow, so it would seem an acceptable to (effectively) disallow that behavior. > Plus, it's too much work -- I'd rather focus on getting 1.6 out of the > door, and there's a lot of other stuff I need to do besides moving > modules around. Stuff that *you* need to do, sure. But there *are* a lot of us who can help here, and some who desire to spend their time moving modules. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Mar 24 22:32:14 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 13:32:14 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242115.QAA04648@eric.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Guido van Rossum wrote: > > Greg Stein writes: > > > Or in the "network" package that was suggested a month ago? > > [Fred] > > +1 > > Which reminds me of another reason to wait: coming up with the right > package hierarchy is hard. (E.g. I find network too long; plus, does > htmllib belong there?) htmllib does not go there. Where does it go? Dunno. Leave it unless/until somebody comes up with a place for it. We package up obvious ones. We don't have to design a complete hierarchy. There seemed to be a general "good feeling" around some kind of network (protocol) package. Call it "net" if "network" is too long. Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido at python.org Fri Mar 24 22:27:51 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 16:27:51 -0500 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: Your message of "Fri, 24 Mar 2000 16:21:54 EST." <14555.56434.974884.832078@anthem.cnri.reston.va.us> References: <200003242103.QAA03288@eric.cnri.reston.va.us> <14555.56434.974884.832078@anthem.cnri.reston.va.us> Message-ID: <200003242127.QAA06269@eric.cnri.reston.va.us> > >>>>> "GvR" == Guido van Rossum writes: > > GvR> Someone noticed that socket.connect() and a few related > GvR> functions (connect_ex() and bind()) take either a single > GvR> (host, port) tuple or two separate arguments, but that only > GvR> the tuple is documented. > > GvR> Similar to append(), I'd like to close this gap, and I've > GvR> made the necessary changes. This will probably break lots of > GvR> code. > > I don't agree that socket.connect() and friends need this fix. Yes, > obviously append() needed fixing because of the application of Tim's > Twelfth Enlightenment to the semantic ambiguity. But socket.connect() > has no such ambiguity; you may spell it differently, but you know > exactly what you mean. > > My suggestion would be to not break any code, but extend connect's > interface to allow an optional second argument. Thus all of these > calls would be legal: > > sock.connect(addr) > sock.connect(addr, port) > sock.connect((addr, port)) You probably meant: sock.connect(addr) sock.connect(host, port) sock.connect((host, port)) since (host, port) is equivalent to (addr). > One nit on the documentation of the socket module. The second entry > says: > > bind (address) > Bind the socket to address. The socket must not already be > bound. (The format of address depends on the address family -- > see above.) > > Huh? What "above" part should I see? Note that I'm reading this doc > off the web! Fred typically directs latex2html to break all sections apart. It's in the previous section: Socket addresses are represented as a single string for the AF_UNIX address family and as a pair (host, port) for the AF_INET address family, where host is a string representing either a hostname in Internet domain notation like 'daring.cwi.nl' or an IP address like '100.50.200.5', and port is an integral port number. Other address families are currently not supported. The address format required by a particular socket object is automatically selected based on the address family specified when the socket object was created. This also explains the reason for requiring a single argument: when using AF_UNIX, the second argument makes no sense! Frankly, I'm not sure what do here -- it's more correct to require a single address argument always, but it's more convenient to allow two sometimes. Note that sendto(data, addr) only accepts the tuple form: you cannot write sendto(data, host, port). --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Fri Mar 24 22:28:32 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 16:28:32 -0500 (EST) Subject: [Python-Dev] 1.6 job list In-Reply-To: References: <200003242111.QAA04208@eric.cnri.reston.va.us> Message-ID: <14555.56832.336242.378838@weyr.cnri.reston.va.us> Greg Stein writes: > Stuff that *you* need to do, sure. But there *are* a lot of us who can > help here, and some who desire to spend their time moving modules. Would it make sense for one of these people with time on their hands to propose a specific mapping from old->new names? I think that would be a good first step, regardless of the implementation timing. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido at python.org Fri Mar 24 22:29:44 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 16:29:44 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: Your message of "Fri, 24 Mar 2000 13:27:57 PST." References: Message-ID: <200003242129.QAA06510@eric.cnri.reston.va.us> > We don't add it to the path. Instead, we create new modules that look > like: > > ---- httplib.py ---- > from web.httplib import * > ---- > > The only backwards-compat issue with this approach is that people who poke > values into the module will have problems. I don't believe that any of the > modules were designed for that, anyhow, so it would seem an acceptable to > (effectively) disallow that behavior. OK, that's reasonable. I'll have to invent a different reason why I don't want this -- because I really don't! > > Plus, it's too much work -- I'd rather focus on getting 1.6 out of the > > door, and there's a lot of other stuff I need to do besides moving > > modules around. > > Stuff that *you* need to do, sure. But there *are* a lot of us who can > help here, and some who desire to spend their time moving modules. Hm. Moving modules requires painful and arcane CVS manipulations that can only be done by the few of us here at CNRI -- and I'm the only one left who's full time on Python. I'm still not convinced that it's a good plan. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Fri Mar 24 22:32:39 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 16:32:39 -0500 (EST) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: <14555.56434.974884.832078@anthem.cnri.reston.va.us> References: <200003242103.QAA03288@eric.cnri.reston.va.us> <14555.56434.974884.832078@anthem.cnri.reston.va.us> Message-ID: <14555.57079.187670.916002@weyr.cnri.reston.va.us> Barry A. Warsaw writes: > I don't agree that socket.connect() and friends need this fix. Yes, > obviously append() needed fixing because of the application of Tim's > Twelfth Enlightenment to the semantic ambiguity. But socket.connect() > has no such ambiguity; you may spell it differently, but you know > exactly what you mean. Crock. The address representations have been fairly well defined for quite a while. Be explicit. > sock.connect(addr) This is the only legal signature. (host, port) is simply the form of addr for a particular address family. > One nit on the documentation of the socket module. The second entry > says: > > bind (address) > Bind the socket to address. The socket must not already be > bound. (The format of address depends on the address family -- > see above.) > > Huh? What "above" part should I see? Note that I'm reading this doc > off the web! Definately written for the paper document! Remind me about this again in a month and I'll fix it, but I don't want to play games with this little stuff until the 1.5.2p2 and 1.6 trees have been merged. Harrumph. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gstein at lyra.org Fri Mar 24 22:37:41 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 13:37:41 -0800 (PST) Subject: [Python-Dev] delegating (was: 1.6 job list) In-Reply-To: Message-ID: On Fri, 24 Mar 2000, Greg Stein wrote: >... > > Plus, it's too much work -- I'd rather focus on getting 1.6 out of the > > door, and there's a lot of other stuff I need to do besides moving > > modules around. > > Stuff that *you* need to do, sure. But there *are* a lot of us who can > help here, and some who desire to spend their time moving modules. I just want to empahisize this point some more. Python 1.6 has a defined timeline, with a defined set of minimal requirements. However! I don't believe that a corollary of that says we MUST ignore everything else. If those other options fit within the required timeline, then why not? (assuming we have adequate testing and doc to go with the changes) There are ample people who have time and inclination to contribute. If those contributions add positive benefit, then I see no reason to exclude them (other than on pure merit, of course). Note that some of the problems stem from CVS access. Much Guido-time could be saved by a commit-then-review model, rather than review-then-Guido- commits model. Fred does this very well with the Doc/ area. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Mar 24 22:38:48 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 13:38:48 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Guido van Rossum wrote: >... > > We don't add it to the path. Instead, we create new modules that look > > like: > > > > ---- httplib.py ---- > > from web.httplib import * > > ---- > > > > The only backwards-compat issue with this approach is that people who poke > > values into the module will have problems. I don't believe that any of the > > modules were designed for that, anyhow, so it would seem an acceptable to > > (effectively) disallow that behavior. > > OK, that's reasonable. I'll have to invent a different reason why I > don't want this -- because I really don't! Fair enough. > > > Plus, it's too much work -- I'd rather focus on getting 1.6 out of the > > > door, and there's a lot of other stuff I need to do besides moving > > > modules around. > > > > Stuff that *you* need to do, sure. But there *are* a lot of us who can > > help here, and some who desire to spend their time moving modules. > > Hm. Moving modules requires painful and arcane CVS manipulations that > can only be done by the few of us here at CNRI -- and I'm the only one > left who's full time on Python. I'm still not convinced that it's a > good plan. There are a number of ways to do this, and I'm familiar with all of them. It is a continuing point of strife in the Apache CVS repositories :-) But... it is premised on accepting the desire to move them, of course. Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido at python.org Fri Mar 24 22:38:51 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 16:38:51 -0500 Subject: [Python-Dev] delegating (was: 1.6 job list) In-Reply-To: Your message of "Fri, 24 Mar 2000 13:37:41 PST." References: Message-ID: <200003242138.QAA07621@eric.cnri.reston.va.us> > Note that some of the problems stem from CVS access. Much Guido-time could > be saved by a commit-then-review model, rather than review-then-Guido- > commits model. Fred does this very well with the Doc/ area. Actually, I'm experimenting with this already: Unicode, list.append() and socket.connect() are done in this way! For renames it is really painful though, even if someone else at CNRI can do it. I'd like to see a draft package hierarchy please? Also, if you have some time, please review the bugs in the bugs list. Patches submitted with a corresponding PR# will be treated with priority! --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Fri Mar 24 22:40:48 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 22:40:48 +0100 Subject: [Python-Dev] Unicode Patch Set 2000-03-24 Message-ID: <38DBE0E0.76A298FE@lemburg.com> Attached you find the latest update of the Unicode implementation. The patch is against the current CVS version. It includes the fix I posted yesterday for the core dump problem in codecs.c (was introduced by my previous patch set -- sorry), adds more tests for the codecs and two new parser markers "es" and "es#". -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ -------------- next part -------------- Only in CVS-Python/Doc/tools: anno-api.py diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/codecs.py Python+Unicode/Lib/codecs.py --- CVS-Python/Lib/codecs.py Thu Mar 23 23:58:41 2000 +++ Python+Unicode/Lib/codecs.py Fri Mar 17 23:51:01 2000 @@ -46,7 +46,7 @@ handling schemes by providing the errors argument. These string values are defined: - 'strict' - raise an error (or a subclass) + 'strict' - raise a ValueError error (or a subclass) 'ignore' - ignore the character and continue with the next 'replace' - replace with a suitable replacement character; Python will use the official U+FFFD REPLACEMENT diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/output/test_unicode Python+Unicode/Lib/test/output/test_unicode --- CVS-Python/Lib/test/output/test_unicode Fri Mar 24 22:21:26 2000 +++ Python+Unicode/Lib/test/output/test_unicode Sat Mar 11 00:23:21 2000 @@ -1,5 +1,4 @@ test_unicode Testing Unicode comparisons... done. -Testing Unicode contains method... done. Testing Unicode formatting strings... done. Testing unicodedata module... done. diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_unicode.py Python+Unicode/Lib/test/test_unicode.py --- CVS-Python/Lib/test/test_unicode.py Thu Mar 23 23:58:47 2000 +++ Python+Unicode/Lib/test/test_unicode.py Fri Mar 24 00:29:43 2000 @@ -293,3 +293,33 @@ assert unicodedata.combining(u'\u20e1') == 230 print 'done.' + +# Test builtin codecs +print 'Testing builtin codecs...', + +assert unicode('hello','ascii') == u'hello' +assert unicode('hello','utf-8') == u'hello' +assert unicode('hello','utf8') == u'hello' +assert unicode('hello','latin-1') == u'hello' + +assert u'hello'.encode('ascii') == 'hello' +assert u'hello'.encode('utf-8') == 'hello' +assert u'hello'.encode('utf8') == 'hello' +assert u'hello'.encode('utf-16-le') == 'h\000e\000l\000l\000o\000' +assert u'hello'.encode('utf-16-be') == '\000h\000e\000l\000l\000o' +assert u'hello'.encode('latin-1') == 'hello' + +u = u''.join(map(unichr, range(1024))) +for encoding in ('utf-8', 'utf-16', 'utf-16-le', 'utf-16-be', + 'raw_unicode_escape', 'unicode_escape', 'unicode_internal'): + assert unicode(u.encode(encoding),encoding) == u + +u = u''.join(map(unichr, range(256))) +for encoding in ('latin-1',): + assert unicode(u.encode(encoding),encoding) == u + +u = u''.join(map(unichr, range(128))) +for encoding in ('ascii',): + assert unicode(u.encode(encoding),encoding) == u + +print 'done.' diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Misc/unicode.txt Python+Unicode/Misc/unicode.txt --- CVS-Python/Misc/unicode.txt Thu Mar 23 23:58:48 2000 +++ Python+Unicode/Misc/unicode.txt Fri Mar 24 22:29:35 2000 @@ -715,21 +715,126 @@ These markers are used by the PyArg_ParseTuple() APIs: - 'U': Check for Unicode object and return a pointer to it + "U": Check for Unicode object and return a pointer to it - 's': For Unicode objects: auto convert them to the + "s": For Unicode objects: auto convert them to the and return a pointer to the object's buffer. - 's#': Access to the Unicode object via the bf_getreadbuf buffer interface + "s#": Access to the Unicode object via the bf_getreadbuf buffer interface (see Buffer Interface); note that the length relates to the buffer length, not the Unicode string length (this may be different depending on the Internal Format). - 't#': Access to the Unicode object via the bf_getcharbuf buffer interface + "t#": Access to the Unicode object via the bf_getcharbuf buffer interface (see Buffer Interface); note that the length relates to the buffer length, not necessarily to the Unicode string length (this may be different depending on the ). + "es": + Takes two parameters: encoding (const char *) and + buffer (char **). + + The input object is first coerced to Unicode in the usual way + and then encoded into a string using the given encoding. + + On output, a buffer of the needed size is allocated and + returned through *buffer as NULL-terminated string. + The encoded may not contain embedded NULL characters. + The caller is responsible for free()ing the allocated *buffer + after usage. + + "es#": + Takes three parameters: encoding (const char *), + buffer (char **) and buffer_len (int *). + + The input object is first coerced to Unicode in the usual way + and then encoded into a string using the given encoding. + + If *buffer is non-NULL, *buffer_len must be set to sizeof(buffer) + on input. Output is then copied to *buffer. + + If *buffer is NULL, a buffer of the needed size is + allocated and output copied into it. *buffer is then + updated to point to the allocated memory area. The caller + is responsible for free()ing *buffer after usage. + + In both cases *buffer_len is updated to the number of + characters written (excluding the trailing NULL-byte). + The output buffer is assured to be NULL-terminated. + +Examples: + +Using "es#" with auto-allocation: + + static PyObject * + test_parser(PyObject *self, + PyObject *args) + { + PyObject *str; + const char *encoding = "latin-1"; + char *buffer = NULL; + int buffer_len = 0; + + if (!PyArg_ParseTuple(args, "es#:test_parser", + encoding, &buffer, &buffer_len)) + return NULL; + if (!buffer) { + PyErr_SetString(PyExc_SystemError, + "buffer is NULL"); + return NULL; + } + str = PyString_FromStringAndSize(buffer, buffer_len); + free(buffer); + return str; + } + +Using "es" with auto-allocation returning a NULL-terminated string: + + static PyObject * + test_parser(PyObject *self, + PyObject *args) + { + PyObject *str; + const char *encoding = "latin-1"; + char *buffer = NULL; + + if (!PyArg_ParseTuple(args, "es:test_parser", + encoding, &buffer)) + return NULL; + if (!buffer) { + PyErr_SetString(PyExc_SystemError, + "buffer is NULL"); + return NULL; + } + str = PyString_FromString(buffer); + free(buffer); + return str; + } + +Using "es#" with a pre-allocated buffer: + + static PyObject * + test_parser(PyObject *self, + PyObject *args) + { + PyObject *str; + const char *encoding = "latin-1"; + char _buffer[10]; + char *buffer = _buffer; + int buffer_len = sizeof(_buffer); + + if (!PyArg_ParseTuple(args, "es#:test_parser", + encoding, &buffer, &buffer_len)) + return NULL; + if (!buffer) { + PyErr_SetString(PyExc_SystemError, + "buffer is NULL"); + return NULL; + } + str = PyString_FromStringAndSize(buffer, buffer_len); + return str; + } + File/Stream Output: ------------------- @@ -837,6 +942,7 @@ History of this Proposal: ------------------------- +1.3: Added new "es" and "es#" parser markers 1.2: Removed POD about codecs.open() 1.1: Added note about comparisons and hash values. Added note about case mapping algorithms. Changed stream codecs .read() and Only in CVS-Python/Objects: .#stringobject.c.2.59 Only in CVS-Python/Objects: stringobject.c.orig diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Python/getargs.c Python+Unicode/Python/getargs.c --- CVS-Python/Python/getargs.c Sat Mar 11 10:55:21 2000 +++ Python+Unicode/Python/getargs.c Fri Mar 24 20:22:26 2000 @@ -178,6 +178,8 @@ } else if (level != 0) ; /* Pass */ + else if (c == 'e') + ; /* Pass */ else if (isalpha(c)) max++; else if (c == '|') @@ -654,6 +656,122 @@ break; } + case 'e': /* encoded string */ + { + char **buffer; + const char *encoding; + PyObject *u, *s; + int size; + + /* Get 'e' parameter: the encoding name */ + encoding = (const char *)va_arg(*p_va, const char *); + if (encoding == NULL) + return "(encoding is NULL)"; + + /* Get 's' parameter: the output buffer to use */ + if (*format != 's') + return "(unkown parser marker combination)"; + buffer = (char **)va_arg(*p_va, char **); + format++; + if (buffer == NULL) + return "(buffer is NULL)"; + + /* Convert object to Unicode */ + u = PyUnicode_FromObject(arg); + if (u == NULL) + return "string, unicode or text buffer"; + + /* Encode object; use default error handling */ + s = PyUnicode_AsEncodedString(u, + encoding, + NULL); + Py_DECREF(u); + if (s == NULL) + return "(encoding failed)"; + if (!PyString_Check(s)) { + Py_DECREF(s); + return "(encoder failed to return a string)"; + } + size = PyString_GET_SIZE(s); + + /* Write output; output is guaranteed to be + 0-terminated */ + if (*format == '#') { + /* Using buffer length parameter '#': + + - if *buffer is NULL, a new buffer + of the needed size is allocated and + the data copied into it; *buffer is + updated to point to the new buffer; + the caller is responsible for + free()ing it after usage + + - if *buffer is not NULL, the data + is copied to *buffer; *buffer_len + has to be set to the size of the + buffer on input; buffer overflow is + signalled with an error; buffer has + to provide enough room for the + encoded string plus the trailing + 0-byte + + - in both cases, *buffer_len is + updated to the size of the buffer + /excluding/ the trailing 0-byte + + */ + int *buffer_len = va_arg(*p_va, int *); + + format++; + if (buffer_len == NULL) + return "(buffer_len is NULL)"; + if (*buffer == NULL) { + *buffer = PyMem_NEW(char, size + 1); + if (*buffer == NULL) { + Py_DECREF(s); + return "(memory error)"; + } + } else { + if (size + 1 > *buffer_len) { + Py_DECREF(s); + return "(buffer overflow)"; + } + } + memcpy(*buffer, + PyString_AS_STRING(s), + size + 1); + *buffer_len = size; + } else { + /* Using a 0-terminated buffer: + + - the encoded string has to be + 0-terminated for this variant to + work; if it is not, an error raised + + - a new buffer of the needed size + is allocated and the data copied + into it; *buffer is updated to + point to the new buffer; the caller + is responsible for free()ing it + after usage + + */ + if (strlen(PyString_AS_STRING(s)) != size) + return "(encoded string without "\ + "NULL bytes)"; + *buffer = PyMem_NEW(char, size + 1); + if (*buffer == NULL) { + Py_DECREF(s); + return "(memory error)"; + } + memcpy(*buffer, + PyString_AS_STRING(s), + size + 1); + } + Py_DECREF(s); + break; + } + case 'S': /* string object */ { PyObject **p = va_arg(*p_va, PyObject **); From fdrake at acm.org Fri Mar 24 22:40:38 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 16:40:38 -0500 (EST) Subject: [Python-Dev] delegating (was: 1.6 job list) In-Reply-To: References: Message-ID: <14555.57558.939236.363358@weyr.cnri.reston.va.us> Greg Stein writes: > Note that some of the problems stem from CVS access. Much Guido-time could > be saved by a commit-then-review model, rather than review-then-Guido- This is a non-problem; I'm willing to do the arcane CVS manipulations if the issue is Guido's time. What I will *not* do is do it piecemeal without a cohesive plan that Guido approves of at least 95%, and I'll be really careful to do that last 5% when he's not in the office. ;) > commits model. Fred does this very well with the Doc/ area. Thanks for the vote of confidence! The model that I use for the Doc/ area is more like "Fred reviews, Fred commits, and Guido can read it on python.org like everyone else." Works for me! ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From bwarsaw at cnri.reston.va.us Fri Mar 24 22:45:38 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 24 Mar 2000 16:45:38 -0500 (EST) Subject: [Python-Dev] 1.6 job list References: <200003242115.QAA04648@eric.cnri.reston.va.us> Message-ID: <14555.57858.824301.693390@anthem.cnri.reston.va.us> One thing you can definitely do now which breaks no code: propose a package hierarchy for the standard library. From akuchlin at mems-exchange.org Fri Mar 24 22:46:28 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Fri, 24 Mar 2000 16:46:28 -0500 (EST) Subject: [Python-Dev] Unicode charnames impl. In-Reply-To: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> Message-ID: <14555.57908.151946.182639@amarok.cnri.reston.va.us> Here's a strawman codec for doing the \N{NULL} thing. Questions: 0) Is the code below correct? 1) What the heck would this encoding be called? 2) What does .encode() do? (Right now it escapes \N as \N{BACKSLASH}N.) 3) How can we store all those names? The resulting dictionary makes a 361K .py file; Python dumps core trying to parse it. (Another bug...) 4) What do you with the error \N{...... no closing right bracket. Right now it stops at that point, and never advances any farther. Maybe it should assume it's an error if there's no } within the next 200 chars or some similar limit? 5) Do we need StreamReader/Writer classes, too? I've also add a script that parses the names out of the NameList.txt file at ftp://ftp.unicode.org/Public/UNIDATA/. --amk namecodec.py: ============= import codecs #from _namedict import namedict namedict = {'NULL': 0, 'START OF HEADING' : 1, 'BACKSLASH':ord('\\')} class NameCodec(codecs.Codec): def encode(self,input,errors='strict'): # XXX what should this do? Escape the # sequence \N as '\N{BACKSLASH}N'? return input.replace( '\\N', '\\N{BACKSLASH}N' ) def decode(self,input,errors='strict'): output = unicode("") last = 0 index = input.find( u'\\N{' ) while index != -1: output = output + unicode( input[last:index] ) used = index r_bracket = input.find( '}', index) if r_bracket == -1: # No closing bracket; bail out... break name = input[index + 3 : r_bracket] code = namedict.get( name ) if code is not None: output = output + unichr(code) elif errors == 'strict': raise ValueError, 'Unknown character name %s' % repr(name) elif errors == 'ignore': pass elif errors == 'replace': output = output + unichr( 0xFFFD ) last = r_bracket + 1 index = input.find( '\\N{', last) else: # Finally failed gently, no longer finding a \N{... output = output + unicode( input[last:] ) return len(input), output # Otherwise, we hit the break for an unterminated \N{...} return index, output if __name__ == '__main__': c = NameCodec() for s in [ r'b\lah blah \N{NULL} asdf', r'b\l\N{START OF HEADING}\N{NU' ]: used, s2 = c.decode(s) print repr( s2 ) s3 = c.encode(s) _, s4 = c.decode(s3) print repr(s3) assert s4 == s print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='replace' )) print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='ignore' )) makenamelist.py =============== # Hack to extract character names from NamesList.txt # Output the repr() of the resulting dictionary import re, sys, string namedict = {} while 1: L = sys.stdin.readline() if L == "": break m = re.match('([0-9a-fA-F]){4}(?:\t(.*)\s*)', L) if m is not None: last_char = int(m.group(1), 16) if m.group(2) is not None: name = string.upper( m.group(2) ) if name not in ['', '']: namedict[ name ] = last_char # print name, last_char m = re.match('\t=\s*(.*)\s*(;.*)?', L) if m is not None: name = string.upper( m.group(1) ) names = string.split(name, ',') names = map(string.strip, names) for n in names: namedict[ n ] = last_char # print n, last_char # XXX and do what with this dictionary? print namedict From mal at lemburg.com Fri Mar 24 22:50:19 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 22:50:19 +0100 Subject: [Python-Dev] Unicode Patch Set 2000-03-24 References: <38DBE0E0.76A298FE@lemburg.com> Message-ID: <38DBE31B.BCB342CA@lemburg.com> Oops, sorry, the patch file wasn't supposed to go to python-dev. Anyway, Greg's wish is included in there and MarkH should be happy now -- at least I hope he his ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Jasbahr at origin.EA.com Fri Mar 24 22:49:35 2000 From: Jasbahr at origin.EA.com (Asbahr, Jason) Date: Fri, 24 Mar 2000 15:49:35 -0600 Subject: [Python-Dev] Memory Management Message-ID: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com> Greetings! We're working on integrating our own memory manager into our project and the current challenge is figuring out how to make it play nice with Python (and SWIG). The approach we're currently taking is to patch 1.5.2 and augment the PyMem* macros to call external memory allocation functions that we provide. The idea is to easily allow the addition of third party memory management facilities to Python. Assuming 1) we get it working :-), and 2) we sync to the latest Python CVS and patch that, would this be a useful patch to give back to the community? Has anyone run up against this before? Thanks, Jason Asbahr Origin Systems, Inc. jasbahr at origin.ea.com From bwarsaw at cnri.reston.va.us Fri Mar 24 22:53:01 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Fri, 24 Mar 2000 16:53:01 -0500 (EST) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead References: <200003242103.QAA03288@eric.cnri.reston.va.us> <14555.56434.974884.832078@anthem.cnri.reston.va.us> <200003242127.QAA06269@eric.cnri.reston.va.us> Message-ID: <14555.58301.790774.159381@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> You probably meant: | sock.connect(addr) | sock.connect(host, port) | sock.connect((host, port)) GvR> since (host, port) is equivalent to (addr). Doh, yes. :) GvR> Fred typically directs latex2html to break all sections GvR> apart. It's in the previous section: I know, I was being purposefully dense for effect :) Fred, is there some way to make the html contain a link to the previous section for the "see above" text? That would solve the problem I think. GvR> This also explains the reason for requiring a single GvR> argument: when using AF_UNIX, the second argument makes no GvR> sense! GvR> Frankly, I'm not sure what do here -- it's more correct to GvR> require a single address argument always, but it's more GvR> convenient to allow two sometimes. GvR> Note that sendto(data, addr) only accepts the tuple form: you GvR> cannot write sendto(data, host, port). Hmm, that /does/ complicate things -- it makes explaining the API more difficult. Still, in this case I think I'd lean toward liberal acceptance of input parameters. :) -Barry From bwarsaw at cnri.reston.va.us Fri Mar 24 22:57:01 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 24 Mar 2000 16:57:01 -0500 (EST) Subject: [Python-Dev] 1.6 job list References: <200003242129.QAA06510@eric.cnri.reston.va.us> Message-ID: <14555.58541.207868.496747@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> OK, that's reasonable. I'll have to invent a different GvR> reason why I don't want this -- because I really don't! Tim's Fifth Enlightenment is all the reason you'd need, /if/ you can't be persuaded to change your mind :) -Barry From fdrake at acm.org Fri Mar 24 23:10:41 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 17:10:41 -0500 (EST) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: <14555.58301.790774.159381@anthem.cnri.reston.va.us> References: <200003242103.QAA03288@eric.cnri.reston.va.us> <14555.56434.974884.832078@anthem.cnri.reston.va.us> <200003242127.QAA06269@eric.cnri.reston.va.us> <14555.58301.790774.159381@anthem.cnri.reston.va.us> Message-ID: <14555.59361.460705.258859@weyr.cnri.reston.va.us> bwarsaw at cnri.reston.va.us writes: > I know, I was being purposefully dense for effect :) Fred, is there > some way to make the html contain a link to the previous section for > the "see above" text? That would solve the problem I think. No. I expect this to no longer be a problem when we push to SGML/XML, so I won't waste any time hacking around it. On the other hand, lots of places in the documentation refer to "above" and "below" in the traditional sense used in paper documents, and that doesn't work well for hypertext, even in the strongly traditional book-derivation way the Python manuals are done. As soon as it's not in the same HTML file, "above" and "below" break for a lot of people. So it still should be adjusted at an appropriate time. > Hmm, that /does/ complicate things -- it makes explaining the API more > difficult. Still, in this case I think I'd lean toward liberal > acceptance of input parameters. :) No -- all the more reason to be strict and keep the descriptions as simple as reasonable. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido at python.org Fri Mar 24 23:10:32 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 17:10:32 -0500 Subject: [Python-Dev] Memory Management In-Reply-To: Your message of "Fri, 24 Mar 2000 15:49:35 CST." <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com> References: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com> Message-ID: <200003242210.RAA11434@eric.cnri.reston.va.us> > We're working on integrating our own memory manager into our project > and the current challenge is figuring out how to make it play nice > with Python (and SWIG). The approach we're currently taking is to > patch 1.5.2 and augment the PyMem* macros to call external memory > allocation functions that we provide. The idea is to easily allow > the addition of third party memory management facilities to Python. > Assuming 1) we get it working :-), and 2) we sync to the latest Python > CVS and patch that, would this be a useful patch to give back to the > community? Has anyone run up against this before? Check out the archives for patches at python.org looking for posts by Vladimir Marangozov. Vladimir has produced several rounds of patches with a very similar goal in mind. We're still working out some details -- but it shouldn't be too long, and I hope that his patches are also suitable for you. If not, discussion is required! --Guido van Rossum (home page: http://www.python.org/~guido/) From bwarsaw at cnri.reston.va.us Fri Mar 24 23:12:35 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Fri, 24 Mar 2000 17:12:35 -0500 (EST) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead References: <200003242103.QAA03288@eric.cnri.reston.va.us> <14555.56434.974884.832078@anthem.cnri.reston.va.us> <200003242127.QAA06269@eric.cnri.reston.va.us> <14555.58301.790774.159381@anthem.cnri.reston.va.us> <14555.59361.460705.258859@weyr.cnri.reston.va.us> Message-ID: <14555.59475.802130.434345@anthem.cnri.reston.va.us> >>>>> "Fred" == Fred L Drake, Jr writes: Fred> No -- all the more reason to be strict and keep the Fred> descriptions as simple as reasonable. At the expense of (IMO unnecessarily) breaking existing code? From mal at lemburg.com Fri Mar 24 23:13:04 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 24 Mar 2000 23:13:04 +0100 Subject: [Python-Dev] Unicode charnames impl. References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> <14555.57908.151946.182639@amarok.cnri.reston.va.us> Message-ID: <38DBE870.D88915B5@lemburg.com> "Andrew M. Kuchling" wrote: > > Here's a strawman codec for doing the \N{NULL} thing. Questions: > > 0) Is the code below correct? Some comments below. > 1) What the heck would this encoding be called? Ehm, 'unicode-with-smileys' I guess... after all that's what motivated the thread ;-) Seriously, I'd go with 'unicode-named'. You can then stack it on top of 'unicode-escape' and get the best of both worlds... > 2) What does .encode() do? (Right now it escapes \N as > \N{BACKSLASH}N.) .encode() should translate Unicode to a string. Since the named char thing is probably only useful on input, I'd say: don't do anything, except maybe return input.encode('unicode-escape'). > 3) How can we store all those names? The resulting dictionary makes a > 361K .py file; Python dumps core trying to parse it. (Another bug...) I've made the same experience with the large Unicode mapping tables... the trick is to split the dictionary definition in chunks and then use dict.update() to paste them together again. > 4) What do you with the error \N{...... no closing right bracket. > Right now it stops at that point, and never advances any farther. > Maybe it should assume it's an error if there's no } within the > next 200 chars or some similar limit? I'd suggest to take the upper bound of all Unicode name lengths as limit. > 5) Do we need StreamReader/Writer classes, too? If you plan to have it registered with a codec search function, yes. No big deal though, because you can use the Codec class as basis for them: class StreamWriter(Codec,codecs.StreamWriter): pass class StreamReader(Codec,codecs.StreamReader): pass ### encodings module API def getregentry(): return (Codec().encode,Codec().decode,StreamReader,StreamWriter) Then call drop the scripts into the encodings package dir and it should be useable via unicode(r'\N{SMILEY}','unicode-named') and u":-)".encode('unicode-named'). > I've also add a script that parses the names out of the NameList.txt > file at ftp://ftp.unicode.org/Public/UNIDATA/. > > --amk > > namecodec.py: > ============= > > import codecs > > #from _namedict import namedict > namedict = {'NULL': 0, 'START OF HEADING' : 1, > 'BACKSLASH':ord('\\')} > > class NameCodec(codecs.Codec): > def encode(self,input,errors='strict'): > # XXX what should this do? Escape the > # sequence \N as '\N{BACKSLASH}N'? > return input.replace( '\\N', '\\N{BACKSLASH}N' ) You should return a string on output... input will be a Unicode object and the return value too if you don't add e.g. an .encode('unicode-escape'). > def decode(self,input,errors='strict'): > output = unicode("") > last = 0 > index = input.find( u'\\N{' ) > while index != -1: > output = output + unicode( input[last:index] ) > used = index > r_bracket = input.find( '}', index) > if r_bracket == -1: > # No closing bracket; bail out... > break > > name = input[index + 3 : r_bracket] > code = namedict.get( name ) > if code is not None: > output = output + unichr(code) > elif errors == 'strict': > raise ValueError, 'Unknown character name %s' % repr(name) This could also be UnicodeError (its a subclass of ValueError). > elif errors == 'ignore': pass > elif errors == 'replace': > output = output + unichr( 0xFFFD ) '\uFFFD' would save a call. > last = r_bracket + 1 > index = input.find( '\\N{', last) > else: > # Finally failed gently, no longer finding a \N{... > output = output + unicode( input[last:] ) > return len(input), output > > # Otherwise, we hit the break for an unterminated \N{...} > return index, output Note that .decode() must only return the decoded data. The "bytes read" integer was removed in order to make the Codec APIs compatible with the standard file object APIs. > if __name__ == '__main__': > c = NameCodec() > for s in [ r'b\lah blah \N{NULL} asdf', > r'b\l\N{START OF HEADING}\N{NU' ]: > used, s2 = c.decode(s) > print repr( s2 ) > > s3 = c.encode(s) > _, s4 = c.decode(s3) > print repr(s3) > assert s4 == s > > print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='replace' )) > print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='ignore' )) > > makenamelist.py > =============== > > # Hack to extract character names from NamesList.txt > # Output the repr() of the resulting dictionary > > import re, sys, string > > namedict = {} > > while 1: > L = sys.stdin.readline() > if L == "": break > > m = re.match('([0-9a-fA-F]){4}(?:\t(.*)\s*)', L) > if m is not None: > last_char = int(m.group(1), 16) > if m.group(2) is not None: > name = string.upper( m.group(2) ) > if name not in ['', > '']: > namedict[ name ] = last_char > # print name, last_char > > m = re.match('\t=\s*(.*)\s*(;.*)?', L) > if m is not None: > name = string.upper( m.group(1) ) > names = string.split(name, ',') > names = map(string.strip, names) > for n in names: > namedict[ n ] = last_char > # print n, last_char > > # XXX and do what with this dictionary? > print namedict > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://www.python.org/mailman/listinfo/python-dev -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake at acm.org Fri Mar 24 23:12:42 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 24 Mar 2000 17:12:42 -0500 (EST) Subject: [Python-Dev] Memory Management In-Reply-To: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com> References: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com> Message-ID: <14555.59482.61317.992089@weyr.cnri.reston.va.us> Asbahr, Jason writes: > community? Has anyone run up against this before? You should talk to Vladimir Marangozov; he's done a fair bit of work dealing with memory management in Python. You probably want to read the chapter he contributed to the Python/C API document for the release earlier this week. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From skip at mojam.com Fri Mar 24 23:19:50 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 24 Mar 2000 16:19:50 -0600 (CST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242115.QAA04648@eric.cnri.reston.va.us> References: <14555.54636.811100.254309@amarok.cnri.reston.va.us> <14555.55139.484135.602894@weyr.cnri.reston.va.us> <200003242115.QAA04648@eric.cnri.reston.va.us> Message-ID: <14555.59910.631130.241930@beluga.mojam.com> Guido> Which reminds me of another reason to wait: coming up with the Guido> right package hierarchy is hard. (E.g. I find network too long; Guido> plus, does htmllib belong there?) Ah, another topic for python-dev. Even if we can't do the packaging right away, we should be able to hash out the structure. Skip From guido at python.org Fri Mar 24 23:25:01 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 17:25:01 -0500 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: Your message of "Fri, 24 Mar 2000 17:10:41 EST." <14555.59361.460705.258859@weyr.cnri.reston.va.us> References: <200003242103.QAA03288@eric.cnri.reston.va.us> <14555.56434.974884.832078@anthem.cnri.reston.va.us> <200003242127.QAA06269@eric.cnri.reston.va.us> <14555.58301.790774.159381@anthem.cnri.reston.va.us> <14555.59361.460705.258859@weyr.cnri.reston.va.us> Message-ID: <200003242225.RAA13408@eric.cnri.reston.va.us> > bwarsaw at cnri.reston.va.us writes: > > I know, I was being purposefully dense for effect :) Fred, is there > > some way to make the html contain a link to the previous section for > > the "see above" text? That would solve the problem I think. [Fred] > No. I expect this to no longer be a problem when we push to > SGML/XML, so I won't waste any time hacking around it. > On the other hand, lots of places in the documentation refer to > "above" and "below" in the traditional sense used in paper documents, > and that doesn't work well for hypertext, even in the strongly > traditional book-derivation way the Python manuals are done. As soon > as it's not in the same HTML file, "above" and "below" break for a lot > of people. So it still should be adjusted at an appropriate time. My approach to this: put more stuff on the same page! I personally favor putting an entire chapter on one page; even if you split the top-level subsections this wouldn't have happened. --Guido van Rossum (home page: http://www.python.org/~guido/) From klm at digicool.com Fri Mar 24 23:40:54 2000 From: klm at digicool.com (Ken Manheimer) Date: Fri, 24 Mar 2000 17:40:54 -0500 (EST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us> Message-ID: Guido wrote: > OK, that's reasonable. I'll have to invent a different reason why I > don't want this -- because I really don't! I'm glad this organize-the-library-in-packages initiative seems to be moving towards concentrating on the organization, rather than just starting to put obvious things in the obvious places. Personally, i *crave* sensible, discoverable organization. The only thing i like less than complicated disorganization is complicated misorganization - and i think that just diving in and doing the "obvious" placements would have the terrible effect of making it harder, not easier, to move eventually to the right arrangement. Ken klm at digicool.com From akuchlin at mems-exchange.org Fri Mar 24 23:45:20 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Fri, 24 Mar 2000 17:45:20 -0500 (EST) Subject: [Python-Dev] Unicode charnames impl. In-Reply-To: <38DBE870.D88915B5@lemburg.com> References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> <14555.57908.151946.182639@amarok.cnri.reston.va.us> <38DBE870.D88915B5@lemburg.com> Message-ID: <14555.61440.613940.50492@amarok.cnri.reston.va.us> M.-A. Lemburg writes: >.encode() should translate Unicode to a string. Since the >named char thing is probably only useful on input, I'd say: >don't do anything, except maybe return input.encode('unicode-escape'). Wait... then you can't stack it on top of unicode-escape, because it would already be Unicode escaped. >> 4) What do you with the error \N{...... no closing right bracket. >I'd suggest to take the upper bound of all Unicode name >lengths as limit. Seems like a hack. >Note that .decode() must only return the decoded data. >The "bytes read" integer was removed in order to make >the Codec APIs compatible with the standard file object >APIs. Huh? Why does Misc/unicode.txt describe decode() as "Decodes the object input and returns a tuple (output object, length consumed)"? Or are you talking about a different .decode() method? -- A.M. Kuchling http://starship.python.net/crew/amk/ "Ruby's dead?" "Yes." "Ah me. That's the trouble with mortals. They do that. Not to worry, eh?" -- Dream and Pharamond, in SANDMAN #46: "Brief Lives:6" From gmcm at hypernet.com Fri Mar 24 23:50:12 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Fri, 24 Mar 2000 17:50:12 -0500 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: <200003242103.QAA03288@eric.cnri.reston.va.us> Message-ID: <1258184279-6957124@hypernet.com> [Guido] > Someone noticed that socket.connect() and a few related functions > (connect_ex() and bind()) take either a single (host, port) tuple or > two separate arguments, but that only the tuple is documented. > > Similar to append(), I'd like to close this gap, and I've made the > necessary changes. This will probably break lots of code. This will indeed cause great wailing and gnashing of teeth. I've been criticized for using the tuple form in the Sockets HOWTO (in fact I foolishly changed it to demonstrate both forms). > Similar to append(), I'd like people to fix their code rather than > whine -- two-arg connect() has never been documented, although it's > found in much code (even the socket module test code :-( ). > > Similar to append(), I may revert the change if it is shown to cause > too much pain during beta testing... I say give 'em something to whine about. put-sand-in-the-vaseline-ly y'rs - Gordon From klm at digicool.com Fri Mar 24 23:55:43 2000 From: klm at digicool.com (Ken Manheimer) Date: Fri, 24 Mar 2000 17:55:43 -0500 (EST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <14555.58541.207868.496747@anthem.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Barry A. Warsaw wrote: > > >>>>> "GvR" == Guido van Rossum writes: > > GvR> OK, that's reasonable. I'll have to invent a different > GvR> reason why I don't want this -- because I really don't! > > Tim's Fifth Enlightenment is all the reason you'd need, /if/ you can't > be persuaded to change your mind :) Maybe i'm just a slave to my organization mania, but i'd suggest the following order change of 5 and 6, plus an addition; from: 5 now: Flat is better than nested. 6 now: Sparse is better than dense. to: 5 Sparse is better than dense. 6 Flat is better than nested 6.5 until it gets too dense. or-is-it-me-that-gets-too-dense'ly yrs, ken klm at digicool.com (And couldn't the humor page get hooked up a bit better? That was definitely a fun part of maintaining python.org...) From gstein at lyra.org Sat Mar 25 02:19:18 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 24 Mar 2000 17:19:18 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <14555.57858.824301.693390@anthem.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Barry A. Warsaw wrote: > One thing you can definitely do now which breaks no code: propose a > package hierarchy for the standard library. I already did! http://www.python.org/pipermail/python-dev/2000-February/003761.html *grumble* -g -- Greg Stein, http://www.lyra.org/ From tim_one at email.msn.com Sat Mar 25 05:19:33 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 24 Mar 2000 23:19:33 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us> Message-ID: <001001bf9611$52e960a0$752d153f@tim> [GregS proposes a partial packaging of std modules for 1.6, Guido objects on spurious grounds, GregS refutes that, Guido agrees] > I'll have to invent a different reason why I don't want this -- because > I really don't! This one's easy! It's why I left the 20th of the 20 Pythonic Theses for you to fill in . All you have to do now is come up with a pithy way to say "if it's something Guido is so interested in that he wants to be deeply involved in it himself, but it comes at a time when he's buried under prior commitments, then tough tulips, it waits". shades-of-the-great-renaming-ly y'rs - tim From tim_one at email.msn.com Sat Mar 25 05:19:36 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 24 Mar 2000 23:19:36 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: <14555.58541.207868.496747@anthem.cnri.reston.va.us> Message-ID: <001101bf9611$544239e0$752d153f@tim> [Guido] > OK, that's reasonable. I'll have to invent a different > reason why I don't want this -- because I really don't! [Barry] > Tim's Fifth Enlightenment is all the reason you'd need, /if/ you can't > be persuaded to change your mind :) No no no no no: "namespaces are one honking great idea ..." is the controlling one here: Guido really *does* want this! It's a question of timing, in the sense of "never is often better than *right* now", but to be eventually modified by "now is better than never". These were carefully designed to support any position whatsoever, you know . although-in-any-particular-case-there's-only-one-true-interpretation-ly y'rs - tim From guido at python.org Sat Mar 25 05:19:41 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2000 23:19:41 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: Your message of "Fri, 24 Mar 2000 17:19:18 PST." References: Message-ID: <200003250419.XAA25751@eric.cnri.reston.va.us> > > One thing you can definitely do now which breaks no code: propose a > > package hierarchy for the standard library. > > I already did! > > http://www.python.org/pipermail/python-dev/2000-February/003761.html > > *grumble* You've got to be kidding. That's not a package hierarchy proposal, it's just one package (network). Without a comprehensive proposal I'm against a partial reorganization: without a destination we can't start marching. Naming things is very contentious -- everybody has an opinion. To pick the right names you must see things in perspective. --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez at math.huji.ac.il Sat Mar 25 09:45:28 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 25 Mar 2000 10:45:28 +0200 (IST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: Message-ID: On Thu, 23 Mar 2000 gvwilson at nevex.com wrote: > If None becomes a keyword, I would like to ask whether it could be used to > signal that a method is a class method, as opposed to an instance method: I'd like to know what you mean by "class" method. (I do know C++ and Java, so I have some idea...). Specifically, my question is: how does a class method access class variables? They can't be totally unqualified (because that's very unpythonic). If they are qualified by the class's name, I see it as a very mild improvement on the current situation. You could suggest, for example, to qualify class variables by "class" (so you'd do things like: class.x = 1), but I'm not sure I like it. On the whole, I think it is a much bigger issue on how be denote class methods. Also, one slight problem with your method of denoting class methods: currently, it is possible to add instance method at run time to a class by something like class C: pass def foo(self): pass C.foo = foo In your suggestion, how do you view the possiblity of adding class methods to a class? (Note that "foo", above, is also perfectly usable as a plain function). I want to note that Edward suggested denotation by a seperate namespace: C.foo = foo # foo is an instance method C.__methods__.foo = foo # foo is a class method The biggest problem with that suggestion is that it doesn't address the common case of defining it textually inside the class definition. > I'd also like to ask (separately) that assignment to None be defined as a > no-op, so that programmers can write: > > year, month, None, None, None, None, weekday, None, None = gmtime(time()) > > instead of having to create throw-away variables to fill in slots in > tuples that they don't care about. Currently, I use "_" for that purpose, after I heard the idea from Fredrik Lundh. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From gstein at lyra.org Sat Mar 25 10:26:23 2000 From: gstein at lyra.org (Greg Stein) Date: Sat, 25 Mar 2000 01:26:23 -0800 (PST) Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: <200003250419.XAA25751@eric.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Guido van Rossum wrote: > > > One thing you can definitely do now which breaks no code: propose a > > > package hierarchy for the standard library. > > > > I already did! > > > > http://www.python.org/pipermail/python-dev/2000-February/003761.html > > > > *grumble* > > You've got to be kidding. That's not a package hierarchy proposal, > it's just one package (network). > > Without a comprehensive proposal I'm against a partial reorganization: > without a destination we can't start marching. Not kidding at all. I said before that I don't think we can do everything all at once. I *do* think this is solvable with a greedy algorithm rather than waiting for some nebulous completion point. > Naming things is very contentious -- everybody has an opinion. To > pick the right names you must see things in perspective. Sure. And those diverse opinions are why I don't believe it is possible to do all at once. The task is simply too large to tackle in one shot. IMO, it must be solved incrementally. I'm not even going to attempt to try to define a hierarchy for all those modules. I count 137 on my local system. Let's say that I *do* try... some are going to end up "forced" rather than obeying some obvious grouping. If you do it a chunk at a time, then you get the obvious, intuitive groupings. Try for more, and you just bung it all up. For discussion's sake: can you provide a rationale for doing it all at once? In the current scenario, modules just appear at some point. After a partial reorg, some modules appear at a different point. "No big whoop." Just because module A is in a package doesn't imply that module B must also be in a package. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sat Mar 25 10:35:39 2000 From: gstein at lyra.org (Greg Stein) Date: Sat, 25 Mar 2000 01:35:39 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <001001bf9611$52e960a0$752d153f@tim> Message-ID: On Fri, 24 Mar 2000, Tim Peters wrote: > [GregS proposes a partial packaging of std modules for 1.6, Guido objects on > spurious grounds, GregS refutes that, Guido agrees] > > > I'll have to invent a different reason why I don't want this -- because > > I really don't! > > This one's easy! It's why I left the 20th of the 20 Pythonic Theses for you > to fill in . All you have to do now is come up with a pithy way to > say "if it's something Guido is so interested in that he wants to be deeply > involved in it himself, but it comes at a time when he's buried under prior > commitments, then tough tulips, it waits". No need for Pythonic Theses. I don't see anybody disagreeing with the end goal. The issue comes up with *how* to get there. I say "do it incrementally" while others say "do it all at once." Personally, I don't think it is possible to do all at once. As a corollary, if you can't do it all at once, but you *require* that it be done all at once, then you have effectively deferred the problem. To put it another way, Guido has already invented a reason to not do it: he just requires that it be done all at once. Result: it won't be done. [ not saying this was Guido's intent or desire... but this is how I read the result :-) ] Cheers, -g -- Greg Stein, http://www.lyra.org/ From moshez at math.huji.ac.il Sat Mar 25 10:55:12 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 25 Mar 2000 11:55:12 +0200 (IST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <14555.34371.749039.946891@beluga.mojam.com> Message-ID: On Fri, 24 Mar 2000, Skip Montanaro wrote: > Might I suggest moving robotparser.py from Tools/webchecker to Lib? Modules > of general usefulness (this is at least generally useful for anyone writing > web spiders ;-) shouldn't live in Tools, because it's not always available > and users need to do extra work to make them available. You're right, but I'd like this to be a 1.7 change. It's just that I plan to suggest a great-renaming-fest for 1.7 modules, and then namespace wouldn't be cluttered when you don't need it. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Sat Mar 25 11:16:23 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 25 Mar 2000 12:16:23 +0200 (IST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us> Message-ID: On Fri, 24 Mar 2000, Guido van Rossum wrote: > OK, that's reasonable. I'll have to invent a different reason why I > don't want this -- because I really don't! Here's a reason: there shouldn't be changes we'll retract later -- we need to come up with the (more or less) right hierarchy the first time, or we'll do a lot of work for nothing. > Hm. Moving modules requires painful and arcane CVS manipulations that > can only be done by the few of us here at CNRI -- and I'm the only one > left who's full time on Python. Hmmmmm....this is a big problem. Maybe we need to have more people with access to the CVS? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal at lemburg.com Sat Mar 25 11:47:30 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 25 Mar 2000 11:47:30 +0100 Subject: [Python-Dev] Unicode charnames impl. References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> <14555.57908.151946.182639@amarok.cnri.reston.va.us> <38DBE870.D88915B5@lemburg.com> <14555.61440.613940.50492@amarok.cnri.reston.va.us> Message-ID: <38DC9942.3C4E4B92@lemburg.com> "Andrew M. Kuchling" wrote: > > M.-A. Lemburg writes: > >.encode() should translate Unicode to a string. Since the > >named char thing is probably only useful on input, I'd say: > >don't do anything, except maybe return input.encode('unicode-escape'). > > Wait... then you can't stack it on top of unicode-escape, because it > would already be Unicode escaped. Sorry for the mixup (I guess yesterday wasn't my day...). I had stream codecs in mind: these are stackable, meaning that you can wrap one codec around another. And its also their interface API that was changed -- not the basic stateless encoder/decoder ones. Stacking of .encode()/.decode() must be done "by hand" in e.g. the way I described above. Another approach would be subclassing the unicode-escape Codec and then calling the base class method. > >> 4) What do you with the error \N{...... no closing right bracket. > >I'd suggest to take the upper bound of all Unicode name > >lengths as limit. > > Seems like a hack. It is... but what other way would there be ? > >Note that .decode() must only return the decoded data. > >The "bytes read" integer was removed in order to make > >the Codec APIs compatible with the standard file object > >APIs. > > Huh? Why does Misc/unicode.txt describe decode() as "Decodes the > object input and returns a tuple (output object, length consumed)"? > Or are you talking about a different .decode() method? You're right... I was thinking about .read() and .write(). .decode() should do return a tuple, just as documented in unicode.txt. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mhammond at skippinet.com.au Sat Mar 25 14:20:59 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun, 26 Mar 2000 00:20:59 +1100 Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: [Greg writes] > I'm not even going to attempt to try to > define a hierarchy for all those modules. I count 137 on my local system. > Let's say that I *do* try... some are going to end up "forced" rather than > obeying some obvious grouping. If you do it a chunk at a time, then you > get the obvious, intuitive groupings. Try for more, and you just bung it > all up. ... > Just because module A is in a package doesn't imply that module B must > also be in a package. I agree with Greg - every module will not fit into a package. But I also agree with Guido - we _should_ attempt to go through the 137 modules and put the ones that fit into logical groupings. Greg is probably correct with his selection for "net", but a general evaluation is still a good thing. A view of the bigger picture will help to quell debates over the structure, and only leave us with the squabbles over the exact spelling :-) +2 on ... err .... -1 on ... errr - awww - screw that--ly, Mark. From tismer at tismer.com Sat Mar 25 14:35:50 2000 From: tismer at tismer.com (Christian Tismer) Date: Sat, 25 Mar 2000 14:35:50 +0100 Subject: [Python-Dev] Unicode charnames impl. References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> <14555.57908.151946.182639@amarok.cnri.reston.va.us> Message-ID: <38DCC0B6.2A7D0EF1@tismer.com> "Andrew M. Kuchling" wrote: ... > 3) How can we store all those names? The resulting dictionary makes a > 361K .py file; Python dumps core trying to parse it. (Another bug...) This is simply not the place to use a dictionary. You don't need fast lookup from names to codes, but something that supports incremental search. This would enable PythonWin to sho a pop-up list after you typed the first letters. I'm working on a common substring analysis that makes each entry into 3 to 5 small integers. You then encode these in an order-preserving way. That means, the resulting code table is still lexically ordered, and access to the sentences is done via bisection. Takes me some more time to get that, but it will not be larger than 60k, or I drop it. Also note that all the names use uppercase letters and space only. An opportunity to use simple context encoding and use just 4 bits most of the time. ... > I've also add a script that parses the names out of the NameList.txt > file at ftp://ftp.unicode.org/Public/UNIDATA/. Is there any reason why you didn't use the UnicodeData.txt file, I mean do I cover everything if I continue to use that? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From Vladimir.Marangozov at inrialpes.fr Sat Mar 25 15:59:55 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Sat, 25 Mar 2000 15:59:55 +0100 (CET) Subject: [Python-Dev] Windows and PyObject_NEW Message-ID: <200003251459.PAA09181@python.inrialpes.fr> For MarkH, Guido and the Windows experienced: I've been reading Jeffrey Richter's "Advanced Windows" last night in order to try understanding better why PyObject_NEW is implemented differently for Windows. Again, I feel uncomfortable with this, especially now, when I'm dealing with the memory aspect of Python's object constructors/desctrs. Some time ago, Guido elaborated on why PyObject_NEW uses malloc() on the user's side, before calling _PyObject_New (on Windows, cf. objimpl.h): [Guido] > I can explain the MS_COREDLL business: > > This is defined on Windows because the core is in a DLL. Since the > caller may be in another DLL, and each DLL (potentially) has a > different default allocator, and (in pre-Vladimir times) the > type-specific deallocator typically calls free(), we (Mark & I) > decided that the allocation should be done in the type-specific > allocator. We changed the PyObject_NEW() macro to call malloc() and > pass that into _PyObject_New() as a second argument. While I agree with this, from reading chapters 5-9 of (a French copy of) the book (translated backwards here): 5. Win32 Memory Architecture 6. Exploring Virtual Memory 7. Using Virtual Memory in Your Applications 8. Memory Mapped Files 9. Heaps I can't find any radical Windows specificities for memory management. On Windows, like the rest of the OSes, the (virtual & physical) memory allocated for a process is common and seem to be accessible from all DDLs involved in an executable. Things like page sharing, copy-on-write, private process mem, etc. are conceptually all the same on Windows and Unix. Now, the backwards binary compatibility argument aside (assuming that extensions get recompiled when a new Python version comes out), my concern is that with the introduction of PyObject_NEW *and* PyObject_DEL, there's no point in having separate implementations for Windows and Unix any more (or I'm really missing something and I fail to see what it is). User objects would be allocated *and* freed by the core DLL (at least the object headers). Even if several DLLs use different allocators, this shouldn't be a problem if what's obtained via PyObject_NEW is freed via PyObject_DEL. This Python memory would be allocated from the Python's core DLL regions/pages/heaps. And I believe that the memory allocated by the core DLL is accessible from the other DLL's of the process. (I haven't seen evidence on the opposite, but tell me if this is not true) I thought that maybe Windows malloc() uses different heaps for the different DLLs, but that's fine too, as long as the _NEW/_DEL symmetry is respected and all heaps are accessible from all DLLs (which seems to be the case...), but: In the beginning of Chapter 9, Heaps, I read the following: """ ...About Win32 heaps (compared to Win16 heaps)... * There is only one kind of heap (it doesn't have any particular name, like "local" or "global" on Win16, because it's unique) * Heaps are always local to a process. The contents of a process heap is not accessible from the threads of another process. A large number of Win16 applications use the global heap as a way of sharing data between processes; this change in the Win32 heaps is often a source of problems for porting Win16 applications to Win32. * One process can create several heaps in its addressing space and can manipulate them all. * A DLL does not have its own heap. It uses the heaps as part of the addressing space of the process. However, a DLL can create a heap in the addressing space of a process and reserve it for its own use. Since several 16-bit DLLs share data between processes by using the local heap of a DLL, this change is a source of problems when porting Win16 apps to Win32... """ This last paragraph confuses me. On one hand, it's stated that all heaps can be manipulated by the process, and OTOH, a DLL can reserve a heap for personal use within that process (implying the heap is r/w protected for the other DLLs ?!?). The rest of this chapter does not explain how this "private reservation" is or can be done, so some of you would probably want to chime in and explain this to me. Going back to PyObject_NEW, if it turns out that all heaps are accessible from all DLLs involved in the process, I would probably lobby for unifying the implementation of _PyObject_NEW/_New and _PyObject_DEL/_Del for Windows and Unix. Actually on Windows, object allocation does not depend on a central, Python core memory allocator. Therefore, with the patches I'm working on, changing the core allocator would work (would be changed for real) only for platforms other than Windows. Next, ff it's possible to unify the implementation, it would also be possible to expose and officialize in the C API a new function set: PyObject_New() and PyObject_Del() (without leading underscores) For now, due to the implementation difference on Windows, we're forced to use the macro versions PyObject_NEW/DEL. Clearly, please tell me what would be wrong on Windows if a) & b) & c): a) we have PyObject_New(), PyObject_Del() b) their implementation is platform independent (no MS_COREDLL diffs, we retain the non-Windows variant) c) they're both used systematically for all object types -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From gmcm at hypernet.com Sat Mar 25 16:46:01 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Sat, 25 Mar 2000 10:46:01 -0500 Subject: [Python-Dev] Windows and PyObject_NEW In-Reply-To: <200003251459.PAA09181@python.inrialpes.fr> Message-ID: <1258123323-10623548@hypernet.com> Vladimir Marangozov > ... And I believe that the memory allocated > by the core DLL is accessible from the other DLL's of the process. > (I haven't seen evidence on the opposite, but tell me if this is not true) This is true. Or, I should say, it all boils down to HeapAlloc( heap, flags, bytes) and malloc is going to use the _crtheap. > In the beginning of Chapter 9, Heaps, I read the following: > > """ > ...About Win32 heaps (compared to Win16 heaps)... > > * There is only one kind of heap (it doesn't have any particular name, > like "local" or "global" on Win16, because it's unique) > > * Heaps are always local to a process. The contents of a process heap is > not accessible from the threads of another process. A large number of > Win16 applications use the global heap as a way of sharing data between > processes; this change in the Win32 heaps is often a source of problems > for porting Win16 applications to Win32. > > * One process can create several heaps in its addressing space and can > manipulate them all. > > * A DLL does not have its own heap. It uses the heaps as part of the > addressing space of the process. However, a DLL can create a heap in > the addressing space of a process and reserve it for its own use. > Since several 16-bit DLLs share data between processes by using the > local heap of a DLL, this change is a source of problems when porting > Win16 apps to Win32... > """ > > This last paragraph confuses me. On one hand, it's stated that all heaps > can be manipulated by the process, and OTOH, a DLL can reserve a heap for > personal use within that process (implying the heap is r/w protected for > the other DLLs ?!?). At any time, you can creat a new Heap handle HeapCreate(options, initsize, maxsize) Nothing special about the "dll" context here. On Win9x, only someone who knows about the handle can manipulate the heap. (On NT, you can enumerate the handles in the process.) I doubt very much that you would break anybody's code by removing the Windows specific behavior. But it seems to me that unless Python always uses the default malloc, those of us who write C++ extensions will have to override operator new? I'm not sure. I've used placement new to allocate objects in a memory mapped file, but I've never tried to muck with the global memory policy of C++ program. - Gordon From akuchlin at mems-exchange.org Sat Mar 25 18:58:56 2000 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Sat, 25 Mar 2000 12:58:56 -0500 (EST) Subject: [Python-Dev] Unicode charnames impl. In-Reply-To: <38DCC0B6.2A7D0EF1@tismer.com> References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> <14555.57908.151946.182639@amarok.cnri.reston.va.us> <38DCC0B6.2A7D0EF1@tismer.com> Message-ID: <14556.65120.22727.524616@newcnri.cnri.reston.va.us> Christian Tismer writes: >This is simply not the place to use a dictionary. >You don't need fast lookup from names to codes, >but something that supports incremental search. >This would enable PythonWin to sho a pop-up list after >you typed the first letters. Hmm... one could argue that PythonWin or IDLE should provide their own database for incremental searching; I was planning on following Bill Tutt's suggestion of generating a perfect minimal hash for the names. gperf isn't up to the job, but I found an algorithm that should be OK. Just got to implement it now... But, if your approach pays off it'll be superior to a perfect hash. >Is there any reason why you didn't use the UnicodeData.txt file, >I mean do I cover everything if I continue to use that? Oops; I saw the NameList file and just went for it; maybe it should use the full UnicodeData.txt. --amk From moshez at math.huji.ac.il Sat Mar 25 19:10:44 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 25 Mar 2000 20:10:44 +0200 (IST) Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Mark Hammond wrote: > But I also agree with Guido - we _should_ attempt to go through the 137 Where did you come up with that number? I counted much more -- not quite sure, but certainly more. Well, here's a tentative suggestion I worked out today. This is just to have something to quibble about. In the interest of rushing it out of the door, there are a few modules (explicitly mentioned) which I have said nothing about. net httplib ftplib urllib cgi gopherlib imaplib poplib nntplib smptlib urlparse telnetlib server BaseHTTPServer CGIHTTPServer SimpleHTTPServer SocketServer asynchat asyncore text sgmllib htmllib htmlentitydefs xml whatever the xml-sig puts here mail rfc822 mime MimeWriter mimetools mimify mailcap mimetypes base64 quopri mailbox mhlib binhex parse string re regex reconvert regex_syntax regsub shlex ConfigParser linecache multifile netrc bin gzip zlib aifc chunk image imghdr colorsys imageop imgfile rgbimg yuvconvert sound sndhdr toaiff audiodev sunau sunaudio wave audioop sunaudiodev db anydbm whichdb bsddb dbm dbhash dumbdbm gdbm math bisect fpformat random whrandom cmath math crypt fpectl fpetest array md5 mpz rotor sha time calendar time tzparse sched timing interpreter new py_compile code codeop compileall keyword token tokenize parser dis bdb pdb profile pyclbr tabnanny symbol pstats traceback rlcompleter security Bastion rexec ihooks file dircache path -- a virtual module which would do a from path import * dospath posixpath macpath nturl2path ntpath macurl2path filecmp fileinput StringIO cStringIO glob fnmatch posixfile stat statcache statvfs tempfile shutil pipes popen2 commands dl fcntl serialize pickle cPickle shelve xdrlib copy copy_reg threads thread threading Queue mutex ui curses Tkinter cmd getpass internal _codecs _locale _tkinter pcre strop posix users pwd grp nis exceptions os types UserDict UserList user site locale sgi al cd cl fl fm gl misc (what used to be sgimodule.c) sv unicode codecs unicodedata unicodedatabase ========== Modules not handled ============ formatter getopt pprint pty repr tty errno operator pure readline resource select signal socket struct syslog termios Well, if you got this far, you certainly deserve... congratualtions-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From DavidA at ActiveState.com Sat Mar 25 19:28:30 2000 From: DavidA at ActiveState.com (David Ascher) Date: Sat, 25 Mar 2000 10:28:30 -0800 Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: > db > anydbm > whichdb > bsddb > dbm > dbhash > dumbdbm > gdbm This made me think of one issue which is worth considering -- is there a mechanism for third-party packages to hook into the standard naming hierarchy? It'd be weird not to have the oracle and sybase modules within the db toplevel package, for example. --david ascher From moshez at math.huji.ac.il Sat Mar 25 19:30:26 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 25 Mar 2000 20:30:26 +0200 (IST) Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: On Sat, 25 Mar 2000, David Ascher wrote: > This made me think of one issue which is worth considering -- is there a > mechanism for third-party packages to hook into the standard naming > hierarchy? It'd be weird not to have the oracle and sybase modules within > the db toplevel package, for example. My position is that any 3rd party module decides for itself where it wants to live -- once we formalized the framework. Consider PyGTK/PyGnome, PyQT/PyKDE -- they should live in the UI package too... From DavidA at ActiveState.com Sat Mar 25 19:50:14 2000 From: DavidA at ActiveState.com (David Ascher) Date: Sat, 25 Mar 2000 10:50:14 -0800 Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: > On Sat, 25 Mar 2000, David Ascher wrote: > > > This made me think of one issue which is worth considering -- is there a > > mechanism for third-party packages to hook into the standard naming > > hierarchy? It'd be weird not to have the oracle and sybase > modules within > > the db toplevel package, for example. > > My position is that any 3rd party module decides for itself where it wants > to live -- once we formalized the framework. Consider PyGTK/PyGnome, > PyQT/PyKDE -- they should live in the UI package too... That sounds good in theory, but I can see possible problems down the line: 1) The current mapping between package names and directory structure means that installing a third party package hierarchy in a different place on disk than the standard library requires some work on the import mechanisms (this may have been discussed already) and a significant amount of user education. 2) We either need a 'registration' mechanism whereby people can claim a name in the standard hierarchy or expect conflicts. As far as I can gather, in the Perl world registration occurs by submission to CPAN. Correct? One alternative is to go the Java route, which would then mean, I think, that some core modules are placed very high in the hierarchy (the equivalent of the java. subtree), and some others are deprecated to lower subtree (the equivalent of com.sun). Anyway, I agree with Guido on this one -- naming is a contentious issue wrought with long-term implications. Let's not rush into a decision just yet. --david From guido at python.org Sat Mar 25 19:56:20 2000 From: guido at python.org (Guido van Rossum) Date: Sat, 25 Mar 2000 13:56:20 -0500 Subject: [Python-Dev] 1.6 job list In-Reply-To: Your message of "Sat, 25 Mar 2000 01:35:39 PST." References: Message-ID: <200003251856.NAA09636@eric.cnri.reston.va.us> > I say "do it incrementally" while others say "do it all at once." > Personally, I don't think it is possible to do all at once. As a > corollary, if you can't do it all at once, but you *require* that it be > done all at once, then you have effectively deferred the problem. To put > it another way, Guido has already invented a reason to not do it: he just > requires that it be done all at once. Result: it won't be done. Bullshit, Greg. (I don't normally like to use such strong words, but since you're being confrontational here...) I'm all for doing it incrementally -- but I want the plan for how to do it made up front. That doesn't require all the details to be worked out -- but it requires a general idea about what kind of things we will have in the namespace and what kinds of names they get. An organizing principle, if you like. If we were to decide later that we go for a Java-like deep hierarchy, the network package would have to be moved around again -- what a waste. --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez at math.huji.ac.il Sat Mar 25 20:35:37 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 25 Mar 2000 21:35:37 +0200 (IST) Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: On Sat, 25 Mar 2000, David Ascher wrote: > > My position is that any 3rd party module decides for itself where it wants > > to live -- once we formalized the framework. Consider PyGTK/PyGnome, > > PyQT/PyKDE -- they should live in the UI package too... > > That sounds good in theory, but I can see possible problems down the line: > > 1) The current mapping between package names and directory structure means > that installing a third party package hierarchy in a different place on disk > than the standard library requires some work on the import mechanisms (this > may have been discussed already) and a significant amount of user education. Ummmm.... 1.a) If the work of the import-sig produces something (which I suspect it will), it's more complicated -- you could have JAR-like files with hierarchies inside. 1.b) Installation is the domain of the distutils-sig. I seem to remember Greg Ward saying something about installing packages. > 2) We either need a 'registration' mechanism whereby people can claim a name > in the standard hierarchy or expect conflicts. As far as I can gather, in > the Perl world registration occurs by submission to CPAN. Correct? Yes. But this is no worse then the current situation, where people pick a toplevel name . I agree a registration mechanism would be helpful. > One alternative is to go the Java route, which would then mean, I think, > that some core modules are placed very high in the hierarchy (the equivalent > of the java. subtree), and some others are deprecated to lower subtree (the > equivalent of com.sun). Personally, I *hate* the Java mechanism -- see Stallman's position on why GNU Java packages use gnu.* rather then org.gnu.* for some of the reasons. I really, really, like the Perl mechanism, and I think we would do well to think if something like that wouldn't suit us, with minor modifications. (Remember that lwall copied the Pythonic module mechanism, so Perl and Python modules are quite similar) > Anyway, I agree with Guido on this one -- naming is a contentious issue > wrought with long-term implications. Let's not rush into a decision just > yet. I agree. That's why I pushed out the straw-man proposal. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From bwarsaw at cnri.reston.va.us Sat Mar 25 21:07:27 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Sat, 25 Mar 2000 15:07:27 -0500 (EST) Subject: [Python-Dev] 1.6 job list References: <14555.57858.824301.693390@anthem.cnri.reston.va.us> Message-ID: <14557.7295.451011.36533@anthem.cnri.reston.va.us> I guess I was making a request for a more comprehensive list. People are asking to packagize the entire directory, so I'd like to know what organization they'd propose for all the modules. -Barry From bwarsaw at cnri.reston.va.us Sat Mar 25 21:20:09 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Sat, 25 Mar 2000 15:20:09 -0500 (EST) Subject: [Python-Dev] 1.6 job list References: <200003242129.QAA06510@eric.cnri.reston.va.us> Message-ID: <14557.8057.896921.693908@anthem.cnri.reston.va.us> >>>>> "MZ" == Moshe Zadka writes: MZ> Hmmmmm....this is a big problem. Maybe we need to have more MZ> people with access to the CVS? To make changes like this, you don't just need write access to CVS, you need physical access to the repository filesystem. It's not possible to provide this access to non-CNRI'ers. -Barry From gstein at lyra.org Sat Mar 25 21:40:59 2000 From: gstein at lyra.org (Greg Stein) Date: Sat, 25 Mar 2000 12:40:59 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <14557.8057.896921.693908@anthem.cnri.reston.va.us> Message-ID: On Sat, 25 Mar 2000, Barry A. Warsaw wrote: > >>>>> "MZ" == Moshe Zadka writes: > > MZ> Hmmmmm....this is a big problem. Maybe we need to have more > MZ> people with access to the CVS? > > To make changes like this, you don't just need write access to CVS, > you need physical access to the repository filesystem. It's not > possible to provide this access to non-CNRI'ers. Unless the CVS repository was moved to, say, SourceForge. :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From bwarsaw at cnri.reston.va.us Sat Mar 25 22:00:39 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Sat, 25 Mar 2000 16:00:39 -0500 (EST) Subject: [Python-Dev] module reorg (was: 1.6 job list) References: Message-ID: <14557.10487.736544.336550@anthem.cnri.reston.va.us> >>>>> "MZ" == Moshe Zadka writes: MZ> Personally, I *hate* the Java mechanism -- see Stallman's MZ> position on why GNU Java packages use gnu.* rather then MZ> org.gnu.* for some of the reasons. Actually, it's Per Bothner's position: http://www.gnu.org/software/java/why-gnu-packages.txt and I agree with him. I kind of wished that JimH had chosen simply `python' as JPython's top level package heirarchy, but that's too late to change now. -Barry From bwarsaw at cnri.reston.va.us Sat Mar 25 22:03:08 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Sat, 25 Mar 2000 16:03:08 -0500 (EST) Subject: [Python-Dev] 1.6 job list References: <14557.8057.896921.693908@anthem.cnri.reston.va.us> Message-ID: <14557.10636.504088.517078@anthem.cnri.reston.va.us> >>>>> "GS" == Greg Stein writes: GS> Unless the CVS repository was moved to, say, SourceForge. I didn't want to rehash that, but yes, you're absolutely right! -Barry From gstein at lyra.org Sat Mar 25 22:13:00 2000 From: gstein at lyra.org (Greg Stein) Date: Sat, 25 Mar 2000 13:13:00 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <14557.10636.504088.517078@anthem.cnri.reston.va.us> Message-ID: On Sat, 25 Mar 2000 bwarsaw at cnri.reston.va.us wrote: > >>>>> "GS" == Greg Stein writes: > > GS> Unless the CVS repository was moved to, say, SourceForge. > > I didn't want to rehash that, but yes, you're absolutely right! Me neither, ergo the smiley :-) Just felt inclined to mention it, and I think the conversation stopped last time at that point; not sure it ever was "hashed" :-). But it is only a discussion to raise if checkins-via-CNRI-guys becomes a true bottleneck. Which it hasn't and doesn't look to be. Constrained? Yes. Bottleneck? No. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jeremy at cnri.reston.va.us Sat Mar 25 22:22:09 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Sat, 25 Mar 2000 16:22:09 -0500 (EST) Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: References: Message-ID: <14557.4689.858620.578102@walden> >>>>> "MH" == Mark Hammond writes: MH> [Greg writes] >> I'm not even going to attempt to try to define a hierarchy for >> all those modules. I count 137 on my local system. Let's say >> that I *do* try... some are going to end up "forced" rather than >> obeying some obvious grouping. If you do it a chunk at a time, >> then you get the obvious, intuitive groupings. Try for more, and >> you just bung it all up. MH> I agree with Greg - every module will not fit into a package. Sure. No one is arguing with that :-). Where I disagree with Greg, is that we shouldn't approach this piecemeal. A greedy algorithm can lead to a locally optimal solution that isn't the right for the whole library. A name or grouping might make sense on its own, but isn't sufficiently clear when taking all 137 odd modules into account. MH> But I also agree with Guido - we _should_ attempt to go through MH> the 137 modules and put the ones that fit into logical MH> groupings. Greg is probably correct with his selection for MH> "net", but a general evaluation is still a good thing. A view MH> of the bigger picture will help to quell debates over the MH> structure, and only leave us with the squabbles over the exact MH> spelling :-) x1.5 on this. I'm not sure which direction you ended up thinking this was (+ or -), but which ever direction it was I like it. Jeremy From gstein at lyra.org Sat Mar 25 22:40:48 2000 From: gstein at lyra.org (Greg Stein) Date: Sat, 25 Mar 2000 13:40:48 -0800 (PST) Subject: [Python-Dev] voting numbers Message-ID: Hey... just thought I'd drop off a description of the "formal" mechanism that the ASF uses for voting since it has been seen here and there on this group :-) +1 "I'm all for it. Do it!" +0 "Seems cool and acceptable, but I can also live without it" -0 "Not sure this is the best thing to do, but I'm not against it." -1 "Veto. And is my reasoning." Strictly speaking, there is no vetoing here, other than by Guido. For changes to Apache (as opposed to bug fixes), it depends on where the development is. Early stages, it is reasonably open and people work straight against CVS (except for really big design changes). Late stage, it requires three +1 votes during discussion of a patch before it goes in. Here on python-dev, it would seem that the votes are a good way to quickly let Guido know people's feelings about topic X or Y. On the patches mailing list, the voting could actually be quite a useful measure for the people with CVS commit access. If a patch gets -1, then its commit should wait until reason X has been resolved. Note that it can be resolved in two ways: the person lifts their veto (after some amount of persuasion or explanation), or the patch is updated to address the concerns (well, unless the veto is against the concept of the patch entirely :-). If a patch gets a few +1 votes, then it can probably go straight in. Note that the Apache guys sometimes say things like "+1 on concept" meaning they like the idea, but haven't reviewed the code. Do we formalize on using these? Not really suggesting that. But if myself (and others) drop these things into mail notes, then we may as well have a description of just what the heck is going on :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From moshez at math.huji.ac.il Sun Mar 26 00:27:18 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 26 Mar 2000 01:27:18 +0200 (IST) Subject: [Python-Dev] Q: repr.py vs. pprint.py Message-ID: Is there any reason to keep two seperate modules with simple-formatting functions? I think pprint is somewhat more sophisticated, but in the worst case, we can just dump them both in the same file (the only thing would be that pprint would export "repr", in addition to "saferepr" (among others). (Just bumped into this in my reorg suggestion) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Sun Mar 26 00:32:38 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 26 Mar 2000 01:32:38 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 Message-ID: Here's a second version of the straw man proposal for the reorganization of modules in packages. Note that I'm treating it as a strictly 1.7 proposal, so I don't care a "lot" about backwards compatiblity. I'm down to 4 unhandled modules, which means that if no one objects (and I'm sure someone will ), this can be a plan of action. So get your objections ready guys! net httplib ftplib urllib cgi gopherlib imaplib poplib nntplib smptlib urlparse telnetlib server BaseHTTPServer CGIHTTPServer SimpleHTTPServer SocketServer asynchat asyncore text sgmllib htmllib htmlentitydefs xml whatever the xml-sig puts here mail rfc822 mime MimeWriter mimetools mimify mailcap mimetypes base64 quopri mailbox mhlib binhex parse string re regex reconvert regex_syntax regsub shlex ConfigParser linecache multifile netrc bin gzip zlib aifc chunk image imghdr colorsys imageop imgfile rgbimg yuvconvert sound sndhdr toaiff audiodev sunau sunaudio wave audioop sunaudiodev db anydbm whichdb bsddb dbm dbhash dumbdbm gdbm math bisect fpformat random whrandom cmath math crypt fpectl fpetest array md5 mpz rotor sha time calendar time tzparse sched timing interpreter new py_compile code codeop compileall keyword token tokenize parser dis bdb pdb profile pyclbr tabnanny symbol pstats traceback rlcompleter security Bastion rexec ihooks file dircache path -- a virtual module which would do a from path import * dospath posixpath macpath nturl2path ntpath macurl2path filecmp fileinput StringIO cStringIO glob fnmatch posixfile stat statcache statvfs tempfile shutil pipes popen2 commands dl fcntl lowlevel socket select terminal termios pty tty readline syslog serialize pickle cPickle shelve xdrlib copy copy_reg threads thread threading Queue mutex ui curses Tkinter cmd getpass internal _codecs _locale _tkinter pcre strop posix users pwd grp nis sgi al cd cl fl fm gl misc (what used to be sgimodule.c) sv unicode codecs unicodedata unicodedatabase exceptions os types UserDict UserList user site locale pure formatter getopt signal pprint ========== Modules not handled ============ errno resource operator struct -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From DavidA at ActiveState.com Sun Mar 26 00:39:51 2000 From: DavidA at ActiveState.com (David Ascher) Date: Sat, 25 Mar 2000 15:39:51 -0800 Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: > I really, really, like the Perl mechanism, and I think we would do well > to think if something like that wouldn't suit us, with minor > modifications. The biggest modification which I think is needed to a Perl-like organization is that IMO there is value in knowing what packages are 'blessed' by Guido. In other words, some sort of Q/A mechanism would be good, if it can be kept simple. [Alternatively, let's not put a Q/A mechanism in place and my employer can make money selling that information, the way they do for Perl! =)] > (Remember that lwall copied the Pythonic module mechanism, > so Perl and Python modules are quite similar) That's stretching things a bit (the part after the 'so' doesn't follow from the part before), as there is a lot more to the nature of module systems, but the point is well taken. --david From moshez at math.huji.ac.il Sun Mar 26 06:44:02 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 26 Mar 2000 06:44:02 +0200 (IST) Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message-ID: On Sat, 25 Mar 2000, David Ascher wrote: > The biggest modification which I think is needed to a Perl-like organization > is that IMO there is value in knowing what packages are 'blessed' by Guido. > In other words, some sort of Q/A mechanism would be good, if it can be kept > simple. You got a point. Anyone knows how the perl-porters decide what modules to put in source.tar.gz? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping at lfw.org Sun Mar 26 07:01:58 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sat, 25 Mar 2000 21:01:58 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Moshe Zadka wrote: > Here's a second version of the straw man proposal for the reorganization > of modules in packages. Note that I'm treating it as a strictly 1.7 > proposal, so I don't care a "lot" about backwards compatiblity. Hey, this looks pretty good. For the most part i agree with your layout. Here are a few notes... > net [...] > server [...] Good. > text [...] > xml > whatever the xml-sig puts here > mail > rfc822 > mime > MimeWriter > mimetools > mimify > mailcap > mimetypes > base64 > quopri > mailbox > mhlib > binhex I'm not convinced "mime" needs a separate branch here. (This is the deepest part of the tree, and at three levels small alarm bells went off in my head.) For example, why text.binhex but text.mail.mime.base64? > parse > string > re > regex > reconvert > regex_syntax > regsub > shlex > ConfigParser > linecache > multifile > netrc The "re" module, in particular, will get used a lot, and it's not clear why these all belong under "parse". I suggest dropping "parse" and moving these up. What's "multifile" doing here instead of with the rest of the mail/mime stuff? > bin [...] I like this. Good idea. > gzip > zlib > aifc Shouldn't "aifc" be under "sound"? > image [...] > sound [...] > db [...] Yup. > math [...] > time [...] Looks good. > interpreter [...] How about just "interp"? > security [...] > file [...] > lowlevel > socket > select Why the separate "lowlevel" branch? Why doesn't "socket" go under "net"? > terminal > termios > pty > tty > readline Why does "terminal" belong under "file"? Maybe it could go under "ui"? Hmm... "pty" doesn't really belong. > syslog Hmm... > serialize > pickle > cPickle > shelve > xdrlib > copy > copy_reg "copy" doesn't really fit here under "serialize", and "serialize" is kind of a long name. How about a "data types" package? We could then put "struct", "UserDict", "UserList", "pprint", and "repr" here. data copy copy_reg pickle cPickle shelve xdrlib struct UserDict UserList pprint repr On second thought, maybe "struct" fits better under "bin". > threads [...] > ui [...] Uh huh. > internal > _codecs > _locale > _tkinter > pcre > strop > posix Not sure this is a good idea. It means the Unicode work lives under both "unicode" and "internal._codecs", Tk is split between "ui" and "internal._tkinter", regular expressions are split between "text.re" and "internal.pcre". I can see your motivation for getting "posix" out of the way, but i suspect this is likely to confuse people. > users > pwd > grp > nis Hmm. Yes, i suppose so. > sgi [...] > unicode [...] Indeed. > os > UserDict > UserList > exceptions > types > operator > user > site Yeah, these are all top-level (except maybe UserDict and UserList, see above). > locale I think "locale" belongs under "math" with "fpformat" and the others. It's for numeric formatting. > pure What the heck is "pure"? > formatter This probably goes under "text". > struct See above under "data". I can't decide whether "struct" should be part of "data" or "bin". Hmm... probably "bin" -- since, unlike the serializers under "data", "struct" does not actually specify a serialization format, it only provides fairly low-level operations. Well, this leaves a few system-like modules that didn't really fit elsewhere for me: pty tty termios syslog select getopt signal errno resource They all seem to be Unix-related. How about putting these in a "unix" or "system" package? -- ?!ng "I'm not trying not to answer the question; i'm just not answering it." -- Lenore Snell From moshez at math.huji.ac.il Sun Mar 26 07:58:34 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 26 Mar 2000 07:58:34 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sat, 25 Mar 2000, Ka-Ping Yee wrote: > I'm not convinced "mime" needs a separate branch here. > (This is the deepest part of the tree, and at three levels > small alarm bells went off in my head.) I've had my problems with that too, but it seemed to many modules were mime specific. > For example, why text.binhex but text.mail.mime.base64? Actually, I thought about this (this isn't random at all): base64 encoding is part of the mime standard, together with quoted-printable. Binhex isn't. I don't know if you find it reason enough, and it may be smarter just having a text.encode.{quopri,uu,base64,binhex} > > parse > > string > > re > > regex > > reconvert > > regex_syntax > > regsub > > shlex > > ConfigParser > > linecache > > multifile > > netrc > > The "re" module, in particular, will get used a lot, and from import re Doesn't seem too painful. > and it's not clear why these all belong under "parse". These are all used for parsing data (which does not have some pre-written parser). I had problems with the name too... > What's "multifile" doing here instead of with the rest > of the mail/mime stuff? It's also useful generally. > Shouldn't "aifc" be under "sound"? You're right. > > interpreter > [...] > > How about just "interp"? I've no *strong* feelings, just a vague "don't abbrev." hunch > Why the separate "lowlevel" branch? Because it is -- most Python code will use one of the higher level modules. > Why doesn't "socket" go under "net"? What about UNIX domain sockets? Again, no *strong* opinion, though. > > terminal > > termios > > pty > > tty > > readline > > Why does "terminal" belong under "file"? Because it is (a special kind of file) > > serialize > > > pickle > > cPickle > > shelve > > xdrlib > > copy > > copy_reg > > "copy" doesn't really fit here under "serialize", and > "serialize" is kind of a long name. I beg to disagree -- "copy" is frequently close to serialization, both in the model (serializing to a "data structure") and in real life (that's the way people copy stuff in Java, and UNIX too: think tar cvf - | tar xvf -) What's more, copy_reg is used both for copy and for pickle I do like the idea of "data-types" package, but it needs to be ironed out a bit. > > internal > > _codecs > > _locale > > _tkinter > > pcre > > strop > > posix > > Not sure this is a good idea. It means the Unicode > work lives under both "unicode" and "internal._codecs", > Tk is split between "ui" and "internal._tkinter", > regular expressions are split between "text.re" and > "internal.pcre". I can see your motivation for getting > "posix" out of the way, but i suspect this is likely to > confuse people. You mistook my motivation -- I just want unadvertised modules (AKA internal use modules) to live in a carefully segregate section of the namespace. How would this confuse people? No one imports _tkinter or pcre, so no one would notice the change. > > locale > > I think "locale" belongs under "math" with "fpformat" and > the others. It's for numeric formatting. Only? And anyway, I doubt many people will think like that. > > pure > > What the heck is "pure"? A module that helps work with purify. > > formatter > > This probably goes under "text". You're right. > Well, this leaves a few system-like modules that didn't > really fit elsewhere for me: > > pty > tty > termios > syslog > select > getopt > signal > errno > resource > > They all seem to be Unix-related. How about putting these > in a "unix" or "system" package? "select", "signal" aren't UNIX specific. "getopt" is used for generic argument processing, so it isn't really UNIX specific. And I don't like the name "system" either. But I have no constructive proposals about thos either. so-i'll-just-shut-up-now-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From dan at cgsoftware.com Sun Mar 26 08:05:44 2000 From: dan at cgsoftware.com (Daniel Berlin) Date: Sat, 25 Mar 2000 22:05:44 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: > "select", "signal" aren't UNIX specific. Huh? How not? Can you name a non-UNIX that is providing them? (BeOS wouldn't count, select is broken, and nobody uses signals.) and if you can, is it providing them for something other than "UNIX/POSIX compatibility" > "getopt" is used for generic argument processing, so it isn't really UNIX > specific. It's a POSIX.2 function. I consider that UNIX. > And I don't like the name "system" either. But I have no > constructive proposals about thos either. > > so-i'll-just-shut-up-now-ly y'rs, Z. > -- just-picking-nits-ly y'rs, Dan From moshez at math.huji.ac.il Sun Mar 26 08:32:33 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 26 Mar 2000 08:32:33 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sat, 25 Mar 2000, Daniel Berlin wrote: > > > "select", "signal" aren't UNIX specific. > Huh? > How not? > Can you name a non-UNIX that is providing them? Win32. Both of them. I've even used select there. > and if you can, is it providing them for something other than "UNIX/POSIX > compatibility" I don't know what it provides them for, but I've *used* *select* on *WinNT*. I don't see why Python should make me feel bad when I'm doing that. > > "getopt" is used for generic argument processing, so it isn't really UNIX > > specific. > > It's a POSIX.2 function. > I consider that UNIX. Well, the argument style it processes is not unheard of in other OSes, and it's nice to have command line apps that have a common ui. That's it! "getopt" belongs in the ui package! -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping at lfw.org Sun Mar 26 09:23:45 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sat, 25 Mar 2000 23:23:45 -0800 (PST) Subject: [Python-Dev] cPickle and cStringIO Message-ID: Are there any objections to including try: from cPickle import * except: pass in pickle and try: from cStringIO import * except: pass in StringIO? -- ?!ng "I'm not trying not to answer the question; i'm just not answering it." -- Lenore Snell From moshez at math.huji.ac.il Sun Mar 26 09:14:10 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 26 Mar 2000 09:14:10 +0200 (IST) Subject: [Python-Dev] cPickle and cStringIO In-Reply-To: Message-ID: On Sat, 25 Mar 2000, Ka-Ping Yee wrote: > Are there any objections to including > > try: > from cPickle import * > except: > pass > > in pickle and > > try: > from cStringIO import * > except: > pass > > in StringIO? Yes, until Python types are subclassable. Currently, one can inherit from pickle.Pickler/Unpickler and StringIO.StringIO. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping at lfw.org Sun Mar 26 09:37:11 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sat, 25 Mar 2000 23:37:11 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: Okay, here's another shot at it. Notice a few things: - no text.mime package - encoders moved to text.encode - Unix stuff moved to unix package (no file.lowlevel, file.terminal) - aifc moved to bin.sound package - struct moved to bin package - locale moved to math package - linecache moved to interp package - data-type stuff moved to data package - modules in internal package moved to live with their friends Modules that are deprecated or not really intended to be imported are listed in parentheses (to give a better idea of the "real" size of each package). cStringIO and cPickle are parenthesized in hopeful anticipation of agreement on my last message... net urlparse urllib ftplib gopherlib imaplib poplib nntplib smtplib telnetlib httplib cgi server BaseHTTPServer CGIHTTPServer SimpleHTTPServer SocketServer asynchat asyncore text re # general-purpose parsing sgmllib htmllib htmlentitydefs xml whatever the xml-sig puts here mail rfc822 mailbox mhlib encode # i'm also ok with moving text.encode.* to text.* binhex uu base64 quopri MimeWriter mimify mimetools mimetypes multifile mailcap # special-purpose file parsing shlex ConfigParser netrc formatter (string, strop, pcre, reconvert, regex, regex_syntax, regsub) bin gzip zlib chunk struct image imghdr colorsys # a bit unsure, but doesn't go anywhere else imageop imgfile rgbimg yuvconvert sound aifc sndhdr toaiff audiodev sunau sunaudio wave audioop sunaudiodev db anydbm whichdb bsddb dbm dbhash dumbdbm gdbm math math # library functions cmath fpectl # type-related fpetest array mpz fpformat # formatting locale bisect # algorithm: also unsure, but doesn't go anywhere else random # randomness whrandom crypt # cryptography md5 rotor sha time calendar time tzparse sched timing interp new linecache # handling .py files py_compile code # manipulating internal objects codeop dis traceback compileall keyword # interpreter constants token symbol tokenize # parsing parser bdb # development pdb profile pyclbr tabnanny pstats rlcompleter # this might go in "ui"... security Bastion rexec ihooks file dircache path -- a virtual module which would do a from path import * nturl2path macurl2path filecmp fileinput StringIO glob fnmatch stat statcache statvfs tempfile shutil pipes popen2 commands dl (dospath, posixpath, macpath, ntpath, cStringIO) data pickle shelve xdrlib copy copy_reg UserDict UserList pprint repr (cPickle) threads thread threading Queue mutex ui _tkinter curses Tkinter cmd getpass getopt readline users pwd grp nis sgi al cd cl fl fm gl misc (what used to be sgimodule.c) sv unicode _codecs codecs unicodedata unicodedatabase unix errno resource signal posix posixfile socket select syslog fcntl termios pty tty _locale exceptions sys os types user site pure operator -- ?!ng "I'm not trying not to answer the question; i'm just not answering it." -- Lenore Snell From ping at lfw.org Sun Mar 26 09:40:27 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sat, 25 Mar 2000 23:40:27 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: Hey, while we're at it... as long as we're renaming modules, what do you all think of getting rid of that "lib" suffix? As in: > net > urlparse > url > ftp > gopher > imap > pop > nntp > smtp > telnet > http > cgi > server [...] > text > re # general-purpose parsing > sgml > html > htmlentitydefs [...] "import net.ftp" seems nicer to me than "import ftplib". We could also just stick htmlentitydefs.entitydefs in html and deprecate htmlentitydefs. -- ?!ng "I'm not trying not to answer the question; i'm just not answering it." -- Lenore Snell From ping at lfw.org Sun Mar 26 09:53:06 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sat, 25 Mar 2000 23:53:06 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Moshe Zadka wrote: > > For example, why text.binhex but text.mail.mime.base64? > > Actually, I thought about this (this isn't random at all): base64 encoding > is part of the mime standard, together with quoted-printable. Binhex > isn't. I don't know if you find it reason enough, and it may be smarter > just having a text.encode.{quopri,uu,base64,binhex} I think i'd like that better, yes. > > and it's not clear why these all belong under "parse". > > These are all used for parsing data (which does not have some pre-written > parser). I had problems with the name too... And parsing is what the "text" package is about anyway. I say move them up. (See the layout in my other message. Notice most of the regular-expression stuff is deprecated anyway, so it's not like there are really that many.) > > Why doesn't "socket" go under "net"? > > What about UNIX domain sockets? Again, no *strong* opinion, though. Bleck, you're right. Well, i think we just have to pick one or the other here, and i think most people would guess "net" first. (You can think of it as IPC, and file IPC-related things under then "net" category...?) > > Why does "terminal" belong under "file"? > > Because it is (a special kind of file) Only in Unix. It's Unix that likes to think of all things, including terminals, as files. > I do like the idea of "data-types" package, but it needs to be ironed > out a bit. See my other message for a possible suggested hierarchy... > > > internal [...] > You mistook my motivation -- I just want unadvertised modules (AKA > internal use modules) to live in a carefully segregate section of the > namespace. How would this confuse people? No one imports _tkinter or pcre, > so no one would notice the change. I think it makes more sense to classify modules by their topic rather than their exposure. (For example, you wouldn't move deprecated modules to a "deprecated" package.) Keep in mind that (well, at least to me) the main point of any naming hierarchy is to avoid name collisions. "internal" doesn't really help that purpose. You also want to be sure (or as sure as you can) that modules will be obvious to find in the hierarchy. An "internal" package creates a distinction orthogonal to the topic-matter distinction we're using for the rest of the packages, which *potentially* introduces the question "well... is this module internal or not?" for every other module. Yes, admittedly this is only "potentially", but i hope you see the abstract point i'm trying to make... > > > locale > > > > I think "locale" belongs under "math" with "fpformat" and > > the others. It's for numeric formatting. > > Only? And anyway, I doubt many people will think like that. Yeah, it is pretty much only for numeric formatting. The more generic locale stuff seems to be in _locale. > > They all seem to be Unix-related. How about putting these > > in a "unix" or "system" package? > > "select", "signal" aren't UNIX specific. Yes, but when they're available on other systems they're an attempt to emulate Unix or Posix functionality, aren't they? > Well, the argument style it processes is not unheard of in other OSes, and > it's nice to have command line apps that have a common ui. That's it! > "getopt" belongs in the ui package! I like ui.getopt. It's a pretty good idea. -- ?!ng "I'm not trying not to answer the question; i'm just not answering it." -- Lenore Snell From moshez at math.huji.ac.il Sun Mar 26 10:05:49 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 26 Mar 2000 10:05:49 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: +1. I've had minor nits, but nothing is perfect, and this is definitely "good enough". Now we'll just have to wait until the BDFL says something... -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Sun Mar 26 10:06:59 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 26 Mar 2000 10:06:59 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sat, 25 Mar 2000, Ka-Ping Yee wrote: > Hey, while we're at it... as long as we're renaming modules, > what do you all think of getting rid of that "lib" suffix? +0 -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Sun Mar 26 10:19:34 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 26 Mar 2000 10:19:34 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sat, 25 Mar 2000, Ka-Ping Yee wrote: > > "select", "signal" aren't UNIX specific. > > Yes, but when they're available on other systems they're an > attempt to emulate Unix or Posix functionality, aren't they? I thinki "signal" is ANSI C, but I'm not sure. no-other-comments-ly y'rs, Z. From gstein at lyra.org Sun Mar 26 13:52:53 2000 From: gstein at lyra.org (Greg Stein) Date: Sun, 26 Mar 2000 03:52:53 -0800 (PST) Subject: [Python-Dev] Windows and PyObject_NEW In-Reply-To: <1258123323-10623548@hypernet.com> Message-ID: On Sat, 25 Mar 2000, Gordon McMillan wrote: >... > I doubt very much that you would break anybody's code by > removing the Windows specific behavior. > > But it seems to me that unless Python always uses the > default malloc, those of us who write C++ extensions will have > to override operator new? I'm not sure. I've used placement > new to allocate objects in a memory mapped file, but I've never > tried to muck with the global memory policy of C++ program. Actually, the big problem arises when you have debug vs. non-debug DLLs. malloc() uses different heaps based on the debug setting. As a result, it is a bad idea to call malloc() from a debug DLL and free() it from a non-debug DLL. If the allocation pattern is fixed, then things may be okay. IF. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sun Mar 26 14:02:40 2000 From: gstein at lyra.org (Greg Stein) Date: Sun, 26 Mar 2000 04:02:40 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Moshe Zadka wrote: >... > [ tree ] This is a great start. I have two comments: 1) keep it *very* shallow. depth just makes it conceptually difficult. 2) you're pushing too hard. modules do not *have* to go into a package. there are some placements that you've made which are very questionable... it appears they are done for movement's sake rather than for being "right" I'm off to sleep, but will look into specific comments tomorrow or so. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sun Mar 26 14:14:32 2000 From: gstein at lyra.org (Greg Stein) Date: Sun, 26 Mar 2000 04:14:32 -0800 (PST) Subject: [Python-Dev] 1.6 job list In-Reply-To: <200003251856.NAA09636@eric.cnri.reston.va.us> Message-ID: On Sat, 25 Mar 2000, Guido van Rossum wrote: > > I say "do it incrementally" while others say "do it all at once." > > Personally, I don't think it is possible to do all at once. As a > > corollary, if you can't do it all at once, but you *require* that it be > > done all at once, then you have effectively deferred the problem. To put > > it another way, Guido has already invented a reason to not do it: he just > > requires that it be done all at once. Result: it won't be done. > > Bullshit, Greg. (I don't normally like to use such strong words, but > since you're being confrontational here...) Fair enough, and point accepted. Sorry. I will say, tho, that you've taken this slightly out of context. The next paragraph explicitly stated that I don't believe you had this intent. I just felt that coming up with a complete plan before doing anything would be prone to failure. You asked to invent a new reason :-), so I said you had one already :-) Confrontational? Yes, guilty as charged. I was a bit frustrated. > I'm all for doing it incrementally -- but I want the plan for how to > do it made up front. That doesn't require all the details to be > worked out -- but it requires a general idea about what kind of things > we will have in the namespace and what kinds of names they get. An > organizing principle, if you like. If we were to decide later that we > go for a Java-like deep hierarchy, the network package would have to > be moved around again -- what a waste. All righty. So I think there is probably a single question that I have here: Moshe posted a large breakdown of how things could be packaged. He and Ping traded a number of comments, and more will be coming as soon as people wake up :-) However, if you are only looking for a "general idea", then should python-dev'ers nit pick the individual modules, or just examine the general breakdown and hierarchy? thx, -g -- Greg Stein, http://www.lyra.org/ From moshez at math.huji.ac.il Sun Mar 26 14:09:02 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 26 Mar 2000 14:09:02 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Greg Stein wrote: > This is a great start. I have two comments: > > 1) keep it *very* shallow. depth just makes it conceptually difficult. I tried, and Ping shallowed it even more. BTW: Anyone who cares to comment, please comment on Ping's last suggestion. I pretty much agree with the changes he made. > 2) you're pushing too hard. modules do not *have* to go into a package. > there are some placements that you've made which are very > questionable... it appears they are done for movement's sake rather > than for being "right" Well, I'm certainly sorry I gave that impression -- the reason I wans't "right" wasn't that, it was more my desire to be "fast" -- I wanted to have some proposal out the door, since it is harder to argue about something concrete. The biggest prrof of concept that we all agree is that no one seriously took objections to anything -- there were just some minor nits to pick. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Sun Mar 26 14:11:10 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sun, 26 Mar 2000 14:11:10 +0200 (IST) Subject: [Python-Dev] 1.6 job list In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Greg Stein wrote: > Moshe posted a large breakdown of how things could be packaged. He and > Ping traded a number of comments, and more will be coming as soon as > people wake up :-) Just a general comment -- it's so much fun to live in a different zone then all of you guys. just-wasting-time-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From gstein at lyra.org Sun Mar 26 14:23:57 2000 From: gstein at lyra.org (Greg Stein) Date: Sun, 26 Mar 2000 04:23:57 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Moshe Zadka wrote: > On Sun, 26 Mar 2000, Greg Stein wrote: >... > > 2) you're pushing too hard. modules do not *have* to go into a package. > > there are some placements that you've made which are very > > questionable... it appears they are done for movement's sake rather > > than for being "right" > > Well, I'm certainly sorry I gave that impression -- the reason I wans't > "right" wasn't that, it was more my desire to be "fast" -- I wanted to > have some proposal out the door, since it is harder to argue about > something concrete. The biggest prrof of concept that we all agree is that > no one seriously took objections to anything -- there were just some minor > nits to pick. Not something to apologize for! :-) Well, the indicator was the line in your original post about "unhandled modules" and the conversation between you and Ping with statements along the lines of "wasn't sure where to put this." I say just leave it then :-) If a module does not make *obvious* sense to be in a package, then it should not be there. For example: locale. That is not about numbers or about text. It has general utility. If there was an i18n package, then it would go there. Otherwise, don't force it somewhere else. Other packages are similar, so don't single out my comment about locale. Cheers, -g -- Greg Stein, http://www.lyra.org/ From DavidA at ActiveState.com Sun Mar 26 20:09:15 2000 From: DavidA at ActiveState.com (David Ascher) Date: Sun, 26 Mar 2000 10:09:15 -0800 Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: > If a module does not make *obvious* sense to be in a package, then it > should not be there. For example: locale. That is not about numbers or > about text. It has general utility. If there was an i18n package, then it > would go there. Otherwise, don't force it somewhere else. Other packages > are similar, so don't single out my comment about locale. I maintain that a general principle re: what the aim of this reorg is is needed before the partitioning of the space can make sense. What Moshe and Ping have is a good stab at partitioning of a subspace of the total space of Python modules and packages, i.e., the standard library. If we limit the aim of the reorg to cover just that subspace, then that's fine and Ping's proposal seems grossly fine to me. If we want to have a Perl-like packaging, then we _need_ to take into account all known Python modules of general utility, such as the database modules, the various GUI packages, the mx* packages, Aaron's work, PIL, etc., etc. Ignoring those means that the dataset used to decide the partitioning function is highly biased. Given the larger dataset, locale might very well fit in a not-toplevel location. I know that any organizational scheme is going to be optimal at best at its inception, and that as history happens, it will become suboptimal. However, it's important to know what the space being partitioned is supposed to look like. A final comment: there's a history and science to this kind of organization, which is part of library science. I suspect there is quite a bit of knowledge available as to organizing principles to do it right. It would be nice if someone could research it a bit and summarize the basic principles to the rest of us. I agree with Greg that we need high-level input from Guido on this. --david 'academic today' ascher From ping at lfw.org Sun Mar 26 22:34:11 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 26 Mar 2000 12:34:11 -0800 (PST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Greg Stein wrote: > > If a module does not make *obvious* sense to be in a package, then it > should not be there. For example: locale. That is not about numbers or > about text. It has general utility. If there was an i18n package, then it > would go there. Otherwise, don't force it somewhere else. Other packages > are similar, so don't single out my comment about locale. I goofed. I apologize. Moshe and Greg are right: locale isn't just about numbers. I just read the comment at the top of locale.py: "Support for number formatting using the current locale settings" and didn't notice the from _locale import * a couple of lines down. "import locale; dir(locale)" didn't work for me because for some reason there's no _locale built-in on my system (Red Hat 6.1, python-1.5.1-10). So i looked for 'def's and they all looked like they had to do with numeric formatting. My mistake. "locale", at least, belongs at the top level. Other candidates for top-level: bisect # algorithm struct # more general than "bin" or "data" colorsys # not really just for image file formats yuvconvert # not really just for image file formats rlcompleter # not really part of the interpreter dl # not really just about files Alternatively, we could have: ui.rlcompleter, unix.dl (It would be nice, by the way, to replace "bisect" with an "algorithm" module containing some nice pedagogical implementations of things like bisect, quicksort, heapsort, Dijkstra's algorithm etc.) The following also could be left at the top-level, since they seem like applications (i.e. they probably won't get imported by code, only interactively). No strong opinion on this. bdb pdb pyclbr tabnanny profile pstats Also... i was avoiding calling the "unix" package "posix" because we already have a "posix" module. But wait... the proposed tree already contains "math" and "time" packages. If there is no conflict (is there a conflict?) then the "unix" package should probably be named "posix". -- ?!ng "In the sciences, we are now uniquely privileged to sit side by side with the giants on whose shoulders we stand." -- Gerald Holton From moshez at math.huji.ac.il Mon Mar 27 07:35:23 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Mon, 27 Mar 2000 07:35:23 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message-ID: On Sun, 26 Mar 2000, Ka-Ping Yee wrote: > The following also could be left at the top-level, since > they seem like applications (i.e. they probably won't > get imported by code, only interactively). No strong > opinion on this. > > bdb > pdb > pyclbr > tabnanny > profile > pstats Let me just state my feelings about the interpreter package: since Python programs are probably the most suited to reasoning about Python programs (among other things, thanks to the strong introspection capabilities of Python), many Python modules were written to supply a convenient interface to that introspection. These modules are *only* needed by programs dealing with Python programs, and hence should live in a well defined part of the namespace. I regret calling it "interpreter" though: "Python" is a better name (something like that java.lang package) > Also... i was avoiding calling the "unix" package "posix" > because we already have a "posix" module. But wait... the > proposed tree already contains "math" and "time" packages. Yes. That was a hard decision I made, and I'm sort of waiting for Guido to veto it: it would negate the easy backwards compatible path of providing a toplevel module for each module which is moved somewhere else which does "from import *". > If there is no conflict (is there a conflict?) then the > "unix" package should probably be named "posix". I hardly agree. "dl", for example, is a common function on unices, but it is not part of the POSIX standard. I think "posix" module should have POSIX fucntions, and the "unix" package should deal with functinality available on real-life unices. standards-are-fun-aren't-they-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From pf at artcom-gmbh.de Mon Mar 27 08:52:25 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Mon, 27 Mar 2000 08:52:25 +0200 (MEST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: from Moshe Zadka at "Mar 27, 2000 7:35:23 am" Message-ID: Hi! Moshe Zadka wrote: > Yes. That was a hard decision I made, and I'm sort of waiting for Guido to > veto it: it would negate the easy backwards compatible path of providing > a toplevel module for each module which is moved somewhere else which does > "from import *". If the result of this renaming initiative will be that I can't use import sys, os, time, re, struct, cPickle, parser import Tkinter; Tk=Tkinter; del Tkinter anymore in Python 1.x and instead I have to change this into (for example): form posix import time from text import re from bin import struct from Python import parser from ui import Tkinter; ... ... I would really really *HATE* this change! [side note: The 'from MODULE import ...' form is evil and I have abandoned its use in favor of the 'import MODULE' form in 1987 or so, as our Modula-2 programs got bigger and bigger. With 20+ software developers working on a ~1,000,000 LOC of Modula-2 software system, this decision proofed itself well. The situation with Python is comparable. Avoiding 'from ... import' rewards itself later, when your software has grown bigger and when it comes to maintaince by people not familar with the used modules. ] May be I didn't understand what this new subdivision of the standard library should achieve. The library documentation provides a existing logical subdivision into chapters, which group the library into several kinds of services. IMO this subdivision could be discussed and possibly revised. But at the moment I got the impression, that it was simply ignored. Why? What's so bad with it? Why is a subdivision on the documentation level not sufficient? Why should modules be moved into packages? I don't get it. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From moshez at math.huji.ac.il Mon Mar 27 09:09:18 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Mon, 27 Mar 2000 09:09:18 +0200 (IST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: Message-ID: On Mon, 27 Mar 2000, Peter Funk wrote: > If the result of this renaming initiative will be that I can't use > import sys, os, time, re, struct, cPickle, parser > import Tkinter; Tk=Tkinter; del Tkinter > anymore in Python 1.x and instead I have to change this into (for example): > form posix import time from time import time > from text import re > from bin import struct > from Python import parser > from ui import Tkinter; ... Yes. > I would really really *HATE* this change! Well, I'm sorry to hear that -- I'm waiting for this change to happen for a long time. > [side note: > The 'from MODULE import ...' form is evil and I have abandoned its use > in favor of the 'import MODULE' form in 1987 or so, as our Modula-2 > programs got bigger and bigger. With 20+ software developers working > on a ~1,000,000 LOC of Modula-2 software system, this decision > proofed itself well. Well, yes. Though syntactically equivalent, from package import module Is the recommended way to use packages, unless there is a specific need. > May be I didn't understand what this new subdivision of the standard > library should achieve. Namespace cleanup. Too many toplevel names seem evil to some of us. > Why is a subdivision on the documentation level not sufficient? > Why should modules be moved into packages? I don't get it. To allow a greater number of modules to live without worrying about namespace collision. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping at lfw.org Mon Mar 27 10:08:57 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 27 Mar 2000 00:08:57 -0800 (PST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: Message-ID: Hi, Peter. Your question as to the purpose of module reorganization is well worth asking, and perhaps we should stand back for a while and try to really answer it well first. I think that my answers for your question would be: 1. To alleviate potential namespace collision. 2. To permit talking about packages as a unit. I hereby solicit other reasons from the rest of the group... Reason #1 is not a serious problem yet, but i think i've seen a few cases where it might start to be an issue. Reason #2 has to do with things like assigning people responsibility for taking care of a particular package, or making commitments about which packages will be available with which distributions or platforms. Hence, for example, the idea of the "unix" package. Neither of these reasons necessitate a deep and holy hierarchy, so we certainly want to keep it shallow and simple if we're going to do this at all. > If the result of this renaming initiative will be that I can't use > import sys, os, time, re, struct, cPickle, parser > import Tkinter; Tk=Tkinter; del Tkinter > anymore in Python 1.x and instead I have to change this into (for example): > form posix import time > from text import re > from bin import struct > from Python import parser > from ui import Tkinter; ... Won't import sys, os, time.time, text.re, bin.struct, data.pickle, python.parser also work? ...i hope? > The library documentation provides a existing logical subdivision into > chapters, which group the library into several kinds of services. > IMO this subdivision could be discussed and possibly revised. > But at the moment I got the impression, that it was simply ignored. > Why? What's so bad with it? I did look at the documentation for some guidance in arranging the modules, though admittedly it didn't direct me much. -- ?!ng "In the sciences, we are now uniquely privileged to sit side by side with the giants on whose shoulders we stand." -- Gerald Holton From pf at artcom-gmbh.de Mon Mar 27 10:35:50 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Mon, 27 Mar 2000 10:35:50 +0200 (MEST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: from Ka-Ping Yee at "Mar 27, 2000 0: 8:57 am" Message-ID: Hi! > > import sys, os, time, re, struct, cPickle, parser [...] Ka-Ping Yee: > Won't > > import sys, os, time.time, text.re, bin.struct, data.pickle, python.parser > > also work? ...i hope? That is even worse. So not only the 'import' sections, which I usually keep at the top of my modules, have to be changed: This way for example 're.compile(...' has to be changed into 'text.re.compile(...' all over the place possibly breaking the 'Maximum Line Length' styleguide rule. Regards, Peter From pf at artcom-gmbh.de Mon Mar 27 12:16:48 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Mon, 27 Mar 2000 12:16:48 +0200 (MEST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? Message-ID: String objects have grown methods since 1.5.2. So it makes sense to provide a class 'UserString' similar to 'UserList' and 'UserDict', so that there is a standard base class to inherit from, if someone has the desire to extend the string methods. What do you think? Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From fdrake at acm.org Mon Mar 27 17:12:55 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 27 Mar 2000 10:12:55 -0500 (EST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: References: Message-ID: <14559.31351.783771.472320@weyr.cnri.reston.va.us> Moshe Zadka writes: > Well, I'm certainly sorry I gave that impression -- the reason I wans't > "right" wasn't that, it was more my desire to be "fast" -- I wanted to > have some proposal out the door, since it is harder to argue about > something concrete. The biggest prrof of concept that we all agree is that > no one seriously took objections to anything -- there were just some minor > nits to pick. It's *really easy* to argue about something concrete. ;) It's just harder to misunderstand the specifics of the proposal. It's too early to say what people think; not enough people have had time to look at the proposals yet. On the other hand, I think its great -- that we have a proposal to discuss. I'll make my comments after I've read through the last version posted when I have time to read these. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake at acm.org Mon Mar 27 18:20:43 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 27 Mar 2000 11:20:43 -0500 (EST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: References: Message-ID: <14559.35419.793906.868645@weyr.cnri.reston.va.us> Peter Funk said: > The library documentation provides a existing logical subdivision into > chapters, which group the library into several kinds of services. > IMO this subdivision could be discussed and possibly revised. > But at the moment I got the impression, that it was simply ignored. > Why? What's so bad with it? Ka-Ping Yee writes: > I did look at the documentation for some guidance in arranging > the modules, though admittedly it didn't direct me much. The library reference is pretty well disorganized at this point. I want to improve that for the 1.6 docs. I received a suggestion a few months back, but haven't had a chance to dig into it, or even respond to the email. ;( -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From jeremy at cnri.reston.va.us Mon Mar 27 19:14:46 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Mon, 27 Mar 2000 12:14:46 -0500 (EST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: References: Message-ID: <14559.38662.835289.499610@goon.cnri.reston.va.us> >>>>> "PF" == Peter Funk writes: PF> That is even worse. So not only the 'import' sections, which I PF> usually keep at the top of my modules, have to be changed: This PF> way for example 're.compile(...' has to be changed into PF> 'text.re.compile(...' all over the place possibly breaking the PF> 'Maximum Line Length' styleguide rule. There is nothing wrong with changing only the import statement: from text import re The only problematic use of from ... import ... is from text.re import * which adds an unspecified set of names to the current namespace. Jeremy From moshez at math.huji.ac.il Mon Mar 27 19:59:34 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Mon, 27 Mar 2000 19:59:34 +0200 (IST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <14559.35419.793906.868645@weyr.cnri.reston.va.us> Message-ID: Peter Funk said: > The library documentation provides a existing logical subdivision into > chapters, which group the library into several kinds of services. > IMO this subdivision could be discussed and possibly revised. > But at the moment I got the impression, that it was simply ignored. > Why? What's so bad with it? Ka-Ping Yee writes: > I did look at the documentation for some guidance in arranging > the modules, though admittedly it didn't direct me much. Fred L. Drake, Jr. writes: > The library reference is pretty well disorganized at this point. I > want to improve that for the 1.6 docs. Let me just mention where my inspirations came from: shame of shames, it came from Perl. It's hard to use Perl's organization as is, because it doesn't (view itself) as a general purpose langauge: so things like CGI.pm are toplevel, and regex's are part of the syntax. However, there are a lot of good hints there. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From klm at digicool.com Mon Mar 27 20:31:01 2000 From: klm at digicool.com (Ken Manheimer) Date: Mon, 27 Mar 2000 13:31:01 -0500 (EST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <14559.38662.835289.499610@goon.cnri.reston.va.us> Message-ID: On Mon, 27 Mar 2000, Jeremy Hylton wrote: > >>>>> "PF" == Peter Funk writes: > > PF> That is even worse. So not only the 'import' sections, which I > PF> usually keep at the top of my modules, have to be changed: This > PF> way for example 're.compile(...' has to be changed into > PF> 'text.re.compile(...' all over the place possibly breaking the > PF> 'Maximum Line Length' styleguide rule. > > There is nothing wrong with changing only the import statement: > from text import re > > The only problematic use of from ... import ... is > from text.re import * > which adds an unspecified set of names to the current namespace. Actually, i think there's another important gotcha with from .. import which may be contributing to peter's sense of concern, but which i don't think needs to in this case. I also thought we had discussed providing transparency in general, at least of the 1.x series. ? The other gotcha i mean applies when the thing you're importing is a terminal, ie a non-module. Then, changes to the assignments of the names in the original module aren't reflected in the names you've imported - they're decoupled from the namespace of the original module. When the thing you're importing is, itself, a module, the same kind of thing *can* happen, but you're more generally concerned with tracking revisions to the contents of those modules, which is tracked ok in the thing you "from .. import"ed. I thought the other problem peter was objecting to, having to change the import sections in the first place, was going to be avoided in the 1.x series (if we do this kind of thing) by inherently extending the import path to include all the packages, so people need not change their code? Seems like most of this would be fairly transparent w.r.t. the operation of existing applications. Have i lost track of the discussion? Ken klm at digicool.com From moshez at math.huji.ac.il Mon Mar 27 20:55:35 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Mon, 27 Mar 2000 20:55:35 +0200 (IST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: Message-ID: On Mon, 27 Mar 2000, Ken Manheimer wrote: > I also thought we had discussed providing > transparency in general, at least of the 1.x series. ? Yes, but it would be clearly marked as deprecated in 1.7, print out error messages in 1.8 and won't work at all in 3000. (That's my view on the point, but I got the feeling this is where the wind is blowing). So the transperancy mechanism is intended only to be "something backwards compatible"...it's not supposed to be a reason why things are ugly (I don't think they are, though). BTW: the transperancy mechanism I suggested was not pushing things into the import path, but rather having toplevel modules which "from import *" from the modules that were moved. E.g., re.py would contain # Deprecated: don't import re, it won't work in future releases from text.re import * -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From skip at mojam.com Mon Mar 27 21:34:39 2000 From: skip at mojam.com (Skip Montanaro) Date: Mon, 27 Mar 2000 13:34:39 -0600 (CST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: References: Message-ID: <14559.47055.604042.381126@beluga.mojam.com> Peter> The library documentation provides a existing logical subdivision Peter> into chapters, which group the library into several kinds of Peter> services. Perhaps it makes sense to revise the library reference manual's documentation to reflect the proposed package hierarchy once it becomes concrete. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From skip at mojam.com Mon Mar 27 21:52:08 2000 From: skip at mojam.com (Skip Montanaro) Date: Mon, 27 Mar 2000 13:52:08 -0600 (CST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: References: Message-ID: <14559.48104.34263.680278@beluga.mojam.com> Responding to an early item in this thread and trying to adapt to later items... Ping wrote: I'm not convinced "mime" needs a separate branch here. (This is the deepest part of the tree, and at three levels small alarm bells went off in my head.) It's not clear that mime should be beneath text/mail. Moshe moved it up a level, but not the way I would have done it. I think the mime stuff still belongs in a separate mime package. I wouldn't just sprinkle the modules under text. I see two possibilities: text>mime net>mime I prefer net>mime, because MIME and its artifacts are used heavily in networked applications where the content being transferred isn't text. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From fdrake at acm.org Mon Mar 27 22:05:32 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 27 Mar 2000 15:05:32 -0500 (EST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <14559.47055.604042.381126@beluga.mojam.com> References: <14559.47055.604042.381126@beluga.mojam.com> Message-ID: <14559.48908.354425.313775@weyr.cnri.reston.va.us> Skip Montanaro writes: > Perhaps it makes sense to revise the library reference manual's > documentation to reflect the proposed package hierarchy once it becomes > concrete. I'd go for this. ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido at python.org Mon Mar 27 22:43:06 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 27 Mar 2000 15:43:06 -0500 Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? Message-ID: <200003272043.PAA18445@eric.cnri.reston.va.us> The _tkinter.c source code is littered with #ifdefs that mostly center around distinguishing between Tcl/Tk 8.0 and older versions. The two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2. Would it be reasonable to assume that everybody is using at least Tcl/Tk version 8.0? This would simplify the code somewhat. Or should I ask this in a larger forum? --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Mon Mar 27 22:59:04 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 27 Mar 2000 15:59:04 -0500 (EST) Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? In-Reply-To: <200003272043.PAA18445@eric.cnri.reston.va.us> References: <200003272043.PAA18445@eric.cnri.reston.va.us> Message-ID: <14559.52120.633384.651377@weyr.cnri.reston.va.us> Guido van Rossum writes: > The _tkinter.c source code is littered with #ifdefs that mostly center > around distinguishing between Tcl/Tk 8.0 and older versions. The > two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2. > > Would it be reasonable to assume that everybody is using at least > Tcl/Tk version 8.0? This would simplify the code somewhat. Simplify! It's more important that the latest versions are supported than pre-8.0 versions. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gstein at lyra.org Mon Mar 27 23:31:30 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 27 Mar 2000 13:31:30 -0800 (PST) Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? In-Reply-To: <14559.52120.633384.651377@weyr.cnri.reston.va.us> Message-ID: On Mon, 27 Mar 2000, Fred L. Drake, Jr. wrote: > Guido van Rossum writes: > > The _tkinter.c source code is littered with #ifdefs that mostly center > > around distinguishing between Tcl/Tk 8.0 and older versions. The > > two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2. > > > > Would it be reasonable to assume that everybody is using at least > > Tcl/Tk version 8.0? This would simplify the code somewhat. > > Simplify! It's more important that the latest versions are > supported than pre-8.0 versions. I strongly agree. My motto is, "if the latest Python version doesn't work for you, then don't upgrade!" This is also Open Source -- they can easily get the source to the old _Tkinter if they want new Python + 7.x support. If you ask in a larger forum, then you are certain to get somebody to say, "yes... I need that support." Then you have yourself a quandary :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From effbot at telia.com Mon Mar 27 23:46:50 2000 From: effbot at telia.com (Fredrik Lundh) Date: Mon, 27 Mar 2000 23:46:50 +0200 Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? References: <200003272043.PAA18445@eric.cnri.reston.va.us> Message-ID: <009801bf9835$f85b87e0$34aab5d4@hagrid> Guido van Rossum wrote: > The _tkinter.c source code is littered with #ifdefs that mostly center > around distinguishing between Tcl/Tk 8.0 and older versions. The > two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2. > > Would it be reasonable to assume that everybody is using at least > Tcl/Tk version 8.0? This would simplify the code somewhat. yes. if people are using older versions, they can always use the version shipped with 1.5.2. (has anyone actually tested that one with pre-8.0 versions, btw?) > Or should I ask this in a larger forum? maybe. maybe not. From jack at oratrix.nl Mon Mar 27 23:58:56 2000 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 27 Mar 2000 23:58:56 +0200 Subject: [Python-Dev] 1.6 job list In-Reply-To: Message by Moshe Zadka , Sat, 25 Mar 2000 12:16:23 +0200 (IST) , Message-ID: <20000327215901.ABA08F58C1@oratrix.oratrix.nl> Recently, Moshe Zadka said: > Here's a reason: there shouldn't be changes we'll retract later -- we > need to come up with the (more or less) right hierarchy the first time, > or we'll do a lot of work for nothing. I think I disagree here (hmm, it's probably better to say that I agree, but I agree on a tangent:-). I think we can be 100% sure that we're wrong the first time around, and we should plan for that. One of the reasons why were' wrong is because the world is moving on. A module that at this point in time will reside at some level in the hierarchy may in a few years (or shorter) be one of a large family and be beter off elsewhere in the hierarchy. It would be silly if it would have to stay where it was because of backward compatability. If we plan for being wrong we can make the mistakes less painful. I think that a simple scheme where a module can say "I'm expecting the Python 1.6 namespace layout" would make transition to a completely different Python 1.7 namespace layout a lot less painful, because some agent could do the mapping. This can either happen at runtime (through a namespace, or through an import hook, or probably through other tricks as well) or optionally by a script that would do the translations. Of course this doesn't mean we should go off and hack in a couple of namespaces (hence my "agreeing on a tangent"), but it does mean that I think Gregs idea of not wanting to change everything at once has merit. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From pf at artcom-gmbh.de Tue Mar 28 00:11:39 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Tue, 28 Mar 2000 00:11:39 +0200 (MEST) Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? In-Reply-To: <200003272043.PAA18445@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 27, 2000 3:43: 6 pm" Message-ID: Guido van Rossum: > Or should I ask this in a larger forum? Don't ask. Simply tell the people on comp.lang.python that support for the ancient Tcl/Tk versions < 8.0 will be dropped in Python 1.6. Period. ;-) Regards, Peter From guido at python.org Tue Mar 28 00:17:33 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 27 Mar 2000 17:17:33 -0500 Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? In-Reply-To: Your message of "Tue, 28 Mar 2000 00:11:39 +0200." References: Message-ID: <200003272217.RAA28910@eric.cnri.reston.va.us> > Don't ask. Simply tell the people on comp.lang.python that support > for the ancient Tcl/Tk versions < 8.0 will be dropped in Python 1.6. > Period. ;-) OK, I'm convinced. We will pre-8.0 support. Could someone submit a set of patches? It would make sense to call #error if a pre-8.0 version is detected at compile-time! --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond at skippinet.com.au Tue Mar 28 01:02:21 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue, 28 Mar 2000 09:02:21 +1000 Subject: [Python-Dev] Windows and PyObject_NEW In-Reply-To: <200003251459.PAA09181@python.inrialpes.fr> Message-ID: Sorry for the delay, but Gordon's reply was accurate so should have kept you going ;-) > I've been reading Jeffrey Richter's "Advanced Windows" last night in order > to try understanding better why PyObject_NEW is implemented > differently for > Windows. So that is where the heaps discussion came from :-) The problem is simply "too many heaps are available". > Again, I feel uncomfortable with this, especially now, when > I'm dealing with the memory aspect of Python's object > constructors/desctrs. It is this exact reason it was added in the first place. I believe this code predates the "_d" convention on Windows. AFAIK, this could could be removed today and everything should work (but see below why it probably wont) MSVC allows you to choose from a number of CRT versions. Only in one of these versions is the CRTL completely shared between the .EXE and all the various .DLLs in the application. What was happening is that this macro ended up causing the "malloc" for a new object to occur in Python15.dll, but the Python type system meant that tp_dealloc() (to cleanup the object) was called in the DLL implementing the new type. Unless Python15.dll and our extension DLL shared the same CRTL (and hence the same malloc heap, fileno table etc) things would die. The DLL version of "free()" would complain, as it had never seen the pointer before. This change meant the malloc() and the free() were both implemented in the same DLL/EXE This was particularly true with Debug builds. MSVC's debug CRTL implementations have some very nice debugging features (guard-blocks, block validity checks with debugger breapoints when things go wrong, leak tracking, etc). However, this means they use yet another heap. Mixing debug builds with release builds in Python is a recipe for disaster. Theoretically, the problem has largely gone away now that a) we have seperate "_d" versions and b) the "official" postition is to use the same CRTL as Python15.dll. However, is it still a minor FAQ on comp.lang.python why PyRun_ExecFile (or whatever) fails with mysterious errors - the reason is exactly the same - they are using a different CRTL, so the CRTL can't map the file pointers correctly, and we get unexplained IO errors. But now that this macro hides the malloc problem, there may be plenty of "home grown" extensions out there that do use a different CRTL and dont see any problems - mainly cos they arent throwing file handles around! Finally getting to the point of all this: We now also have the PyMem_* functions. This problem also doesnt exist if extension modules use these functions instead of malloc()/free(). We only ask them to change the PyObject allocations and deallocations, not the rest of their code, so it is no real burden. IMO, we should adopt these functions for most internal object allocations and the extension samples/docs. Also, we should consider adding relevant PyFile_fopen(), PyFile_fclose() type functions, that simply are a thin layer over the fopen/fclose functions. If extensions writers used these instead of fopen/fclose we would gain a few fairly intangible things - lose the minor FAQ, platforms that dont have fopen at all (eg, CE) would love you, etc. Mark. From mhammond at skippinet.com.au Tue Mar 28 03:04:11 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue, 28 Mar 2000 11:04:11 +1000 Subject: [Python-Dev] Windows and PyObject_NEW In-Reply-To: Message-ID: [I wrote] > Also, we should consider adding relevant PyFile_fopen(), PyFile_fclose() Maybe I had something like PyFile_FromString in mind!! That-damn-time-machine-again-ly, Mark. From moshez at math.huji.ac.il Tue Mar 28 07:36:59 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 28 Mar 2000 07:36:59 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: <14559.48104.34263.680278@beluga.mojam.com> Message-ID: On Mon, 27 Mar 2000, Skip Montanaro wrote: > Responding to an early item in this thread and trying to adapt to later > items... > > Ping wrote: > > I'm not convinced "mime" needs a separate branch here. (This is the > deepest part of the tree, and at three levels small alarm bells went off > in my head.) > > It's not clear that mime should be beneath text/mail. Moshe moved it up a > level, Actually, Ping moved it up a level. I only decided to agree with him retroactively... > I think the mime stuff still > belongs in a separate mime package. I wouldn't just sprinkle the modules > under text. I see two possibilities: > > text>mime > net>mime > > I prefer net>mime, I don't. MIME is not a "wire protocol" like all the other things in net -- it's used inside another wire protocol, like RFC822 or HTTP. If at all, I'd go for having a net/ mail/ mime/ Package, but Ping would yell at me again for nesting 3 levels. I could live with text/mime, because the mime format basically *is* text. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Tue Mar 28 07:47:13 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 28 Mar 2000 07:47:13 +0200 (IST) Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? In-Reply-To: <200003272043.PAA18445@eric.cnri.reston.va.us> Message-ID: On Mon, 27 Mar 2000, Guido van Rossum wrote: > The _tkinter.c source code is littered with #ifdefs that mostly center > around distinguishing between Tcl/Tk 8.0 and older versions. The > two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2. > > Would it be reasonable to assume that everybody is using at least > Tcl/Tk version 8.0? This would simplify the code somewhat. I want to ask a different question: when is Python going to officially support Tcl/Tk v8.2/8.3? I'd really like for this to happen, as I hate having several libraries of Tcl/Tk on my machine. (I assume you know the joke about Jews always answering a question with a question ) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From jack at oratrix.nl Tue Mar 28 10:55:56 2000 From: jack at oratrix.nl (Jack Jansen) Date: Tue, 28 Mar 2000 10:55:56 +0200 Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: Message by Ka-Ping Yee , Sat, 25 Mar 2000 23:37:11 -0800 (PST) , Message-ID: <20000328085556.CFEAC370CF2@snelboot.oratrix.nl> > Okay, here's another shot at it. Notice a few things: > ... > bin > ... > image ... > sound > ... These I don't like, I think image and sound should be either at toplevel, or otherwise in a separate package (mm?). I know images and sounds are customarily stored in binary files, but so are databases and other things. Hmm, the bin group in general seems to be a bit of a catch-all. gzip, zlib and chunk definitely belong together, but struct is a wholly different beast. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From jack at oratrix.nl Tue Mar 28 11:01:51 2000 From: jack at oratrix.nl (Jack Jansen) Date: Tue, 28 Mar 2000 11:01:51 +0200 Subject: [Python-Dev] module reorg (was: 1.6 job list) In-Reply-To: Message by Moshe Zadka , Sat, 25 Mar 2000 20:30:26 +0200 (IST) , Message-ID: <20000328090151.86B59370CF2@snelboot.oratrix.nl> > On Sat, 25 Mar 2000, David Ascher wrote: > > > This made me think of one issue which is worth considering -- is there a > > mechanism for third-party packages to hook into the standard naming > > hierarchy? It'd be weird not to have the oracle and sybase modules within > > the db toplevel package, for example. > > My position is that any 3rd party module decides for itself where it wants > to live -- once we formalized the framework. Consider PyGTK/PyGnome, > PyQT/PyKDE -- they should live in the UI package too... For separate modules, yes. For packages this is different. As a point in case think of MacPython: it could stuff all mac-specific packages under the toplevel "mac", but it would probably be nicer if it could extend the existing namespace. It is a bit silly if mac users have to do "from mac.text.encoding import macbinary" but "from text.encoding import binhex", just because BinHex support happens to live in the core (purely for historical reasons). But maybe this holds only for the platform distributions, then it shouldn't be as much of a problem as there aren't that many. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From moshez at math.huji.ac.il Tue Mar 28 11:24:14 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 28 Mar 2000 11:24:14 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: <20000328085556.CFEAC370CF2@snelboot.oratrix.nl> Message-ID: On Tue, 28 Mar 2000, Jack Jansen wrote: > These I don't like, I think image and sound should be either at toplevel, or > otherwise in a separate package (mm?). I know images and sounds are > customarily stored in binary files, but so are databases and other things. Hmmm...I think of "bin" as "interface to binary files". Agreed that I don't have a good reason for seperating gdbm from zlib. > Hmm, the bin group in general seems to be a bit of a catch-all. gzip, zlib and > chunk definitely belong together, but struct is a wholly different beast. I think Ping and I decided to move struct to toplevel. Ping, would you like to take your last proposal and fold into it the consensual changes,, or should I? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From effbot at telia.com Tue Mar 28 11:44:14 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 28 Mar 2000 11:44:14 +0200 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead References: <200003242103.QAA03288@eric.cnri.reston.va.us> Message-ID: <02c101bf989a$2ee35860$34aab5d4@hagrid> Guido van Rossum wrote: > Similar to append(), I'd like to close this gap, and I've made the > necessary changes. This will probably break lots of code. > > Similar to append(), I'd like people to fix their code rather than > whine -- two-arg connect() has never been documented, although it's > found in much code (even the socket module test code :-( ). > > Similar to append(), I may revert the change if it is shown to cause > too much pain during beta testing... proposal: if anyone changes the API for a fundamental module, and fails to update the standard library, the change is automatically "minus one'd" for each major module that no longer works :-) (in this case, that would be -5 or so...) From effbot at telia.com Tue Mar 28 11:55:19 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 28 Mar 2000 11:55:19 +0200 Subject: [Python-Dev] Great Renaming? What is the goal? References: Message-ID: <02c901bf989b$be203d80$34aab5d4@hagrid> Peter Funk wrote: > Why should modules be moved into packages? I don't get it. fwiw, neither do I... I'm not so sure that Python really needs a simple reorganization of the existing set of standard library modules. just moving the modules around won't solve the real problems with the 1.5.2 std library... > IMO this subdivision could be discussed and possibly revised. here's one proposal: http://www.pythonware.com/people/fredrik/librarybook-contents.htm From gstein at lyra.org Tue Mar 28 12:09:44 2000 From: gstein at lyra.org (Greg Stein) Date: Tue, 28 Mar 2000 02:09:44 -0800 (PST) Subject: [Python-Dev] 3rd parties in the hierarchy (was: module reorg) In-Reply-To: <20000328090151.86B59370CF2@snelboot.oratrix.nl> Message-ID: On Tue, 28 Mar 2000, Jack Jansen wrote: > > On Sat, 25 Mar 2000, David Ascher wrote: > > > This made me think of one issue which is worth considering -- is there a > > > mechanism for third-party packages to hook into the standard naming > > > hierarchy? It'd be weird not to have the oracle and sybase modules within > > > the db toplevel package, for example. > > > > My position is that any 3rd party module decides for itself where it wants > > to live -- once we formalized the framework. Consider PyGTK/PyGnome, > > PyQT/PyKDE -- they should live in the UI package too... > > For separate modules, yes. For packages this is different. As a point in case > think of MacPython: it could stuff all mac-specific packages under the > toplevel "mac", but it would probably be nicer if it could extend the existing > namespace. It is a bit silly if mac users have to do "from mac.text.encoding > import macbinary" but "from text.encoding import binhex", just because BinHex > support happens to live in the core (purely for historical reasons). > > But maybe this holds only for the platform distributions, then it shouldn't be > as much of a problem as there aren't that many. Assuming that you use an archive like those found in my "small" distro or Gordon's distro, then this is no problem. The archive simply recognizes and maps "text.encoding.macbinary" to its own module. Another way to say it: stop thinking in terms of the filesystem as the sole mechanism for determining placement in the package hierarchy. Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido at python.org Tue Mar 28 15:38:12 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 08:38:12 -0500 Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0? In-Reply-To: Your message of "Tue, 28 Mar 2000 07:47:13 +0200." References: Message-ID: <200003281338.IAA29532@eric.cnri.reston.va.us> > I want to ask a different question: when is Python going to officially > support Tcl/Tk v8.2/8.3? I'd really like for this to happen, as I hate > having several libraries of Tcl/Tk on my machine. This is already in the CVS tree, except for the Windows installer. Python 1.6 will not install a separate complete Tcl installation; instead, it will install the needed Tcl/Tk files (Tcl/Tk 8.3 or newer) in the Python tree, so it won't affect existing Tcl/Tk installations. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Mar 28 15:57:02 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 08:57:02 -0500 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: Your message of "Tue, 28 Mar 2000 11:44:14 +0200." <02c101bf989a$2ee35860$34aab5d4@hagrid> References: <200003242103.QAA03288@eric.cnri.reston.va.us> <02c101bf989a$2ee35860$34aab5d4@hagrid> Message-ID: <200003281357.IAA29621@eric.cnri.reston.va.us> > proposal: if anyone changes the API for a fundamental module, and > fails to update the standard library, the change is automatically "minus > one'd" for each major module that no longer works :-) > > (in this case, that would be -5 or so...) Oops. Sigh. While we're pretending that this change goes in, could you point me to those five modules? Also, we need to add test cases to the standard test suite that would have found these! --Guido van Rossum (home page: http://www.python.org/~guido/) From gward at cnri.reston.va.us Tue Mar 28 17:04:47 2000 From: gward at cnri.reston.va.us (Greg Ward) Date: Tue, 28 Mar 2000 10:04:47 -0500 Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: ; from ping@lfw.org on Sat, Mar 25, 2000 at 11:37:11PM -0800 References: Message-ID: <20000328100446.A2586@cnri.reston.va.us> On 25 March 2000, Ka-Ping Yee said: > Okay, here's another shot at it. Notice a few things: Damn, I started writing a response to Moshe's original proposal -- and *then* saw this massive thread. Oh well. Turns out I still have a few useful things to say: First, any organization scheme for the standard library (or anything else, for that matter) should have a few simple guidelines. Here are two: * "deep hierarchies considered harmful": ie. avoid sub-packages if at all possible * "everything should have a purpose": every top-level package should be describable with a single, clear sentence of plain language. Eg.: net - Internet protocols, data formats, and client/server infrastructure unix - Unix-specific system calls, protocols, and conventions And two somewhat open issues: * "as long as we're renaming...": maybe this would be a good time to standardize naming conventions, eg. "cgi" -> "cgilib" *or* "{http,ftp,url,...}lib" -> "{http,ftp,url}...", "MimeWriter" -> "mimewriter", etc. * "shared namespaces vs system namespaces": the Perl model of "nothing belongs to The System; anyone can add a module in Text:: or Net:: or whatever" works there because Perl doesn't have __init__ files or anything to distinguish module namespaces; they just are. Python's import mechanism would have to change to support this, and the fact that __init__ files may contain arbitrary code makes this feel like a very tricky change to make. Now specific comments... > net > urlparse > urllib > ftplib > gopherlib > imaplib > poplib > nntplib > smtplib > telnetlib > httplib > cgi Rename? Either cgi -> cgilib or foolib -> foo? > server > BaseHTTPServer > CGIHTTPServer > SimpleHTTPServer > SocketServer > asynchat > asyncore This is one good place for a sub-package. It's a also a good place to rename: the convention for Python module names seems to be all-lowercase; and "Server" is redundant when you're in the net.server package. How about: net.server.base_http net.server.cgi_http net.server.simple_http net.server.socket Underscores negotiable. They don't seem to be popular in module names, although sometimes they would be real life-savers. > text I think "text" should mean "plain old unstructured, un-marked-up ASCII text", where "unstructured, un-marked-up" really means "not structured or marked up in a well-known standard way". Or maybe not. I'm just trying to come up with an excuse for moving xml to top-level, which I think is where it belongs. Maybe the excuse should just be, "XML is really important and visible, and anyways Paul Prescod will raise a stink if it isn't put at top-level in Python package-space". > re # general-purpose parsing Top-level: this is a fundamental module that should be treated on a par with 'string'. (Well, except for building RE methods into strings... hmmMMmm...maybe... [no, I'm kidding!]) > sgmllib > htmllib > htmlentitydefs Not sure what to do about these. Someone referred somewhere to a "web" top-level package, which seems to have disappeared. If it reappars, it would be a good place for the HTML modules (not to mention a big chunk of "net") -- this would mainly be for "important and visible" (ie. PR) reasons, rather than sound technical reasons. > xml > whatever the xml-sig puts here Should be top-level. > mail > rfc822 > mailbox > mhlib "mail" should either be top-level or under "net". (Yes, I *know* it's not a wire-level protocol: that's what net.smtplib is for. But last time I checked, email is pretty useless without a network. And vice-versa.) Or maybe these all belong in a top-level "data" package: I'm starting to warm to that. > bin > gzip > zlib > chunk > struct > image > imghdr > colorsys # a bit unsure, but doesn't go anywhere else > imageop > imgfile > rgbimg > yuvconvert > sound > aifc > sndhdr > toaiff > audiodev > sunau > sunaudio > wave > audioop > sunaudiodev I agree with Jack: image and sound (audio?) should be top-level. I don't think I like the idea of an intervening "mm" or "multimedia" or "media" or what-have-you package, though. The other stuff in "bin" is kind of a grab-bag: "chunk" and "struct" might belong in the mythical "data" package. > db > anydbm > whichdb > bsddb > dbm > dbhash > dumbdbm > gdbm Yup. > math > math # library functions > cmath > fpectl # type-related > fpetest > array > mpz > fpformat # formatting > locale > bisect # algorithm: also unsure, but doesn't go anywhere else > random # randomness > whrandom > crypt # cryptography > md5 > rotor > sha Hmmm. "locale" has already been dealt with; obviously it should be top-evel. I think "array" should be top-level or under the mythical "data". Six crypto-related modules seems like enough to justify a top-level "crypt" package, though. > time > calendar > time > tzparse > sched > timing Yup. > interp > new > linecache # handling .py files [...] > tabnanny > pstats > rlcompleter # this might go in "ui"... I like "python" for this one. (But I'm not sure if tabnanny and rlcompleter belong there.) > security > Bastion > rexec > ihooks What does ihooks have to do with security? > file > dircache > path -- a virtual module which would do a from path import * > nturl2path > macurl2path > filecmp > fileinput > StringIO Lowercase for consistency? > glob > fnmatch > stat > statcache > statvfs > tempfile > shutil > pipes > popen2 > commands > dl No problem until these last two -- 'commands' is a Unix-specific thing that has very little to do with the filesystem per se, and 'dl' is (as I understand it) deep ju-ju with sharp edges that should probably be hidden away in the 'python' ('sys'?) package. Oh yeah, "dl" should be elsewhere -- "python" maybe? Top-level? Perhaps we need a "deepmagic" package for "dl" and "new"? ;-) > data > pickle > shelve > xdrlib > copy > copy_reg > UserDict > UserList > pprint > repr > (cPickle) Oh hey, it's *not* a mythical package! Guess I didn't read far enough ahead. I like it, but would add more stuff to it (obviously): 'struct', 'chunk', 'array' for starters. Should cPickle be renamed to fastpickle? > threads > thread > threading > Queue Lowercase? > ui > _tkinter > curses > Tkinter > cmd > getpass > getopt > readline > users > pwd > grp > nis These belong in "unix". Possibly "nis" belongs in "net" -- do any non-Unix OSes use NIS? > sgi > al > cd > cl > fl > fm > gl > misc (what used to be sgimodule.c) > sv Should this be "sgi" or "irix"? Ditto for "sun" vs "solaris" if there are a significant number of Sun/Solaris modules. Note that the respective trademark holders might get very antsy about who gets to put names in those namespaces -- that's exactly what happened with Sun, Solaris 8, and Perl. I believe the compromise they arrived at was that the "Solaris::" namespace remains open, but Sun gets the "Sun::" namespace. There should probably be a win32 package, for core registry access stuff if nothing else. There might someday be a "linux" package; it's highly unlikely there would be a "pc" or "alpha" package though. All of those argue over "irix" and "solaris" instead of "sgi" and "sun". Greg From gvwilson at nevex.com Tue Mar 28 17:45:10 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Tue, 28 Mar 2000 10:45:10 -0500 (EST) Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: Message-ID: > > Greg Wilson > > If None becomes a keyword, I would like to ask whether it could be > > used to signal that a method is a class method, as opposed to an > > instance method: > I'd like to know what you mean by "class" method. (I do know C++ and > Java, so I have some idea...). Specifically, my question is: how does > a class method access class variables? They can't be totally > unqualified (because that's very unpythonic). If they are qualified by > the class's name, I see it as a very mild improvement on the current > situation. You could suggest, for example, to qualify class variables > by "class" (so you'd do things like: > > class.x = 1 > > ), but I'm not sure I like it. On the whole, I think it is a much > bigger issue on how be denote class methods. I don't like overloading the word 'class' this way, as it makes it difficult to distinguish a parent's 'foo' member and a child's 'foo' member: class Parent: foo = 3 ...other stuff... class Child(Parent): foo = 9 def test(): print class.foo # obviously 9, but how to get 3? I think that using the class's name instead of 'self' will be easy to explain, will look like it belongs in the language, will be unlikely to lead to errors, and will handle multiple inheritance with ease: class Child(Parent): foo = 9 def test(): print Child.foo # 9 print Parent.foo # 3 > Also, one slight problem with your method of denoting class methods: > currently, it is possible to add instance method at run time to a > class by something like > > class C: > pass > > def foo(self): > pass > > C.foo = foo > > In your suggestion, how do you view the possiblity of adding class > methods to a class? (Note that "foo", above, is also perfectly usable > as a plain function). Hm, I hadn't thought of this... :-( > > I'd also like to ask (separately) that assignment to None be defined as a > > no-op, so that programmers can write: > > > > year, month, None, None, None, None, weekday, None, None = gmtime(time()) > > > > instead of having to create throw-away variables to fill in slots in > > tuples that they don't care about. > > Currently, I use "_" for that purpose, after I heard the idea from > Fredrik Lundh. I do the same thing when I need to; I just thought that making assignment to "None" special would formalize this in a readable way. From jeremy at cnri.reston.va.us Tue Mar 28 19:31:48 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Tue, 28 Mar 2000 12:31:48 -0500 (EST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: References: <14559.38662.835289.499610@goon.cnri.reston.va.us> Message-ID: <14560.60548.74378.613188@goon.cnri.reston.va.us> >>>>> "KLM" == Ken Manheimer writes: >> The only problematic use of from ... import ... is >> from text.re import * >> which adds an unspecified set of names to the current >> namespace. KLM> The other gotcha i mean applies when the thing you're importing KLM> is a terminal, ie a non-module. Then, changes to the KLM> assignments of the names in the original module aren't KLM> reflected in the names you've imported - they're decoupled from KLM> the namespace of the original module. This isn't an import issue. Some people simply don't understand that assignment (and import as form of assignment) is name binding. Import binds an imported object to a name in the current namespace. It does not affect bindings in other namespaces, nor should it. KLM> I thought the other problem peter was objecting to, having to KLM> change the import sections in the first place, was going to be KLM> avoided in the 1.x series (if we do this kind of thing) by KLM> inherently extending the import path to include all the KLM> packages, so people need not change their code? Seems like KLM> most of this would be fairly transparent w.r.t. the operation KLM> of existing applications. I'm not sure if there is consensus on backwards compatibility. I'm not in favor of creating a huge sys.path that includes every package's contents. It would be a big performance hit. Jeremy From moshez at math.huji.ac.il Tue Mar 28 19:36:47 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 28 Mar 2000 19:36:47 +0200 (IST) Subject: [Python-Dev] Great Renaming - Straw Man 0.2 In-Reply-To: <20000328100446.A2586@cnri.reston.va.us> Message-ID: On Tue, 28 Mar 2000, Greg Ward wrote: > * "deep hierarchies considered harmful": ie. avoid sub-packages if at > all possible > > * "everything should have a purpose": every top-level package should > be describable with a single, clear sentence of plain language. Good guidelines, but they aren't enough. And anyway, rules were meant to be broken <0.9 wink> > * "as long as we're renaming...": maybe this would be a good time to > standardize naming conventions, eg. "cgi" -> "cgilib" *or* > "{http,ftp,url,...}lib" -> "{http,ftp,url}...", "MimeWriter" -> > "mimewriter", etc. +1 > * "shared namespaces vs system namespaces": the Perl model of "nothing > belongs to The System; anyone can add a module in Text:: or Net:: or > whatever" works there because Perl doesn't have __init__ files or > anything to distinguish module namespaces; they just are. Python's > import mechanism would have to change to support this, and the fact > that __init__ files may contain arbitrary code makes this feel > like a very tricky change to make. Indeed. But I still feel that "few things should belong to the system" is quite a useful rule... (That's what I referred to when I said Perl's module system is more suited to CPAN (now there's a surprise)) > Rename? Either cgi -> cgilib or foolib -> foo? Yes. But I wanted the first proposal to be just about placing stuff, because that airs out more disagreements. > This is one good place for a sub-package. It's a also a good place to > rename: the convention for Python module names seems to be > all-lowercase; and "Server" is redundant when you're in the net.server > package. How about: > > net.server.base_http > net.server.cgi_http > net.server.simple_http > net.server.socket Hmmmmm......+0 > Underscores negotiable. They don't seem to be popular in module names, > although sometimes they would be real life-savers. Personally, I prefer underscores to CamelCase. > Or maybe not. I'm just trying to come up with an excuse for moving xml > to top-level, which I think is where it belongs. Maybe the excuse > should just be, "XML is really important and visible, and anyways Paul > Prescod will raise a stink if it isn't put at top-level in Python > package-space". I still think "xml" should be a brother to "html" and "sgml". Current political trans not withstanding. > Not sure what to do about these. Someone referred somewhere to a "web" > top-level package, which seems to have disappeared. If it reappars, it > would be a good place for the HTML modules (not to mention a big chunk > of "net") -- this would mainly be for "important and visible" (ie. PR) > reasons, rather than sound technical reasons. I think the "web" package should be reinstated. But you won't like it: I'd put xml in web. > "mail" should either be top-level or under "net". (Yes, I *know* it's > not a wire-level protocol: that's what net.smtplib is for. But last > time I checked, email is pretty useless without a network. And > vice-versa.) Ummmm.....I'd disagree, but I lack the strength and the moral conviction. Put it under net and we'll call it a deal > Or maybe these all belong in a top-level "data" package: I'm starting to > warm to that. Ummmm...I don't like the "data" package personally. It seems to disobey your second guideline. > I agree with Jack: image and sound (audio?) should be top-level. I > don't think I like the idea of an intervening "mm" or "multimedia" or > "media" or what-have-you package, though. Definitely multimedia. Okay, I'm bought. > Six crypto-related modules seems like enough to justify a top-level > "crypt" package, though. It seemed obvious to me that "crypt" should be under "math". But maybe that's just the mathematician in me speaking. > I like "python" for this one. (But I'm not sure if tabnanny and > rlcompleter belong there.) I agree, and I'm not sure about rlcompleter, but am sure about tabnanny. > What does ihooks have to do with security? Well, it was more or less written to support rexec. A weak argument, admittedly > No problem until these last two -- 'commands' is a Unix-specific thing > that has very little to do with the filesystem per se Hmmmmm...it is on the same level with popen. Why not move popen too? >, and 'dl' is (as I > understand it) deep ju-ju with sharp edges that should probably be > hidden away Ummmmmm.....not in the "python" package: it doesn't have anything to do with the interpreter. > Should this be "sgi" or "irix"? Ditto for "sun" vs "solaris" if there > are a significant number of Sun/Solaris modules. Note that the > respective trademark holders might get very antsy about who gets to put > names in those namespaces -- that's exactly what happened with Sun, > Solaris 8, and Perl. I believe the compromise they arrived at was that > the "Solaris::" namespace remains open, but Sun gets the "Sun::" > namespace. Ummmmm.....I don't see how they have any legal standing. I for one refuse to care about what Sun Microsystem thinks about names for Python packages. > There should probably be a win32 package, for core registry access stuff > if nothing else. And for all the other extensions in win32all Yep! (Just goes to show what happens when you decide to package based on a UNIX system) > All of those > argue over "irix" and "solaris" instead of "sgi" and "sun". Fine with me -- just wanted to move them out of my face -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From andy at reportlab.com Tue Mar 28 20:13:02 2000 From: andy at reportlab.com (Andy Robinson) Date: Tue, 28 Mar 2000 18:13:02 GMT Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: <20000327170031.693531CDF6@dinsdale.python.org> References: <20000327170031.693531CDF6@dinsdale.python.org> Message-ID: <38e0f4cf.24247656@post.demon.co.uk> On Mon, 27 Mar 2000 12:00:31 -0500 (EST), Peter Funk wrote: > Do we need a UserString class? This will probably be useful on top of the i18n stuff in due course, so I'd like it. Something Mike Da Silva and I have discussed a lot is implementing a higher-level 'typed string' library on top of the Unicode stuff. A 'typed string' is like a string, but knows what encoding it is in - possibly Unicode, possibly a native encoding and embodies some basic type safety and convenience notions, like not being able to add a Shift-JIS and an EUC string together. Iteration would always be per character, not per byte; and a certain amount of magic would say that if the string was (say) Japanese, it would acquire a few extra methods for doing some Japan-specific things like expanding half-width katakana. Of course, we can do this anyway, but I think defining the API clearly in UserString is a great idea. - Andy Robinson From guido at python.org Tue Mar 28 21:22:43 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 14:22:43 -0500 Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: Your message of "Tue, 28 Mar 2000 18:13:02 GMT." <38e0f4cf.24247656@post.demon.co.uk> References: <20000327170031.693531CDF6@dinsdale.python.org> <38e0f4cf.24247656@post.demon.co.uk> Message-ID: <200003281922.OAA03113@eric.cnri.reston.va.us> > > Do we need a UserString class? > > This will probably be useful on top of the i18n stuff in due course, > so I'd like it. > > Something Mike Da Silva and I have discussed a lot is implementing a > higher-level 'typed string' library on top of the Unicode stuff. > A 'typed string' is like a string, but knows what encoding it is in - > possibly Unicode, possibly a native encoding and embodies some basic > type safety and convenience notions, like not being able to add a > Shift-JIS and an EUC string together. Iteration would always be per > character, not per byte; and a certain amount of magic would say that > if the string was (say) Japanese, it would acquire a few extra methods > for doing some Japan-specific things like expanding half-width > katakana. > > Of course, we can do this anyway, but I think defining the API clearly > in UserString is a great idea. Agreed. Please somebody send a patch! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Mar 28 21:25:39 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 14:25:39 -0500 Subject: [Python-Dev] First alpha release of Python 1.6 Message-ID: <200003281925.OAA03287@eric.cnri.reston.va.us> I'm hoping to release a first, rough alpha of Python 1.6 by April 1st (no joke!). Not everything needs to be finished by then, but I hope to have the current versions of distutil, expat, and sre in there. Anything else that needs to go into 1.6 and isn't ready yet? (Small stuff doesn't matter, everything currently in the patches queue can probably go in if it isn't rejected by then.) --Guido van Rossum (home page: http://www.python.org/~guido/) From DavidA at ActiveState.com Tue Mar 28 21:40:24 2000 From: DavidA at ActiveState.com (David Ascher) Date: Tue, 28 Mar 2000 11:40:24 -0800 Subject: [Python-Dev] First alpha release of Python 1.6 In-Reply-To: <200003281925.OAA03287@eric.cnri.reston.va.us> Message-ID: > Anything else that needs to go into 1.6 and isn't ready yet? No one seems to have found time to figure out the mmap module support. --david From guido at python.org Tue Mar 28 21:33:29 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 14:33:29 -0500 Subject: [Python-Dev] First alpha release of Python 1.6 In-Reply-To: Your message of "Tue, 28 Mar 2000 11:40:24 PST." References: Message-ID: <200003281933.OAA04896@eric.cnri.reston.va.us> > > Anything else that needs to go into 1.6 and isn't ready yet? > > No one seems to have found time to figure out the mmap module support. I wasn't even aware that that was a priority. If someone submits it, it will go in -- alpha 1 is not a total feature freeze, just a "testing the waters". --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer at tismer.com Tue Mar 28 21:49:17 2000 From: tismer at tismer.com (Christian Tismer) Date: Tue, 28 Mar 2000 21:49:17 +0200 Subject: [Python-Dev] First alpha release of Python 1.6 References: <200003281925.OAA03287@eric.cnri.reston.va.us> Message-ID: <38E10CBD.C6B71D50@tismer.com> Guido van Rossum wrote: ... > Anything else that needs to go into 1.6 and isn't ready yet? Stackless Python of course, but it *is* ready yet. Just kidding. I will provide a compressed unicode database in a few days. That will be a non-Python-specific module, and (Marc or I) will provide a Python specific wrapper. This will probably not get ready until April 1. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From akuchlin at mems-exchange.org Tue Mar 28 21:51:29 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Tue, 28 Mar 2000 14:51:29 -0500 (EST) Subject: [Python-Dev] mmapfile module In-Reply-To: References: <200003281925.OAA03287@eric.cnri.reston.va.us> Message-ID: <14561.3393.761177.776684@amarok.cnri.reston.va.us> David Ascher writes: >> Anything else that needs to go into 1.6 and isn't ready yet? >No one seems to have found time to figure out the mmap module support. The issue there is cross-platform compatibility; the Windows and Unix versions take completely different constructor arguments, so how should we paper over the differences? Unix arguments: (file descriptor, size, flags, protection) Win32 arguments:(filename, tagname, size) We could just say, "OK, the args are completely different between Win32 and Unix, despite it being the same function name". Maybe that's best, because there seems no way to reconcile those two different sets of arguments. -- A.M. Kuchling http://starship.python.net/crew/amk/ I'm here for the FBI, not the _Weekly World News_. -- Scully in X-FILES #1 From DavidA at ActiveState.com Tue Mar 28 22:06:09 2000 From: DavidA at ActiveState.com (David Ascher) Date: Tue, 28 Mar 2000 12:06:09 -0800 Subject: [Python-Dev] mmapfile module In-Reply-To: <14561.3393.761177.776684@amarok.cnri.reston.va.us> Message-ID: > The issue there is cross-platform compatibility; the Windows and Unix > versions take completely different constructor arguments, so how > should we paper over the differences? > > Unix arguments: (file descriptor, size, flags, protection) > Win32 arguments:(filename, tagname, size) > > We could just say, "OK, the args are completely different between > Win32 and Unix, despite it being the same function name". Maybe > that's best, because there seems no way to reconcile those two > different sets of arguments. I guess my approach would be to provide two platform-specific modules, and to figure out a high-level Python module which could provide a reasonable platform-independent interface on top of it. One problem with that approach is that I think that there is also great value in having a portable mmap interface in the C layer, where i see lots of possible uses in extension modules (much like the threads API). --david From guido at python.org Tue Mar 28 22:00:57 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 15:00:57 -0500 Subject: [Python-Dev] mmapfile module In-Reply-To: Your message of "Tue, 28 Mar 2000 12:06:09 PST." References: Message-ID: <200003282000.PAA11988@eric.cnri.reston.va.us> > > The issue there is cross-platform compatibility; the Windows and Unix > > versions take completely different constructor arguments, so how > > should we paper over the differences? > > > > Unix arguments: (file descriptor, size, flags, protection) > > Win32 arguments:(filename, tagname, size) > > > > We could just say, "OK, the args are completely different between > > Win32 and Unix, despite it being the same function name". Maybe > > that's best, because there seems no way to reconcile those two > > different sets of arguments. > > I guess my approach would be to provide two platform-specific modules, and > to figure out a high-level Python module which could provide a reasonable > platform-independent interface on top of it. One problem with that approach > is that I think that there is also great value in having a portable mmap > interface in the C layer, where i see lots of possible uses in extension > modules (much like the threads API). I don't know enough about this, but it seems that there might be two steps: *creating* a mmap object is necessarily platform-specific; but *using* a mmap object could be platform-neutral. What is the API for mmap objects? --Guido van Rossum (home page: http://www.python.org/~guido/) From klm at digicool.com Tue Mar 28 22:07:25 2000 From: klm at digicool.com (Ken Manheimer) Date: Tue, 28 Mar 2000 15:07:25 -0500 (EST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <14560.60548.74378.613188@goon.cnri.reston.va.us> Message-ID: On Tue, 28 Mar 2000, Jeremy Hylton wrote: > >>>>> "KLM" == Ken Manheimer writes: > > >> The only problematic use of from ... import ... is > >> from text.re import * > >> which adds an unspecified set of names to the current > >> namespace. > > KLM> The other gotcha i mean applies when the thing you're importing > KLM> is a terminal, ie a non-module. Then, changes to the > KLM> assignments of the names in the original module aren't > KLM> reflected in the names you've imported - they're decoupled from > KLM> the namespace of the original module. > > This isn't an import issue. Some people simply don't understand > that assignment (and import as form of assignment) is name binding. > Import binds an imported object to a name in the current namespace. > It does not affect bindings in other namespaces, nor should it. I know that - i was addressing the asserted evilness of from ... import ... and how it applied - and didn't - w.r.t. packages. > KLM> I thought the other problem peter was objecting to, having to > KLM> change the import sections in the first place, was going to be > KLM> avoided in the 1.x series (if we do this kind of thing) by > KLM> inherently extending the import path to include all the > KLM> packages, so people need not change their code? Seems like > KLM> most of this would be fairly transparent w.r.t. the operation > KLM> of existing applications. > > I'm not sure if there is consensus on backwards compatibility. I'm > not in favor of creating a huge sys.path that includes every package's > contents. It would be a big performance hit. Yes, someone reminded me that the other (better, i think) option is stub modules in the current places that do the "from ... import *" for the right values of "...". py3k finishes the migration by eliminating the stubs. Ken klm at digicool.com From gward at cnri.reston.va.us Tue Mar 28 22:29:55 2000 From: gward at cnri.reston.va.us (Greg Ward) Date: Tue, 28 Mar 2000 15:29:55 -0500 Subject: [Python-Dev] First alpha release of Python 1.6 In-Reply-To: <200003281925.OAA03287@eric.cnri.reston.va.us>; from guido@python.org on Tue, Mar 28, 2000 at 02:25:39PM -0500 References: <200003281925.OAA03287@eric.cnri.reston.va.us> Message-ID: <20000328152955.A3136@cnri.reston.va.us> On 28 March 2000, Guido van Rossum said: > I'm hoping to release a first, rough alpha of Python 1.6 by April 1st > (no joke!). > > Not everything needs to be finished by then, but I hope to have the > current versions of distutil, expat, and sre in there. We just need to do a bit of CVS trickery to put Distutils under the Python tree. I'd *like* for Distutils to have its own CVS existence at least until 1.6 is released, but it's not essential. Two of the big Distutils to-do items that I enumerated at IPC8 have been knocked off: the "dist" command has been completely redone (and renamed "sdist", for "source distribution"), as has the "install" command. The really major to-do items left for Distutils are: * implement the "bdist" command with enough marbles to generate RPMs and some sort of Windows installer (Wise?); Solaris packages, Debian packages, and something for the Mac would be nice too. * documentation (started, but only just) And there are some almost-as-important items: * Mac OS support; this has been started, at least for the unfashionable and clunky sounding MPW compiler; CodeWarrior support (via AppleEvents, I think) would be nice * test suite -- at least the fundamental Distutils marbles should get a good exercise; it would also be nice to put together a bunch of toy module distributions and make sure that "build" and "install" on them do the right things... all automatically, of course! * reduce number of tracebacks: right now, certain errors in the setup script or on the command line can result in a traceback, when they should just result in SystemExit with "error in setup script: ..." or "error on command line: ..." * fold in Finn Bock's JPython compat. patch * fold in Michael Muller's "pkginfo" patch * finish and fold in my Python 1.5.1 compat. patch (only necessary as long as Distutils has a life of its own, outside Python) Well, I'd better get cracking ... Guido, we can do the CVS thing any time; I guess I'll mosey on downstairs. Greg -- Greg Ward - software developer gward at cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913 From effbot at telia.com Tue Mar 28 21:46:17 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 28 Mar 2000 21:46:17 +0200 Subject: [Python-Dev] mmapfile module References: <200003281925.OAA03287@eric.cnri.reston.va.us> <14561.3393.761177.776684@amarok.cnri.reston.va.us> Message-ID: <003501bf98ee$50097a20$34aab5d4@hagrid> Andrew M. Kuchling wrote: > The issue there is cross-platform compatibility; the Windows and Unix > versions take completely different constructor arguments, so how > should we paper over the differences? > > Unix arguments: (file descriptor, size, flags, protection) > Win32 arguments:(filename, tagname, size) > > We could just say, "OK, the args are completely different between > Win32 and Unix, despite it being the same function name". Maybe > that's best, because there seems no way to reconcile those two > different sets of arguments. I don't get this. Why expose low-level implementation details to the user (flags, protection, tagname)? (And how come the Windows implementation doesn't support read-only vs. read/write flags?) Unless the current implementation uses something radically different from mmap/MapViewOfFile, wouldn't an interface like: (filename, mode="rb", size=entire file, offset=0) be sufficient? (where mode can be "wb" or "wb+" or "rb+", optionally without the "b") From donb at init.com Tue Mar 28 22:46:06 2000 From: donb at init.com (Donald Beaudry) Date: Tue, 28 Mar 2000 15:46:06 -0500 Subject: [Python-Dev] None as a keyword / class methods References: Message-ID: <200003282046.PAA18822@zippy.init.com> ...sorry to jump in on the middle of this one, but. A while back I put a lot of thought into how to support class methods and class attributes. I feel that I solved the problem in a fairly complete way though the solution does have some warts. Here's an example: >>> class foo(base): ... value = 10 # this is an instance attribute called 'value' ... # as usual, it is shared between all instances ... # until explicitly set on a particular instance ... ... def set_value(self, x): ... print "instance method" ... self.value = x ... ... # ... # here come the weird part ... # ... class __class__: ... value = 5 # this is a class attribute called value ... ... def set_value(cl, x): ... print "class method" ... cl.value = x ... ... def set_instance_default_value(cl, x): ... cl._.value = x ... >>> f = foo() >>> f.value 10 >>> foo.value = 20 >>> f.value 10 >>> f.__class__.value 20 >>> foo._.value 10 >>> foo._.value = 1 >>> f.value 1 >>> foo.set_value(100) class method >>> foo.value 100 >>> f.value 1 >>> f.set_value(40) instance method >>> f.value 40 >>> foo._.value 1 >>> ff=foo() >>> foo.set_instance_default_value(15) >>> ff.value 15 >>> foo._.set_value(ff, 5) instance method >>> ff.value 5 >>> Is anyone still with me? The crux of the problem is that in the current python class/instance implementation, classes dont have attributes of their own. All of those things that look like class attributes are really there as defaults for the instances. To support true class attributes a new name space must be invented. Since I wanted class objects to look like any other object, I chose to move the "instance defaults" name space under the underscore attribute. This allows the class's unqualified namespace to refer to its own attributes. Clear as mud, right? In case you are wondering, yes, the code above is a working example. I released it a while back as the 'objectmodule' and just updated it to work with Python-1.5.2. The update has yet to be released. -- Donald Beaudry Ab Initio Software Corp. 201 Spring Street donb at init.com Lexington, MA 02421 ...Will hack for sushi... From akuchlin at mems-exchange.org Tue Mar 28 22:50:18 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Tue, 28 Mar 2000 15:50:18 -0500 (EST) Subject: [Python-Dev] mmapfile module In-Reply-To: <003501bf98ee$50097a20$34aab5d4@hagrid> References: <200003281925.OAA03287@eric.cnri.reston.va.us> <14561.3393.761177.776684@amarok.cnri.reston.va.us> <003501bf98ee$50097a20$34aab5d4@hagrid> Message-ID: <14561.6922.415063.279939@amarok.cnri.reston.va.us> Fredrik Lundh writes: >(And how come the Windows implementation doesn't support >read-only vs. read/write flags?) Good point; that should be fixed. > (filename, mode="rb", size=entire file, offset=0) >be sufficient? (where mode can be "wb" or "wb+" or "rb+", >optionally without the "b") Hmm... maybe we can dispose of the PROT_* argument that way on Unix. But how would you specify MAP_SHARED vs. MAP_PRIVATE, or MAP_ANONYMOUS? (MAP_FIXED seems useless to a Python programmer.) Another character in the mode argument, or a flags argument? Worse, as you pointed out in the same thread, MAP_ANONYMOUS on OSF/1 doesn't want to take a file descriptor at all. Also, the tag name on Windows seems important, from Gordon McMillan's explanation of it: http://www.python.org/pipermail/python-dev/1999-November/002808.html -- A.M. Kuchling http://starship.python.net/crew/amk/ You mustn't kill me. You don't love me. You d-don't even know me. -- The Furies kill Abel, in SANDMAN #66: "The Kindly Ones:10" From guido at python.org Tue Mar 28 23:02:04 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 16:02:04 -0500 Subject: [Python-Dev] None as a keyword / class methods In-Reply-To: Your message of "Tue, 28 Mar 2000 15:46:06 EST." <200003282046.PAA18822@zippy.init.com> References: <200003282046.PAA18822@zippy.init.com> Message-ID: <200003282102.QAA13041@eric.cnri.reston.va.us> > A while back I put a lot of thought into how to support class methods > and class attributes. I feel that I solved the problem in a fairly > complete way though the solution does have some warts. Here's an > example: [...] > Is anyone still with me? > > The crux of the problem is that in the current python class/instance > implementation, classes dont have attributes of their own. All of > those things that look like class attributes are really there as > defaults for the instances. To support true class attributes a new > name space must be invented. Since I wanted class objects to look > like any other object, I chose to move the "instance defaults" name > space under the underscore attribute. This allows the class's > unqualified namespace to refer to its own attributes. Clear as mud, > right? > > In case you are wondering, yes, the code above is a working example. > I released it a while back as the 'objectmodule' and just updated it > to work with Python-1.5.2. The update has yet to be released. This looks like it would break a lot of code. How do you refer to a superclass method? It seems that ClassName.methodName would refer to the class method, not to the unbound instance method. Also, moving the default instance attributes to a different namespace seems to be a semantic change that could change lots of things. I am still in favor of saying "Python has no class methods -- use module-global functions for that". Between the module, the class and the instance, there are enough namespaces -- we don't need another one. --Guido van Rossum (home page: http://www.python.org/~guido/) From pf at artcom-gmbh.de Tue Mar 28 23:01:29 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Tue, 28 Mar 2000 23:01:29 +0200 (MEST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: <200003281922.OAA03113@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 28, 2000 2:22:43 pm" Message-ID: I wrote: > > > Do we need a UserString class? > > Andy Robinson: > > This will probably be useful on top of the i18n stuff in due course, > > so I'd like it. > > > > Something Mike Da Silva and I have discussed a lot is implementing a > > higher-level 'typed string' library on top of the Unicode stuff. > > A 'typed string' is like a string, but knows what encoding it is in - > > possibly Unicode, possibly a native encoding and embodies some basic > > type safety and convenience notions, like not being able to add a > > Shift-JIS and an EUC string together. Iteration would always be per > > character, not per byte; and a certain amount of magic would say that > > if the string was (say) Japanese, it would acquire a few extra methods > > for doing some Japan-specific things like expanding half-width > > katakana. > > > > Of course, we can do this anyway, but I think defining the API clearly > > in UserString is a great idea. > Guido van Rossum: > Agreed. Please somebody send a patch! I feel unable to do, what Andy proposed. What I had in mind was a simple wrapper class around the builtin string type similar to UserDict and UserList which can be used to derive other classes from. I use UserList and UserDict quite often and find them very useful. They are simple and powerful and easy to extend. May be the things Andy Robinson proposed above belong into a sub class which inherits from a simple UserString class? Do we need an additional UserUnicode class for unicode string objects? Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From guido at python.org Tue Mar 28 23:56:49 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 16:56:49 -0500 Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: Your message of "Tue, 28 Mar 2000 23:01:29 +0200." References: Message-ID: <200003282156.QAA13361@eric.cnri.reston.va.us> [Peter Funk] > > > > Do we need a UserString class? > > > > Andy Robinson: > > > This will probably be useful on top of the i18n stuff in due course, > > > so I'd like it. > > > > > > Something Mike Da Silva and I have discussed a lot is implementing a > > > higher-level 'typed string' library on top of the Unicode stuff. > > > A 'typed string' is like a string, but knows what encoding it is in - > > > possibly Unicode, possibly a native encoding and embodies some basic > > > type safety and convenience notions, like not being able to add a > > > Shift-JIS and an EUC string together. Iteration would always be per > > > character, not per byte; and a certain amount of magic would say that > > > if the string was (say) Japanese, it would acquire a few extra methods > > > for doing some Japan-specific things like expanding half-width > > > katakana. > > > > > > Of course, we can do this anyway, but I think defining the API clearly > > > in UserString is a great idea. > > > Guido van Rossum: > > Agreed. Please somebody send a patch! [PF] > I feel unable to do, what Andy proposed. What I had in mind was a > simple wrapper class around the builtin string type similar to > UserDict and UserList which can be used to derive other classes from. Yes. I think Andy wanted his class to be a subclass of UserString. > I use UserList and UserDict quite often and find them very useful. > They are simple and powerful and easy to extend. Agreed. > May be the things Andy Robinson proposed above belong into a sub class > which inherits from a simple UserString class? Do we need > an additional UserUnicode class for unicode string objects? It would be great if there was a single UserString class which would work with either Unicode or 8-bit strings. I think that shouldn't be too hard, since it's just a wrapper. So why don't you give the UserString.py a try and leave Andy's wish alone? --Guido van Rossum (home page: http://www.python.org/~guido/) From pf at artcom-gmbh.de Tue Mar 28 23:47:59 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Tue, 28 Mar 2000 23:47:59 +0200 (MEST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <02c901bf989b$be203d80$34aab5d4@hagrid> from Fredrik Lundh at "Mar 28, 2000 11:55:19 am" Message-ID: Hi! > Peter Funk wrote: > > Why should modules be moved into packages? I don't get it. > Fredrik Lundh: > fwiw, neither do I... Pheeewww... And I thought I'am the only one! ;-) > I'm not so sure that Python really needs a simple reorganization > of the existing set of standard library modules. just moving the > modules around won't solve the real problems with the 1.5.2 std > library... Right. I propose to leave the namespace flat. I like to argue with Brad J. Cox ---the author of the book "Object Oriented Programming - An Evolutionary Approach" Addison Wesley, 1987--- who proposes the idea of what he calls a "Software-IC": He looks closely to design process of electronic engineers which ussually deal with large data books with prefabricated components. There are often hundreds of them in such a databook and most of them have terse and not very mnemonic names. But the engineers using them all day *know* after a short while that a 7400 chip is a TTL-chip containing 4 NAND gates. Nearly the same holds true for software engineers using Software-IC like 're' or 'struct' as their daily building blocks. A software engineer who is already familar with his/her building blocks has absolutely no advantage from a deeply nested namespace. Now for something completely different: Fredrik Lundh about the library documentation: > here's one proposal: > http://www.pythonware.com/people/fredrik/librarybook-contents.htm Whether 'md5', 'getpass' and 'traceback' fit into a category 'Commonly Used Modules' is ....ummmm.... at least a bit questionable. But we should really focus the discussion on the structure of the documentation. Since many standard library modules belong into several logical catagories at once, a true tree structured organization is simply not sufficient to describe everything. So it is important to set up pointers between related functionality. For example 'string.replace' is somewhat related to 're.sub' or 'getpass' is related to 'crypt', however 'crypt' is related to 'md5' and so on. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From pf at artcom-gmbh.de Wed Mar 29 00:13:02 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 29 Mar 2000 00:13:02 +0200 (MEST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules _tkinter.c,1.91,1.92 In-Reply-To: <200003282007.PAA12045@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 28, 2000 3: 7: 9 pm" Message-ID: Hi! Guido van Rossum: > Modified Files: > _tkinter.c [...] > *** 491,501 **** > > v->interp = Tcl_CreateInterp(); > - > - #if TKMAJORMINOR == 8001 > - TclpInitLibraryPath(baseName); > - #endif /* TKMAJORMINOR */ > > ! #if defined(macintosh) && TKMAJORMINOR >= 8000 > ! /* This seems to be needed since Tk 8.0 */ > ClearMenuBar(); > TkMacInitMenus(v->interp); > --- 475,481 ---- > > v->interp = Tcl_CreateInterp(); > > ! #if defined(macintosh) > ! /* This seems to be needed */ > ClearMenuBar(); > TkMacInitMenus(v->interp); > *************** Are you sure that the call to 'TclpInitLibraryPath(baseName);' is not required in Tcl/Tk 8.1, 8.2, 8.3 ? I would propose the following: +#if TKMAJORMINOR >= 8001 + TclpInitLibraryPath(baseName); +# endif /* TKMAJORMINOR */ Here I quote from the Tcl8.3 source distribution: /* *--------------------------------------------------------------------------- * * TclpInitLibraryPath -- * * Initialize the library path at startup. We have a minor * metacircular problem that we don't know the encoding of the * operating system but we may need to talk to operating system * to find the library directories so that we know how to talk to * the operating system. * * We do not know the encoding of the operating system. * We do know that the encoding is some multibyte encoding. * In that multibyte encoding, the characters 0..127 are equivalent * to ascii. * * So although we don't know the encoding, it's safe: * to look for the last slash character in a path in the encoding. * to append an ascii string to a path. * to pass those strings back to the operating system. * * But any strings that we remembered before we knew the encoding of * the operating system must be translated to UTF-8 once we know the * encoding so that the rest of Tcl can use those strings. * * This call sets the library path to strings in the unknown native * encoding. TclpSetInitialEncodings() will translate the library * path from the native encoding to UTF-8 as soon as it determines * what the native encoding actually is. * * Called at process initialization time. * * Results: * None. */ Sorry, but I don't know enough about this in connection with the unicode patches and if we should pay attention to this. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From akuchlin at mems-exchange.org Wed Mar 29 00:21:07 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Tue, 28 Mar 2000 17:21:07 -0500 (EST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: References: <02c901bf989b$be203d80$34aab5d4@hagrid> Message-ID: <14561.12371.857178.550236@amarok.cnri.reston.va.us> Peter Funk quoted: >Fredrik Lundh: >> I'm not so sure that Python really needs a simple reorganization >> of the existing set of standard library modules. just moving the >> modules around won't solve the real problems with the 1.5.2 std >> library... >Right. I propose to leave the namespace flat. I third that comment. Arguments against reorganizing for 1.6: 1) I doubt that we have time to do a good job of it for 1.6. (1.7, maybe.) 2) Right now there's no way for third-party extensions to add themselves to a package in the standard library. Once Python finds foo/__init__.py, it won't look for site-packages/foo/__init__.py, so if you grab, say, "crypto" as a package name in the standard library, it's forever lost to third-party extensions. 3) Rearranging the modules is a good chance to break backward compatibility in other ways. If you want to rewrite, say, httplib in a non-compatible way to support HTTP/1.1, then the move from httplib.py to net.http.py is a great chance to do that, and leave httplib.py as-is for old programs. If you just copy httplib.py, rewriting net.http.py is now harder, since you have to either maintain compatibility or break things *again* in the next version of Python. 4) We wanted to get 1.6 out fairly quickly, and therefore limited the number of features that would get in. (Vide the "Python 1.6 timing" thread last ... November, was it?) Packagizing is feature creep that'll slow things down Maybe we should start a separate list to discuss a package hierarchy for 1.7. But for 1.6, forget it. -- A.M. Kuchling http://starship.python.net/crew/amk/ Posting "Please send e-mail, since I don't read this group": Poster is rendered illiterate by a simple trepanation. -- Kibo, in the Happynet Manifesto From guido at python.org Wed Mar 29 00:24:46 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 17:24:46 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules _tkinter.c,1.91,1.92 In-Reply-To: Your message of "Wed, 29 Mar 2000 00:13:02 +0200." References: Message-ID: <200003282224.RAA13573@eric.cnri.reston.va.us> > Are you sure that the call to 'TclpInitLibraryPath(baseName);' > is not required in Tcl/Tk 8.1, 8.2, 8.3 ? > I would propose the following: > > +#if TKMAJORMINOR >= 8001 > + TclpInitLibraryPath(baseName); > +# endif /* TKMAJORMINOR */ It is an internal routine which shouldn't be called at all by the user. I believe it is called internally at the right time. Note that we now call Tcl_FindExecutable(), which *is* intended to be called by the user (and exists in all 8.x versions) -- maybe this causes TclpInitLibraryPath() to be called. I tested it on Solaris, with Tcl/Tk versions 8.0.4, 8.1.1, 8.2.3 and 8.3.0, and it doesn't seem to make any difference, as long as that version of Tcl/Tk has actually been installed. (When it's not installed, TclpInitLibraryPath() doesn't help either.) I still have to check this on Windows -- maybe it'll have to go back in. [...] > Sorry, but I don't know enough about this in connection with the > unicode patches and if we should pay attention to this. It seems to be allright... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Mar 29 00:25:27 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 17:25:27 -0500 Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: Your message of "Tue, 28 Mar 2000 17:21:07 EST." <14561.12371.857178.550236@amarok.cnri.reston.va.us> References: <02c901bf989b$be203d80$34aab5d4@hagrid> <14561.12371.857178.550236@amarok.cnri.reston.va.us> Message-ID: <200003282225.RAA13586@eric.cnri.reston.va.us> > Maybe we should start a separate list to discuss a package hierarchy > for 1.7. But for 1.6, forget it. Yes! Please! --Guido van Rossum (home page: http://www.python.org/~guido/) From donb at init.com Wed Mar 29 00:56:03 2000 From: donb at init.com (Donald Beaudry) Date: Tue, 28 Mar 2000 17:56:03 -0500 Subject: [Python-Dev] None as a keyword / class methods References: <200003282046.PAA18822@zippy.init.com> <200003282102.QAA13041@eric.cnri.reston.va.us> Message-ID: <200003282256.RAA21080@zippy.init.com> Guido van Rossum wrote, > This looks like it would break a lot of code. Only if it were to replace the current implementation. Perhaps I inadvertly made that suggestion. It was not my intention. Another way to look at my post is to say that it was intended to point out why we cant have class methods in the current implementation... it's a name space issue. > How do you refer to a superclass method? It seems that > ClassName.methodName would refer to the class method, not to the > unbound instance method. Right. To get at the unbound instance methods you must go through the 'unbound accessor' which is accessed via the underscore. If you wanted to chain to a superclass method it would look like this: class child(parent): def do_it(self, x): z = parent._.do_it(self, x) return z > Also, moving the default instance attributes to a different > namespace seems to be a semantic change that could change lots of > things. I agree... and that's why I wouldnt suggest doing it to the current class/instance implementation. However, for those who insist on having class attributes and methods I think it would be cool to settle on a standard "syntax". > I am still in favor of saying "Python has no class methods -- use > module-global functions for that". Or use a class/instance implementation provided via an extension module rather than the built-in one. The class named 'base' shown in my example is a class designed for that purpose. > Between the module, the class and the instance, there are enough > namespaces -- we don't need another one. The topic comes up often enough to make me think some might disagree. -- Donald Beaudry Ab Initio Software Corp. 201 Spring Street donb at init.com Lexington, MA 02421 ...So much code, so little time... From moshez at math.huji.ac.il Wed Mar 29 01:24:29 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 29 Mar 2000 01:24:29 +0200 (IST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <14561.12371.857178.550236@amarok.cnri.reston.va.us> Message-ID: On Tue, 28 Mar 2000, Andrew M. Kuchling wrote: > Peter Funk quoted: > >Fredrik Lundh: > >> I'm not so sure that Python really needs a simple reorganization > >> of the existing set of standard library modules. just moving the > >> modules around won't solve the real problems with the 1.5.2 std > >> library... > >Right. I propose to leave the namespace flat. > > I third that comment. Arguments against reorganizing for 1.6: Let me just note that my original great renaming proposal was titled "1.7". I'm certain I don't want it to affect the 1.6 release -- my god, it's almost alpha time and we don't even know how to reorganize. Strictly 1.7. > 4) We wanted to get 1.6 out fairly quickly, and therefore limited > the number of features that would get in. (Vide the "Python 1.6 > timing" thread last ... November, was it?) Packagizing is feature > creep that'll slow things down Oh yes. I'm waiting for that 1.6....I wouldn't want to stall it for the world. But this is a good chance as any to discuss reasons, before strategies. Here's why I believe we should re-organize Python modules: -- modules fall quite naturally into subpackages. Reducing the number of toplevel modules will lessen the clutter -- it would be easier to synchronize documentation and code (think "automatically generated documentation") -- it would enable us to move toward a CPAN-like module repository, together with the dist-sig efforts. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From gmcm at hypernet.com Wed Mar 29 01:44:27 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Tue, 28 Mar 2000 18:44:27 -0500 Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <14561.12371.857178.550236@amarok.cnri.reston.va.us> References: Message-ID: <1257835425-27941123@hypernet.com> Andrew M. Kuchling wrote: [snip] > 2) Right now there's no way for third-party extensions to add > themselves to a package in the standard library. Once Python finds > foo/__init__.py, it won't look for site-packages/foo/__init__.py, so > if you grab, say, "crypto" as a package name in the standard library, > it's forever lost to third-party extensions. That way lies madness. While I'm happy to carp at Java for requiring "com", "net" or whatever as a top level name, their intent is correct: the names grabbed by the Python standard packages belong to no one but the Python standard packages. If you *don't* do that, upgrades are an absolute nightmare. Marc-Andre grabbed "mx". If (as I rather suspect ) he wants to remake the entire standard lib in his image, he's welcome to - *under* mx. What would happen if he (and everyone else) installed themselves *into* my core packages, then I decided I didn't want his stuff? More than likely I'd have to scrub the damn installation and start all over again. - Gordon From DavidA at ActiveState.com Wed Mar 29 02:01:57 2000 From: DavidA at ActiveState.com (David Ascher) Date: Tue, 28 Mar 2000 16:01:57 -0800 Subject: [Python-Dev] yeah! for Jeremy and Greg Message-ID: I'm thrilled to see the extended call syntax patches go in! One less wart in the language! Jeremy ZitBlaster Hylton and Greg Noxzema Ewing! --david From pf at artcom-gmbh.de Wed Mar 29 01:53:50 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 29 Mar 2000 01:53:50 +0200 (MEST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: <200003282156.QAA13361@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 28, 2000 4:56:49 pm" Message-ID: Hi! > [Peter Funk] > > > > > Do we need a UserString class? [...] Guido van Rossum: > So why don't you give the UserString.py a try and leave Andy's wish alone? Okay. Here we go. Could someone please have a close eye on this? I've haccked it up in hurry. ---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ---- #!/usr/bin/env python """A user-defined wrapper around string objects Note: string objects have grown methods in Python 1.6 This module requires Python 1.6 or later. """ import sys # XXX Totally untested and hacked up until 2:00 am with too less sleep ;-) class UserString: def __init__(self, string=""): self.data = string def __repr__(self): return repr(self.data) def __cmp__(self, string): if isinstance(string, UserString): return cmp(self.data, string.data) else: return cmp(self.data, string) def __len__(self): return len(self.data) # methods defined in alphabetical order def capitalize(self): return self.__class__(self.data.capitalize()) def center(self, width): return self.__class__(self.data.center(width)) def count(self, sub, start=0, end=sys.maxint): return self.data.count(sub, start, end) def encode(self, encoding=None, errors=None): # XXX improve this? if encoding: if errors: return self.__class__(self.data.encode(encoding, errors)) else: return self.__class__(self.data.encode(encoding)) else: return self.__class__(self.data.encode()) def endswith(self): raise NotImplementedError def find(self, sub, start=0, end=sys.maxint): return self.data.find(sub, start, end) def index(self): return self.data.index(sub, start, end) def isdecimal(self): return self.data.isdecimal() def isdigit(self): return self.data.isdigit() def islower(self): return self.data.islower() def isnumeric(self): return self.data.isnumeric() def isspace(self): return self.data.isspace() def istitle(self): return self.data.istitle() def isupper(self): return self.data.isupper() def join(self, seq): return self.data.join(seq) def ljust(self, width): return self.__class__(self.data.ljust(width)) def lower(self): return self.__class__(self.data.lower()) def lstrip(self): return self.__class__(self.data.lstrip()) def replace(self, old, new, maxsplit=-1): return self.__class__(self.data.replace(old, new, maxsplit)) def rfind(self, sub, start=0, end=sys.maxint): return self.data.rfind(sub, start, end) def rindex(self, sub, start=0, end=sys.maxint): return self.data.rindex(sub, start, end) def rjust(self, width): return self.__class__(self.data.rjust(width)) def rstrip(self): return self.__class__(self.data.rstrip()) def split(self, sep=None, maxsplit=-1): return self.data.split(sep, maxsplit) def splitlines(self, maxsplit=-1): return self.data.splitlines(maxsplit) def startswith(self, prefix, start=0, end=sys.maxint): return self.data.startswith(prefix, start, end) def strip(self): return self.__class__(self.data.strip()) def swapcase(self): return self.__class__(self.data.swapcase()) def title(self): return self.__class__(self.data.title()) def translate(self, table, deletechars=""): return self.__class__(self.data.translate(table, deletechars)) def upper(self): return self.__class__(self.data.upper()) def __add__(self, other): if isinstance(other, UserString): return self.__class__(self.data + other.data) elif isinstance(other, type(self.data)): return self.__class__(self.data + other) else: return self.__class__(self.data + str(other)) def __radd__(self, other): if isinstance(other, type(self.data)): return self.__class__(other + self.data) else: return self.__class__(str(other) + self.data) def __mul__(self, n): return self.__class__(self.data*n) __rmul__ = __mul__ def _test(): s = UserString("abc") u = UserString(u"efg") # XXX add some real tests here? return [0] if __name__ == "__main__": import sys sys.exit(_test()[0]) From effbot at telia.com Wed Mar 29 01:12:55 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 29 Mar 2000 01:12:55 +0200 Subject: [Python-Dev] yeah! for Jeremy and Greg References: Message-ID: <012301bf990b$2a494c80$34aab5d4@hagrid> > I'm thrilled to see the extended call syntax patches go in! One less wart > in the language! but did he compile before checking in? ..\Python\compile.c(1225) : error C2065: 'CALL_FUNCTION_STAR' : undeclared identifier (compile.c and opcode.h both mention this identifier, but nobody defines it... should it be CALL_FUNCTION_VAR, perhaps?) From guido at python.org Wed Mar 29 02:07:34 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 19:07:34 -0500 Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: Your message of "Wed, 29 Mar 2000 01:53:50 +0200." References: Message-ID: <200003290007.TAA16081@eric.cnri.reston.va.us> > > [Peter Funk] > > > > > > Do we need a UserString class? > [...] > Guido van Rossum: > > So why don't you give the UserString.py a try and leave Andy's wish alone? [Peter] > Okay. Here we go. Could someone please have a close eye on this? > I've haccked it up in hurry. Good job! Go get some sleep, and tomorrow morning when you're fresh, compare it to UserList. From visual inpsection, you seem to be missing __getitem__ and __getslice__, and maybe more (of course not __set*__). --Guido van Rossum (home page: http://www.python.org/~guido/) From ping at lfw.org Wed Mar 29 02:13:24 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 28 Mar 2000 18:13:24 -0600 (CST) Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: <012301bf990b$2a494c80$34aab5d4@hagrid> Message-ID: On Wed, 29 Mar 2000, Fredrik Lundh wrote: > > I'm thrilled to see the extended call syntax patches go in! One less wart > > in the language! > > but did he compile before checking in? You beat me to it. I read David's message and got so excited i just had to try it right away. So i updated my CVS tree, did "make", and got the same error: make[1]: Entering directory `/home/ping/dev/python/dist/src/Python' gcc -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c compile.c -o compile.o compile.c: In function `com_call_function': compile.c:1225: `CALL_FUNCTION_STAR' undeclared (first use in this function) compile.c:1225: (Each undeclared identifier is reported only once compile.c:1225: for each function it appears in.) make[1]: *** [compile.o] Error 1 > (compile.c and opcode.h both mention this identifier, but > nobody defines it... should it be CALL_FUNCTION_VAR, > perhaps?) But CALL_FUNCTION_STAR is mentioned in the comments... #define CALL_FUNCTION 131 /* #args + (#kwargs<<8) */ #define MAKE_FUNCTION 132 /* #defaults */ #define BUILD_SLICE 133 /* Number of items */ /* The next 3 opcodes must be contiguous and satisfy (CALL_FUNCTION_STAR - CALL_FUNCTION) & 3 == 1 */ #define CALL_FUNCTION_VAR 140 /* #args + (#kwargs<<8) */ #define CALL_FUNCTION_KW 141 /* #args + (#kwargs<<8) */ #define CALL_FUNCTION_VAR_KW 142 /* #args + (#kwargs<<8) */ The condition (CALL_FUNCTION_STAR - CALL_FUNCTION) & 3 == 1 doesn't make much sense, though... -- ?!ng From jeremy at cnri.reston.va.us Wed Mar 29 02:18:54 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Tue, 28 Mar 2000 19:18:54 -0500 (EST) Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: <012301bf990b$2a494c80$34aab5d4@hagrid> References: <012301bf990b$2a494c80$34aab5d4@hagrid> Message-ID: <14561.19438.157799.810802@goon.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: >> I'm thrilled to see the extended call syntax patches go in! One >> less wart in the language! FL> but did he compile before checking in? Indeed, but not often enough :-). FL> ..\Python\compile.c(1225) : error C2065: 'CALL_FUNCTION_STAR' : FL> undeclared identifier FL> (compile.c and opcode.h both mention this identifier, but nobody FL> defines it... should it be CALL_FUNCTION_VAR, perhaps?) This was a last minute change of names. I had previously compiled under the old names. The Makefile doesn't describe the dependency between opcode.h and compile.c. And the compile.o file I had worked, because the only change was to the name of a macro. It's too bad the Makefile doesn't have all the dependencies. It seems that it's necessary to do a make clean before checking in a change that affects many files. Jeremy From klm at digicool.com Wed Mar 29 02:30:05 2000 From: klm at digicool.com (Ken Manheimer) Date: Tue, 28 Mar 2000 19:30:05 -0500 (EST) Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: Message-ID: On Tue, 28 Mar 2000, David Ascher wrote: > I'm thrilled to see the extended call syntax patches go in! One less wart > in the language! Me too! Even the lisps i used to know (albeit ancient, according to eric) couldn't get it as tidy as this. (Silly me, now i'm imagining we're going to see operator assignments just around the bend. "Give them a tasty morsel, they ask for your dinner..."-) Ken klm at digicool.com From ping at lfw.org Wed Mar 29 02:35:54 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 28 Mar 2000 18:35:54 -0600 (CST) Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: <14561.19438.157799.810802@goon.cnri.reston.va.us> Message-ID: On Tue, 28 Mar 2000, Jeremy Hylton wrote: > > It's too bad the Makefile doesn't have all the dependencies. It seems > that it's necessary to do a make clean before checking in a change > that affects many files. I updated again and rebuilt. >>> def sum(*args): ... s = 0 ... for x in args: s = s + x ... return s ... >>> sum(2,3,4) 9 >>> sum(*[2,3,4]) 9 >>> x = (2,3,4) >>> sum(*x) 9 >>> def func(a, b, c): ... print a, b, c ... >>> func(**{'a':2, 'b':1, 'c':6}) 2 1 6 >>> func(**{'c':8, 'a':1, 'b':9}) 1 9 8 >>> *cool*. So does this completely obviate the need for "apply", then? apply(x, y, z) <==> x(*y, **z) -- ?!ng From guido at python.org Wed Mar 29 02:35:17 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 19:35:17 -0500 Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: Your message of "Tue, 28 Mar 2000 18:35:54 CST." References: Message-ID: <200003290035.TAA16278@eric.cnri.reston.va.us> > *cool*. > > So does this completely obviate the need for "apply", then? > > apply(x, y, z) <==> x(*y, **z) I think so (except for backwards compatibility). The 1.6 docs for apply should point this out! --Guido van Rossum (home page: http://www.python.org/~guido/) From DavidA at ActiveState.com Wed Mar 29 02:42:20 2000 From: DavidA at ActiveState.com (David Ascher) Date: Tue, 28 Mar 2000 16:42:20 -0800 Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: Message-ID: > I updated again and rebuilt. > > >>> def sum(*args): > ... s = 0 > ... for x in args: s = s + x > ... return s > ... > >>> sum(2,3,4) > 9 > >>> sum(*[2,3,4]) > 9 > >>> x = (2,3,4) > >>> sum(*x) > 9 > >>> def func(a, b, c): > ... print a, b, c > ... > >>> func(**{'a':2, 'b':1, 'c':6}) > 2 1 6 > >>> func(**{'c':8, 'a':1, 'b':9}) > 1 9 8 > >>> > > *cool*. But most importantly, IMO: class SubClass(Class): def __init__(self, a, *args, **kw): self.a = a Class.__init__(self, *args, **kw) Much neater. From bwarsaw at cnri.reston.va.us Wed Mar 29 02:46:11 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 28 Mar 2000 19:46:11 -0500 (EST) Subject: [Python-Dev] yeah! for Jeremy and Greg References: <14561.19438.157799.810802@goon.cnri.reston.va.us> Message-ID: <14561.21075.637108.322536@anthem.cnri.reston.va.us> Uh oh. Fresh CVS update and make clean, make: -------------------- snip snip -------------------- Python 1.5.2+ (#20, Mar 28 2000, 19:37:38) [GCC 2.8.1] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> def sum(*args): ... s = 0 ... for x in args: s = s + x ... return s ... >>> class Nums: ... def __getitem__(self, i): ... if i >= 10 or i < 0: raise IndexError ... return i ... >>> n = Nums() >>> for i in n: print i ... 0 1 2 3 4 5 6 7 8 9 >>> sum(*n) Traceback (innermost last): File "", line 1, in ? SystemError: bad argument to internal function -------------------- snip snip -------------------- -Barry From bwarsaw at cnri.reston.va.us Wed Mar 29 03:02:16 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 28 Mar 2000 20:02:16 -0500 (EST) Subject: [Python-Dev] yeah! for Jeremy and Greg References: <14561.19438.157799.810802@goon.cnri.reston.va.us> <14561.21075.637108.322536@anthem.cnri.reston.va.us> Message-ID: <14561.22040.383370.283163@anthem.cnri.reston.va.us> Changing the definition of class Nums to class Nums: def __getitem__(self, i): if 0 <= i < 10: return i raise IndexError def __len__(self): return 10 I.e. adding the __len__() method avoids the SystemError. Either the *arg call should not depend on the sequence being lenght-able, or it should error check that the length calculation doesn't return -1 or raise an exception. Looking at PySequence_Length() though, it seems that m->sq_length(s) can return -1 without setting a type_error. So the fix is either to include a check for return -1 in PySequence_Length() when calling sq_length, or instance_length() should set a TypeError when it has no __len__() method and returns -1. I gotta run so I can't follow this through -- I'm sure I'll see the right solution from someone in tomorrow mornings email :) -Barry From ping at lfw.org Wed Mar 29 03:17:27 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 28 Mar 2000 19:17:27 -0600 (CST) Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: <14561.22040.383370.283163@anthem.cnri.reston.va.us> Message-ID: On Tue, 28 Mar 2000, Barry A. Warsaw wrote: > > Changing the definition of class Nums to > > class Nums: > def __getitem__(self, i): > if 0 <= i < 10: return i > raise IndexError > def __len__(self): > return 10 > > I.e. adding the __len__() method avoids the SystemError. It should be noted that "apply" has the same problem, with a different counterintuitive error message: >>> n = Nums() >>> apply(sum, n) Traceback (innermost last): File "", line 1, in ? AttributeError: __len__ -- ?!ng From jeremy at cnri.reston.va.us Wed Mar 29 04:59:26 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Tue, 28 Mar 2000 21:59:26 -0500 (EST) Subject: [Python-Dev] yeah! for Jeremy and Greg In-Reply-To: References: Message-ID: <14561.29070.940238.542509@bitdiddle.cnri.reston.va.us> >>>>> "DA" == David Ascher writes: DA> But most importantly, IMO: DA> class SubClass(Class): DA> def __init__(self, a, *args, **kw): DA> self.a = a DA> Class.__init__(self, *args, **kw) DA> Much neater. This version of method overloading was what I liked most about Greg's patch. Note that I also prefer: class SubClass(Class): super_init = Class.__init__ def __init__(self, a, *args, **kw): self.a = a self.super_init(*args, **kw) I've been happy to have all the overridden methods explicitly labelled at the top of a class lately. It is much easier to change the class hierarchy later. Jeremy From gward at cnri.reston.va.us Wed Mar 29 05:15:00 2000 From: gward at cnri.reston.va.us (Greg Ward) Date: Tue, 28 Mar 2000 22:15:00 -0500 Subject: [Python-Dev] __debug__ and py_compile Message-ID: <20000328221500.A3290@cnri.reston.va.us> Hi all -- a particularly active member of the Distutils-SIG brought the global '__debug__' flag to my attention, since I (and thus my code) didn't know if calling 'py_compile.compile()' would result in a ".pyc" or a ".pyo" file. It appears that, using __debug__, you can determine what you're going to get. Cool! However, it doesn't look like you can *choose* what you're going to get. Is this correct? Ie. does the presence/absence of -O when the interpreter starts up *completely* decide how code is compiled? Also, can I rely on __debug__ being there in the future? How about in the past? I still occasionally ponder making Distutils compatible with Python 1.5.1. Thanks -- Greg From guido at python.org Wed Mar 29 06:08:12 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Mar 2000 23:08:12 -0500 Subject: [Python-Dev] __debug__ and py_compile In-Reply-To: Your message of "Tue, 28 Mar 2000 22:15:00 EST." <20000328221500.A3290@cnri.reston.va.us> References: <20000328221500.A3290@cnri.reston.va.us> Message-ID: <200003290408.XAA17991@eric.cnri.reston.va.us> > a particularly active member of the Distutils-SIG brought the > global '__debug__' flag to my attention, since I (and thus my code) > didn't know if calling 'py_compile.compile()' would result in a ".pyc" > or a ".pyo" file. It appears that, using __debug__, you can determine > what you're going to get. Cool! > > However, it doesn't look like you can *choose* what you're going to > get. Is this correct? Ie. does the presence/absence of -O when the > interpreter starts up *completely* decide how code is compiled? Correct. You (currently) can't change the opt setting of the compiler. (It was part of the compiler restructuring to give more freedom here; this has been pushed back to 1.7.) > Also, can I rely on __debug__ being there in the future? How about in > the past? I still occasionally ponder making Distutils compatible with > Python 1.5.1. __debug__ is as old as the assert statement, going back to at least 1.5.0. --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez at math.huji.ac.il Wed Mar 29 07:35:51 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 29 Mar 2000 07:35:51 +0200 (IST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <1257835425-27941123@hypernet.com> Message-ID: On Tue, 28 Mar 2000, Gordon McMillan wrote: > What would happen if he (and everyone else) installed > themselves *into* my core packages, then I decided I didn't > want his stuff? More than likely I'd have to scrub the damn > installation and start all over again. I think Greg Stein answered that objection, by reminding us that the filesystem isn't the only way to set up a package hierarchy. In particular, even with Python's current module system, there is no need to scrub installations: Python core modules go (under UNIX) in /usr/local/lib/python1.5, and 3rd party modules go in /usr/local/lib/python1.5/site-packages. Need to remove stuff? Remove whatever is in /usr/local/lib/python1.5/site-packages. Need to upgrade? Just backup /usr/local/lib/python1.5/site-packages, remove /usr/local/lib/python1.5/, install, and move 3rd party modules back from backup. This becomes even easier if the standard installation is in a JAR-like file, and 3rd party modules are also in a JAR-like file, but specified to be in their natural place. Wow! That was a long rant! Anyway, I already expressed my preference of the Perl way, over the Java way. For one thing, I don't want to have to register a domain just so I could distribute Python code -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From bwarsaw at cnri.reston.va.us Wed Mar 29 07:42:34 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 29 Mar 2000 00:42:34 -0500 (EST) Subject: [Python-Dev] yeah! for Jeremy and Greg References: <14561.19438.157799.810802@goon.cnri.reston.va.us> <14561.21075.637108.322536@anthem.cnri.reston.va.us> Message-ID: <14561.38858.41246.28460@anthem.cnri.reston.va.us> >>>>> "BAW" == Barry A Warsaw writes: BAW> Uh oh. Fresh CVS update and make clean, make: >>> sum(*n) | Traceback (innermost last): | File "", line 1, in ? | SystemError: bad argument to internal function Here's a proposed patch that will cause a TypeError to be raised instead. -Barry -------------------- snip snip -------------------- Index: abstract.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/abstract.c,v retrieving revision 2.33 diff -c -r2.33 abstract.c *** abstract.c 2000/03/10 22:55:18 2.33 --- abstract.c 2000/03/29 05:36:21 *************** *** 860,866 **** PyObject *s; { PySequenceMethods *m; ! if (s == NULL) { null_error(); return -1; --- 860,867 ---- PyObject *s; { PySequenceMethods *m; ! int size = -1; ! if (s == NULL) { null_error(); return -1; *************** *** 868,877 **** m = s->ob_type->tp_as_sequence; if (m && m->sq_length) ! return m->sq_length(s); ! type_error("len() of unsized object"); ! return -1; } PyObject * --- 869,879 ---- m = s->ob_type->tp_as_sequence; if (m && m->sq_length) ! size = m->sq_length(s); ! if (size < 0) ! type_error("len() of unsized object"); ! return size; } PyObject * Index: ceval.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Python/ceval.c,v retrieving revision 2.169 diff -c -r2.169 ceval.c *** ceval.c 2000/03/28 23:49:16 2.169 --- ceval.c 2000/03/29 05:39:00 *************** *** 1636,1641 **** --- 1636,1649 ---- break; } nstar = PySequence_Length(stararg); + if (nstar < 0) { + if (!PyErr_Occurred) + PyErr_SetString( + PyExc_TypeError, + "len() of unsized object"); + x = NULL; + break; + } } if (nk > 0) { if (kwdict == NULL) { From bwarsaw at cnri.reston.va.us Wed Mar 29 07:46:19 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Wed, 29 Mar 2000 00:46:19 -0500 (EST) Subject: [Python-Dev] yeah! for Jeremy and Greg References: <14561.22040.383370.283163@anthem.cnri.reston.va.us> Message-ID: <14561.39083.748093.694726@anthem.cnri.reston.va.us> >>>>> "KY" == Ka-Ping Yee writes: | It should be noted that "apply" has the same problem, with a | different counterintuitive error message: >> n = Nums() apply(sum, n) | Traceback (innermost last): | File "", line 1, in ? | AttributeError: __len__ The patch I just posted fixes this too. The error message ain't great, but at least it's consistent with the direct call. -Barry -------------------- snip snip -------------------- Traceback (innermost last): File "/tmp/doit.py", line 15, in ? print apply(sum, n) TypeError: len() of unsized object From pf at artcom-gmbh.de Wed Mar 29 08:30:22 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 29 Mar 2000 08:30:22 +0200 (MEST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: from Moshe Zadka at "Mar 29, 2000 7:44:42 am" Message-ID: Hi! > On Wed, 29 Mar 2000, Peter Funk wrote: > > > class UserString: > > def __init__(self, string=""): > > self.data = string > ^^^^^^^ Moshe Zadka wrote: > Why do you feel there is a need to default? Strings are immutable I had something like this in my mind: class MutableString(UserString): """Python strings are immutable objects. But of course this can be changed in a derived class implementing the missing methods. >>> s = MutableString() >>> s[0:5] = "HUH?" """ def __setitem__(self, char): .... def __setslice__(self, i, j, substring): .... > What about __int__, __long__, __float__, __str__, __hash__? > And what about __getitem__ and __contains__? > And __complex__? I was obviously too tired and too eager to get this out! Thanks for reviewing and responding so quickly. I will add them. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From moshez at math.huji.ac.il Wed Mar 29 08:51:30 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 29 Mar 2000 08:51:30 +0200 (IST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: Message-ID: On Wed, 29 Mar 2000, Peter Funk wrote: > Moshe Zadka wrote: > > Why do you feel there is a need to default? Strings are immutable > > I had something like this in my mind: > > class MutableString(UserString): > """Python strings are immutable objects. But of course this can > be changed in a derived class implementing the missing methods. Then add the default in the constructor for MutableString.... eagerly-waiting-for-UserString.py-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Wed Mar 29 09:03:53 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 29 Mar 2000 09:03:53 +0200 (IST) Subject: [Python-Dev] 1.5.2->1.6 Changes Message-ID: I'm starting to compile a list of changes from 1.5.2 to 1.6. Here's what I came up with so far -- string objects now have methods (though they are still immutable) -- unicode support: Unicode strings are marked with u"string", and there is support for arbitrary encoders/decoders -- "in" operator can now be overriden in user-defined classes to mean anything: it calls the magic method __contains__ -- SRE is the new regular expression engine. re.py became an interface to the same engine. The new engine fully supports unicode regular expressions. -- Some methods which would take multiple arguments and treat them as a tuple were fixed: list.{append, insert, remove, count}, socket.connect -- Some modules were made obsolete -- filecmp.py (supersedes the old cmp.py and dircmp.py modules), -- tabnanny.py (make sure the source file doesn't assume a specific tab-width) -- win32reg (win32 registry editor) -- unicode module, and codecs package -- New calling syntax: f(*args, **kw) equivalent to apply(f, args, kw) -- _tkinter now uses the object, rather then string, interface to Tcl. Please e-mail me personally if you think of any other changes, and I'll try to integrate them into a complete "changes" document. Thanks in advance -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From esr at thyrsus.com Wed Mar 29 09:21:29 2000 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 29 Mar 2000 02:21:29 -0500 Subject: [Python-Dev] 1.5.2->1.6 Changes In-Reply-To: ; from Moshe Zadka on Wed, Mar 29, 2000 at 09:03:53AM +0200 References: Message-ID: <20000329022129.A15539@thyrsus.com> Moshe Zadka : > -- _tkinter now uses the object, rather then string, interface to Tcl. Hm, does this mean that the annoying requirement to do explicit gets and sets to move data between the Python world and the Tcl/Tk world is gone? -- Eric S. Raymond "A system of licensing and registration is the perfect device to deny gun ownership to the bourgeoisie." -- Vladimir Ilyich Lenin From moshez at math.huji.ac.il Wed Mar 29 09:22:54 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 29 Mar 2000 09:22:54 +0200 (IST) Subject: [Python-Dev] 1.5.2->1.6 Changes In-Reply-To: <20000329022129.A15539@thyrsus.com> Message-ID: On Wed, 29 Mar 2000, Eric S. Raymond wrote: > Moshe Zadka : > > -- _tkinter now uses the object, rather then string, interface to Tcl. > > Hm, does this mean that the annoying requirement to do explicit gets and > sets to move data between the Python world and the Tcl/Tk world is gone? I doubt it. It's just that Python and Tcl have such a different outlook about variables, that I don't think it can be slided over. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From pf at artcom-gmbh.de Wed Mar 29 11:16:17 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 29 Mar 2000 11:16:17 +0200 (MEST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: from Moshe Zadka at "Mar 29, 2000 8:51:30 am" Message-ID: Hi! Moshe Zadka: > eagerly-waiting-for-UserString.py-ly y'rs, Z. Well, I've added the missing methods. Unfortunately I ran out of time now and a 'test_userstring.py' derived from 'src/Lib/test/test_string.py' is still missing. Regards, Peter ---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ---- #!/usr/bin/env python """A user-defined wrapper around string objects Note: string objects have grown methods in Python 1.6 This module requires Python 1.6 or later. """ from types import StringType, UnicodeType import sys class UserString: def __init__(self, string): self.data = string def __str__(self): return str(self.data) def __repr__(self): return repr(self.data) def __int__(self): return int(self.data) def __long__(self): return long(self.data) def __float__(self): return float(self.data) def __hash__(self): return hash(self.data) def __cmp__(self, string): if isinstance(string, UserString): return cmp(self.data, string.data) else: return cmp(self.data, string) def __contains__(self, char): return char in self.data def __len__(self): return len(self.data) def __getitem__(self, index): return self.__class__(self.data[index]) def __getslice__(self, start, end): start = max(start, 0); end = max(end, 0) return self.__class__(self.data[start:end]) def __add__(self, other): if isinstance(other, UserString): return self.__class__(self.data + other.data) elif isinstance(other, StringType) or isinstance(other, UnicodeType): return self.__class__(self.data + other) else: return self.__class__(self.data + str(other)) def __radd__(self, other): if isinstance(other, StringType) or isinstance(other, UnicodeType): return self.__class__(other + self.data) else: return self.__class__(str(other) + self.data) def __mul__(self, n): return self.__class__(self.data*n) __rmul__ = __mul__ # the following methods are defined in alphabetical order: def capitalize(self): return self.__class__(self.data.capitalize()) def center(self, width): return self.__class__(self.data.center(width)) def count(self, sub, start=0, end=sys.maxint): return self.data.count(sub, start, end) def encode(self, encoding=None, errors=None): # XXX improve this? if encoding: if errors: return self.__class__(self.data.encode(encoding, errors)) else: return self.__class__(self.data.encode(encoding)) else: return self.__class__(self.data.encode()) def endswith(self, suffix, start=0, end=sys.maxint): return self.data.endswith(suffix, start, end) def find(self, sub, start=0, end=sys.maxint): return self.data.find(sub, start, end) def index(self, sub, start=0, end=sys.maxint): return self.data.index(sub, start, end) def isdecimal(self): return self.data.isdecimal() def isdigit(self): return self.data.isdigit() def islower(self): return self.data.islower() def isnumeric(self): return self.data.isnumeric() def isspace(self): return self.data.isspace() def istitle(self): return self.data.istitle() def isupper(self): return self.data.isupper() def join(self, seq): return self.data.join(seq) def ljust(self, width): return self.__class__(self.data.ljust(width)) def lower(self): return self.__class__(self.data.lower()) def lstrip(self): return self.__class__(self.data.lstrip()) def replace(self, old, new, maxsplit=-1): return self.__class__(self.data.replace(old, new, maxsplit)) def rfind(self, sub, start=0, end=sys.maxint): return self.data.rfind(sub, start, end) def rindex(self, sub, start=0, end=sys.maxint): return self.data.rindex(sub, start, end) def rjust(self, width): return self.__class__(self.data.rjust(width)) def rstrip(self): return self.__class__(self.data.rstrip()) def split(self, sep=None, maxsplit=-1): return self.data.split(sep, maxsplit) def splitlines(self, maxsplit=-1): return self.data.splitlines(maxsplit) def startswith(self, prefix, start=0, end=sys.maxint): return self.data.startswith(prefix, start, end) def strip(self): return self.__class__(self.data.strip()) def swapcase(self): return self.__class__(self.data.swapcase()) def title(self): return self.__class__(self.data.title()) def translate(self, table, deletechars=""): return self.__class__(self.data.translate(table, deletechars)) def upper(self): return self.__class__(self.data.upper()) class MutableString(UserString): """mutable string objects Python strings are immutable objects. This has the advantage, that strings may be used as dictionary keys. If this property isn't needed and you insist on changing string values in place instead, you may cheat and use MutableString. But the purpose of this class is an educational one: to prevent people from inventing their own mutable string class derived from UserString and than forget thereby to remove (override) the __hash__ method inherited from ^UserString. This would lead to errors that would be very hard to track down. A faster and better solution is to rewrite the program using lists.""" def __init__(self, string=""): self.data = string def __hash__(self): raise TypeError, "unhashable type (it is mutable)" def __setitem__(self, index, sub): if index < 0 or index >= len(self.data): raise IndexError self.data = self.data[:index] + sub + self.data[index+1:] def __delitem__(self, index): if index < 0 or index >= len(self.data): raise IndexError self.data = self.data[:index] + self.data[index+1:] def __setslice__(self, start, end, sub): start = max(start, 0); end = max(end, 0) if isinstance(sub, UserString): self.data = self.data[:start]+sub.data+self.data[end:] elif isinstance(sub, StringType) or isinstance(sub, UnicodeType): self.data = self.data[:start]+sub+self.data[end:] else: self.data = self.data[:start]+str(sub)+self.data[end:] def __delslice__(self, start, end): start = max(start, 0); end = max(end, 0) self.data = self.data[:start] + self.data[end:] def immutable(self): return UserString(self.data) def _test(): s = UserString("abc") u = UserString(u"efg") # XXX add some real tests here? return 0 if __name__ == "__main__": sys.exit(_test()) From mal at lemburg.com Wed Mar 29 11:34:21 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 29 Mar 2000 11:34:21 +0200 Subject: [Python-Dev] Great Renaming? What is the goal? References: <1257835425-27941123@hypernet.com> Message-ID: <38E1CE1D.7899B1BC@lemburg.com> Gordon McMillan wrote: > > Andrew M. Kuchling wrote: > [snip] > > 2) Right now there's no way for third-party extensions to add > > themselves to a package in the standard library. Once Python finds > > foo/__init__.py, it won't look for site-packages/foo/__init__.py, so > > if you grab, say, "crypto" as a package name in the standard library, > > it's forever lost to third-party extensions. > > That way lies madness. While I'm happy to carp at Java for > requiring "com", "net" or whatever as a top level name, their > intent is correct: the names grabbed by the Python standard > packages belong to no one but the Python standard > packages. If you *don't* do that, upgrades are an absolute > nightmare. > > Marc-Andre grabbed "mx". If (as I rather suspect ) he > wants to remake the entire standard lib in his image, he's > welcome to - *under* mx. Right, that's the way I see it too. BTW, where can I register the "mx" top-level package name ? Should these be registered in the NIST registry ? Will the names registered there be honored ? > What would happen if he (and everyone else) installed > themselves *into* my core packages, then I decided I didn't > want his stuff? More than likely I'd have to scrub the damn > installation and start all over again. That's a no-no, IMHO. Unless explicitly allowed, packages should *not* install themselves as subpackages to other existing top-level packages. If they do, its their problem if the hierarchy changes... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From moshez at math.huji.ac.il Wed Mar 29 11:59:47 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 29 Mar 2000 11:59:47 +0200 (IST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: Message-ID: On Wed, 29 Mar 2000, Peter Funk wrote: > Hi! > > Moshe Zadka: > > eagerly-waiting-for-UserString.py-ly y'rs, Z. > > Well, I've added the missing methods. Unfortunately I ran out of time now and > a 'test_userstring.py' derived from 'src/Lib/test/test_string.py' is still > missing. Great work, Peter! I really like UserString. However, I have two issues with MutableString: 1. I tshouldn't share implementation with UserString, otherwise your algorithm are not behaving with correct big-O properties. It should probably use a char-array (from the array module) as the internal representation. 2. It shouldn't share interface iwth UserString, since it doesn't have a proper implementation with __hash__. All in all, I probably disagree with making MutableString a subclass of UserString. If I have time later today, I'm hoping to be able to make my own MutableString From pf at artcom-gmbh.de Wed Mar 29 12:35:32 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 29 Mar 2000 12:35:32 +0200 (MEST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: from Moshe Zadka at "Mar 29, 2000 11:59:47 am" Message-ID: Hi! > > Moshe Zadka: > > > eagerly-waiting-for-UserString.py-ly y'rs, Z. > > > On Wed, 29 Mar 2000, Peter Funk wrote: > > Well, I've added the missing methods. Unfortunately I ran out of time now and > > a 'test_userstring.py' derived from 'src/Lib/test/test_string.py' is still > > missing. > Moshe Zadka schrieb: > Great work, Peter! I really like UserString. However, I have two issues > with MutableString: > > 1. I tshouldn't share implementation with UserString, otherwise your > algorithm are not behaving with correct big-O properties. It should > probably use a char-array (from the array module) as the internal > representation. Hmm.... I don't understand what you mean with 'big-O properties'. The internal representation of any object should be considered ... umm ... internal. > 2. It shouldn't share interface iwth UserString, since it doesn't have a > proper implementation with __hash__. What's wrong with my implementation of __hash__ raising a TypeError with the attribution 'unhashable object'. This is the same behaviour, if you try to add some other mutable object as key to dictionary: >>> l = [] >>> d = { l : 'foo' } Traceback (innermost last): File "", line 1, in ? TypeError: unhashable type > All in all, I probably disagree with making MutableString a subclass of > UserString. If I have time later today, I'm hoping to be able to make my > own MutableString As I tried to point out in the docstring of 'MutableString', I don't want people actually start using the 'MutableString' class. My Intentation was to prevent people from trying to invent their own and than probably wrong MutableString class derived from UserString. Only Newbies will really ever need mutable strings in Python (see FAQ). May be my 'MutableString' idea belongs somewhere into the to be written src/Doc/libuserstring.tex. But since Newbies tend to ignore docs ... Sigh. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From gmcm at hypernet.com Wed Mar 29 13:07:20 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Wed, 29 Mar 2000 06:07:20 -0500 Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: References: <1257835425-27941123@hypernet.com> Message-ID: <1257794452-30405909@hypernet.com> Moshe Zadka wrote: > On Tue, 28 Mar 2000, Gordon McMillan wrote: > > > What would happen if he (and everyone else) installed > > themselves *into* my core packages, then I decided I didn't > > want his stuff? More than likely I'd have to scrub the damn > > installation and start all over again. > > I think Greg Stein answered that objection, by reminding us that the > filesystem isn't the only way to set up a package hierarchy. You mean when Greg said: >Assuming that you use an archive like those found in my "small" distro or > Gordon's distro, then this is no problem. The archive simply recognizes > and maps "text.encoding.macbinary" to its own module. I don't know what this has to do with it. When we get around to the 'macbinary' part, we have already established that 'text.encoding' is the parent which should supply 'macbinary'. > In > particular, even with Python's current module system, there is no need to > scrub installations: Python core modules go (under UNIX) in > /usr/local/lib/python1.5, and 3rd party modules go in > /usr/local/lib/python1.5/site-packages. And if there's a /usr/local/lib/python1.5/text/encoding, there's no way that /usr/local/lib/python1.5/site- packages/text/encoding will get searched. I believe you could hack up an importer that did allow this, and I think you'd be 100% certifiable if you did. Just look at the surprise factor. Hacking stuff into another package is just as evil as math.pi = 42. > Anyway, I already expressed my preference of the Perl way, over the Java > way. For one thing, I don't want to have to register a domain just so I > could distribute Python code I haven't the foggiest what the "Perl way" is; I wouldn't be surprised if it relied on un-Pythonic sociological factors. I already said the Java mechanics are silly; uniqueness is what matters. When Python packages start selling in the four and five figure range , then a registry mechanism will likely be necessary. - Gordon From moshez at math.huji.ac.il Wed Mar 29 13:21:09 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 29 Mar 2000 13:21:09 +0200 (IST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: Message-ID: On Wed, 29 Mar 2000, Peter Funk wrote: > > 1. I tshouldn't share implementation with UserString, otherwise your > > algorithm are not behaving with correct big-O properties. It should > > probably use a char-array (from the array module) as the internal > > representation. > > Hmm.... I don't understand what you mean with 'big-O properties'. > The internal representation of any object should be considered ... > umm ... internal. Yes, but s[0] = 'a' Should take O(1) time, not O(len(s)) > > 2. It shouldn't share interface iwth UserString, since it doesn't have a > > proper implementation with __hash__. > > What's wrong with my implementation of __hash__ raising a TypeError with > the attribution 'unhashable object'. A subtype shouldn't change contracts of its supertypes. hash() was implicitly contracted as "raising no exceptions". -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Wed Mar 29 13:30:59 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 29 Mar 2000 13:30:59 +0200 (IST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <1257794452-30405909@hypernet.com> Message-ID: On Wed, 29 Mar 2000, Gordon McMillan wrote: > And if there's a /usr/local/lib/python1.5/text/encoding, there's > no way that /usr/local/lib/python1.5/site- > packages/text/encoding will get searched. Oh my god! I just realized you're right. Well, back to the drawing board. > I haven't the foggiest what the "Perl way" is; I wouldn't be > surprised if it relied on un-Pythonic sociological factors. No, it relies on non-Pythonic (but not unpythonic -- simply different) technical choices. > I > already said the Java mechanics are silly; uniqueness is what > matters. As in all things namespacish ;-) Though I suspect a registry will be needed much sooner. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From guido at python.org Wed Mar 29 14:26:56 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Mar 2000 07:26:56 -0500 Subject: [Python-Dev] 1.5.2->1.6 Changes In-Reply-To: Your message of "Wed, 29 Mar 2000 02:21:29 EST." <20000329022129.A15539@thyrsus.com> References: <20000329022129.A15539@thyrsus.com> Message-ID: <200003291226.HAA18216@eric.cnri.reston.va.us> > Moshe Zadka : > > -- _tkinter now uses the object, rather then string, interface to Tcl. Eric Raymond: > Hm, does this mean that the annoying requirement to do explicit gets and > sets to move data between the Python world and the Tcl/Tk world is gone? Not sure what you are referring to -- this should be completely transparant to Python/Tkinter users. If you are thinking of the way Tcl variables are created and manipulated in Python, no, this doesn't change, alas (Tcl variables aren't objects -- they are manipulated through get and set commands. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Mar 29 14:32:16 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Mar 2000 07:32:16 -0500 Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: Your message of "Wed, 29 Mar 2000 11:34:21 +0200." <38E1CE1D.7899B1BC@lemburg.com> References: <1257835425-27941123@hypernet.com> <38E1CE1D.7899B1BC@lemburg.com> Message-ID: <200003291232.HAA18234@eric.cnri.reston.va.us> > > Marc-Andre grabbed "mx". If (as I rather suspect ) he > > wants to remake the entire standard lib in his image, he's > > welcome to - *under* mx. > > Right, that's the way I see it too. BTW, where can I register > the "mx" top-level package name ? Should these be registered > in the NIST registry ? Will the names registered there be > honored ? I think the NIST registry is a failed experiment -- too cumbersome to maintain or consult. We can do this the same way as common law handles trade marks: if you have used it as your brand name long enough, even if you didn't register, someone else cannot grab it away from you. > > What would happen if he (and everyone else) installed > > themselves *into* my core packages, then I decided I didn't > > want his stuff? More than likely I'd have to scrub the damn > > installation and start all over again. > > That's a no-no, IMHO. Unless explicitly allowed, packages > should *not* install themselves as subpackages to other > existing top-level packages. If they do, its their problem > if the hierarchy changes... Agreed. Although some people seem to *want* this. Probably because it's okay to do that in Java and (apparently?) in Perl. And C++, probably. It all probably stems back to Lisp. I admit that I didn't see this subtlety when I designed Python's package architecture. It's too late to change (e.g. because of __init__.py). Is it a problem though? Let's be open-minded about this and think about whether we want to allow this or not, and why... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Mar 29 14:35:33 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Mar 2000 07:35:33 -0500 Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: Your message of "Wed, 29 Mar 2000 13:21:09 +0200." References: Message-ID: <200003291235.HAA18249@eric.cnri.reston.va.us> > > What's wrong with my implementation of __hash__ raising a TypeError with > > the attribution 'unhashable object'. > > A subtype shouldn't change contracts of its supertypes. hash() was > implicitly contracted as "raising no exceptions". Let's not confuse subtypes and subclasses. One of the things implicit in the discussion on types-sig is that not every subclass is a subtype! Yes, this violates something we all learned from C++ -- but it's a great insight. No time to explain it more, but for me, Peter's subclassing UserString for MutableString to borrow implementation is fine. --Guido van Rossum (home page: http://www.python.org/~guido/) From pf at artcom-gmbh.de Wed Mar 29 15:49:24 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 29 Mar 2000 15:49:24 +0200 (MEST) Subject: [Python-Dev] NIST Registry (was Great Renaming? What is the goal?) In-Reply-To: <200003291232.HAA18234@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 29, 2000 7:32:16 am" Message-ID: Hi! Guido van Rossum: > I think the NIST registry is a failed experiment -- too cumbersome to > maintain or consult. The WEB frontend of the NIST registry is not that bad --- if you are even aware of the fact, that such a beast exists! I use Python since 1994 and discovered the NIST registry incidental a few weeks ago, when I was really looking for something about the Win32 registry and used the search engine on www.python.org. My first thought was: What a neat clever idea! I think this is an example how the Python community suffers from poor advertising of good ideas. > We can do this the same way as common law > handles trade marks: if you have used it as your brand name long > enough, even if you didn't register, someone else cannot grab it away > from you. Okay. But a more formal registry wouldn't hurt. Something like the global module index from the current docs supplemented with all contribution modules which can be currently found a www.vex.net would be a useful resource. Regards, Peter From moshez at math.huji.ac.il Wed Mar 29 16:15:36 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 29 Mar 2000 16:15:36 +0200 (IST) Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class? In-Reply-To: <200003291235.HAA18249@eric.cnri.reston.va.us> Message-ID: On Wed, 29 Mar 2000, Guido van Rossum wrote: > Let's not confuse subtypes and subclasses. One of the things implicit > in the discussion on types-sig is that not every subclass is a > subtype! Yes, this violates something we all learned from C++ -- but > it's a great insight. No time to explain it more, but for me, Peter's > subclassing UserString for MutableString to borrow implementation is > fine. Oh, I agree with this. An earlier argument which got snipped in the discussion is why it's a bad idea to borrow implementation (a totally different argument) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From fdrake at acm.org Wed Mar 29 18:02:13 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 29 Mar 2000 11:02:13 -0500 (EST) Subject: [Python-Dev] 1.5.2->1.6 Changes In-Reply-To: References: Message-ID: <14562.10501.726637.335088@seahag.cnri.reston.va.us> Moshe Zadka writes: > -- filecmp.py (supersedes the old cmp.py and dircmp.py modules), > -- tabnanny.py (make sure the source file doesn't assume a specific tab-width) Weren't these in 1.5.2? I think filecmp is documented in the released docs... ah, no, I'm safe. ;) > Please e-mail me personally if you think of any other changes, and I'll > try to integrate them into a complete "changes" document. The documentation is updated. ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From skip at mojam.com Wed Mar 29 18:57:51 2000 From: skip at mojam.com (Skip Montanaro) Date: Wed, 29 Mar 2000 10:57:51 -0600 Subject: [Python-Dev] CVS woes... Message-ID: <200003291657.KAA22177@beluga.mojam.com> Does anyone else besides me have trouble getting their Python tree to sync with the CVS repository? I've tried all manner of flags to "cvs update", most recently "cvs update -d -A ." with no success. There are still some files I know Fred Drake has patched that show up as different and it refuses to pick up Lib/robotparser.py. I'm going to blast my current tree and start anew after saving one or two necessary files. Any thoughts you might have would be much appreciated. (Private emails please, unless for some reason you think this should be a python-dev topic. I only post here because I suspect most of the readers use CVS to keep in frequent sync and may have some insight.) Thx, -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From moshez at math.huji.ac.il Wed Mar 29 19:06:59 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 29 Mar 2000 19:06:59 +0200 (IST) Subject: [Python-Dev] 1.5.2->1.6 Changes In-Reply-To: <14562.10501.726637.335088@seahag.cnri.reston.va.us> Message-ID: On Wed, 29 Mar 2000, Fred L. Drake, Jr. wrote: > > Moshe Zadka writes: > > -- filecmp.py (supersedes the old cmp.py and dircmp.py modules), > > -- tabnanny.py (make sure the source file doesn't assume a specific tab-width) > > Weren't these in 1.5.2? I think filecmp is documented in the > released docs... ah, no, I'm safe. ;) Tabnanny wasn't a module, and filecmp wasn't at all. > The documentation is updated. ;) Yes, but it was released as a late part of 1.5.2. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From effbot at telia.com Wed Mar 29 18:38:00 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 29 Mar 2000 18:38:00 +0200 Subject: [Python-Dev] CVS woes... References: <200003291657.KAA22177@beluga.mojam.com> Message-ID: <01b701bf999d$267b6740$34aab5d4@hagrid> Skip wrote: > Does anyone else besides me have trouble getting their Python tree to sync > with the CVS repository? I've tried all manner of flags to "cvs update", > most recently "cvs update -d -A ." with no success. There are still some > files I know Fred Drake has patched that show up as different and it refuses > to pick up Lib/robotparser.py. note that robotparser doesn't show up on cvs.python.org either. maybe cnri's cvs admins should look into this... From fdrake at acm.org Wed Mar 29 20:20:14 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 29 Mar 2000 13:20:14 -0500 (EST) Subject: [Python-Dev] CVS woes... In-Reply-To: <200003291657.KAA22177@beluga.mojam.com> References: <200003291657.KAA22177@beluga.mojam.com> Message-ID: <14562.18782.465814.696099@seahag.cnri.reston.va.us> Skip Montanaro writes: > most recently "cvs update -d -A ." with no success. There are still some > files I know Fred Drake has patched that show up as different and it refuses You should be aware that many of the more recent documentation patches have been in the 1.5.2p2 branch (release-1.5.2p1-patches, I think), rather than the development head. I'm hoping to begin the merge in the next week. I also have a few patches that I haven't had time to look at yet, and I'm not inclined to make any changes until I've merged the 1.5.2p2 docs with the 1.6 tree, mostly to keep the merge from being any more painful than I already expect it to be. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From bwarsaw at cnri.reston.va.us Wed Mar 29 20:22:57 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 29 Mar 2000 13:22:57 -0500 (EST) Subject: [Python-Dev] CVS woes... References: <200003291657.KAA22177@beluga.mojam.com> <01b701bf999d$267b6740$34aab5d4@hagrid> Message-ID: <14562.18945.407398.812930@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> note that robotparser doesn't show up on cvs.python.org FL> either. maybe cnri's cvs admins should look into this... I've just resync'd python/dist and am doing a fresh checkout now. Looks like Lib/robotparser.py is there now. -Barry From guido at python.org Wed Mar 29 20:23:38 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Mar 2000 13:23:38 -0500 Subject: [Python-Dev] CVS woes... In-Reply-To: Your message of "Wed, 29 Mar 2000 10:57:51 CST." <200003291657.KAA22177@beluga.mojam.com> References: <200003291657.KAA22177@beluga.mojam.com> Message-ID: <200003291823.NAA20134@eric.cnri.reston.va.us> > Does anyone else besides me have trouble getting their Python tree to sync > with the CVS repository? I've tried all manner of flags to "cvs update", > most recently "cvs update -d -A ." with no success. There are still some > files I know Fred Drake has patched that show up as different and it refuses > to pick up Lib/robotparser.py. My bad. When I move or copy a file around in the CVS repository directly instead of using cvs commit, I have to manually call a script that updates the mirror. I've done that now, and robotparser.py should now be in the mirror. --Guido van Rossum (home page: http://www.python.org/~guido/) From gward at cnri.reston.va.us Wed Mar 29 21:06:14 2000 From: gward at cnri.reston.va.us (Greg Ward) Date: Wed, 29 Mar 2000 14:06:14 -0500 Subject: [Python-Dev] Distutils now in Python CVS tree Message-ID: <20000329140613.A5850@cnri.reston.va.us> Hi all -- Distutils is now available through the Python CVS tree *in addition to its own CVS tree*. That is, if you keep on top of developments in the Python CVS tree, then you will be tracking the latest Distutils code in Lib/distutils. Or, you can keep following the Distutils through its own CVS tree. (This is all done through one itty-bitty little symlink in the CNRI CVS repository, and It Just Works. Cool.) Note that only the 'distutils' subdirectory of the distutils distribution is tracked by Python: that is, changes to the documentation, test suites, and example setup scripts are *not* reflected in the Python CVS tree. If you follow neither Python nor Distutils CVS updates, this doesn't affect you. If you've been following Distutils CVS updates, you can continue to do so as you've always done (and as is documented on the Distutils "Anonymous CVS" web page). If you've been following Python CVS updates, then you are now following most Distutils CVS updates too -- as long as you do "cvs update -d", of course. If you're interested in following updates in the Distutils documentation, tests, examples, etc. then you should follow the Distutils CVS tree directly. If you've been following *both* Python and Distutils CVS updates, and hacking on the Distutils, then you should pick one or the other as your working directory. If you submit patches, it doesn't really matter if they're relative to the top of the Python tree, the top of the Distutils tree, or what -- I'll probably figure it out. However, it's probably best to continue sending Distutils patches to distutils-sig at python.org, *or* direct to me (gward at python.net) for trivial patches. Unless Guido says otherwise, I don't see a compelling reason to send Distutils patches to patches at python.org. In related news, the distutils-checkins list is probably going to go away, and all Distutils checkin messages will go python-checkins instead. Let me know if you avidly follow distutils-checkins, but do *not* want to follow python-checkins -- if lots of people respond (doubtful, as distutils-checkins only had 3 subscribers last I checked!), we'll reconsider. Greg From fdrake at acm.org Wed Mar 29 21:28:19 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 29 Mar 2000 14:28:19 -0500 (EST) Subject: [Python-Dev] Re: [Distutils] Distutils now in Python CVS tree In-Reply-To: <20000329140525.A5842@cnri.reston.va.us> References: <20000329140525.A5842@cnri.reston.va.us> Message-ID: <14562.22867.998809.897214@seahag.cnri.reston.va.us> Greg Ward writes: > Distutils is now available through the Python CVS tree *in addition to > its own CVS tree*. That is, if you keep on top of developments in the > Python CVS tree, then you will be tracking the latest Distutils code in > Lib/distutils. Or, you can keep following the Distutils through its own > CVS tree. (This is all done through one itty-bitty little symlink in > the CNRI CVS repository, and It Just Works. Cool.) Greg, You may want to point out the legalese requirements for patches to the Python tree. ;( That means the patches should probably go to patches at python.org or you should ensure an archive of all the legal statements is maintained at CNRI. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From ping at lfw.org Wed Mar 29 23:44:31 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 29 Mar 2000 15:44:31 -0600 (CST) Subject: [Python-Dev] Great Renaming? What is the goal? In-Reply-To: <02c901bf989b$be203d80$34aab5d4@hagrid> Message-ID: On Tue, 28 Mar 2000, Fredrik Lundh wrote: > > > IMO this subdivision could be discussed and possibly revised. > > here's one proposal: > http://www.pythonware.com/people/fredrik/librarybook-contents.htm Wow. I don't think i hardly ever use any of the modules in your "Commonly Used Modules" category. Except traceback, from time to time, but that's really the only one! Hmm. I'd arrange things a little differently, though i do like the category for Data Representation (it should probably go next to Data Storage though). I would prefer a separate group for interpreter-and-development-related things. The "File Formats" group seems weak... to me, its contents would better belong in a "parsing" or "text processing" classification. urlparse definitely goes with urllib. These comments are kind of random, i know... maybe i'll try putting together another grouping if i have any time. -- ?!ng From adustman at comstar.net Thu Mar 30 02:57:06 2000 From: adustman at comstar.net (Andy Dustman) Date: Wed, 29 Mar 2000 19:57:06 -0500 (EST) Subject: [Python-Dev] socketmodule with SSL enabled In-Reply-To: <200003290150.UAA17819@eric.cnri.reston.va.us> Message-ID: I had to make the following one-line change to socketmodule.c so that it would link properly with openssl-0.9.4. In studying the openssl include files, I found: #define SSLeay_add_ssl_algorithms() SSL_library_init() SSL_library_init() seems to be the "correct" call nowadays. I don't know why this isn't being picked up. I also don't know how well the module works, other than it imports, but I sure would like to try it with Zope/ZServer/Medusa... -- andy dustman | programmer/analyst | comstar.net, inc. telephone: 770.485.6025 / 706.549.7689 | icq: 32922760 | pgp: 0xc72f3f1d "Therefore, sweet knights, if you may doubt your strength or courage, come no further, for death awaits you all, with nasty, big, pointy teeth!" Index: socketmodule.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Modules/socketmodule.c,v retrieving revision 1.98 diff -c -r1.98 socketmodule.c *** socketmodule.c 2000/03/24 20:56:56 1.98 --- socketmodule.c 2000/03/30 00:49:09 *************** *** 2384,2390 **** return; #ifdef USE_SSL SSL_load_error_strings(); ! SSLeay_add_ssl_algorithms(); SSLErrorObject = PyErr_NewException("socket.sslerror", NULL, NULL); if (SSLErrorObject == NULL) return; --- 2384,2390 ---- return; #ifdef USE_SSL SSL_load_error_strings(); ! SSL_library_init(); SSLErrorObject = PyErr_NewException("socket.sslerror", NULL, NULL); if (SSLErrorObject == NULL) return; From gstein at lyra.org Thu Mar 30 04:54:27 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 29 Mar 2000 18:54:27 -0800 (PST) Subject: [Python-Dev] installation points (was: Great Renaming? What is the goal?) In-Reply-To: <1257794452-30405909@hypernet.com> Message-ID: On Wed, 29 Mar 2000, Gordon McMillan wrote: > Moshe Zadka wrote: > > On Tue, 28 Mar 2000, Gordon McMillan wrote: > > > What would happen if he (and everyone else) installed > > > themselves *into* my core packages, then I decided I didn't > > > want his stuff? More than likely I'd have to scrub the damn > > > installation and start all over again. > > > > I think Greg Stein answered that objection, by reminding us that the > > filesystem isn't the only way to set up a package hierarchy. > > You mean when Greg said: > >Assuming that you use an archive like those found in my "small" distro or > > Gordon's distro, then this is no problem. The archive simply recognizes > > and maps "text.encoding.macbinary" to its own module. > > I don't know what this has to do with it. When we get around > to the 'macbinary' part, we have already established that > 'text.encoding' is the parent which should supply 'macbinary'. good point... > > In > > particular, even with Python's current module system, there is no need to > > scrub installations: Python core modules go (under UNIX) in > > /usr/local/lib/python1.5, and 3rd party modules go in > > /usr/local/lib/python1.5/site-packages. > > And if there's a /usr/local/lib/python1.5/text/encoding, there's > no way that /usr/local/lib/python1.5/site- > packages/text/encoding will get searched. > > I believe you could hack up an importer that did allow this, and > I think you'd be 100% certifiable if you did. Just look at the > surprise factor. > > Hacking stuff into another package is just as evil as math.pi = > 42. Not if the package was designed for it. For a "package" like "net", it would be perfectly acceptable to allow third-parties to define that as their installation point. And yes, assume there is an importer that looks into the installed archives for modules. In the example, the harder part is determining where the "text.encoding" package is loaded from. And yah: it may be difficult to arrange the the text.encoding's importer to allow for archive searching. Cheers, -g -- Greg Stein, http://www.lyra.org/ From thomas.heller at ion-tof.com Thu Mar 30 21:30:25 2000 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Thu, 30 Mar 2000 21:30:25 +0200 Subject: [Python-Dev] Metaclasses, customizing attribute access for classes Message-ID: <021c01bf9a7e$662327c0$4500a8c0@thomasnotebook> Dear Python-developers, Recently I played with metaclasses from within python, also with Jim Fulton's ExtensionClass. I even tried to write my own metaclass in a C-extension, using the famous Don Beaudry hook. It seems that ExtensionClass does not completely what I want. Metaclasses implemented in python are somewhat slow, also writing them is a lot of work. Writing a metaclass in C is even more work... Well, what do I want? Often, I use the following pattern: class X: def __init__ (self): self.delegate = anObjectImplementedInC(...) def __getattr__ (self, key): return self.delegate.dosomething(key) def __setattr__ (self, key, value): self.delegate.doanotherthing(key, value) def __delattr__ (self, key): self.delegate.doevenmore(key) This is too slow (for me). So what I would like do to is: class X: def __init__ (self): self.__dict__ = aMappingObject(...) and now aMappingObject will automatically receive all the setattr, getattr, and delattr calls. The *only* thing which is required for this is to remove the restriction that the __dict__ attribute must be a dictionary. This is only a small change to classobject.c (which unfortunately I have only implemented for 1.5.2, not for the CVS version). The performance impact for this change is unnoticable in pystone. What do you think? Should I prepare a patch? Any chance that this can be included in a future python version? Thomas Heller From petrilli at amber.org Thu Mar 30 21:52:02 2000 From: petrilli at amber.org (Christopher Petrilli) Date: Thu, 30 Mar 2000 14:52:02 -0500 Subject: [Python-Dev] Unicode compile Message-ID: <20000330145202.B9078@trump.amber.org> I don't know how much memory other people have in their machiens, but in this machine (128Mb), I get the following trying to compile a CVS checkout of Python: gcc -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c ./unicodedatabase.c:53482: virtual memory exhausted I hope that this is a temporary thing, or we ship the database some other manner, but I would argue that you should be able to compile Python on a machine with 32Mb of RAM at MOST.... for an idea of how much VM this machine has, i have 256Mb of SWAP on top of it. Chris -- | Christopher Petrilli | petrilli at amber.org From guido at python.org Thu Mar 30 22:12:22 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Mar 2000 15:12:22 -0500 Subject: [Python-Dev] Unicode compile In-Reply-To: Your message of "Thu, 30 Mar 2000 14:52:02 EST." <20000330145202.B9078@trump.amber.org> References: <20000330145202.B9078@trump.amber.org> Message-ID: <200003302012.PAA22062@eric.cnri.reston.va.us> > I don't know how much memory other people have in their machiens, but > in this machine (128Mb), I get the following trying to compile a CVS > checkout of Python: > > gcc -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c > ./unicodedatabase.c:53482: virtual memory exhausted > > I hope that this is a temporary thing, or we ship the database some > other manner, but I would argue that you should be able to compile > Python on a machine with 32Mb of RAM at MOST.... for an idea of how > much VM this machine has, i have 256Mb of SWAP on top of it. I'm not sure how to fix this, short of reading the main database from a file. Marc-Andre? --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer at tismer.com Thu Mar 30 22:14:55 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 30 Mar 2000 22:14:55 +0200 Subject: [Python-Dev] Unicode compile References: <20000330145202.B9078@trump.amber.org> Message-ID: <38E3B5BF.2D00F930@tismer.com> Christopher Petrilli wrote: > > I don't know how much memory other people have in their machiens, but > in this machine (128Mb), I get the following trying to compile a CVS > checkout of Python: > > gcc -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c > ./unicodedatabase.c:53482: virtual memory exhausted > > I hope that this is a temporary thing, or we ship the database some > other manner, but I would argue that you should be able to compile > Python on a machine with 32Mb of RAM at MOST.... for an idea of how > much VM this machine has, i have 256Mb of SWAP on top of it. I had similar effects, what made me work on a compressed database (see older messages). Due to time limits, I will not get ready before 1.6.a1 is out. And then quite a lot of other changes will be necessary by Marc, since the API changes quite much. But it will definately be a less than 20 KB module, proven. ciao - chris(2) -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From akuchlin at mems-exchange.org Thu Mar 30 22:14:27 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Thu, 30 Mar 2000 15:14:27 -0500 (EST) Subject: [Python-Dev] Unicode compile In-Reply-To: <200003302012.PAA22062@eric.cnri.reston.va.us> References: <20000330145202.B9078@trump.amber.org> <200003302012.PAA22062@eric.cnri.reston.va.us> Message-ID: <14563.46499.555853.413690@amarok.cnri.reston.va.us> Guido van Rossum writes: >I'm not sure how to fix this, short of reading the main database from >a file. Marc-Andre? Turning off optimization may help. (Or it may not -- it might be creating the data structures for a large static table that's the problem.) --amk From akuchlin at mems-exchange.org Thu Mar 30 22:22:02 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Thu, 30 Mar 2000 15:22:02 -0500 (EST) Subject: [Python-Dev] mmapfile module In-Reply-To: <200003282000.PAA11988@eric.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> Message-ID: <14563.46954.70800.706245@amarok.cnri.reston.va.us> Guido van Rossum writes: >I don't know enough about this, but it seems that there might be two >steps: *creating* a mmap object is necessarily platform-specific; but >*using* a mmap object could be platform-neutral. > >What is the API for mmap objects? You create them; Unix wants a file descriptor, and Windows wants a filename. Then they behave like buffer objects, like mutable strings. I like Fredrik's suggestion of an 'open(filename, mode, ...)' type of interface. If someone can suggest a way to handle the extra flags such as MAP_SHARED and the Windows tag argument, I'll happily implement it. Maybe just keyword arguments that differ across platforms? open(filename, mode, [tag = 'foo',] [flags = mmapfile.MAP_SHARED]). We could preserve the ability to mmap() only a file descriptor on Unix through a separate openfd() function. I'm also strongly tempted to rename the module from mmapfile to just 'mmap'. I'd suggest waiting until the interface is finalized before adding the module to the CVS tree -- which means after 1.6a1 -- but I can add the module as it stands if you like. Guido, let me know if you want me to do that. -- A.M. Kuchling http://starship.python.net/crew/amk/ A Puck is harder by far to hurt than some little lord of malice from the lands of ice and snow. We Pucks are old and hard and wild... -- Robin Goodfellow, in SANDMAN #66: "The Kindly Ones:10" From guido at python.org Thu Mar 30 22:23:42 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Mar 2000 15:23:42 -0500 Subject: [Python-Dev] socketmodule with SSL enabled In-Reply-To: Your message of "Wed, 29 Mar 2000 19:57:06 EST." References: Message-ID: <200003302023.PAA22350@eric.cnri.reston.va.us> > I had to make the following one-line change to socketmodule.c so that it > would link properly with openssl-0.9.4. In studying the openssl include > files, I found: > > #define SSLeay_add_ssl_algorithms() SSL_library_init() > > SSL_library_init() seems to be the "correct" call nowadays. I don't know > why this isn't being picked up. I also don't know how well the module > works, other than it imports, but I sure would like to try it with > Zope/ZServer/Medusa... Strange -- the version of OpenSSL I have also calls itself 0.9.4 ("OpenSSL 0.9.4 09 Aug 1999" to be precise) and doesn't have SSL_library_init(). I wonder what gives... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Mar 30 22:25:58 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Mar 2000 15:25:58 -0500 Subject: [Python-Dev] mmapfile module In-Reply-To: Your message of "Thu, 30 Mar 2000 15:22:02 EST." <14563.46954.70800.706245@amarok.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> Message-ID: <200003302025.PAA22367@eric.cnri.reston.va.us> > Guido van Rossum writes: > >I don't know enough about this, but it seems that there might be two > >steps: *creating* a mmap object is necessarily platform-specific; but > >*using* a mmap object could be platform-neutral. > > > >What is the API for mmap objects? [AMK] > You create them; Unix wants a file descriptor, and Windows wants a > filename. Then they behave like buffer objects, like mutable strings. > > I like Fredrik's suggestion of an 'open(filename, mode, ...)' type of > interface. If someone can suggest a way to handle the extra flags > such as MAP_SHARED and the Windows tag argument, I'll happily > implement it. Maybe just keyword arguments that differ across > platforms? open(filename, mode, [tag = 'foo',] [flags = > mmapfile.MAP_SHARED]). We could preserve the ability to mmap() only a > file descriptor on Unix through a separate openfd() function. Yes, keyword args seem to be the way to go. To avoid an extra function you could add a fileno=... kwarg, in which case the filename is ignored or required to be "". > I'm > also strongly tempted to rename the module from mmapfile to just > 'mmap'. Sure. > I'd suggest waiting until the interface is finalized before adding the > module to the CVS tree -- which means after 1.6a1 -- but I can add the > module as it stands if you like. Guido, let me know if you want me to > do that. Might as well check it in -- the alpha is going to be rough and I expect another alpha to come out shortly to correct the biggest problems. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Thu Mar 30 22:22:08 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 30 Mar 2000 22:22:08 +0200 Subject: [Python-Dev] Unicode compile References: <20000330145202.B9078@trump.amber.org> <200003302012.PAA22062@eric.cnri.reston.va.us> Message-ID: <38E3B770.6CD61C37@lemburg.com> Guido van Rossum wrote: > > > I don't know how much memory other people have in their machiens, but > > in this machine (128Mb), I get the following trying to compile a CVS > > checkout of Python: > > > > gcc -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c > > ./unicodedatabase.c:53482: virtual memory exhausted > > > > I hope that this is a temporary thing, or we ship the database some > > other manner, but I would argue that you should be able to compile > > Python on a machine with 32Mb of RAM at MOST.... for an idea of how > > much VM this machine has, i have 256Mb of SWAP on top of it. > > I'm not sure how to fix this, short of reading the main database from > a file. Marc-Andre? Hmm, the file compiles fine on my 64MB Linux machine with about 100MB of swap. What gcc version do you use ? Anyway, once Christian is ready with his compact replacement I think we no longer have to worry about that chunk of static data :-) Reading in the data from a file is not a very good solution, because it would override the OS optimizations for static data in object files (like e.g. swapping in only those pages which are really needed, etc.). An alternative solution would be breaking the large table into several smaller ones and accessing it via a redirection function. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From adustman at comstar.net Thu Mar 30 23:12:51 2000 From: adustman at comstar.net (Andy Dustman) Date: Thu, 30 Mar 2000 16:12:51 -0500 (EST) Subject: [Python-Dev] socketmodule with SSL enabled In-Reply-To: <200003302023.PAA22350@eric.cnri.reston.va.us> Message-ID: On Thu, 30 Mar 2000, Guido van Rossum wrote: > > I had to make the following one-line change to socketmodule.c so that it > > would link properly with openssl-0.9.4. In studying the openssl include > > files, I found: > > > > #define SSLeay_add_ssl_algorithms() SSL_library_init() > > > > SSL_library_init() seems to be the "correct" call nowadays. I don't know > > why this isn't being picked up. I also don't know how well the module > > works, other than it imports, but I sure would like to try it with > > Zope/ZServer/Medusa... > > Strange -- the version of OpenSSL I have also calls itself 0.9.4 > ("OpenSSL 0.9.4 09 Aug 1999" to be precise) and doesn't have > SSL_library_init(). > > I wonder what gives... I don't know. Right after I made the patch, I found that 0.9.5 is available, and I was able to successfully compile against that version (with the patch). -- andy dustman | programmer/analyst | comstar.net, inc. telephone: 770.485.6025 / 706.549.7689 | icq: 32922760 | pgp: 0xc72f3f1d "Therefore, sweet knights, if you may doubt your strength or courage, come no further, for death awaits you all, with nasty, big, pointy teeth!" From akuchlin at mems-exchange.org Thu Mar 30 23:19:45 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Thu, 30 Mar 2000 16:19:45 -0500 (EST) Subject: [Python-Dev] mmapfile module In-Reply-To: <200003302025.PAA22367@eric.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> Message-ID: <14563.50417.909045.81868@amarok.cnri.reston.va.us> Guido van Rossum writes: >Might as well check it in -- the alpha is going to be rough and I >expect another alpha to come out shortly to correct the biggest >problems. Done -- just doing my bit to ensure the first alpha is rough! :) My next task is to add the Expat module. My understanding is that it's OK to add Expat itself, too; where should I put all that code? Modules/expat/* ? -- A.M. Kuchling http://starship.python.net/crew/amk/ I'll bring the Kindly Ones down on his blasted head. -- Desire, in SANDMAN #31: "Three Septembers and a January" From fdrake at acm.org Thu Mar 30 23:29:58 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 30 Mar 2000 16:29:58 -0500 (EST) Subject: [Python-Dev] mmapfile module In-Reply-To: <14563.50417.909045.81868@amarok.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> Message-ID: <14563.51030.24773.587972@seahag.cnri.reston.va.us> Andrew M. Kuchling writes: > Done -- just doing my bit to ensure the first alpha is rough! :) > > My next task is to add the Expat module. My understanding is that > it's OK to add Expat itself, too; where should I put all that code? > Modules/expat/* ? Do you have documentation for this? -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From akuchlin at mems-exchange.org Thu Mar 30 23:30:35 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Thu, 30 Mar 2000 16:30:35 -0500 (EST) Subject: [Python-Dev] mmapfile module In-Reply-To: <14563.51030.24773.587972@seahag.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <14563.51030.24773.587972@seahag.cnri.reston.va.us> Message-ID: <14563.51067.560938.367690@amarok.cnri.reston.va.us> Fred L. Drake, Jr. writes: > Do you have documentation for this? Somewhere at home, I think, but not here at work. I'll try to get it checked in before 1.6alpha1, but don't hold me to that. --amk From guido at python.org Thu Mar 30 23:31:58 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Mar 2000 16:31:58 -0500 Subject: [Python-Dev] mmapfile module In-Reply-To: Your message of "Thu, 30 Mar 2000 16:19:45 EST." <14563.50417.909045.81868@amarok.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> Message-ID: <200003302131.QAA22897@eric.cnri.reston.va.us> > Done -- just doing my bit to ensure the first alpha is rough! :) When the going gets rough, the rough get going :-) > My next task is to add the Expat module. My understanding is that > it's OK to add Expat itself, too; where should I put all that code? > Modules/expat/* ? Whoa... Not sure. This will give issues with Patrice, at least (even if it is pure Open Source -- given the size). I'd prefer to add instructions to Setup.in about where to get it. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Thu Mar 30 23:34:55 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 30 Mar 2000 16:34:55 -0500 (EST) Subject: [Python-Dev] mmapfile module In-Reply-To: <14563.51067.560938.367690@amarok.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <14563.51030.24773.587972@seahag.cnri.reston.va.us> <14563.51067.560938.367690@amarok.cnri.reston.va.us> Message-ID: <14563.51327.190466.477566@seahag.cnri.reston.va.us> Andrew M. Kuchling writes: > Somewhere at home, I think, but not here at work. I'll try to get it > checked in before 1.6alpha1, but don't hold me to that. The date isn't important; I'm not planning to match alpha/beta releases with Doc releases. I just want to be sure it gets in soon so that the debugging process can kick in for that as well. ;) Thanks! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido at python.org Thu Mar 30 23:34:02 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Mar 2000 16:34:02 -0500 Subject: [Python-Dev] mmapfile module In-Reply-To: Your message of "Thu, 30 Mar 2000 16:31:58 EST." <200003302131.QAA22897@eric.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> Message-ID: <200003302134.QAA22939@eric.cnri.reston.va.us> > Whoa... Not sure. This will give issues with Patrice, at least (even > if it is pure Open Source -- given the size). For those outside CNRI -- Patrice is CNRI's tough IP lawyer. --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Thu Mar 30 23:48:13 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Thu, 30 Mar 2000 16:48:13 -0500 (EST) Subject: [Python-Dev] Expat module In-Reply-To: <200003302131.QAA22897@eric.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> Message-ID: <14563.52125.401817.986919@amarok.cnri.reston.va.us> Guido van Rossum writes: >> My next task is to add the Expat module. My understanding is that >> it's OK to add Expat itself, too; where should I put all that code? >> Modules/expat/* ? > >Whoa... Not sure. This will give issues with Patrice, at least (even >if it is pure Open Source -- given the size). I'd prefer to add >instructions to Setup.in about where to get it. Fair enough; I'll just add the module itself, then, and we can always change it later. Should we consider replacing the makesetup/Setup.in mechanism with a setup.py script that uses the Distutils? You'd have to compile a minipython with just enough critical modules -- strop and posixmodule are probably the most important ones -- in order to run setup.py. It's something I'd like to look at for 1.6, because then you could be much smarter in automatically enabling modules. -- A.M. Kuchling http://starship.python.net/crew/amk/ This is the way of Haskell or Design by Contract of Eiffel. This one is like wearing a XV century armor, you walk very safely but in a very tiring way. -- Manuel Gutierrez Algaba, 26 Jan 2000 From guido at python.org Fri Mar 31 00:41:45 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Mar 2000 17:41:45 -0500 Subject: [Python-Dev] Expat module In-Reply-To: Your message of "Thu, 30 Mar 2000 16:48:13 EST." <14563.52125.401817.986919@amarok.cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> <14563.52125.401817.986919@amarok.cnri.reston.va.us> Message-ID: <200003302241.RAA23050@eric.cnri.reston.va.us> > Fair enough; I'll just add the module itself, then, and we can always > change it later. OK. > Should we consider replacing the makesetup/Setup.in mechanism with a > setup.py script that uses the Distutils? You'd have to compile a > minipython with just enough critical modules -- strop and posixmodule > are probably the most important ones -- in order to run setup.py. > It's something I'd like to look at for 1.6, because then you could be > much smarter in automatically enabling modules. If you can come up with something that works well enough, that would be great. (Although I'm not sure where the distutils come in.) We still need to use configure/autoconf though. Hardcoding a small complement of modules is no problem. (Why do you think you need strop though? Remember we have string methods!) --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond at skippinet.com.au Fri Mar 31 01:03:39 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri, 31 Mar 2000 09:03:39 +1000 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/PC python_nt.rc,1.8,1.9 In-Reply-To: <200003302259.RAA23266@eric.cnri.reston.va.us> Message-ID: This is the version number as displayed by Windows Explorer in the "properties" dialog. Mark. > Modified Files: > python_nt.rc > Log Message: > Seems there was a version string here that still looked > like 1.5.2. > > > Index: python_nt.rc > ========================================================== > ========= > RCS file: /projects/cvsroot/python/dist/src/PC/python_nt.rc,v > retrieving revision 1.8 > retrieving revision 1.9 > diff -C2 -r1.8 -r1.9 > *** python_nt.rc 2000/03/29 01:50:50 1.8 > --- python_nt.rc 2000/03/30 22:59:09 1.9 > *************** > *** 29,34 **** > > VS_VERSION_INFO VERSIONINFO > ! FILEVERSION 1,5,2,3 > ! PRODUCTVERSION 1,5,2,3 > FILEFLAGSMASK 0x3fL > #ifdef _DEBUG > --- 29,34 ---- > > VS_VERSION_INFO VERSIONINFO > ! FILEVERSION 1,6,0,0 > ! PRODUCTVERSION 1,6,0,0 > FILEFLAGSMASK 0x3fL > #ifdef _DEBUG > > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://www.python.org/mailman/listinfo/python-checkins > From effbot at telia.com Fri Mar 31 00:40:51 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 31 Mar 2000 00:40:51 +0200 Subject: [Python-Dev] SRE: what to do with undocumented attributes? Message-ID: <00b701bf9a99$022339c0$34aab5d4@hagrid> at this time, SRE uses types instead of classes for compiled patterns and matches. these classes provide a documented interface, and a bunch of internal attributes, for example: RegexObjects: code -- a PCRE code object pattern -- the source pattern groupindex -- maps group names to group indices MatchObjects: regs -- same as match.span()? groupindex -- as above re -- the pattern object used for this match string -- the target string used for this match the problem is that some other modules use these attributes directly. for example, xmllib.py uses the pattern attribute, and other code I've seen uses regs to speed things up. in SRE, I would like to get rid of all these (except possibly for the match.string attribute). opinions? From guido at python.org Fri Mar 31 01:31:43 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Mar 2000 18:31:43 -0500 Subject: [Python-Dev] SRE: what to do with undocumented attributes? In-Reply-To: Your message of "Fri, 31 Mar 2000 00:40:51 +0200." <00b701bf9a99$022339c0$34aab5d4@hagrid> References: <00b701bf9a99$022339c0$34aab5d4@hagrid> Message-ID: <200003302331.SAA24895@eric.cnri.reston.va.us> > at this time, SRE uses types instead of classes for compiled > patterns and matches. these classes provide a documented > interface, and a bunch of internal attributes, for example: > > RegexObjects: > > code -- a PCRE code object > pattern -- the source pattern > groupindex -- maps group names to group indices > > MatchObjects: > > regs -- same as match.span()? > groupindex -- as above > re -- the pattern object used for this match > string -- the target string used for this match > > the problem is that some other modules use these attributes > directly. for example, xmllib.py uses the pattern attribute, and > other code I've seen uses regs to speed things up. > > in SRE, I would like to get rid of all these (except possibly for > the match.string attribute). > > opinions? Sounds reasonable. All std lib modules that violate this will need to be fixed once sre.py replaces re.py. (Checkin of sre is next.) --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Fri Mar 31 01:40:16 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Thu, 30 Mar 2000 18:40:16 -0500 (EST) Subject: [Python-Dev] SRE: what to do with undocumented attributes? In-Reply-To: <00b701bf9a99$022339c0$34aab5d4@hagrid> References: <00b701bf9a99$022339c0$34aab5d4@hagrid> Message-ID: <14563.58848.109072.339060@amarok.cnri.reston.va.us> Fredrik Lundh writes: >RegexObjects: > code -- a PCRE code object > pattern -- the source pattern > groupindex -- maps group names to group indices pattern and groupindex are documented in the Library Reference, and they're part of the public interface. .code is not, so you can drop it. >MatchObjects: > regs -- same as match.span()? > groupindex -- as above > re -- the pattern object used for this match > string -- the target string used for this match .re and .string are documented. I don't see a reference to MatchObject.groupindex anywhere, and .regs isn't documented, so those two can be ignored; xmllib or whatever external modules use them are being very naughty, so go ahead and break them. -- A.M. Kuchling http://starship.python.net/crew/amk/ Imagine a thousand thousand fireflies of every shape and color; Oh, that was Baghdad at night in those days. -- From SANDMAN #50: "Ramadan" From effbot at telia.com Fri Mar 31 01:05:15 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 31 Mar 2000 01:05:15 +0200 Subject: [Python-Dev] SRE: what to do with undocumented attributes? References: <00b701bf9a99$022339c0$34aab5d4@hagrid> <14563.58848.109072.339060@amarok.cnri.reston.va.us> Message-ID: <00e901bf9a9c$6c036240$34aab5d4@hagrid> Andrew wrote: > >RegexObjects: > > code -- a PCRE code object > > pattern -- the source pattern > > groupindex -- maps group names to group indices > > pattern and groupindex are documented in the Library Reference, and > they're part of the public interface. hmm. I could have sworn... guess I didn't look carefully enough (or someone's used his time machine again :-). oh well, more bloat... btw, "pattern" doesn't make much sense in SRE -- who says the pattern object was created by re.compile? guess I'll just set it to None in other cases (e.g. sregex, sreverb, sgema...) From bwarsaw at cnri.reston.va.us Fri Mar 31 02:35:16 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 30 Mar 2000 19:35:16 -0500 (EST) Subject: [Python-Dev] SRE: what to do with undocumented attributes? References: <00b701bf9a99$022339c0$34aab5d4@hagrid> <14563.58848.109072.339060@amarok.cnri.reston.va.us> <00e901bf9a9c$6c036240$34aab5d4@hagrid> Message-ID: <14563.62148.860971.360871@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> hmm. I could have sworn... guess I didn't look carefully FL> enough (or someone's used his time machine again :-). Yep, sorry. If it's documented as in the public interface, it should be kept. Anything else can go (he says without yet grep'ing through his various code bases). -Barry From bwarsaw at cnri.reston.va.us Fri Mar 31 06:34:15 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 30 Mar 2000 23:34:15 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 References: <200003310117.UAA26774@eric.cnri.reston.va.us> Message-ID: <14564.10951.90258.729547@anthem.cnri.reston.va.us> >>>>> "Guido" == Guido van Rossum writes: Guido> Modified Files: mmapmodule.c Log Message: Hacked for Win32 Guido> by Mark Hammond. Reformatted for 8-space tabs and fitted Guido> into 80-char lines by GvR. Can we change the 8-space-tab rule for all new C code that goes in? I know that we can't practically change existing code right now, but for new C code, I propose we use no tab characters, and we use a 4-space block indentation. -Barry From DavidA at ActiveState.com Fri Mar 31 07:07:02 2000 From: DavidA at ActiveState.com (David Ascher) Date: Thu, 30 Mar 2000 21:07:02 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 In-Reply-To: <14564.10951.90258.729547@anthem.cnri.reston.va.us> Message-ID: > Can we change the 8-space-tab rule for all new C code that goes in? I > know that we can't practically change existing code right now, but for > new C code, I propose we use no tab characters, and we use a 4-space > block indentation. Heretic! +1, FWIW =) From bwarsaw at cnri.reston.va.us Fri Mar 31 07:16:48 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Fri, 31 Mar 2000 00:16:48 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 References: <14564.10951.90258.729547@anthem.cnri.reston.va.us> Message-ID: <14564.13504.310866.835201@anthem.cnri.reston.va.us> >>>>> "DA" == David Ascher writes: DA> Heretic! DA> +1, FWIW =) I hereby offer to so untabify and reformat any C code in the standard distribution that Guido will approve of. -Barry From mhammond at skippinet.com.au Fri Mar 31 07:16:26 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri, 31 Mar 2000 15:16:26 +1000 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 In-Reply-To: Message-ID: +1 for me too. It also brings all source files under the same guidelines (rather than seperate ones for .py and .c) Mark. From bwarsaw at cnri.reston.va.us Fri Mar 31 07:40:16 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Fri, 31 Mar 2000 00:40:16 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 References: Message-ID: <14564.14912.629414.970309@anthem.cnri.reston.va.us> >>>>> "MH" == Mark Hammond writes: MH> +1 for me too. It also brings all source files under the same MH> guidelines (rather than seperate ones for .py and .c) BTW, I further propose that if Guido lets me reformat the C code, that we freeze other checkins for the duration and I temporarily turn off the python-checkins email. That is, unless you guys /want/ to be bombarded with boatloads of useless diffs. :) -Barry From pf at artcom-gmbh.de Fri Mar 31 08:45:45 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Fri, 31 Mar 2000 08:45:45 +0200 (MEST) Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....) In-Reply-To: <14564.14912.629414.970309@anthem.cnri.reston.va.us> from "bwarsaw@cnri.reston.va.us" at "Mar 31, 2000 0:40:16 am" Message-ID: Hi! sigh :-( > >>>>> "MH" == Mark Hammond writes: > > MH> +1 for me too. It also brings all source files under the same > MH> guidelines (rather than seperate ones for .py and .c) bwarsaw at cnri.reston.va.us: > BTW, I further propose that if Guido lets me reformat the C code, that > we freeze other checkins for the duration and I temporarily turn off > the python-checkins email. That is, unless you guys /want/ to be > bombarded with boatloads of useless diffs. :) -1 for C reformatting. The 4 space intendation seesm reasonable for Python sources, but I disaggree for C code. C is not Python. Let me cite a very prominent member of the open source community (pasted from /usr/src/linux/Documentation/CodingStyle): Chapter 1: Indentation Tabs are 8 characters, and thus indentations are also 8 characters. There are heretic movements that try to make indentations 4 (or even 2!) characters deep, and that is akin to trying to define the value of PI to be 3. Rationale: The whole idea behind indentation is to clearly define where a block of control starts and ends. Especially when you've been looking at your screen for 20 straight hours, you'll find it a lot easier to see how the indentation works if you have large indentations. Now, some people will claim that having 8-character indentations makes the code move too far to the right, and makes it hard to read on a 80-character terminal screen. The answer to that is that if you need more than 3 levels of indentation, you're screwed anyway, and should fix your program. In short, 8-char indents make things easier to read, and have the added benefit of warning you when you're nesting your functions too deep. Heed that warning. Also the Python interpreter has no strong relationship with Linux kernel a agree with Linus on this topic. Python source code is another thing: Python identifiers are usually longer due to qualifiying and Python operands are often lists, tuples or the like, so lines contain more stuff. disliking-yet-another-white-space-discussion-ly y'rs - peter From mhammond at skippinet.com.au Fri Mar 31 09:11:50 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri, 31 Mar 2000 17:11:50 +1000 Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....) In-Reply-To: Message-ID: > Rationale: The whole idea behind indentation is to > clearly define where > a block of control starts and ends. Especially when Ironically, this statement is a strong argument for insisting on Python using real tab characters! "Clearly define" is upgraded to "used to define". > 80-character terminal screen. The answer to that is > that if you need > more than 3 levels of indentation, you're screwed > anyway, and should fix > your program. Yeah, right! int foo() { // one level for the privilege of being here. switch (bar) { // uh oh - running out of room... case WTF: // Oh no - if I use an "if" statement, // my code is "screwed"?? } } > disliking-yet-another-white-space-discussion-ly y'rs - peter Like-death-and-taxes-ly y'rs - Mark. From moshez at math.huji.ac.il Fri Mar 31 10:04:32 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Fri, 31 Mar 2000 10:04:32 +0200 (IST) Subject: [Python-Dev] mmapfile module In-Reply-To: <200003302134.QAA22939@eric.cnri.reston.va.us> Message-ID: On Thu, 30 Mar 2000, Guido van Rossum wrote: > > Whoa... Not sure. This will give issues with Patrice, at least (even > > if it is pure Open Source -- given the size). > > For those outside CNRI -- Patrice is CNRI's tough IP lawyer. It was understandable from the context... Personally, I'd rather if it was folded in by value, and not by reference: one reason is versioning problems, and another is pure laziness on my part. what-do-you-have-when-you-got-a-lawyer-up-to-his-neck-in-the-sand-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal at lemburg.com Fri Mar 31 09:42:04 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 31 Mar 2000 09:42:04 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us> Message-ID: <38E456CC.1A49334A@lemburg.com> "Barry A. Warsaw" wrote: > > >>>>> "Guido" == Guido van Rossum writes: > > Guido> Modified Files: mmapmodule.c Log Message: Hacked for Win32 > Guido> by Mark Hammond. Reformatted for 8-space tabs and fitted > Guido> into 80-char lines by GvR. > > Can we change the 8-space-tab rule for all new C code that goes in? I > know that we can't practically change existing code right now, but for > new C code, I propose we use no tab characters, and we use a 4-space > block indentation. Why not just leave new code formatted as it is (except maybe to bring the used TAB width to the standard 8 spaces used throughout the Python C source code) ? BTW, most of the new unicode stuff uses 4-space indents. Unfortunately, it mixes whitespace and tabs since Emacs c-mode doesn't do the python-mode magic yet (is there a way to turn it on ?). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From effbot at telia.com Fri Mar 31 11:14:49 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 31 Mar 2000 11:14:49 +0200 Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....) References: Message-ID: <01ae01bf9af1$927b1940$34aab5d4@hagrid> Peter Funk wrote: > Also the Python interpreter has no strong relationship with Linux kernel > a agree with Linus on this topic. Python source code is another thing: > Python identifiers are usually longer due to qualifiying and Python > operands are often lists, tuples or the like, so lines contain more stuff. you're just guessing, right? (if you check, you'll find that the actual difference is very small. iirc, that's true for c, c++, java, python, tcl, and probably a few more languages. dunno about perl, though... :-) From effbot at telia.com Fri Mar 31 11:17:42 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 31 Mar 2000 11:17:42 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us> <38E456CC.1A49334A@lemburg.com> Message-ID: <01b501bf9af1$f9b44500$34aab5d4@hagrid> M.-A. Lemburg wrote: > Why not just leave new code formatted as it is (except maybe > to bring the used TAB width to the standard 8 spaces used throughout > the Python C source code) ? > > BTW, most of the new unicode stuff uses 4-space indents. > Unfortunately, it mixes whitespace and tabs since Emacs > c-mode doesn't do the python-mode magic yet (is there a > way to turn it on ?). http://www.jwz.org/doc/tabs-vs-spaces.html contains some hints. From moshez at math.huji.ac.il Fri Mar 31 13:24:05 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Fri, 31 Mar 2000 13:24:05 +0200 (IST) Subject: [Python-Dev] 1.5.2 -> 1.6 Changes Message-ID: Here is a new list of things that will change in the next release. Thanks to all the people who gave me hints and information! If you have anything you think I missed, or mistreated, please e-mail me personally -- I'll post an updated version soon. Obligatory ========== A lot of bug-fixes, some optimizations, many improvements in the documentation Core changes ============ Deleting objects is safe even for deeply nested data structures. Long/int unifications: long integers can be used in seek() calls, as slice indexes. str(1L) --> '1', not '1L' (repr() is still the same) Builds on NT Alpha UnboundLocalError is raised when a local variable is undefined long, int take optional "base" parameter string objects now have methods (though they are still immutable) unicode support: Unicode strings are marked with u"string", and there is support for arbitrary encoders/decoders "in" operator can now be overriden in user-defined classes to mean anything: it calls the magic method __contains__ New calling syntax: f(*args, **kw) equivalent to apply(f, args, kw) Some methods which would take multiple arguments and treat them as a tuple were fixed: list.{append, insert, remove, count}, socket.connect New modules =========== winreg - Windows registry interface. Distutils - tools for distributing Python modules robotparser - parse a robots.txt file (for writing web spiders) linuxaudio - audio for Linux mmap - treat a file as a memory buffer sre - regular expressions (fast, supports unicode) filecmp - supersedes the old cmp.py and dircmp.py modules tabnanny - check Python sources for tab-width dependance unicode - support for unicode codecs - support for Unicode encoders/decoders Module changes ============== re - changed to be a frontend to sre readline, ConfigParser, cgi, calendar, posix, readline, xmllib, aifc, chunk, wave, random, shelve, nntplib - minor enhancements socket, httplib, urllib - optional OpenSSL support _tkinter - support for 8.1,8.2,8.3 (no support for versions older then 8.0) Tool changes ============ IDLE -- complete overhaul (Andrew, I'm still waiting for the expat support and integration to add to this list -- other than that, please contact me if you want something less telegraphic ) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping at lfw.org Fri Mar 31 14:01:21 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 31 Mar 2000 04:01:21 -0800 (PST) Subject: [Python-Dev] Roundup et al. Message-ID: Hi -- there was some talk on this list earlier about nosy lists, managing patches, and such things, so i just wanted to mention, for anybody interested, that i threw together Roundup very quickly for you to try out. http://www.lfw.org/python/ There's a tar file there -- it's very messy code, and i apologize (it was hastily hacked out of the running prototype implementation), but it should be workable enough to play with. There's a test installation to play with at http://www.lfw.org/ping/roundup/roundup.cgi Dummy user:password pairs are test:test, spam:spam, eggs:eggs. A fancier design, still in the last stages of coming together (which will be my submission to the Software Carpentry contest) is up at http://crit.org/http://www.lfw.org/ping/sctrack.html and i welcome your thoughts and comments on that if you have the spare time (ha!) and generous inclination to contribute them. Thank you and apologies for the interruption. -- ?!ng "To be human is to continually change. Your desire to remain as you are is what ultimately limits you." -- The Puppet Master, Ghost in the Shell From guido at python.org Fri Mar 31 14:10:45 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 31 Mar 2000 07:10:45 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 In-Reply-To: Your message of "Thu, 30 Mar 2000 23:34:15 EST." <14564.10951.90258.729547@anthem.cnri.reston.va.us> References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us> Message-ID: <200003311210.HAA29010@eric.cnri.reston.va.us> > Can we change the 8-space-tab rule for all new C code that goes in? I > know that we can't practically change existing code right now, but for > new C code, I propose we use no tab characters, and we use a 4-space > block indentation. Actually, this one was formatted for 8-space indents but using 4-space tabs, so in my editor it looked like 16-space indents! Given that we don't want to change existing code, I'd prefer to stick with 1-tab 8-space indents. --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez at math.huji.ac.il Fri Mar 31 15:10:06 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Fri, 31 Mar 2000 15:10:06 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc ACKS,1.51,1.52 In-Reply-To: <200003311301.IAA29221@eric.cnri.reston.va.us> Message-ID: On Fri, 31 Mar 2000, Guido van Rossum wrote: > + Christian Tismer > + Christian Tismer Ummmmm....I smell something fishy here. Are there two Christian Tismers? That would explain how Christian has so much time to work on Stackless. Well, between the both of them, Guido will have no chance but to put Stackless in the standard distribution. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From fredrik at pythonware.com Fri Mar 31 15:16:16 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 31 Mar 2000 15:16:16 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc ACKS,1.51,1.52 References: <200003311301.IAA29221@eric.cnri.reston.va.us> Message-ID: <000d01bf9b13$4be1db00$0500a8c0@secret.pythonware.com> > Tracy Tims > + Christian Tismer > + Christian Tismer > R Lindsay Todd two christians? From bwarsaw at cnri.reston.va.us Fri Mar 31 15:55:13 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Fri, 31 Mar 2000 08:55:13 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us> <38E456CC.1A49334A@lemburg.com> Message-ID: <14564.44609.221250.471147@anthem.cnri.reston.va.us> >>>>> "M" == M writes: M> BTW, most of the new unicode stuff uses 4-space indents. M> Unfortunately, it mixes whitespace and tabs since Emacs M> c-mode doesn't do the python-mode magic yet (is there a M> way to turn it on ?). (setq indent-tabs-mode nil) I could add that to the "python" style. And to zap all your existing tab characters: C-M-h M-x untabify RET -Barry From skip at mojam.com Fri Mar 31 16:04:46 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 31 Mar 2000 08:04:46 -0600 (CST) Subject: [Python-Dev] 1.5.2 -> 1.6 Changes In-Reply-To: References: Message-ID: <14564.45182.460160.589244@beluga.mojam.com> Moshe, I would highlight those bits that are likely to warrant a little closer scrutiny. The list.{append,insert,...} and socket.connect change certainly qualify. Perhaps split the Core Changes section into two subsections, one set of changes likely to require some adaptation and one set that should be backwards-compatible. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From guido at python.org Fri Mar 31 16:47:31 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 31 Mar 2000 09:47:31 -0500 Subject: [Python-Dev] 1.5.2 -> 1.6 Changes In-Reply-To: Your message of "Fri, 31 Mar 2000 08:04:46 CST." <14564.45182.460160.589244@beluga.mojam.com> References: <14564.45182.460160.589244@beluga.mojam.com> Message-ID: <200003311447.JAA29633@eric.cnri.reston.va.us> See what I've done to Moshe's list: http://www.python.org/1.6/ --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Fri Mar 31 17:28:56 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 31 Mar 2000 09:28:56 -0600 (CST) Subject: [Python-Dev] 1.5.2 -> 1.6 Changes In-Reply-To: <200003311447.JAA29633@eric.cnri.reston.va.us> References: <14564.45182.460160.589244@beluga.mojam.com> <200003311447.JAA29633@eric.cnri.reston.va.us> Message-ID: <14564.50232.734778.152933@beluga.mojam.com> Guido> See what I've done to Moshe's list: http://www.python.org/1.6/ Looks good. Attached are a couple nitpicky diffs. Skip -------------- next part -------------- A non-text attachment was scrubbed... Name: 1.6.diff Type: application/octet-stream Size: 1263 bytes Desc: diffs to 1.6 Release Notes URL: From guido at python.org Fri Mar 31 17:47:56 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 31 Mar 2000 10:47:56 -0500 Subject: [Python-Dev] Windows installer pre-prelease Message-ID: <200003311547.KAA15538@eric.cnri.reston.va.us> The Windows installer is always hard to get just right. If you have a moment, go to http://www.python.org/1.6/ and download the Windows Installer prerelease. Let me know what works, what doesn't! I've successfully installed it on Windows NT 4.0 and on Windows 98, both with default install target and with a modified install target. I'd love to hear that it also installs cleanly on Windows 95. Please test IDLE from the start menu! --Guido van Rossum (home page: http://www.python.org/~guido/) From gward at cnri.reston.va.us Fri Mar 31 18:18:43 2000 From: gward at cnri.reston.va.us (Greg Ward) Date: Fri, 31 Mar 2000 11:18:43 -0500 Subject: [Python-Dev] Distutils for the std. library (was: Expat module) In-Reply-To: <14563.52125.401817.986919@amarok.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Thu, Mar 30, 2000 at 04:48:13PM -0500 References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> <14563.52125.401817.986919@amarok.cnri.reston.va.us> Message-ID: <20000331111842.A8060@cnri.reston.va.us> On 30 March 2000, Andrew M. Kuchling said: > Should we consider replacing the makesetup/Setup.in mechanism with a > setup.py script that uses the Distutils? You'd have to compile a > minipython with just enough critical modules -- strop and posixmodule > are probably the most important ones -- in order to run setup.py. > It's something I'd like to look at for 1.6, because then you could be > much smarter in automatically enabling modules. Gee, I didn't think anyone was gonna open *that* can of worms for 1.6. Obviously, I'd love to see the Distutils used to build parts of the Python library. Some possible problems: * Distutils relies heavily on the sys, os, string, and re modules, so those would have to be built and included in the mythical mini-python (as would everything they rely on -- strop, pcre, ... ?) * Distutils currently assumes that it's working with an installed Python -- it doesn't know anything about working in the Python source tree. I think this could be fixed just be tweaking the distutils.sysconfig module, but there might be subtle assumptions elsewhere in the code. * I haven't written the mythical Autoconf-in-Python yet, so we'd still have to rely on either the configure script or user intervention to find out whether library X is installed, and where its header and library files live (for X in zlib, tcl, tk, ...). Of course, the configure script would still be needed to build the mini-python, so it's not going away any time soon. Greg From skip at mojam.com Fri Mar 31 18:26:55 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 31 Mar 2000 10:26:55 -0600 (CST) Subject: [Python-Dev] Distutils for the std. library (was: Expat module) In-Reply-To: <20000331111842.A8060@cnri.reston.va.us> References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> <14563.52125.401817.986919@amarok.cnri.reston.va.us> <20000331111842.A8060@cnri.reston.va.us> Message-ID: <14564.53711.803509.962248@beluga.mojam.com> Greg> * Distutils relies heavily on the sys, os, string, and re Greg> modules, so those would have to be built and included in the Greg> mythical mini-python (as would everything they rely on -- Greg> strop, pcre, ... ?) With string methods in 1.6, reliance on the string and strop modules should be lessened or eliminated, right? re and os may need a tweak or two to use string methods themselves. The sys module is always available. Perhaps it would make sense to put sre(module)?.c into the Python directory where sysmodule.c lives. That way, a Distutils-capable mini-python could be built without messing around in the Modules directory at all... -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From moshez at math.huji.ac.il Fri Mar 31 18:25:11 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Fri, 31 Mar 2000 18:25:11 +0200 (IST) Subject: [Python-Dev] Distutils for the std. library (was: Expat module) In-Reply-To: <20000331111842.A8060@cnri.reston.va.us> Message-ID: On Fri, 31 Mar 2000, Greg Ward wrote: > Gee, I didn't think anyone was gonna open *that* can of worms for 1.6. Well, it's not like it's not a lot of work, but it could be done, with liberal interpretation of "mini": include in "mini" Python *all* modules which do not rely on libraries not distributed with the Python core -- zlib, expat and Tkinter go right out the window, but most everything else can stay. That way, Distutils can use all modules it currently uses . The other problem, file-location, is a problem I have talked about earlier: it *cannot* be assumed that the default place for putting new libraries is the same place the Python interpreter resides, for many reasons. Why not ask the user explicitly? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From gward at cnri.reston.va.us Fri Mar 31 18:29:33 2000 From: gward at cnri.reston.va.us (Greg Ward) Date: Fri, 31 Mar 2000 11:29:33 -0500 Subject: [Python-Dev] Distutils for the std. library (was: Expat module) In-Reply-To: <14564.53711.803509.962248@beluga.mojam.com>; from skip@mojam.com on Fri, Mar 31, 2000 at 10:26:55AM -0600 References: <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> <14563.52125.401817.986919@amarok.cnri.reston.va.us> <20000331111842.A8060@cnri.reston.va.us> <14564.53711.803509.962248@beluga.mojam.com> Message-ID: <20000331112933.B8060@cnri.reston.va.us> On 31 March 2000, Skip Montanaro said: > With string methods in 1.6, reliance on the string and strop modules should > be lessened or eliminated, right? re and os may need a tweak or two to use > string methods themselves. The sys module is always available. Perhaps it > would make sense to put sre(module)?.c into the Python directory where > sysmodule.c lives. That way, a Distutils-capable mini-python could be built > without messing around in the Modules directory at all... But I'm striving to maintain compatability with (at least) Python 1.5.2 in Distutils. That need will fade with time, but it's not going to disappear the moment Python 1.6 is released. (Guess I'll have to find somewhere else to play with string methods and extended call syntax). Greg From thomas.heller at ion-tof.com Fri Mar 31 19:09:41 2000 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Fri, 31 Mar 2000 19:09:41 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: distutils/distutils msvccompiler.py References: <200003311653.LAA08175@thrak.cnri.reston.va.us> Message-ID: <038701bf9b33$e7c49240$4500a8c0@thomasnotebook> > Simplified Thomas Heller's registry patch: just assign all those > HKEY_* and Reg* names once, rather than having near-duplicate code > in the two import attempts. Your change won't work, the function names in win32api and winreg are not the same: Example: win32api.RegEnumValue <-> winreg.EnumValue > > Also dropped the leading underscore on all the imported symbols, > as it's not appropriate (they're not local to this module). Are they used anywhere else? Or do you think they *could* be used somewhere else? Thomas Heller From mal at lemburg.com Fri Mar 31 12:19:58 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 31 Mar 2000 12:19:58 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2 References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us> <38E456CC.1A49334A@lemburg.com> <01b501bf9af1$f9b44500$34aab5d4@hagrid> Message-ID: <38E47BCE.94E4E012@lemburg.com> Fredrik Lundh wrote: > > M.-A. Lemburg wrote: > > Why not just leave new code formatted as it is (except maybe > > to bring the used TAB width to the standard 8 spaces used throughout > > the Python C source code) ? > > > > BTW, most of the new unicode stuff uses 4-space indents. > > Unfortunately, it mixes whitespace and tabs since Emacs > > c-mode doesn't do the python-mode magic yet (is there a > > way to turn it on ?). > > http://www.jwz.org/doc/tabs-vs-spaces.html > contains some hints. Ah, cool. Thanks :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From pf at artcom-gmbh.de Fri Mar 31 20:56:40 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Fri, 31 Mar 2000 20:56:40 +0200 (MEST) Subject: [Python-Dev] 'make install' should create lib/site-packages IMO In-Reply-To: <200003311513.KAA00790@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 31, 2000 10:13:20 am" Message-ID: Hi! Guido van Rossum: [...] > Modified Files: > Makefile.in > Log Message: > Added distutils and distutils/command to LIBSUBDIRS. Noted by Andrew > Kuchling. [...] > ! LIBSUBDIRS= lib-old lib-tk test test/output encodings \ > ! distutils distutils/command $(MACHDEPS) [...] What about 'site-packages'? SuSE added this to their Python packaging and I think it is a good idea to have an empty 'site-packages' directory installed by default. Regards, Peter From akuchlin at mems-exchange.org Fri Mar 31 22:16:53 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Fri, 31 Mar 2000 15:16:53 -0500 (EST) Subject: [Python-Dev] SRE: what to do with undocumented attributes? In-Reply-To: <00e901bf9a9c$6c036240$34aab5d4@hagrid> References: <00b701bf9a99$022339c0$34aab5d4@hagrid> <14563.58848.109072.339060@amarok.cnri.reston.va.us> <00e901bf9a9c$6c036240$34aab5d4@hagrid> Message-ID: <14565.1973.361549.291817@amarok.cnri.reston.va.us> Fredrik Lundh writes: >btw, "pattern" doesn't make much sense in SRE -- who says >the pattern object was created by re.compile? guess I'll just >set it to None in other cases (e.g. sregex, sreverb, sgema...) Good point; I can imagine fabulously complex patterns assembled programmatically, for which no summary could be made. I guess there could be another attribute that also gives the class (module? function?) used to compile the pattern, but more likely, the pattern attribute should be deprecated and eventually dropped. -- A.M. Kuchling http://starship.python.net/crew/amk/ You know how she is when she gets an idea into her head. I mean, when one finally penetrates. -- Desire describes Delirium, in SANDMAN #41: "Brief Lives:1" From pf at artcom-gmbh.de Fri Mar 31 22:14:41 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Fri, 31 Mar 2000 22:14:41 +0200 (MEST) Subject: [Python-Dev] 1.5.2 -> 1.6 Changes In-Reply-To: <200003311447.JAA29633@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 31, 2000 9:47:31 am" Message-ID: Hi! Guido van Rossum : > See what I've done to Moshe's list: http://www.python.org/1.6/ Very fine, but I have a few small annotations: 1.'linuxaudio' has been renamed to 'linuxaudiodev' 2.The following text: "_tkinter - support for 8.1,8.2,8.3 (no support for versions older than 8.0)." looks a bit misleading, since it is not explicit about Version 8.0.x I suggest the following wording: "_tkinter - supports Tcl/Tk from version 8.0 up to the current 8.3. Support for versions older than 8.0 has been dropped." 3.'src/Tools/i18n/pygettext.py' by Barry should be mentioned. This is a very useful utility. I suggest to append the following text: "New utility pygettext.py -- Python equivalent of xgettext(1). A message text extraction tool used for internationalizing applications written in Python" Regards, Peter From fdrake at acm.org Fri Mar 31 22:30:00 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 31 Mar 2000 15:30:00 -0500 (EST) Subject: [Python-Dev] 1.5.2 -> 1.6 Changes In-Reply-To: References: <200003311447.JAA29633@eric.cnri.reston.va.us> Message-ID: <14565.2760.665022.206361@seahag.cnri.reston.va.us> Peter Funk writes: > I suggest the following wording: ... > a very useful utility. I suggest to append the following text: Peter, I'm beginning to figure this out -- you really just want to get published! ;) You forgot the legelese. ;( -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido at python.org Fri Mar 31 23:30:42 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 31 Mar 2000 16:30:42 -0500 Subject: [Python-Dev] Python 1.6 alpha 1 released Message-ID: <200003312130.QAA04361@eric.cnri.reston.va.us> I've just released a source tarball and a Windows installer for Python 1.6 alpha 1 to the Python website: http://www.python.org/1.6/ Probably the biggest news (if you hadn't heard the rumors) is Unicode support. More news on the above webpage. Note: this is an alpha release. Some of the code is very rough! Please give it a try with your favorite Python application, but don't trust it for production use yet. I plan to release several more alpha and beta releases over the next two months, culminating in an 1.6 final release around June first. We need your help to make the final 1.6 release as robust as possible -- please test this alpha release!!! --Guido van Rossum (home page: http://www.python.org/~guido/) From gandalf at starship.python.net Fri Mar 31 23:56:16 2000 From: gandalf at starship.python.net (Vladimir Ulogov) Date: Fri, 31 Mar 2000 16:56:16 -0500 (EST) Subject: [Python-Dev] Re: Python 1.6 alpha 1 released In-Reply-To: <200003312130.QAA04361@eric.cnri.reston.va.us> Message-ID: Guido, """where you used to write sock.connect(host, port) you must now write sock.connect((host, port))""" Is it possible to keep old notation ? I'm understand (according you past mail about parameters of the connect) this may be not what you has have in mind, but we do use this notation "a lot" and for us it will means to create workaround for socket.connect function. It's inconvinient. In general, I'm thinknig the socket.connect(Host, Port) looks prettier :)) than socket.connect((Host, Port)) Vladimir