From rrr at ronadam.com Thu Nov 1 16:58:32 2007 From: rrr at ronadam.com (Ron Adam) Date: Thu, 01 Nov 2007 10:58:32 -0500 Subject: [Python-ideas] str(, base=) as complement to int(, base=) In-Reply-To: <47289EA3.7090005@cheimes.de> References: <47289996.9000304@ronadam.com> <47289EA3.7090005@cheimes.de> Message-ID: <4729F7A8.4090803@ronadam.com> Christian Heimes wrote: >> Or should it be a function in the math or string module? > > Why do you want to hide the function somewhere instead of putting the > functionality in an obvious place. In Python 3000 the str() builtin has > two optional arguments: > > str(s, [encoding, [errors]]) > > Isn't base 2 or base 16 just another kind of encoding? IMHO the > intergers 2, 8 or 16 can be treated as a form of encoding just as > "ascii" or "latin-1". > > Christian See Guido's reply about it not being a str() constructor. Sense int types don't have non-special methods it can't be an int method. I don't think it's needed often enough to justify making it a global builtin function. That leaves putting it in either the string or math module. I don't think of it as hiding. I think of it a grouping which makes it easier to find rather than harder to find. Cheers, Ron From adam at atlas.st Thu Nov 1 18:39:58 2007 From: adam at atlas.st (Adam Atlas) Date: Thu, 1 Nov 2007 13:39:58 -0400 Subject: [Python-ideas] str(, base=) as complement to int(, base=) In-Reply-To: References: Message-ID: <203E0116-42D5-4059-9659-D5A6527F4E3C@atlas.st> On 31 Oct 2007, at 06:02, Christian Heimes wrote: > I know it's not a killer feature but it feels right to have a > complement. How do you like the idea? > > Christian How about extending the int type's (and other numeric types', perhaps) implementation of __format__ (for py3k -- PEP 3101) so that it can take an optional format specifier component indicating the base? From guido at python.org Thu Nov 1 19:52:25 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 1 Nov 2007 11:52:25 -0700 Subject: [Python-ideas] str(, base=) as complement to int(, base=) In-Reply-To: <203E0116-42D5-4059-9659-D5A6527F4E3C@atlas.st> References: <203E0116-42D5-4059-9659-D5A6527F4E3C@atlas.st> Message-ID: We go over this about once a year. The conclusion is always the same: there isn't enough use for bases other than 2, 8, 10, 16 to bother including anything, and these are already covered by bin(), oct(), str() and hex(). (bin() is in 3.0 and to be backported to 2.6.) On 11/1/07, Adam Atlas wrote: > > On 31 Oct 2007, at 06:02, Christian Heimes wrote: > > I know it's not a killer feature but it feels right to have a > > complement. How do you like the idea? > > > > Christian > > How about extending the int type's (and other numeric types', perhaps) > implementation of __format__ (for py3k -- PEP 3101) so that it can > take an optional format specifier component indicating the base? > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Thu Nov 1 20:13:02 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 1 Nov 2007 15:13:02 -0400 Subject: [Python-ideas] str(, base=) as complement to int(, base=) In-Reply-To: References: <203E0116-42D5-4059-9659-D5A6527F4E3C@atlas.st> Message-ID: On 11/1/07, Guido van Rossum wrote: > We go over this about once a year. The conclusion is always the same: > there isn't enough use for bases other than 2, 8, 10, 16 to bother > including anything, and these are already covered by bin(), oct(), > str() and hex(). (bin() is in 3.0 and to be backported to 2.6.) Of course, if part of the deal were dropping bin, oct, and hex, that might be a good trade. But it may already be too late even for Py3. -jJ From greg.ewing at canterbury.ac.nz Thu Nov 1 23:48:21 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 02 Nov 2007 11:48:21 +1300 Subject: [Python-ideas] str(, base=) as complement to int(, base=) In-Reply-To: <4729F7A8.4090803@ronadam.com> References: <47289996.9000304@ronadam.com> <47289EA3.7090005@cheimes.de> <4729F7A8.4090803@ronadam.com> Message-ID: <472A57B5.8010102@canterbury.ac.nz> Ron Adam wrote: > > That leaves putting it in either the string or math module. I don't think it belongs in the math module, because that's supposed to correspond 1-1 with what's in the C math library. -- Greg From bborcic at gmail.com Fri Nov 9 15:39:59 2007 From: bborcic at gmail.com (Boris Borcic) Date: Fri, 09 Nov 2007 15:39:59 +0100 Subject: [Python-ideas] x )= f as shorthand for x=f(x) Message-ID: Title says it all. Got used to += et al. My mind often expects augmented assignment syntax to exist uniformly for whatever transform. If I am not mistaken, python syntax doesn't permit augmented assignment operators to sit between parens so that )= wouldn't risk confusing quick machine- or eye-scans to match parens. Cheers, BB From jimjjewett at gmail.com Fri Nov 9 16:20:11 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 9 Nov 2007 10:20:11 -0500 Subject: [Python-ideas] x )= f as shorthand for x=f(x) In-Reply-To: References: Message-ID: On 11/9/07, Boris Borcic wrote: > Title says it all. Got used to += et al. My mind often expects > augmented assignment syntax to exist uniformly for whatever > transform. Agreed. Whether it is worth the costs is a different question. I'm not sure it is, and I'm sure it isn't with this particular syntax. > If I am not mistaken, python syntax doesn't permit augmented > assignment operators to sit between parens so that )= wouldn't > risk confusing quick machine- or eye-scans to match parens. There are plenty of tools (and plenty of eyes, including mine) that don't use the full ruleset. A parenthesis inside a string has no syntactic meaning. In practice, it still messes up some syntax colorings. (1, 2, """3, 4) """, 5) I don't think there is any reason to encourage the use of unmatched parentheses for any purpose. -jJ From fredrik.johansson at gmail.com Fri Nov 9 16:24:22 2007 From: fredrik.johansson at gmail.com (Fredrik Johansson) Date: Fri, 9 Nov 2007 16:24:22 +0100 Subject: [Python-ideas] x )= f as shorthand for x=f(x) In-Reply-To: References: Message-ID: <3d0cebfb0711090724p5fecb5c5pc23d44db8a4f0c84@mail.gmail.com> On Nov 9, 2007 3:39 PM, Boris Borcic wrote: > > Title says it all. Got used to += et al. My mind often expects augmented > assignment syntax to exist uniformly for whatever transform. > > If I am not mistaken, python syntax doesn't permit augmented assignment > operators to sit between parens so that )= wouldn't risk confusing quick > machine- or eye-scans to match parens. Would the statement ( x )= f represent the ordinary assignment x=f or would it become a syntax error? Fredrik From eduardo.padoan at gmail.com Fri Nov 9 16:22:30 2007 From: eduardo.padoan at gmail.com (Eduardo O. Padoan) Date: Fri, 9 Nov 2007 13:22:30 -0200 Subject: [Python-ideas] x )= f as shorthand for x=f(x) In-Reply-To: References: Message-ID: On Nov 9, 2007 12:39 PM, Boris Borcic wrote: > > Title says it all. Got used to += et al. My mind often expects augmented > assignment syntax to exist uniformly for whatever transform. > > If I am not mistaken, python syntax doesn't permit augmented assignment > operators to sit between parens so that )= wouldn't risk confusing quick > machine- or eye-scans to match parens. > Bizarre syntax. Close-parens should close something. Also, al it saves is 1 char. -- http://www.advogato.org/person/eopadoan/ Bookmarks: http://del.icio.us/edcrypt From bborcic at gmail.com Fri Nov 9 18:24:49 2007 From: bborcic at gmail.com (Boris Borcic) Date: Fri, 09 Nov 2007 18:24:49 +0100 Subject: [Python-ideas] x )= f as shorthand for x=f(x) In-Reply-To: <3d0cebfb0711090724p5fecb5c5pc23d44db8a4f0c84@mail.gmail.com> References: <3d0cebfb0711090724p5fecb5c5pc23d44db8a4f0c84@mail.gmail.com> Message-ID: Fredrik Johansson wrote: > On Nov 9, 2007 3:39 PM, Boris Borcic wrote: >> Title says it all. Got used to += et al. My mind often expects augmented >> assignment syntax to exist uniformly for whatever transform. >> >> If I am not mistaken, python syntax doesn't permit augmented assignment >> operators to sit between parens so that )= wouldn't risk confusing quick >> machine- or eye-scans to match parens. > > Would the statement > > ( x )= f > > represent the ordinary assignment x=f or would it become a syntax error? Ah, indeed I almost itemized my remark about current python syntax with a (1), to add a "(2) makes closing parens before an augmented assignment (part of) a superfluous construct". I'd make it a syntax error, to answer your question. I'd be interested in examples out of the "wild". Cheers, BB > > Fredrik From bborcic at gmail.com Fri Nov 9 18:29:43 2007 From: bborcic at gmail.com (Boris Borcic) Date: Fri, 09 Nov 2007 18:29:43 +0100 Subject: [Python-ideas] x )= f as shorthand for x=f(x) In-Reply-To: References: Message-ID: Jim Jewett wrote: > On 11/9/07, Boris Borcic wrote: > >> Title says it all. Got used to += et al. My mind often expects >> augmented assignment syntax to exist uniformly for whatever >> transform. > > Agreed. > > Whether it is worth the costs is a different question. I'm not sure > it is, and I'm sure it isn't with this particular syntax. > >> If I am not mistaken, python syntax doesn't permit augmented >> assignment operators to sit between parens so that )= wouldn't >> risk confusing quick machine- or eye-scans to match parens. > > There are plenty of tools (and plenty of eyes, including mine) that > don't use the full ruleset. > > A parenthesis inside a string has no syntactic meaning. In practice, > it still messes up some syntax colorings. > > (1, 2, """3, 4) > > """, 5) Point was, in a syntactically correct program, the proposed operator can not occur /at all/ inside the span of an opened parenthesis, so this type of confusion isn't possible. BB > > I don't think there is any reason to encourage the use of unmatched > parentheses for any purpose. > > -jJ From bborcic at gmail.com Fri Nov 9 18:51:35 2007 From: bborcic at gmail.com (Boris Borcic) Date: Fri, 09 Nov 2007 18:51:35 +0100 Subject: [Python-ideas] x )= f as shorthand for x=f(x) In-Reply-To: <3d0cebfb0711090724p5fecb5c5pc23d44db8a4f0c84@mail.gmail.com> References: <3d0cebfb0711090724p5fecb5c5pc23d44db8a4f0c84@mail.gmail.com> Message-ID: Fredrik Johansson wrote: > On Nov 9, 2007 3:39 PM, Boris Borcic wrote: >> Title says it all. Got used to += et al. My mind often expects augmented >> assignment syntax to exist uniformly for whatever transform. >> >> If I am not mistaken, python syntax doesn't permit augmented assignment >> operators to sit between parens so that )= wouldn't risk confusing quick >> machine- or eye-scans to match parens. > > Would the statement > > ( x )= f > > represent the ordinary assignment x=f or would it become a syntax error? Ah, and what about (x,y)=f - more likely to already exist in the wild, isn't it ? Well, if ')=' was an augmented assignment operator, I'd say (x,y)=f should parse as a destructuring assignment as it already does while (x)=f should become a syntax error. I admit it's debatable, of course. I think a case could be made in terms of lookahead tokens in favor of that solution (all other things equal). Cheers, BB From bborcic at gmail.com Fri Nov 9 19:00:18 2007 From: bborcic at gmail.com (Boris Borcic) Date: Fri, 09 Nov 2007 19:00:18 +0100 Subject: [Python-ideas] x )= f as shorthand for x=f(x) In-Reply-To: References: Message-ID: Eduardo O. Padoan wrote: > On Nov 9, 2007 12:39 PM, Boris Borcic wrote: >> Title says it all. Got used to += et al. My mind often expects augmented >> assignment syntax to exist uniformly for whatever transform. >> >> If I am not mistaken, python syntax doesn't permit augmented assignment >> operators to sit between parens so that )= wouldn't risk confusing quick >> machine- or eye-scans to match parens. >> > > Bizarre syntax. Close-parens should close something. Also, al it saves > is 1 char. Typical motivating usecase is like for other augmented assignment Just as a[] += n saves both the typing and the computation of an over a[] += a[] + n and an temporary variable assignment over temp = a[temp]=a[temp]+n so would a[] )= f save over a[] = f(a[]) etc. More than "just 1 char", anyway. BB From steven.bethard at gmail.com Fri Nov 9 19:07:42 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Fri, 9 Nov 2007 11:07:42 -0700 Subject: [Python-ideas] x )= f as shorthand for x=f(x) In-Reply-To: References: Message-ID: On Nov 9, 2007 7:39 AM, Boris Borcic wrote: > Title says it all. Got used to += et al. My mind often expects augmented > assignment syntax to exist uniformly for whatever transform. I'm not really a Guido channeler, but I'd guess this has about a 0% chance of ever making it into Python. Function calls in Python are indicated by () following the function name. Your proposal puts the parentheses (or one of them) *before* the function name. Breaking the consistency here seems like an *extremely* bad idea. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From george.sakkis at gmail.com Fri Nov 9 19:16:26 2007 From: george.sakkis at gmail.com (George Sakkis) Date: Fri, 9 Nov 2007 13:16:26 -0500 Subject: [Python-ideas] x )= f as shorthand for x=f(x) In-Reply-To: References: Message-ID: <91ad5bf80711091016s732f163asf0ac533e7e7841ac@mail.gmail.com> On Nov 9, 2007 7:39 AM, Boris Borcic wrote: > Title says it all. Got used to += et al. My mind often expects augmented > assignment syntax to exist uniformly for whatever transform. And the "most inane proposal in python-ideas" award goes to... ;-) From bborcic at gmail.com Fri Nov 9 19:33:00 2007 From: bborcic at gmail.com (Boris Borcic) Date: Fri, 09 Nov 2007 19:33:00 +0100 Subject: [Python-ideas] x )= f as shorthand for x=f(x) In-Reply-To: References: Message-ID: Steven Bethard wrote: > On Nov 9, 2007 7:39 AM, Boris Borcic wrote: >> Title says it all. Got used to += et al. My mind often expects augmented >> assignment syntax to exist uniformly for whatever transform. > > I'm not really a Guido channeler, but I'd guess this has about a 0% > chance of ever making it into Python. > > Function calls in Python are indicated by () following the function > name. Your proposal puts the parentheses (or one of them) *before* > the function name. Breaking the consistency here seems like an > *extremely* bad idea. I contend that x )= f captures some perfume of the invariant you mention, although I admit there is no comparably simple formula for the relaxed invariant (if indeed it exists). Note that current python syntax requires any ) to follow a ( that it balances, so that's not one but two rules broken in coordination. (-1)*(-1)==(+1)-ly yours, Boris Borcic -- What happened to our chief humorist and python zen master, BTW ? From guido at python.org Fri Nov 9 19:40:28 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 9 Nov 2007 10:40:28 -0800 Subject: [Python-ideas] x )= f as shorthand for x=f(x) In-Reply-To: References: Message-ID: Boris, give it up. That syntax is never going to fly. If you have to ask why, you're just not cut out to be a language designer. On Nov 9, 2007 10:33 AM, Boris Borcic wrote: > Steven Bethard wrote: > > On Nov 9, 2007 7:39 AM, Boris Borcic wrote: > >> Title says it all. Got used to += et al. My mind often expects augmented > >> assignment syntax to exist uniformly for whatever transform. > > > > I'm not really a Guido channeler, but I'd guess this has about a 0% > > chance of ever making it into Python. > > > > Function calls in Python are indicated by () following the function > > name. Your proposal puts the parentheses (or one of them) *before* > > the function name. Breaking the consistency here seems like an > > *extremely* bad idea. > > > I contend that x )= f captures some perfume of the invariant you mention, > although I admit there is no comparably simple formula for the relaxed invariant > (if indeed it exists). > > Note that current python syntax requires any ) to follow a ( that it balances, > so that's not one but two rules broken in coordination. > > (-1)*(-1)==(+1)-ly yours, > > Boris Borcic > -- > What happened to our chief humorist and python zen master, BTW ? > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From bborcic at gmail.com Fri Nov 9 19:54:03 2007 From: bborcic at gmail.com (Boris Borcic) Date: Fri, 09 Nov 2007 19:54:03 +0100 Subject: [Python-ideas] x )= f as shorthand for x=f(x) In-Reply-To: <91ad5bf80711091016s732f163asf0ac533e7e7841ac@mail.gmail.com> References: <91ad5bf80711091016s732f163asf0ac533e7e7841ac@mail.gmail.com> Message-ID: George Sakkis wrote: > On Nov 9, 2007 7:39 AM, Boris Borcic wrote: > >> Title says it all. Got used to += et al. My mind often expects augmented >> assignment syntax to exist uniformly for whatever transform. > > And the "most inane proposal in python-ideas" award goes to... ;-) [Is the name of "sarkkasm" (sarkkism ?) the one obvious way to make your remark escape appropriate qualification as...] But please be precise, are you saying (1) that it is inane to suggest that x=f(x) has enough in common with say x=x%n that special syntax paralleling the latter's shorthand x%=n could or would make sense for the former ? (2) that the proposed choice of special syntax is "most inane". In case you mean only (2), please back your claim with some facts, by proposing "less inane" special syntax. Cheers, BB From adam at atlas.st Fri Nov 9 20:24:46 2007 From: adam at atlas.st (Adam Atlas) Date: Fri, 9 Nov 2007 14:24:46 -0500 Subject: [Python-ideas] x )= f as shorthand for x=f(x) In-Reply-To: <91ad5bf80711091016s732f163asf0ac533e7e7841ac@mail.gmail.com> References: <91ad5bf80711091016s732f163asf0ac533e7e7841ac@mail.gmail.com> Message-ID: <38D4AA4E-2708-4E4F-BBCC-381B62F2961B@atlas.st> On 9 Nov 2007, at 13:16, George Sakkis wrote: > On Nov 9, 2007 7:39 AM, Boris Borcic wrote: > >> Title says it all. Got used to += et al. My mind often expects >> augmented >> assignment syntax to exist uniformly for whatever transform. > > And the "most inane proposal in python-ideas" award goes to... ;-) I can top that. Instead of "x )= f", I propose one of the following: - x $?$%$?666= f - x =^_^= f - x ?= f - x 8======D f From bborcic at gmail.com Fri Nov 9 20:37:29 2007 From: bborcic at gmail.com (Boris Borcic) Date: Fri, 09 Nov 2007 20:37:29 +0100 Subject: [Python-ideas] x )= f as shorthand for x=f(x) In-Reply-To: References: Message-ID: Guido van Rossum wrote: > Boris, give it up. That syntax is never going to fly. If you have to > ask why, you're just not cut out to be a language designer. Guido, I did not intend to pose as a language designer. I just bumped for the nth time on a corner of the language and came up with the closest approximation to a solution I could invent, expecting the (actual and potential) language designers of the forum to find a better solution if any can be dreamed up. Maybe I was mistaken about this newsgroup's purpose, but imho playing the devil's advocate is a perfectly honorable manner to push ideas (as opposed to designs). I must admit I wasn't expecting the discussion to rely so quickly on involving my character. In conclusion, I guess I'm warranted to take this to mean "we can dream up no appropriate syntax". Regards, Boris --- PS,FYI : a notation borne from letting parens live independent lives, and indeed could fly http://en.wikipedia.org/wiki/Bra-ket_notation > > On Nov 9, 2007 10:33 AM, Boris Borcic wrote: >> Steven Bethard wrote: >>> On Nov 9, 2007 7:39 AM, Boris Borcic wrote: >>>> Title says it all. Got used to += et al. My mind often expects augmented >>>> assignment syntax to exist uniformly for whatever transform. >>> I'm not really a Guido channeler, but I'd guess this has about a 0% >>> chance of ever making it into Python. >>> >>> Function calls in Python are indicated by () following the function >>> name. Your proposal puts the parentheses (or one of them) *before* >>> the function name. Breaking the consistency here seems like an >>> *extremely* bad idea. >> >> I contend that x )= f captures some perfume of the invariant you mention, >> although I admit there is no comparably simple formula for the relaxed invariant >> (if indeed it exists). >> >> Note that current python syntax requires any ) to follow a ( that it balances, >> so that's not one but two rules broken in coordination. >> >> (-1)*(-1)==(+1)-ly yours, >> >> Boris Borcic >> -- >> What happened to our chief humorist and python zen master, BTW ? >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> > > > From bborcic at gmail.com Fri Nov 9 20:42:35 2007 From: bborcic at gmail.com (Boris Borcic) Date: Fri, 09 Nov 2007 20:42:35 +0100 Subject: [Python-ideas] x )= f as shorthand for x=f(x) In-Reply-To: <38D4AA4E-2708-4E4F-BBCC-381B62F2961B@atlas.st> References: <91ad5bf80711091016s732f163asf0ac533e7e7841ac@mail.gmail.com> <38D4AA4E-2708-4E4F-BBCC-381B62F2961B@atlas.st> Message-ID: Adam Atlas wrote: >> And the "most inane proposal in python-ideas" award goes to... ;-) > > I can top that. Instead of "x )= f", I propose one of the following: > > - x $?$%$?666= f > - x =^_^= f > - x ?= f > - x 8======D f That's self-contradictory, or "most" doesn't denote a superlative. From bwinton at latte.ca Fri Nov 9 21:05:32 2007 From: bwinton at latte.ca (Blake Winton) Date: Fri, 09 Nov 2007 15:05:32 -0500 Subject: [Python-ideas] x )= f as shorthand for x=f(x) In-Reply-To: References: Message-ID: <4734BD8C.6050600@latte.ca> Some people wrote: >>>> Function calls in Python are indicated by () following the function name. >>> I contend that "x )= f" captures some perfume of the invariant you mention, But not enough of it. A syntax of "x ()= f" would seem to have more chance of being accepted. But I would still give it no more than 0.1% chance, based on the potential confusion between it and "x() = f"... > In conclusion, I guess I'm warranted to take this to mean "we can > dream up no appropriate syntax". If I were you, I would take it more as "that suggestion is too Functional (or perhaps just too confusing) for Python." (If you're looking for a language that has filed all the corners off, might I suggest Scheme. No, seriously, I'm not making a parenthesis joke here.) Later, Blake. From tjreedy at udel.edu Fri Nov 9 21:11:16 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 9 Nov 2007 15:11:16 -0500 Subject: [Python-ideas] x )= f as shorthand for x=f(x) References: Message-ID: "Boris Borcic" wrote in message news:fh1rhm$ui$1 at ger.gmane.org... | | Title says it all. Got used to += et al. My mind often expects augmented | assignment syntax to exist uniformly for whatever transform. I the analogy can be improved. x += y # abbreviates x = x + y # which could have been defined to have been written x = +(x,y) # and which usually *is* equivalent to x = type(x).__add__(x,y) Hence by analogy, I would rewrite x = f(x,y) # as x f= y # ;-) Making the obvious generalization to n params, and specializing to one, gives x f= tjr From jimjjewett at gmail.com Fri Nov 9 22:12:55 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 9 Nov 2007 16:12:55 -0500 Subject: [Python-ideas] x )= f as shorthand for x=f(x) In-Reply-To: References: Message-ID: Boris, I'm posting this publicly because you aren't the first to feel this way, so I think an answer should be archived. On 11/9/07, Boris Borcic wrote: > Guido van Rossum wrote: > > Boris, give it up. That syntax is never going to fly. If you have to > > ask why, you're just not cut out to be a language designer. > I did not intend to pose as a language designer. Suggesting a change in python is acting (in a small way) as a language designer. > came up with the closest approximation to a solution I could invent, Which is fine. The catch is that no one -- not even Guido -- gets everything right the first time. There is a natural desire to just tweak the proposal to work, or even to explain why things are already OK. For a good proposal, you need to do this to make it great. Unfortunately, that turns out to be running in circles for the proposals that -- like most proposals -- turn out to be dead ends. So you need to be willing to step back and figure out (1) How important the problem really is. (2) How expensive the proposed solutions really are. > I must admit I wasn't expecting the discussion to rely so quickly on > involving my character. I don't think that was anyone's intent. I suspect you were thinking of lines like: > > That syntax is never going to fly. If you have to > > ask why, you're just not cut out to be a language designer. These don't mean you're bad person; they just mean that you don't yet know how to answer those two questions the same way Guido (for example) would. > In conclusion, I guess I'm warranted to take this to mean "we can > dream up no appropriate syntax". Yes, but there is also a question about whether to do it at all. Remember that x = f(x) is one step of reduce -- and reduce is something Guido wants to take back out of the language because, in practice, it is too confusing. (a) Is this operation frequent enough to be worth a syntactic shortcut? Would it actually make the code easier to read? (b) Is the sort of code that uses this operation something that should be encouraged? Or is making it hard a *good* thing that steers people towards other idioms? > PS,FYI : a notation borne from letting parens live independent lives, > and indeed could fly http://en.wikipedia.org/wiki/Bra-ket_notation The question isn't whether it is possible, but whether it is worth the cost. The costs are different for physics and for a generic programming language -- and different still for Python in particular. -jJ From g.brandl at gmx.net Fri Nov 9 23:40:49 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 09 Nov 2007 23:40:49 +0100 Subject: [Python-ideas] x )= f as shorthand for x=f(x) In-Reply-To: References: Message-ID: Terry Reedy schrieb: > "Boris Borcic" wrote in > message news:fh1rhm$ui$1 at ger.gmane.org... > | > | Title says it all. Got used to += et al. My mind often expects augmented > | assignment syntax to exist uniformly for whatever transform. > > I the analogy can be improved. > > x += y # abbreviates > x = x + y # which could have been defined to have been written > x = +(x,y) # and which usually *is* equivalent to x = type(x).__add__(x,y) Hah, I have the solution! x ?= f unicode-ly yours, Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From ntoronto at cs.byu.edu Fri Nov 9 23:51:26 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Fri, 09 Nov 2007 15:51:26 -0700 Subject: [Python-ideas] Raw strings return compiled regexps Message-ID: <4734E46E.2050709@cs.byu.edu> It seems like every time somebody has issues with raw strings, the canonical answer is "don't use them for that, use the for regular expressions". What if they just returned regular expression objects? As in r''.match('') That would guarantee they didn't get abused for anything else. It would break a lot of code, too. :) Quick question, if someone has the time: is there any way to test equivalence of regular expressions? If we had intersection and an emptiness test (both of which are easy in the theoretical construct, but harder to do in practice), it'd be easy. I may be able to fake intersection using lookahead and such, but there's no emptiness test that I know of. Thanks in advance, Neil From stephen at xemacs.org Sat Nov 10 00:26:21 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 10 Nov 2007 08:26:21 +0900 Subject: [Python-ideas] x )= f as shorthand for x=f(x) In-Reply-To: References: Message-ID: <87d4ujm25e.fsf@uwakimon.sk.tsukuba.ac.jp> Boris Borcic writes: > I must admit I wasn't expecting the discussion to rely so quickly > on involving my character. Some people have natural talent for a particular kind of design, some don't. If one doesn't, it's no big deal, s/he still can contribute, even to design---but coming up with original ideas is likely to waste her/his time and that of others. (I don't say it's impossible to develop it as a skill, but it would take real work.) Why not take Guido's comment literally, "*if* you don't have it," and think about the "litmus test" he described? (Ie, think about why this proposal is unattractive.) Of course, there is an implication that you *don't* have it, but it will be better all around if you ignore that implication, and leave it an open question as long as you want to contribute in this way. > In conclusion, I guess I'm warranted to take this to mean "we can > dream up no appropriate syntax". I wouldn't say "impossible". However, the senior developers who have spoken up clearly think that your proposal (a) is not an improvement over x = f(x) in most use cases (and IMO often would be worse, because x += y expresses accumulation of y, while x = y expresses replacement) and (b) seems to have very few, if any, appropriate use cases. So "why bother?" is the message. > PS,FYI : a notation borne from letting parens live independent lives, > and indeed could fly http://en.wikipedia.org/wiki/Bra-ket_notation As I understand it, the bra-ket notation arose in physics because both the bra part and the ket part make sense as operators, but only in the lefthand role for the bra, and righthand role for the ket. So they don't really live independent lives, any more than the dx and the dy do in conventional calculus. However, in your syntax you do (c) lose the kind of implied symmetry that the bra-ket and infinitesimal notations have. You could "fix" that by using the notation "apply-and-assign" x ()= f, but that syntax already has a meaning in python, and runs even more forcefully into STeVe's criticism that parens are a postfix operator, not infix. Note that I myself can come up with criticisms like (a), (b), and (c) but to the best of my knowledge I've never invented any useful syntax. I-always-wanted-to-be-a-language-designer-too-ly y'rs, From lists at cheimes.de Sat Nov 10 02:37:53 2007 From: lists at cheimes.de (Christian Heimes) Date: Sat, 10 Nov 2007 02:37:53 +0100 Subject: [Python-ideas] x )= f as shorthand for x=f(x) In-Reply-To: References: Message-ID: Georg Brandl wrote: > Hah, I have the solution! > > x ?= f > > unicode-ly yours, Georg has even written a Python enhancement proposals about the topic: http://www.python.org/dev/peps/pep-3117/ It should be hard to get the idea into it ... *just kidding* Christian From greg at krypto.org Sat Nov 10 08:04:40 2007 From: greg at krypto.org (Gregory P. Smith) Date: Fri, 9 Nov 2007 23:04:40 -0800 Subject: [Python-ideas] Raw strings return compiled regexps In-Reply-To: <4734E46E.2050709@cs.byu.edu> References: <4734E46E.2050709@cs.byu.edu> Message-ID: <52dc1c820711092304y2d88c403q26e1d4e8a1cbf4fd@mail.gmail.com> Interesting idea. Rather than breaking a lot of code you could have it be a subclass of string that also adds the regular expression object methods. Trivial to prototype such a type: import re class rstr(str): def __init__(self, x): str.__init__(self, x) self.__re = None def match(self, *args, **kwargs): if not self.__re: self.__re = re.compile(self) return self.__re.match(*args, **kwargs) def search(self, *args, **kwargs): if not self.__re: self.__re = re.compile(self) return self.__re.search(*args, **kwargs) def set_re_flags(self, flags): if self.__re: raise RuntimeError('flags may only be set once before the first use as a regular expression.') self.__re = re.compile(self, flags) Regardless, count me as +0 on the concept. It seems neat but also smells fishy. -gps On 11/9/07, Neil Toronto wrote: > > It seems like every time somebody has issues with raw strings, the > canonical answer is "don't use them for that, use the for regular > expressions". > > What if they just returned regular expression objects? As in > > r''.match('') > > That would guarantee they didn't get abused for anything else. It would > break a lot of code, too. :) > > Quick question, if someone has the time: is there any way to test > equivalence of regular expressions? If we had intersection and an > emptiness test (both of which are easy in the theoretical construct, but > harder to do in practice), it'd be easy. I may be able to fake > intersection using lookahead and such, but there's no emptiness test > that I know of. > > Thanks in advance, > Neil > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at cheimes.de Sun Nov 11 21:43:27 2007 From: lists at cheimes.de (Christian Heimes) Date: Sun, 11 Nov 2007 21:43:27 +0100 Subject: [Python-ideas] Enable tab completion for interactive sessions Message-ID: Hello fellow Pythonistas! Python has a very useful feature a lot of people don't know about. It's tab completion for the interactive shell. http://docs.python.org/lib/module-rlcompleter.html Tab completion is very useful for introspection and quick tests in an interactive shell. rlcompleter isn't enable by default - and it shouldn't. But I like to add a cmd line option and an env var to load and enable the rlcompleter in an interactive session. The right place to enable the feature is in Modules/main.c:460 if ((Py_InspectFlag || (command == NULL && filename == NULL && module == NULL)) && isatty(fileno(stdin))) { code run when -i is given or neither command nor filename nor module is set and stdin is an interactive terminal. } At the moment the code block loads just the readline module. I also like to load the rlcompleter module and invoke readline.parse_and_bind("tab: complete") there. Options: (1) always enable tab completion for interactive shells w/o a command, module and filename. (2) only enable rlcompleter when the -i flag or PYTHONINTERACTIVE env var is set. (3) add a new command flag and env var to enable the completer when (1) or (2) is true Christian From adam at atlas.st Mon Nov 12 05:52:43 2007 From: adam at atlas.st (Adam Atlas) Date: Sun, 11 Nov 2007 23:52:43 -0500 Subject: [Python-ideas] Pause (sort of a 'deep yield'?) Message-ID: Generator-based coroutines are great, but I've thought of some interesting cases where it would help to be able to sort of yield to an outer scope (beyond the parent scope) while being able to resume. I'm thinking this would make the most sense as a kind of exception, with an added "resume" method which would resume execution at the point at which the exception was raised. (They'd also have a throw() method for continuing execution but raising an exception, and a close() method, as with generators in Python >= 2.5.) Here's an example to demonstrate what I'm talking about: def a(): print 'blah' p = pause 7 # like using `yield` as an expression # but it raises "PauseException" (or whatever) print p return (p, 123) def b(): return a() try: print b() except PauseException, e: print e.value e.reusme(3) #prints: # blah # 7 # 3 # (3, 123) Normally you'd subclass PauseException so you can catch specific known instances of pausing in your application. If no outer scope can handle a pause, then the program should exit as with any other exception. For more practical use cases, I'm mainly thinking about asynchronous programming, things like Twisted; I see a lot of interesting possibilities there. But here's a simpler example... Suppose we have WSGI 2.0, and, as expected, it is rid of start_response() and the resulting write() callable. And suppose we want to write an adaptor to allow WSGI 1.0 applications to be used as WSGI 2.0 applications. We want to do this by creating a write() which pauses and sends the value to an outer wrapper which interleaves any write()en output with the WSGI 1.0 app's returned app_iter into a single generator. It would go something like this: class StartRespPause (PauseException): pass class WritePause (PauseException): pass class wsgi_adaptor (object): def __init__(self, app): self.app = app def _write(self, data): pause WritePause(data) # Interrupts this frame and returns control to the first outer frame # that catches WritePause. # If the `pause` statement/expression is given a PauseException # instance, it raises that; if it is given a PauseException subclass, # it raises that with None; if it gets another value `v`, it raises # PauseException(v). def _start_response(self, status, response_headers, exc_info=None): # [...irrelevant exc_info handling stuff here...] pause (status, response_headers) return self._write def _app_iter(self, environ): try: for v in self.app(environ, self._start_response): yield v except WritePause, e: yield e.value e.resume() # This part of the syntax is perhaps a little troublesome -- the # body of a `try` block might cause multiple pauses, so an `except` # block catching a PauseException subclass has the possibility of # running multiple times. This is the correct behaviour, but it is # somewhat counterintuitive given the huge precedent for at most # one `except` block to execute, once, for a given `try` block. # Perhaps there could be some syntax other than `except`, but of # course we'd rather keep the number of reserved words down. def __call__(self, environ): # [...whatever other bridging is needed...] try: app_iter = self.app_iter(environ) except StartRespPause, e: status, response_headers = e.value e.resume() return (status, response_headers, app_iter) Thinking about environments like Twisted, it seems to me that this could make Deferreds/callbacks [almost?] entirely unnecessary. PEP 342 (Coroutines via Enhanced Generators) speaks of using "a simple co- routine scheduler or 'trampoline function' [which] would let coroutines 'call' each other without blocking -- a tremendous boon for asynchronous applications", but I think pauses would simplify this even further; it would allow these matters to be mostly invisible outside the innermost potentially blocking functions. Basically, it "would let coroutines 'call' each other without blocking", but now without the quotes around the word 'call'. :) The PEP gives the simple example of "data = (yield nonblocking_read(my_socket, nbytes))", but with pauses, we could forget about yields -- we'd be able to program almost exactly as with traditional blocking operations. "data = read(my_socket, nbytes)". Only potentially blocking functions would have to be concerned with pausing; read() would pause to an outer scheduler/trampoline/Twisted- type reactor, which, when data was available, would resume the paused read() function (giving it the data similarly to generator.send()), which would then return the value to the calling function exactly as a synchronous function would. From guido at python.org Mon Nov 12 17:51:40 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 12 Nov 2007 08:51:40 -0800 Subject: [Python-ideas] Enable tab completion for interactive sessions In-Reply-To: References: Message-ID: You can already enable this by copying those few lines into your $PYTHONSTARTUP file. People who are truly into completion should be using iPython anyway. :-) On Nov 11, 2007 12:43 PM, Christian Heimes wrote: > Hello fellow Pythonistas! > > Python has a very useful feature a lot of people don't know about. It's > tab completion for the interactive shell. > http://docs.python.org/lib/module-rlcompleter.html > > Tab completion is very useful for introspection and quick tests in an > interactive shell. rlcompleter isn't enable by default - and it > shouldn't. But I like to add a cmd line option and an env var to load > and enable the rlcompleter in an interactive session. > > The right place to enable the feature is in > > Modules/main.c:460 > if ((Py_InspectFlag || (command == NULL && filename == NULL && module == > NULL)) && isatty(fileno(stdin))) { > > code run when -i is given or neither command nor filename nor module > is set and stdin is an interactive terminal. > > } > > At the moment the code block loads just the readline module. I also like > to load the rlcompleter module and invoke readline.parse_and_bind("tab: > complete") there. > > Options: > (1) always enable tab completion for interactive shells w/o a command, > module and filename. > (2) only enable rlcompleter when the -i flag or PYTHONINTERACTIVE env > var is set. > (3) add a new command flag and env var to enable the completer when (1) > or (2) is true > > Christian > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From arno at marooned.org.uk Mon Nov 12 20:03:16 2007 From: arno at marooned.org.uk (Arnaud Delobelle) Date: Mon, 12 Nov 2007 19:03:16 +0000 Subject: [Python-ideas] Pause (sort of a 'deep yield'?) In-Reply-To: References: Message-ID: On 12 Nov 2007, at 04:52, Adam Atlas wrote: > Generator-based coroutines are great, but I've thought of some > interesting cases where it would help to be able to sort of yield to > an outer scope (beyond the parent scope) while being able to resume. > I'm thinking this would make the most sense as a kind of exception, > with an added "resume" method which would resume execution at the > point at which the exception was raised. (They'd also have a throw() > method for continuing execution but raising an exception, and a > close() method, as with generators in Python >= 2.5.) > > Here's an example to demonstrate what I'm talking about: > > def a(): > print 'blah' > p = pause 7 # like using `yield` as an expression > # but it raises "PauseException" (or whatever) > print p > return (p, 123) > > def b(): > return a() > > try: > print b() > except PauseException, e: > print e.value > e.reusme(3) > > #prints: > # blah > # 7 > # 3 > # (3, 123) It seems to me it has the full power of call/cc & co. It would allow to turn the clock back to any previous state of an execution stack (unless I misunderstand what you mean by 'pause'). Here is a simple example: def getstate(): pause return try: getstate() except PauseException, here: pass # code line 1 # code line 2 ... here.resume() # This line takes us back to code line 1 So the whole stack should be saved each time a pause happens (unless a stackless approach is adopted). -- Arnaud From ntoronto at cs.byu.edu Tue Nov 13 08:23:40 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Tue, 13 Nov 2007 00:23:40 -0700 Subject: [Python-ideas] Required to call superclass __init__ Message-ID: <473950FC.10202@cs.byu.edu> I'm not talking about having the runtime call the superclass __init__ for you, as I am aware of the arguments over it and I am against it myself. I'm talking about checking whether it's been called within a subclass's own __init__. There are many kinds of objects with such complex underpinnings or initialization that leaving out the call to superclass __init__ would be disastrous. There are two situations I can think of where enforcing its invocation could be useful: a corporate environment and a teaching environment. (I've done the former and I'm working in the latter.) If someone forgets to call a superclass __init__, problems may not show up until much later. Even if they do show up immediately, it's almost never obvious what the real problem is, especially to someone who is new to programming or is working on someone else's code. I've got a working prototype metaclass and class instance (require_super) and decorator (super_required). Decorating a require_super method with @super_required will require any subclass override to call its superclass method, or it throws a TypeError upon exiting the subclass method. Here's how it works on the __init__ problem: class A(require_super): @super_required def __init__(self): pass a = A() # No problem class B(A): def __init__(self): super(B, self).__init__() b = B() # No problem class C(B): def __init__(self): pass # this could be a problem c = C() # TypeError: C.__init__: no super call class D(C): def __init__(self): super(D, self).__init__() d = D() # TypeError: C.__init__: no super call As long as A.__init__ is eventually called, it doesn't raise a TypeError. There's not much magic involved (as metaclasses go), just explicit and implicit method wrappers, and no crufty-looking magic words in the subclasses. Not calling the superclass method results in immediate runtime feedback. I've tested this on a medium-small, real-life single-inheritance hierarchy and it seems to work just fine. (I *think* it should work with multiple inheritance.) Two questions: 1. Is the original problem (missed superclass method calls) big enough to warrant language, runtime, or library support for a similar solution? 2. Does anybody but me think this is a great idea? Neil From phd at phd.pp.ru Tue Nov 13 10:12:46 2007 From: phd at phd.pp.ru (Oleg Broytmann) Date: Tue, 13 Nov 2007 12:12:46 +0300 Subject: [Python-ideas] Required to call superclass __init__ In-Reply-To: <473950FC.10202@cs.byu.edu> References: <473950FC.10202@cs.byu.edu> Message-ID: <20071113091246.GC15166@phd.pp.ru> On Tue, Nov 13, 2007 at 12:23:40AM -0700, Neil Toronto wrote: > I've got a working prototype metaclass and class instance > (require_super) and decorator (super_required). Chicken and egg problem, in my eyes. If the user is clever enough to use the class and the decorator isn't she clever enough to call inherited __init__? Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From jimjjewett at gmail.com Tue Nov 13 15:36:41 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 13 Nov 2007 09:36:41 -0500 Subject: [Python-ideas] Required to call superclass __init__ In-Reply-To: <20071113091246.GC15166@phd.pp.ru> References: <473950FC.10202@cs.byu.edu> <20071113091246.GC15166@phd.pp.ru> Message-ID: On 11/13/07, Oleg Broytmann wrote: > On Tue, Nov 13, 2007 at 12:23:40AM -0700, Neil Toronto wrote: > > I've got a working prototype metaclass and class instance > > (require_super) and decorator (super_required). Is this restricted to __init__ (and __new__?) or could it be used on any method? Is there (and should there be?) a way around it, by catching the TypeError? By creating a decoy object to call super on? > Chicken and egg problem, in my eyes. If the user is clever enough to use > the class and the decorator isn't she clever enough to call inherited > __init__? It may not be the same user. A library or framework writer would create the base class and use the decorator to (somewhat) ensure that subclasses meet the full interface requirements. A subclass writer should call the super.__init__ because it is in the API, but Neil's metaclass makes it easier to debug if they forget. -jJ From ntoronto at cs.byu.edu Tue Nov 13 17:09:31 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Tue, 13 Nov 2007 09:09:31 -0700 Subject: [Python-ideas] Required to call superclass __init__ In-Reply-To: References: <473950FC.10202@cs.byu.edu> <20071113091246.GC15166@phd.pp.ru> Message-ID: <4739CC3B.1090205@cs.byu.edu> Jim Jewett wrote: > On 11/13/07, Oleg Broytmann wrote: >> On Tue, Nov 13, 2007 at 12:23:40AM -0700, Neil Toronto wrote: >>> I've got a working prototype metaclass and class instance >>> (require_super) and decorator (super_required). > > Is this restricted to __init__ (and __new__?) or could it be used on any method? It can be used on any method. > Is there (and should there be?) a way around it, by catching the > TypeError? By creating a decoy object to call super on? Definitely should be, and I made one because I plan on using this myself. :) Currently, you can set self._super = True (or self.____super = True) instead of doing the superclass method call. (Yes, it currently litters the class instance with flags, but that's an implementation detail.) If you're not going to call the superclass method, you need to state that explicitly. class C(B): def __init__(self): self.__init__super = True c = C() # No problem I've fiddled with the idea of having a redecoration with @super_required remove the requirement from the current method but place it back on future overrides. Maybe a @super_not_required could remove it completely. >> Chicken and egg problem, in my eyes. If the user is clever enough to use >> the class and the decorator isn't she clever enough to call inherited >> __init__? > > It may not be the same user. > > A library or framework writer would create the base class and use the > decorator to (somewhat) ensure that subclasses meet the full interface > requirements. > > A subclass writer should call the super.__init__ because it is in the > API, but Neil's metaclass makes it easier to debug if they forget. Exactly so. Neil From mark at qtrac.eu Wed Nov 14 09:07:16 2007 From: mark at qtrac.eu (Mark Summerfield) Date: Wed, 14 Nov 2007 08:07:16 +0000 Subject: [Python-ideas] python3: subtle change to new input() Message-ID: <200711140807.16677.mark@qtrac.eu> Hi, In Python 3, input() returns an empty string in two situations: blank lines and EOF. Here's a little program that uses it: print("enter numbers one per line; blank line to quit") count = 0 total = 0 while True: line = input() if not line: # EOF or blank line break n = int(line) total += n count += 1 print("count =", count, "total =", total) If input() returned None on EOF you could write this: print("enter numbers one per line; EOF (^D or ^Z) to quit") count = 0 total = 0 while True: line = input() if line is None: # EOF break elif not line: # Blank line continue n = int(line) total += n count += 1 print("count =", count, "total =", total) The advantage of this second approach is that you can accept blank lines, which is often more convenient if using < on the command line to read stdin. Furthermore, if you replaced input() with the None returning one in the first example, it will work just the same as before. So I think that returning None on EOF gives a subtle improvement without breaking much. -- Mark Summerfield, Qtrac Ltd., www.qtrac.eu From ntoronto at cs.byu.edu Wed Nov 14 11:38:05 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Wed, 14 Nov 2007 03:38:05 -0700 Subject: [Python-ideas] Required to call superclass __init__ In-Reply-To: <4739CC3B.1090205@cs.byu.edu> References: <473950FC.10202@cs.byu.edu> <20071113091246.GC15166@phd.pp.ru> <4739CC3B.1090205@cs.byu.edu> Message-ID: <473AD00D.2070502@cs.byu.edu> Since I know you're all dying to see the code... ;) This works for instance methods, classmethods, staticmethods (if cls or self is the first parameter, as in __new__), and probably most decorated methods. Current Issues I Can Think Of: - @classmethod overrides that don't redecorate with @classmethod always raise TypeError (maybe not such a bad thing) - exceptions can cause entries in the __supercalls__ set to accumulate unboundedly (can be fixed) Anyway, here's a concrete implementation. Is the "missed super call" problem big or annoying enough to warrant having language, runtime, or library support for a similar solution? import threading import types class _supercall_set(object): def __init__(self, *args, **kwargs): # Removing threading.local() should make this faster # but not thread-safe self.loc = threading.local() self.loc.s = set(*args, **kwargs) def add(self, key): self.loc.s.add(key) def discard(self, key): self.loc.s.discard(key) def __contains__(self, key): return self.loc.s.__contains__(key) def __repr__(self): return self.loc.s.__repr__() def _unwrap_rewrap(func): '''For supported types (classmethod, staticmethod, function), returns the actual function and a function to re-wrap it, if necessary. Raises TypeError if func's type isn't supported.''' if isinstance(func, classmethod): return func.__get__(func).im_func, type(func) elif isinstance(func, staticmethod): return func.__get__(func), type(func) elif isinstance(func, types.FunctionType): return func, lambda func: func raise TypeError('unsupported type %s' % type(func)) def super_required(func): ''' Marks a method as requiring subclass overrides to call it, either directly or via a super() call. Works with all undecorated methods, classmethods, staticmethods (fragile: only if 'cls' or 'self' is the first parameter) including __new__, and probably most other decorated methods. Correct operation is guaranteed only when the method is in a subclass of require_super. If a super_required override has a superclass method that is also super_required, the override will not be required to call the superclass method, either directly or via a super() call. The superclass call requirement can be cancelled for a method and methods of the same name in all future subclasses using the super_not_required decorator. The implementation should be as thread-safe as the classes it's used in. Recursion should work as long as the last, innermost call calls the superclass method. (It's usually best to avoid it.) This is not robust to method injection, but then again, what is? Examples: class A(require_super): @super_required def __init__(self): pass class B(A): def __init__(self): pass b = B() # TypeError: B.__init__: no super call # B.__init__ needs a super(B, self).__init__() class C(require_super): @super_required @classmethod # order of decorators doesn't matter def clsmeth(cls): pass class D(C): @classmethod def clsmeth(cls): pass d = D() d.clsmeth() # TypeError: D.clsmeth: no super call # C.clsmeth needs a super(C, cls).clsmeth() ''' func, rewrap = _unwrap_rewrap(func) name = func.func_name def super_wrapper(self_or_cls, *args, **kwargs): retval = func(self_or_cls, *args, **kwargs) # Flag that the super call happened self_or_cls.__supercalls__.discard((id(self_or_cls), name)) return retval super_wrapper.func_name = func.func_name super_wrapper.func_doc = func.func_doc super_wrapper.__super_required__ = True # Pass it down return rewrap(super_wrapper) def super_not_required(func): '''Marks a method as no longer requiring subclass overrides to call it. This is only meaningful for methods in subclasses of require_super.''' func.__super_required__ = False return func def _get_sub_wrapper(func, class_name, method_name): '''Returns a wrapper function that: 1. Adds key to __supercalls__ 2. Calls the wrapped function 3. Checks for key in __supercalls__ - if there, raises TypeError''' def sub_wrapper(self_or_cls, *args, **kwargs): key = (id(self_or_cls), method_name) self_or_cls.__supercalls__.add(key) retval = func(self_or_cls, *args, **kwargs) if key not in self_or_cls.__supercalls__: return retval self_or_cls.__supercalls__.discard(key) raise TypeError("%s.%s: no super call" % (class_name, method_name)) sub_wrapper.func_name = func.func_name sub_wrapper.func_doc = func.func_doc sub_wrapper.__super_required__ = True # Pass it down return sub_wrapper class _require_super_meta(type): def __new__(typ, cls_name, bases, dct): # Search through all attributes for method_name, func in dct.items(): try: func, rewrap = _unwrap_rewrap(func) except TypeError: continue # unsupported type if hasattr(func, '__super_required__'): continue # decorated - don't wrap it again # See if a base class's method is __super_required__ for base in bases: try: if getattr(base, method_name).__super_required__: break except AttributeError: pass # not there or no __super_required__ else: continue # outer loop # Wrap up the function newfunc = _get_sub_wrapper(func, cls_name, method_name) dct[method_name] = rewrap(newfunc) return type.__new__(typ, cls_name, bases, dct) class require_super(object): '''Inheriting from require_super makes super_required and super_not_required decorators work.''' __metaclass__ = _require_super_meta # This will be visible to classes and instances __supercalls__ = _supercall_set() From g.brandl at gmx.net Wed Nov 14 14:43:03 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 14 Nov 2007 14:43:03 +0100 Subject: [Python-ideas] python3: subtle change to new input() In-Reply-To: <200711140807.16677.mark@qtrac.eu> References: <200711140807.16677.mark@qtrac.eu> Message-ID: Mark Summerfield schrieb: > Hi, > > In Python 3, input() returns an empty string in two situations: blank > lines and EOF. Could this be a platform issue? Here, on Linux, input() raises EOFError on EOF. Georg From mark at qtrac.eu Wed Nov 14 15:00:39 2007 From: mark at qtrac.eu (Mark Summerfield) Date: Wed, 14 Nov 2007 14:00:39 +0000 Subject: [Python-ideas] python3: subtle change to new input() In-Reply-To: References: <200711140807.16677.mark@qtrac.eu> Message-ID: <200711141400.39876.mark@qtrac.eu> On 2007-11-14, Georg Brandl wrote: > Mark Summerfield schrieb: > > Hi, > > > > In Python 3, input() returns an empty string in two situations: blank > > lines and EOF. > > Could this be a platform issue? Here, on Linux, input() raises EOFError > on EOF. Sorry, you're quite right... -- Mark Summerfield, Qtrac Ltd., www.qtrac.eu From guido at python.org Wed Nov 14 15:28:34 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 14 Nov 2007 06:28:34 -0800 Subject: [Python-ideas] python3: subtle change to new input() In-Reply-To: <200711141400.39876.mark@qtrac.eu> References: <200711140807.16677.mark@qtrac.eu> <200711141400.39876.mark@qtrac.eu> Message-ID: On Nov 14, 2007 6:00 AM, Mark Summerfield wrote: > On 2007-11-14, Georg Brandl wrote: > > Mark Summerfield schrieb: > > > Hi, > > > > > > In Python 3, input() returns an empty string in two situations: blank > > > lines and EOF. > > > > Could this be a platform issue? Here, on Linux, input() raises EOFError > > on EOF. > > Sorry, you're quite right... Mark, did it return "" on your platform? Then please file a bug. I can't quite tell if that's the case or if you simply misread the docs. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Wed Nov 14 15:35:09 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 14 Nov 2007 09:35:09 -0500 Subject: [Python-ideas] Required to call superclass __init__ In-Reply-To: <473AD00D.2070502@cs.byu.edu> References: <473950FC.10202@cs.byu.edu> <20071113091246.GC15166@phd.pp.ru> <4739CC3B.1090205@cs.byu.edu> <473AD00D.2070502@cs.byu.edu> Message-ID: On 11/14/07, Neil Toronto wrote: > Current Issues I Can Think Of: > ... Is the "missed super call" > problem big or annoying enough to warrant having language, > runtime, or library support for a similar solution? recipe, yes. Library or more? I'm not sure -- and I don't think this is ready yet. It feels too complicated, as though there may still be plenty of simplifications that should happen before it gets frozen. I don't yet see what those simplifications should actually be, but maybe someone else will if you publish and wait long enough. -jJ From lists at cheimes.de Wed Nov 14 16:57:46 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 14 Nov 2007 16:57:46 +0100 Subject: [Python-ideas] python3: subtle change to new input() In-Reply-To: References: <200711140807.16677.mark@qtrac.eu> Message-ID: <473B1AFA.7080900@cheimes.de> Georg Brandl wrote: > Mark Summerfield schrieb: >> Hi, >> >> In Python 3, input() returns an empty string in two situations: blank >> lines and EOF. > > Could this be a platform issue? Here, on Linux, input() raises EOFError > on EOF. I think it's more likely a subtle difference between platforms: On Linux >>> r = input() Traceback (most recent call last): File "", line 1, in EOFError >>> r = input() [1]+ Stopped ./python $ fg 1 ./python >>> On Windows >>> r = input() ^D >>> r '\x04' >>> r = input() ^Z Traceback (most recent call last): File "", line 1, in EOFError Christian From mark at qtrac.eu Wed Nov 14 17:37:56 2007 From: mark at qtrac.eu (Mark Summerfield) Date: Wed, 14 Nov 2007 16:37:56 +0000 Subject: [Python-ideas] python3: subtle change to new input() In-Reply-To: References: <200711140807.16677.mark@qtrac.eu> <200711141400.39876.mark@qtrac.eu> Message-ID: <200711141637.56229.mark@qtrac.eu> On 2007-11-14, Guido van Rossum wrote: > On Nov 14, 2007 6:00 AM, Mark Summerfield wrote: > > On 2007-11-14, Georg Brandl wrote: > > > Mark Summerfield schrieb: > > > > Hi, > > > > > > > > In Python 3, input() returns an empty string in two situations: blank > > > > lines and EOF. > > > > > > Could this be a platform issue? Here, on Linux, input() raises EOFError > > > on EOF. > > > > Sorry, you're quite right... > > Mark, did it return "" on your platform? Then please file a bug. I > can't quite tell if that's the case or if you simply misread the docs. It isn't a Python 3 bug. I confused myself with my tests. Sorry! And the docs are perfectly okay... well, apart from "stripping a trailing newline". On Unices that's fine but I don't know if Windows consoles actually send \r\n or whatever, in which case, assuming input() does the right cross-platform thing, maybe "stripping the trailing line termination character(s)" would be more accurate. (What went wrong: My little program worked fine when I used it interactively. But then I ran it using a file of data redirected from stdin, that didn't produce an EOFError. But the reason was that my test file had a blank line in it, so the program correctly broke out of the while loop at that point and stopped reading, so never reached EOF. Once I removed the blank line the program correctly terminated with an unhandled EOFError.) -- Mark Summerfield, Qtrac Ltd., www.qtrac.eu From bborcic at gmail.com Wed Nov 14 17:55:23 2007 From: bborcic at gmail.com (Boris Borcic) Date: Wed, 14 Nov 2007 17:55:23 +0100 Subject: [Python-ideas] x )= f as shorthand for x=f(x) In-Reply-To: References: Message-ID: Jim Jewett wrote: > Boris, > > I'm posting this publicly because you aren't the first to feel this > way, so I think an answer should be archived. [...] Ah, thanks for caring, Jim. And for your nice explanations. Stephen J. Turnbull wrote: [...] > > Why not take Guido's comment literally, "*if* you don't have it," and > think about the "litmus test" he described? (Ie, think about why this > proposal is unattractive.) It's like a judge silencing a advocate by saying "It's no, and if you can't plead the other side's view now that it's over, this means you don't have what it takes to be judge". Now the competent advocate is deferential to the judge and in general won't dream he could replace the judge any more than he would ignore any judge's simple demand for silence. But he will nevertheless recognize that the test the judge proposes is one by which to recognize a competent advocate foremost, and a competent judge only subsidiarily if ever. IOW, deciding given pros and cons isn't the same as listing them. And if courts tend to distribute the role of listing the pros, that of listing the cons, and that of deciding, to three distinct persons or parties, it's not without good reasons, imo. And... I've a "good" enough personal history of driving myself into undecidable dilemmas, thanks. The above is what I was first tempted to reply in short to Guido, but felt it was rather OT, so I settled on a shortcut. 'nough said. [...] > I-always-wanted-to-be-a-language-designer-too-ly y'rs, But-I-never-really-did-ly y'rs, Boris From lists at cheimes.de Wed Nov 14 18:29:52 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 14 Nov 2007 18:29:52 +0100 Subject: [Python-ideas] python3: subtle change to new input() In-Reply-To: <200711141637.56229.mark@qtrac.eu> References: <200711140807.16677.mark@qtrac.eu> <200711141400.39876.mark@qtrac.eu> <200711141637.56229.mark@qtrac.eu> Message-ID: Mark Summerfield wrote: > And the docs are perfectly okay... well, apart from "stripping a > trailing newline". On Unices that's fine but I don't know if Windows > consoles actually send \r\n or whatever, in which case, assuming input() > does the right cross-platform thing, maybe "stripping the trailing line > termination character(s)" would be more accurate. Microsoft's stdio lib is using \n as newline for stdin, stdout and stderr. Does it answer your question? Christian From rhamph at gmail.com Wed Nov 14 18:49:20 2007 From: rhamph at gmail.com (Adam Olsen) Date: Wed, 14 Nov 2007 10:49:20 -0700 Subject: [Python-ideas] cmp and sorting non-symmetric types In-Reply-To: References: Message-ID: (ugh, this was supposed to go to python-ideas, not python-list. No wonder I got no responses to this email!) (I've had trouble getting response for collaboration on a PEP. Perhaps I'm the only interested party?) Although py3k raises an exception for completely unsortable types, it continues to silently do the wrong thing for non-symmetric types that overload comparison operator with special meanings. >>> a = set([1]) >>> b = set([2, 5]) >>> c = set([1, 2]) >>> sorted([a, c, b]) [{1}, {1, 2}, {2, 5}] >>> sorted([a, b, c]) [{1}, {2, 5}, {1, 2}] To solve this I propose a revived cmp (as per the previous thread[1]), which is the preferred path for orderings. The rich comparison operators will be simple wrappers for cmp() (ensuring an exception is raised if they're not merely comparing for equality.) Thus, set would need 7 methods defined (6 rich comparisons plus __cmp__, although it could skip __eq__ and __ne__), whereas nearly all other types (int, list, etc) need only __cmp__. Code which uses <= to compare sets would be assumed to want subset operations. Generic containers should use cmp() exclusively. [1] http://mail.python.org/pipermail/python-3000/2007-October/011072.html -- Adam Olsen, aka Rhamphoryncus From guido at python.org Wed Nov 14 18:54:50 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 14 Nov 2007 09:54:50 -0800 Subject: [Python-ideas] cmp and sorting non-symmetric types In-Reply-To: References: Message-ID: Are you sure you're solving a real problem? On Nov 14, 2007 9:49 AM, Adam Olsen wrote: > (ugh, this was supposed to go to python-ideas, not python-list. No > wonder I got no responses to this email!) > > (I've had trouble getting response for collaboration on a PEP. > Perhaps I'm the only interested party?) > > Although py3k raises an exception for completely unsortable types, it > continues to silently do the wrong thing for non-symmetric types that > overload comparison operator with special meanings. > > >>> a = set([1]) > >>> b = set([2, 5]) > >>> c = set([1, 2]) > >>> sorted([a, c, b]) > [{1}, {1, 2}, {2, 5}] > >>> sorted([a, b, c]) > [{1}, {2, 5}, {1, 2}] > > To solve this I propose a revived cmp (as per the previous thread[1]), > which is the preferred path for orderings. The rich comparison > operators will be simple wrappers for cmp() (ensuring an exception is > raised if they're not merely comparing for equality.) > > Thus, set would need 7 methods defined (6 rich comparisons plus > __cmp__, although it could skip __eq__ and __ne__), whereas nearly all > other types (int, list, etc) need only __cmp__. > > Code which uses <= to compare sets would be assumed to want subset > operations. Generic containers should use cmp() exclusively. > > > [1] http://mail.python.org/pipermail/python-3000/2007-October/011072.html > > -- > Adam Olsen, aka Rhamphoryncus > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From theller at ctypes.org Wed Nov 14 19:29:52 2007 From: theller at ctypes.org (Thomas Heller) Date: Wed, 14 Nov 2007 19:29:52 +0100 Subject: [Python-ideas] Make obj[] valid syntax? Message-ID: I'm not sure if this is a good idea or not, but - hey - this is python.ideas ;-) The following statements currently raise a SyntaxError: obj[] = something x = obj[] I propose to make these statements valid syntax. 'obj[]' should behave like 'obj[()]' does: Call __getitem__ or __setitem__ with an empty tuple. My use case is in a COM library (comtypes). Some COM properties require one or more arguments; this is not a problem since one could write obj.prop[1, 2, 3] Sometimes, however, arguments are optional. Unfortunately one has to write obj.prop[()] to pass an empty tuple to __getitem__ or __setitem__, which looks strange imo. Comments? Thomas From phd at phd.pp.ru Wed Nov 14 19:34:40 2007 From: phd at phd.pp.ru (Oleg Broytmann) Date: Wed, 14 Nov 2007 21:34:40 +0300 Subject: [Python-ideas] Make obj[] valid syntax? In-Reply-To: References: Message-ID: <20071114183440.GC30836@phd.pp.ru> On Wed, Nov 14, 2007 at 07:29:52PM +0100, Thomas Heller wrote: > 'obj[]' should behave like 'obj[()]' does: I remember it was discussed and rejected a year or two ago. Still -1 from me. Explicit [()] is better than implicit []. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From guido at python.org Wed Nov 14 19:35:16 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 14 Nov 2007 10:35:16 -0800 Subject: [Python-ideas] Make obj[] valid syntax? In-Reply-To: References: Message-ID: Why can't you use call syntax, i.e. obj.prop(1, 2, 3)? On Nov 14, 2007 10:29 AM, Thomas Heller wrote: > I'm not sure if this is a good idea or not, but - hey - this > is python.ideas ;-) > > The following statements currently raise a SyntaxError: > > obj[] = something > x = obj[] > > I propose to make these statements valid syntax. > 'obj[]' should behave like 'obj[()]' does: > Call __getitem__ or __setitem__ with an empty tuple. > > My use case is in a COM library (comtypes). > > Some COM properties require one or more arguments; this is > not a problem since one could write > obj.prop[1, 2, 3] > > Sometimes, however, arguments are optional. Unfortunately > one has to write > obj.prop[()] > to pass an empty tuple to __getitem__ or __setitem__, > which looks strange imo. > > Comments? > > Thomas > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From theller at ctypes.org Wed Nov 14 19:37:48 2007 From: theller at ctypes.org (Thomas Heller) Date: Wed, 14 Nov 2007 19:37:48 +0100 Subject: [Python-ideas] Make obj[] valid syntax? In-Reply-To: References: Message-ID: Guido van Rossum schrieb: > Why can't you use call syntax, i.e. obj.prop(1, 2, 3)? Because I cannot set the property in this way: obj.prop(1, 2, 3) = "foo" Of course I know that obj.set_prop(1, 2, 3, "foo") would work. From theller at ctypes.org Wed Nov 14 19:38:34 2007 From: theller at ctypes.org (Thomas Heller) Date: Wed, 14 Nov 2007 19:38:34 +0100 Subject: [Python-ideas] Make obj[] valid syntax? In-Reply-To: <20071114183440.GC30836@phd.pp.ru> References: <20071114183440.GC30836@phd.pp.ru> Message-ID: Oleg Broytmann schrieb: > On Wed, Nov 14, 2007 at 07:29:52PM +0100, Thomas Heller wrote: >> 'obj[]' should behave like 'obj[()]' does: > > I remember it was discussed and rejected a year or two ago. Still -1 > from me. Explicit [()] is better than implicit []. However: obj[(1, 2, 3)] is the same as obj[1, 2, 3] From guido at python.org Wed Nov 14 19:45:06 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 14 Nov 2007 10:45:06 -0800 Subject: [Python-ideas] Make obj[] valid syntax? In-Reply-To: References: Message-ID: On Nov 14, 2007 10:37 AM, Thomas Heller wrote: > Guido van Rossum schrieb: > > Why can't you use call syntax, i.e. obj.prop(1, 2, 3)? > > Because I cannot set the property in this way: > > obj.prop(1, 2, 3) = "foo" > > Of course I know that obj.set_prop(1, 2, 3, "foo") would work. And can't you arrange for obj.prop = "foo" to work as well as obj.prop[1,2,3] = "foo"? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rhamph at gmail.com Wed Nov 14 19:47:56 2007 From: rhamph at gmail.com (Adam Olsen) Date: Wed, 14 Nov 2007 11:47:56 -0700 Subject: [Python-ideas] cmp and sorting non-symmetric types In-Reply-To: References: Message-ID: On Nov 14, 2007 10:54 AM, Guido van Rossum wrote: > Are you sure you're solving a real problem? I see it as part of a problem we've already decided to solve, by making types with no reasonable ordering raise TypeError. > On Nov 14, 2007 9:49 AM, Adam Olsen wrote: > > (ugh, this was supposed to go to python-ideas, not python-list. No > > wonder I got no responses to this email!) > > > > (I've had trouble getting response for collaboration on a PEP. > > Perhaps I'm the only interested party?) > > > > Although py3k raises an exception for completely unsortable types, it > > continues to silently do the wrong thing for non-symmetric types that > > overload comparison operator with special meanings. > > > > >>> a = set([1]) > > >>> b = set([2, 5]) > > >>> c = set([1, 2]) > > >>> sorted([a, c, b]) > > [{1}, {1, 2}, {2, 5}] > > >>> sorted([a, b, c]) > > [{1}, {2, 5}, {1, 2}] > > > > To solve this I propose a revived cmp (as per the previous thread[1]), > > which is the preferred path for orderings. The rich comparison > > operators will be simple wrappers for cmp() (ensuring an exception is > > raised if they're not merely comparing for equality.) > > > > Thus, set would need 7 methods defined (6 rich comparisons plus > > __cmp__, although it could skip __eq__ and __ne__), whereas nearly all > > other types (int, list, etc) need only __cmp__. > > > > Code which uses <= to compare sets would be assumed to want subset > > operations. Generic containers should use cmp() exclusively. > > > > > > [1] http://mail.python.org/pipermail/python-3000/2007-October/011072.html > > > > -- > > Adam Olsen, aka Rhamphoryncus > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas > > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- Adam Olsen, aka Rhamphoryncus From theller at ctypes.org Wed Nov 14 19:59:15 2007 From: theller at ctypes.org (Thomas Heller) Date: Wed, 14 Nov 2007 19:59:15 +0100 Subject: [Python-ideas] Make obj[] valid syntax? In-Reply-To: References: Message-ID: Guido van Rossum schrieb: > On Nov 14, 2007 10:37 AM, Thomas Heller wrote: >> Guido van Rossum schrieb: >> > Why can't you use call syntax, i.e. obj.prop(1, 2, 3)? >> >> Because I cannot set the property in this way: >> >> obj.prop(1, 2, 3) = "foo" >> >> Of course I know that obj.set_prop(1, 2, 3, "foo") would work. > > And can't you arrange for obj.prop = "foo" to work as well as > obj.prop[1,2,3] = "foo"? > Sure, but this requires to use [] for setting and () for getting the property. From phd at phd.pp.ru Wed Nov 14 20:00:57 2007 From: phd at phd.pp.ru (Oleg Broytmann) Date: Wed, 14 Nov 2007 22:00:57 +0300 Subject: [Python-ideas] Make obj[] valid syntax? In-Reply-To: References: <20071114183440.GC30836@phd.pp.ru> Message-ID: <20071114190057.GA32728@phd.pp.ru> On Wed, Nov 14, 2007 at 07:38:34PM +0100, Thomas Heller wrote: > Oleg Broytmann schrieb: > > On Wed, Nov 14, 2007 at 07:29:52PM +0100, Thomas Heller wrote: > >> 'obj[]' should behave like 'obj[()]' does: > > > > I remember it was discussed and rejected a year or two ago. Still -1 > > from me. Explicit [()] is better than implicit []. > > However: obj[(1, 2, 3)] is the same as obj[1, 2, 3] 1, 2, 3 is a tuple, and () is a tuple, should there be a syntax for an empty tuple without parenthesis? Thomas, there were many arguments in the previous discussion. This one was there, too. But finally the proposal was rejected. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From guido at python.org Wed Nov 14 19:54:31 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 14 Nov 2007 10:54:31 -0800 Subject: [Python-ideas] cmp and sorting non-symmetric types In-Reply-To: References: Message-ID: On Nov 14, 2007 10:47 AM, Adam Olsen wrote: > On Nov 14, 2007 10:54 AM, Guido van Rossum wrote: > > Are you sure you're solving a real problem? > > I see it as part of a problem we've already decided to solve, by > making types with no reasonable ordering raise TypeError. I think we're reaching the land of diminishing returns though. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Nov 14 19:55:39 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 14 Nov 2007 10:55:39 -0800 Subject: [Python-ideas] Make obj[] valid syntax? In-Reply-To: References: <20071114183440.GC30836@phd.pp.ru> Message-ID: On Nov 14, 2007 10:38 AM, Thomas Heller wrote: > Oleg Broytmann schrieb: > > On Wed, Nov 14, 2007 at 07:29:52PM +0100, Thomas Heller wrote: > >> 'obj[]' should behave like 'obj[()]' does: > > > > I remember it was discussed and rejected a year or two ago. Still -1 > > from me. Explicit [()] is better than implicit []. > > However: obj[(1, 2, 3)] is the same as obj[1, 2, 3] So what? x = is not equivalent to x = () -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rhamph at gmail.com Wed Nov 14 20:03:19 2007 From: rhamph at gmail.com (Adam Olsen) Date: Wed, 14 Nov 2007 12:03:19 -0700 Subject: [Python-ideas] cmp and sorting non-symmetric types In-Reply-To: References: Message-ID: On Nov 14, 2007 11:54 AM, Guido van Rossum wrote: > On Nov 14, 2007 10:47 AM, Adam Olsen wrote: > > On Nov 14, 2007 10:54 AM, Guido van Rossum wrote: > > > Are you sure you're solving a real problem? > > > > I see it as part of a problem we've already decided to solve, by > > making types with no reasonable ordering raise TypeError. > > I think we're reaching the land of diminishing returns though. Aye. If we don't want to readd __cmp__ for other reasons then it's not worthwhile. If we do readd __cmp__ then it's basically free. So the real question is if there's enough support behind __cmp__.. which I kind of doubt at this point. -- Adam Olsen, aka Rhamphoryncus From guido at python.org Wed Nov 14 20:18:14 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 14 Nov 2007 11:18:14 -0800 Subject: [Python-ideas] cmp and sorting non-symmetric types In-Reply-To: References: Message-ID: On Nov 14, 2007 11:03 AM, Adam Olsen wrote: > On Nov 14, 2007 11:54 AM, Guido van Rossum wrote: > > On Nov 14, 2007 10:47 AM, Adam Olsen wrote: > > > On Nov 14, 2007 10:54 AM, Guido van Rossum wrote: > > > > Are you sure you're solving a real problem? > > > > > > I see it as part of a problem we've already decided to solve, by > > > making types with no reasonable ordering raise TypeError. > > > > I think we're reaching the land of diminishing returns though. > > Aye. If we don't want to readd __cmp__ for other reasons then it's > not worthwhile. If we do readd __cmp__ then it's basically free. That depends -- while __cmp__ may be faster to compare lists or tuples, __lt__ is faster when comparing ints or strings. > So the real question is if there's enough support behind __cmp__.. > which I kind of doubt at this point. If nobody volunteers to help write a PEP at this point, I will have to agree with that conclusion. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From theller at ctypes.org Wed Nov 14 20:22:48 2007 From: theller at ctypes.org (Thomas Heller) Date: Wed, 14 Nov 2007 20:22:48 +0100 Subject: [Python-ideas] Make obj[] valid syntax? In-Reply-To: References: <20071114183440.GC30836@phd.pp.ru> Message-ID: Guido van Rossum schrieb: > So what? > > x = > > is not equivalent to > > x = () I won't argue this with you ;-) >> > Oleg Broytmann schrieb: >>> > > On Wed, Nov 14, 2007 at 07:29:52PM +0100, Thomas Heller wrote: >>>> > >> 'obj[]' should behave like 'obj[()]' does: >>> > > >>> > > I remember it was discussed and rejected a year or two ago. Still -1 >>> > > from me. Explicit [()] is better than implicit []. >> > >> > However: obj[(1, 2, 3)] is the same as obj[1, 2, 3] > > 1, 2, 3 is a tuple, and () is a tuple, should there be a syntax for an > empty tuple without parenthesis? > > Thomas, there were many arguments in the previous discussion. This one > was there, too. But finally the proposal was rejected. I see that my proposal probably won't fly. This encourages me to describe my full wish, just for fun: It would be nice if I could have positional AND keyword arguments for __getitem__ and __setitem__, so that I could write code like this (COM has named parameters also): x = obj.prop[1, 2, lcid=0] x = obj.prop[] obj.prop[1, 2, lcid=0] = "foo" obj.prop[] = "foo" or even x = obj.prop[1, 2, lcid=0] x = obj.prop[] x = obj.prop # same as previous line (now how would THAT work?) obj.prop[1, 2, lcid=0] = "foo" obj.prop[] = "foo" obj.prop = "foo" # same as previous line I retract my proposal. VB-ly, yours Thomas From tjreedy at udel.edu Wed Nov 14 22:43:44 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 14 Nov 2007 16:43:44 -0500 Subject: [Python-ideas] python3: subtle change to new input() References: <200711140807.16677.mark@qtrac.eu> <473B1AFA.7080900@cheimes.de> Message-ID: "Christian Heimes" wrote in message news:473B1AFA.7080900 at cheimes.de... | I think it's more likely a subtle difference between platforms: | | On Linux | >>> r = input() | Traceback (most recent call last): | File "", line 1, in | EOFError | >>> r = input() | [1]+ Stopped ./python | $ fg 1 | ./python | >>> | | On Windows | >>> r = input() | ^D | >>> r | '\x04' | >>> r = input() | ^Z | Traceback (most recent call last): | File "", line 1, in | EOFError 1. Would it be sensibly possible to equalize the behavior? (Your def of 'sensibly'.) a. ^D and ^Z both raise EOF on all systems. b. Only ^D on all systems c. ^D on all systems and ^Z also on Windows. Would it be a good idea? For many current Windows users, Python will be the only contact with an imitation-DOS console window and the need for EOF input, so strict imitation of old, semi-obsolete DOS mode behavior seems not necesarry. tjr From tjreedy at udel.edu Wed Nov 14 23:04:41 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 14 Nov 2007 17:04:41 -0500 Subject: [Python-ideas] Make obj[] valid syntax? References: <20071114183440.GC30836@phd.pp.ru> <20071114190057.GA32728@phd.pp.ru> Message-ID: "Oleg Broytmann" wrote in message news:20071114190057.GA32728 at phd.pp.ru... | 1, 2, 3 is a tuple, and () is a tuple, should there be a syntax for an | empty tuple without parenthesis? The only thing I have thought of is a bare comma, but I like that even less than the () exception and expect most would agree ;-) From steven.bethard at gmail.com Wed Nov 14 23:05:44 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Wed, 14 Nov 2007 15:05:44 -0700 Subject: [Python-ideas] python3: subtle change to new input() In-Reply-To: References: <200711140807.16677.mark@qtrac.eu> <473B1AFA.7080900@cheimes.de> Message-ID: On Nov 14, 2007 2:43 PM, Terry Reedy wrote: > 1. Would it be sensibly possible to equalize the behavior? (Your def of > 'sensibly'.) > a. ^D and ^Z both raise EOF on all systems. > b. Only ^D on all systems > c. ^D on all systems and ^Z also on Windows. > > Would it be a good idea? > > For many current Windows users, Python will be the only contact with an > imitation-DOS console window and the need for EOF input, so strict > imitation of old, semi-obsolete DOS mode behavior seems not necesarry. There's already a single way of spelling this on both systems: quit() $python Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> quit() $ $python Python 2.5.1 (r251:54863, Nov 12 2007, 09:59:19) [GCC 3.4.6 20060404 (Red Hat 3.4.6-8)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> quit() $ STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From guido at python.org Wed Nov 14 23:15:34 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 14 Nov 2007 14:15:34 -0800 Subject: [Python-ideas] python3: subtle change to new input() In-Reply-To: References: <200711140807.16677.mark@qtrac.eu> <473B1AFA.7080900@cheimes.de> Message-ID: On Nov 14, 2007 1:43 PM, Terry Reedy wrote: > For many current Windows users, Python will be the only contact with an > imitation-DOS console window and the need for EOF input, so strict > imitation of old, semi-obsolete DOS mode behavior seems not necesarry. We're not doing any of the imitation. On both Linux and Windows we're getting whatever the OS provides. Note that on Linux you can change the EOF character using the stty command. I wouldn't be surprised if there was a way to change this setting in Windows too. But I'd be opposed to Python messing with it -- while some users may never have seen a DOS box before, others use them all the time, and Python should work out of the box for the latter too. Those afraid of DOS boxes should probably use IDLE anyway. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Thu Nov 15 02:41:33 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 15 Nov 2007 14:41:33 +1300 Subject: [Python-ideas] cmp and sorting non-symmetric types In-Reply-To: References: Message-ID: <473BA3CD.2050305@canterbury.ac.nz> Adam Olsen wrote: > Thus, set would need 7 methods defined (6 rich comparisons plus > __cmp__, although it could skip __eq__ and __ne__) With the 4-valued __cmp__ that I proposed, it would only need __cmp__, I think. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From luke.stebbing at gmail.com Thu Nov 15 03:01:50 2007 From: luke.stebbing at gmail.com (Luke Stebbing) Date: Wed, 14 Nov 2007 18:01:50 -0800 Subject: [Python-ideas] cmp and sorting non-symmetric types In-Reply-To: <473BA3CD.2050305@canterbury.ac.nz> References: <473BA3CD.2050305@canterbury.ac.nz> Message-ID: On 11/14/07, Greg Ewing wrote: > Adam Olsen wrote: > > Thus, set would need 7 methods defined (6 rich comparisons plus > > __cmp__, although it could skip __eq__ and __ne__) > > With the 4-valued __cmp__ that I proposed, it would > only need __cmp__, I think. set only needs 4 values, but other types need more. See PEP 207, Proposed Resolutions, #3: http://www.python.org/dev/peps/pep-0207/ IMO, such things should not use comparison operators, but I think I'm in the minority. Luke From greg.ewing at canterbury.ac.nz Thu Nov 15 03:01:52 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 15 Nov 2007 15:01:52 +1300 Subject: [Python-ideas] python3: subtle change to new input() In-Reply-To: References: <200711140807.16677.mark@qtrac.eu> <473B1AFA.7080900@cheimes.de> Message-ID: <473BA890.4080904@canterbury.ac.nz> Terry Reedy wrote: > strict > imitation of old, semi-obsolete DOS mode behavior ...which I think was already obsolete when it was inherited from CP/M... -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From greg.ewing at canterbury.ac.nz Thu Nov 15 03:21:37 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 15 Nov 2007 15:21:37 +1300 Subject: [Python-ideas] cmp and sorting non-symmetric types In-Reply-To: References: <473BA3CD.2050305@canterbury.ac.nz> Message-ID: <473BAD31.4080205@canterbury.ac.nz> Luke Stebbing wrote: > set only needs 4 values, but other types need more. A type can always override the 6 separate methods if it needs to. I'm not proposing to replace these, only to provide a simpler alternative that covers most use cases. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From greg.ewing at canterbury.ac.nz Thu Nov 15 00:55:23 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 15 Nov 2007 12:55:23 +1300 Subject: [Python-ideas] Raw strings return compiled regexps In-Reply-To: <4734E46E.2050709@cs.byu.edu> References: <4734E46E.2050709@cs.byu.edu> Message-ID: <473B8AEB.8050000@canterbury.ac.nz> Neil Toronto wrote: > What if they just returned regular expression objects? That would force the re module to be part of the core, which would not be a good thing. Also, raw strings are good for more than just regexps. The fact that there are a few things that they're *not* good for doesn't mean they should be restricted to regexps. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From scott+python-ideas at scottdial.com Thu Nov 15 09:14:25 2007 From: scott+python-ideas at scottdial.com (Scott Dial) Date: Thu, 15 Nov 2007 03:14:25 -0500 Subject: [Python-ideas] Required to call superclass __init__ In-Reply-To: References: <473950FC.10202@cs.byu.edu> <20071113091246.GC15166@phd.pp.ru> <4739CC3B.1090205@cs.byu.edu> <473AD00D.2070502@cs.byu.edu> Message-ID: <473BFFE1.2040804@scottdial.com> Jim Jewett wrote: > I don't yet > see what those simplifications should actually be, but maybe someone > else will if you publish and wait long enough. > The first thing I noticed was that the naming scheme is confusing. Between required_super and super_required, neither of them indicate to me which is the function decorator and which is the base class. Furthermore, I don't see why required_super (the base class) needs a distinct name. Perhaps I am being a bit to clever, but couldn't we just overload the __new__ method of the base class. def _super_required(func): ... class super_required(object): ... def __new__(cls, *func): if len(func) > 0: return _super_required(*func) return object.__new__(cls) Leaving your example now being spelled as: class A(super_required): @super_required def __init__(self): pass I can't think of a case that the the base class would ever be passed arguments, so this seems ok and rids us of the naming oddities. -Scott -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From bborcic at gmail.com Thu Nov 15 11:48:13 2007 From: bborcic at gmail.com (Boris Borcic) Date: Thu, 15 Nov 2007 11:48:13 +0100 Subject: [Python-ideas] x @f as shorthand for x=f(x) In-Reply-To: References: Message-ID: Terry Reedy wrote: > Making the obvious generalization to n params, and specializing to one, > gives > > x f= Been there, saw that :) But hadn't seen 'target decorators', eg @f x Possibly shorthandable as x @f Cheers, BB From luke.stebbing at gmail.com Thu Nov 15 12:18:20 2007 From: luke.stebbing at gmail.com (Luke Stebbing) Date: Thu, 15 Nov 2007 03:18:20 -0800 Subject: [Python-ideas] x @f as shorthand for x=f(x) In-Reply-To: References: Message-ID: On 11/15/07, Boris Borcic wrote: > Possibly shorthandable as > > x @f Hey, I can actually read that one. It sounds like "x, apply f". Luke From ntoronto at cs.byu.edu Sat Nov 17 20:27:47 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Sat, 17 Nov 2007 12:27:47 -0700 Subject: [Python-ideas] Optional extra globals dict for function objects Message-ID: <473F40B3.9030804@cs.byu.edu> I set out trying to redo the 3.0 autosuper metaclass in 2.5 without bytecode hacking and ran into a problem: a function's func_globals isn't polymorphic. That is, the interpreter uses PyDict_* calls to access it, and in one case (LOAD_GLOBAL), actually inlines PyDict_GetItem manually. If it weren't for this, I could have easily done 3.0 super without bytecode hacking, by making a custom dict that allows another dict to shadow it, and putting the new super object in the shadowing dict. I know it's for performance, and that if func_globals were made polymorphic, it'd bring the pystone benchmark to its knees, begging for a quick and merciful death. That's not what I'm proposing. I propose adding a read-only attribute func_extra_globals to the function object, default NULL. In the interpreter loop, global lookups try func_extra_globals first if it's not NULL. It's accessed using PyObject_* functions. Here are the reasons I think this is a good idea: - It should have near zero impact on performance in the general case because NULL checks are quick. There would be another attribute in the frame object (f_extra_globals), almost always NULL. - Language enhancement prototypes that currently use bytecode hacking could be accomplished with a method wrapper and a func_extra_globals dict. The prototypes could be pure Python, and thus more general, less brittle, and easier to get right. Hacking closures is nasty business. - I'm sure lots of other stuff that I can't think of, where it'd be nice to dynamically add information to a method or function that can be accessed as a variable. Pure-Python function preambles whose results can be seen by the original function would be pretty sweet. - Because func_extra_globals would be read-only and default NULL, it'd almost always be obvious when it's getting messed with. A wrapper/decorator or a metaclass, and a call to types.FunctionType() would signal that. - func_globals would almost never have to be overridden: for most purposes (besides security), shadowing it is actually better, as it leaves the function's module fully accessible. Anybody else think it's awesome? :) How about opinions of major suckage? If it helps acceptance, I'd be willing to make a patch for this. It looks pretty straightforward. Neil From brett at python.org Sat Nov 17 21:46:39 2007 From: brett at python.org (Brett Cannon) Date: Sat, 17 Nov 2007 12:46:39 -0800 Subject: [Python-ideas] Optional extra globals dict for function objects In-Reply-To: <473F40B3.9030804@cs.byu.edu> References: <473F40B3.9030804@cs.byu.edu> Message-ID: On Nov 17, 2007 11:27 AM, Neil Toronto wrote: > I set out trying to redo the 3.0 autosuper metaclass in 2.5 without > bytecode hacking and ran into a problem: a function's func_globals isn't > polymorphic. That is, the interpreter uses PyDict_* calls to access it, > and in one case (LOAD_GLOBAL), actually inlines PyDict_GetItem manually. > If it weren't for this, I could have easily done 3.0 super without > bytecode hacking, by making a custom dict that allows another dict to > shadow it, and putting the new super object in the shadowing dict. > > I know it's for performance, and that if func_globals were made > polymorphic, it'd bring the pystone benchmark to its knees, begging for > a quick and merciful death. That's not what I'm proposing. > > I propose adding a read-only attribute func_extra_globals to the > function object, default NULL. In the interpreter loop, global lookups > try func_extra_globals first if it's not NULL. It's accessed using > PyObject_* functions. > My initial response is "eww". I say this as I don't want to complicate the scoping rules anymore than they are. This adds yet another place to check for things. While it might not be a nasty performance hit (although you neglect to say what happens if something is not found in func_extra_globals; do you check func_globals as well? That will be a penalty hit), it does complicate semantics slightly. > Here are the reasons I think this is a good idea: > > - It should have near zero impact on performance in the general case > because NULL checks are quick. There would be another attribute in the > frame object (f_extra_globals), almost always NULL. > That is only true if you skip a func_globals check if the func_extra_globals check doesn't happen. > - Language enhancement prototypes that currently use bytecode hacking > could be accomplished with a method wrapper and a func_extra_globals > dict. The prototypes could be pure Python, and thus more general, less > brittle, and easier to get right. Hacking closures is nasty business. Which are what? the auto-super example is not exactly common. > > - I'm sure lots of other stuff that I can't think of, where it'd be nice > to dynamically add information to a method or function that can be > accessed as a variable. Pure-Python function preambles whose results can > be seen by the original function would be pretty sweet. Basing an idea on unknown potential is not a good reason to add something to the language. I don't think the Air Force needs to protect against flying pigs just because there is the possibility someone might genetically engineer some to carry nuclear bombs. =) > > - Because func_extra_globals would be read-only and default NULL, it'd > almost always be obvious when it's getting messed with. A > wrapper/decorator or a metaclass, and a call to types.FunctionType() > would signal that. Read-only? Then how are you supposed to set this? Do you want to introduce something like __build_class__ for functions and methods? Requiring the use of Types.FunctionType() will be a pain and dilute the usefulness. > > - func_globals would almost never have to be overridden: for most > purposes (besides security), shadowing it is actually better, as it > leaves the function's module fully accessible. > If that's the case why worry about func_extra_globals? =) It solves %95 of the uses you might have (and I suspect 94% of the uses are "I don't need to muck with func_globals"). > Anybody else think it's awesome? :) How about opinions of major suckage? > I'm -1 on the idea personally. > If it helps acceptance, I'd be willing to make a patch for this. It > looks pretty straightforward. It always helps acceptance, it's just a question of whether it will push it over the edge into actually being accepted. -Brett From ntoronto at cs.byu.edu Mon Nov 19 20:00:04 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Mon, 19 Nov 2007 12:00:04 -0700 Subject: [Python-ideas] Explicit self argument, implicit super argument Message-ID: <4741DD34.1070106@cs.byu.edu> (Disclaimer: I have no issue with "self." and "super." attribute access, which is what most people think of when they think "implicit self".) While showing a coworker a bytecode hack I made this weekend - it allows insertion of arbitrary function parameters into an already-existing function - he asked for a use case. I gave him this: class A(object): # ... def method(x, y): self.x = x super.method(y) where 'method' is replaced by this method wrapper via metaclass or decorator: def method_wrapper(self, *args, **kwargs): return hacked_method(self, super(cls, self), *args, **kwargs) These hackish details aren't important, the resulting "A.method" is. It occurred to me that explicit self and implicit super is semantically inconsistent. Here's Python 3000's version of the above (please compare): class A(object): def method(self, x, y): self.x = x super.method(y) Why have a magic "super" local but not a magic "self" local? From a *general usage* standpoint, the only reason I can think of (which is not necessarily the only one, which is why I'm asking) is that a person might want to change the name of "self", like so: class AddLike(object): # ... def __add__(a, b): # return something def __radd__(b, a): # return something But reverse binary special methods are the only case where it's not extremely bad form. Okay, two reasons for explicit self: backward compatibility, but 2to3 would make it a non-issue. From an *implementation standpoint*, making self implicit - a cell variable like super, for example - would wreak havoc with the current bound/unbound method distinction, but I'm not so sure that's a bad thing. Neil From guido at python.org Mon Nov 19 20:11:21 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 19 Nov 2007 11:11:21 -0800 Subject: [Python-ideas] Explicit self argument, implicit super argument In-Reply-To: <4741DD34.1070106@cs.byu.edu> References: <4741DD34.1070106@cs.byu.edu> Message-ID: The reason for explicit self in method definition signatures is semantic consistency. If you write class C: def foo(self, x, y): ... This really *is* the same as writing class C: pass def foo(self, x, y): ... C.foo = foo And of course it works the other way as well: you really *can* invoke foo with an explicit argument for self as follows: class D(C): ... C.foo(D(), 1, 2) IOW it's not an implementation hack -- it is a semantic device. --Guido On Nov 19, 2007 11:00 AM, Neil Toronto wrote: > (Disclaimer: I have no issue with "self." and "super." attribute access, > which is what most people think of when they think "implicit self".) > > While showing a coworker a bytecode hack I made this weekend - it allows > insertion of arbitrary function parameters into an already-existing > function - he asked for a use case. I gave him this: > > class A(object): > # ... > def method(x, y): > self.x = x > super.method(y) > > > where 'method' is replaced by this method wrapper via metaclass or > decorator: > > def method_wrapper(self, *args, **kwargs): > return hacked_method(self, super(cls, self), *args, **kwargs) > > > These hackish details aren't important, the resulting "A.method" is. > > It occurred to me that explicit self and implicit super is semantically > inconsistent. Here's Python 3000's version of the above (please compare): > > class A(object): > def method(self, x, y): > self.x = x > super.method(y) > > > Why have a magic "super" local but not a magic "self" local? From a > *general usage* standpoint, the only reason I can think of (which is not > necessarily the only one, which is why I'm asking) is that a person > might want to change the name of "self", like so: > > class AddLike(object): > # ... > def __add__(a, b): > # return something > def __radd__(b, a): > # return something > > > But reverse binary special methods are the only case where it's not > extremely bad form. Okay, two reasons for explicit self: backward > compatibility, but 2to3 would make it a non-issue. > > From an *implementation standpoint*, making self implicit - a cell > variable like super, for example - would wreak havoc with the current > bound/unbound method distinction, but I'm not so sure that's a bad thing. > > Neil > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From luke.stebbing at gmail.com Mon Nov 19 21:20:28 2007 From: luke.stebbing at gmail.com (Luke Stebbing) Date: Mon, 19 Nov 2007 12:20:28 -0800 Subject: [Python-ideas] Explicit self argument, implicit super argument In-Reply-To: References: <4741DD34.1070106@cs.byu.edu> Message-ID: On 11/19/07, Guido van Rossum wrote: > The reason for explicit self in method definition signatures is > semantic consistency. If you write > > class C: > def foo(self, x, y): ... > > This really *is* the same as writing > > class C: > pass > > def foo(self, x, y): ... > C.foo = foo What about an instancemethod decorator? @instancemethod(C) def foo(x, y): ... > And of course it works the other way as well: you really *can* invoke > foo with an explicit argument for self as follows: > > class D(C): > ... > > C.foo(D(), 1, 2) Couldn't __builtin__.__super__ be used? It would look pretty weird if you invoked a method higher up the MRO, though. I find that these cases come up rarely in my code, while I forget the 'self' argument much more frequently, but YMMV. Luke From ntoronto at cs.byu.edu Mon Nov 19 21:42:16 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Mon, 19 Nov 2007 13:42:16 -0700 Subject: [Python-ideas] Explicit self argument, implicit super argument In-Reply-To: References: <4741DD34.1070106@cs.byu.edu> Message-ID: <4741F528.6000400@cs.byu.edu> Guido van Rossum wrote: > The reason for explicit self in method definition signatures is > semantic consistency. If you write > > class C: > def foo(self, x, y): ... > > This really *is* the same as writing > > class C: > pass > > def foo(self, x, y): ... > C.foo = foo > > And of course it works the other way as well: you really *can* invoke > foo with an explicit argument for self as follows: > > class D(C): > ... > > C.foo(D(), 1, 2) > > IOW it's not an implementation hack -- it is a semantic device. Ah, thanks, that helps. (I'll be able to sleep tonight. :D) This semantic device, of course, would really suck if applied to "super": d = D() C.foo(d, super(C, d), 1, 2) # strange and hideous which is a great reason that the new "super" is implicit. (Before I continue, please understand that I'm not arguing for a language change. Responses to my last two ideas have shown me that I need to thoroughly understand why things are as they are right now while considering a change, and long before advocating one. It also goes over better with the language designers.) Now, correct me if I'm wrong, but it seems there are only two use cases for DistantParentOfD.method(D_instance, ...): 1. The Good Case: you know the "next-method" as determined by the MRO isn't the right one to call. Multiple inheritance can twist you into this sort of behavior, though if it does, your design likely needs reconsideration. 2. The Evil Case: you know the override method as defined by D isn't the one you want for your extra-special D instance. This should be possible but never encouraged. Because the runtime enforces isinstance(D_instance, D), everything else can be handled with D_instance.method(...) or self.method() or super.method(). We know that #1 and #2 above are the uncommon cases, which is why the new "super", which covers the common ones, doesn't cover those. Is it right to say that the explicit "self" parameter only exists to enable those two uncommon cases? Of course, if self were implicit, there would still need to be a way to spell DistantParentOfD.method(D_instance, ...). Being the uncommon case, maybe it shouldn't have a nice spelling: as_parent(C, D_instance).method(...) Trying-to-understandingly-yours, Neil From arno at marooned.org.uk Mon Nov 19 22:54:50 2007 From: arno at marooned.org.uk (Arnaud Delobelle) Date: Mon, 19 Nov 2007 21:54:50 +0000 Subject: [Python-ideas] Explicit self argument, implicit super argument In-Reply-To: <4741F528.6000400@cs.byu.edu> References: <4741DD34.1070106@cs.byu.edu> <4741F528.6000400@cs.byu.edu> Message-ID: On 19 Nov 2007, at 20:42, Neil Toronto wrote: > [...] > Now, correct me if I'm wrong, but it seems there are only two use > cases > for DistantParentOfD.method(D_instance, ...): > > 1. The Good Case: you know the "next-method" as determined by the MRO > isn't the right one to call. Multiple inheritance can twist you into > this sort of behavior, though if it does, your design likely needs > reconsideration. > > 2. The Evil Case: you know the override method as defined by D isn't > the > one you want for your extra-special D instance. This should be > possible > but never encouraged. > > Because the runtime enforces isinstance(D_instance, D), everything > else > can be handled with D_instance.method(...) or self.method() or > super.method(). We know that #1 and #2 above are the uncommon cases, > which is why the new "super", which covers the common ones, doesn't > cover those. > > Is it right to say that the explicit "self" parameter only exists to > enable those two uncommon cases? Self being explicit makes it less selfish :) To illustrate, I like that you can do: class Foo(str): def mybar(self): class Bar(str): def madeby(me): return "I am %s and I was made by %s" % (me, self) return Bar >>> foo=Foo("foo") >>> bar=foo.mybar() >>> Bar=foo.mybar() >>> bar=Bar("bar") >>> print bar.madeby() I am bar and I was made by foo This depends on 'self' being explicit and is not related to super. I didn't know about implicit super, it's probably great but my initial reaction is that I don't like it :( Why not: class Foo: @with_super def bar(super, self, x, y): super.bar(x, y) ... -- Arnaud From ntoronto at cs.byu.edu Mon Nov 19 23:03:56 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Mon, 19 Nov 2007 15:03:56 -0700 Subject: [Python-ideas] Explicit self argument, implicit super argument In-Reply-To: References: <4741DD34.1070106@cs.byu.edu> <4741F528.6000400@cs.byu.edu> Message-ID: <4742084C.6070008@cs.byu.edu> Arnaud Delobelle wrote: > Self being explicit makes it less selfish :) > To illustrate, I like that you can do: > > class Foo(str): > def mybar(self): > class Bar(str): > def madeby(me): > return "I am %s and I was made by %s" % (me, self) > return Bar > > >>> foo=Foo("foo") > >>> #bar=foo.mybar() # typo > >>> Bar=foo.mybar() > >>> bar=Bar("bar") > >>> print bar.madeby() > I am bar and I was made by foo Ah, I see. If self were passed implicitly, you would need to make a Bar.__init__ that received and stored the outer self. I think I'd call this a third uncommon case. Outside functional idioms, common is usually flat. > This depends on 'self' being explicit and is not related to super. > I didn't know about implicit super, it's probably great but my initial > reaction is that I don't like it :( > > Why not: > > class Foo: > @with_super > def bar(super, self, x, y): > super.bar(x, y) > ... Probably because it's way too common to require a decorator for it. Users would have to make "always use @with_super" into a coding habit. (Sort of like "self" actually.) It'd also be yet another thing to keep in mind while reading code: did this method use @with_super or not? Neil From greg.ewing at canterbury.ac.nz Tue Nov 20 01:33:37 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 20 Nov 2007 13:33:37 +1300 Subject: [Python-ideas] Explicit self argument, implicit super argument In-Reply-To: <4741F528.6000400@cs.byu.edu> References: <4741DD34.1070106@cs.byu.edu> <4741F528.6000400@cs.byu.edu> Message-ID: <47422B61.50703@canterbury.ac.nz> Neil Toronto wrote: > Because the runtime enforces isinstance(D_instance, D), everything else > can be handled with D_instance.method(...) or self.method() or > super.method(). But super() is not a general replacement for explicit inherited method calls. It's only appropriate in special, quite restricted circumstances. -- Greg From luke.stebbing at gmail.com Tue Nov 20 02:27:30 2007 From: luke.stebbing at gmail.com (Luke Stebbing) Date: Mon, 19 Nov 2007 17:27:30 -0800 Subject: [Python-ideas] Explicit self argument, implicit super argument In-Reply-To: References: <4741DD34.1070106@cs.byu.edu> <4741F528.6000400@cs.byu.edu> Message-ID: On 11/19/07, Arnaud Delobelle wrote: > Self being explicit makes it less selfish :) > To illustrate, I like that you can do: > > class Foo(str): > def mybar(self): > class Bar(str): > def madeby(me): > return "I am %s and I was made by %s" % (me, self) > return Bar > How about: class Foo(str): def mybar(): outer = self class Bar(str): def madeby(): return "I am %s and I was made by %s" % (self, outer) return Bar Luke From greg.ewing at canterbury.ac.nz Tue Nov 20 02:23:40 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 20 Nov 2007 14:23:40 +1300 Subject: [Python-ideas] Explicit self argument, implicit super argument In-Reply-To: <4741DD34.1070106@cs.byu.edu> References: <4741DD34.1070106@cs.byu.edu> Message-ID: <4742371C.1070706@canterbury.ac.nz> Neil Toronto wrote: > class A(object): > def method(self, x, y): > self.x = x > super.method(y) Is that really how it's going to be? What if self isn't called 'self'? I would rather see super.method(self, y) > From an *implementation standpoint*, making self implicit - a cell > variable like super, for example - would wreak havoc with the current > bound/unbound method distinction, but I'm not so sure that's a bad thing. What happens to explicit inherited method calls? If they become impossible or awkward, it's very definitely a bad thing. -- Greg From luke.stebbing at gmail.com Tue Nov 20 02:50:08 2007 From: luke.stebbing at gmail.com (Luke Stebbing) Date: Mon, 19 Nov 2007 17:50:08 -0800 Subject: [Python-ideas] Explicit self argument, implicit super argument In-Reply-To: <4742371C.1070706@canterbury.ac.nz> References: <4741DD34.1070106@cs.byu.edu> <4742371C.1070706@canterbury.ac.nz> Message-ID: On 11/19/07, Greg Ewing wrote: > Neil Toronto wrote: > > > class A(object): > > def method(self, x, y): > > self.x = x > > super.method(y) > > Is that really how it's going to be? What if self isn't > called 'self'? > > I would rather see > > super.method(self, y) PEP 3135 specifies that the first argument of the method is used, regardless of name: http://www.python.org/dev/peps/pep-3135/#specification Luke From ntoronto at cs.byu.edu Tue Nov 20 03:37:09 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Mon, 19 Nov 2007 19:37:09 -0700 Subject: [Python-ideas] Explicit self argument, implicit super argument In-Reply-To: References: <4741DD34.1070106@cs.byu.edu> <4741F528.6000400@cs.byu.edu> Message-ID: <47424855.7020904@cs.byu.edu> Luke Stebbing wrote: > On 11/19/07, Arnaud Delobelle wrote: >> Self being explicit makes it less selfish :) >> To illustrate, I like that you can do: >> >> class Foo(str): >> def mybar(self): >> class Bar(str): >> def madeby(me): >> return "I am %s and I was made by %s" % (me, self) >> return Bar >> > > How about: > > class Foo(str): > def mybar(): > outer = self > class Bar(str): > def madeby(): > return "I am %s and I was made by %s" % (self, outer) > return Bar Good point. I actually like this better, since it forces the outer scope self to have a different name, removing a source of confusion. Back down to two uncommon use cases so far, then. Neil From arno at marooned.org.uk Tue Nov 20 08:05:51 2007 From: arno at marooned.org.uk (Arnaud Delobelle) Date: Tue, 20 Nov 2007 07:05:51 -0000 (GMT) Subject: [Python-ideas] Explicit self argument, implicit super argument In-Reply-To: References: <4741DD34.1070106@cs.byu.edu> <4741F528.6000400@cs.byu.edu> Message-ID: <51974.82.46.172.40.1195542351.squirrel@www.marooned.org.uk> On Tue, November 20, 2007 1:27 am, Luke Stebbing wrote: > On 11/19/07, Arnaud Delobelle wrote: >> Self being explicit makes it less selfish :) >> To illustrate, I like that you can do: >> >> class Foo(str): >> def mybar(self): >> class Bar(str): >> def madeby(me): >> return "I am %s and I was made by %s" % (me, self) >> return Bar >> > > How about: > > class Foo(str): > def mybar(): > outer = self > class Bar(str): > def madeby(): > return "I am %s and I was made by %s" % (self, outer) > return Bar > > > Luke > I suppose, though it's a waste of a cell IMHO ;) -- Arnaud From ntoronto at cs.byu.edu Tue Nov 20 09:50:47 2007 From: ntoronto at cs.byu.edu (ntoronto at cs.byu.edu) Date: Tue, 20 Nov 2007 01:50:47 -0700 (MST) Subject: [Python-ideas] Explicit self argument, implicit super argument In-Reply-To: <47422B61.50703@canterbury.ac.nz> References: <4741DD34.1070106@cs.byu.edu> <4741F528.6000400@cs.byu.edu> <47422B61.50703@canterbury.ac.nz> Message-ID: <33030.10.7.75.26.1195548647.squirrel@mail.cs.byu.edu> > Neil Toronto wrote: >> Because the runtime enforces isinstance(D_instance, D), everything else >> can be handled with D_instance.method(...) or self.method() or >> super.method(). > > But super() is not a general replacement for explicit inherited > method calls. It's only appropriate in special, quite restricted > circumstances. Exactly. There are two common method-calling cases, and an uncommon one. In order of expected number of occurrences, with #3 being quite low: 1. self.method(...) 2. super.method(...) 3. DistantParent.method(self, ...) (either to get out of the MRO or because you're feeling evil - two use cases for it) If self were only implicitly available, #3 would need a new spelling, as you say. That's not hard to do, and I've already suggested as_parent(DistantParent, self).method(...) as an alternate spelling for the uncommon cases. That's not to say I'm advocating such a thing for Python 3.0 - just showing that it's possible to cover the current known use cases. Actually, I suspect there aren't any more use cases, as all correct ways of calling the method (those that don't raise an exception) are covered, and implicit self would still be as accessible from anywhere as explicit self is. Would saving six keystrokes per method, reducing noise in every method header, and removing the need for a habit (always including self in the parameter list) be enough to justify a change? I'm going to guess either "no" or "not right now". If I were doing it from scratch, I'd make self and super into keywords, and change method binding to return a function with them included in the locals somehow. Neil From jimjjewett at gmail.com Tue Nov 20 16:15:56 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 20 Nov 2007 10:15:56 -0500 Subject: [Python-ideas] Explicit self argument, implicit super argument In-Reply-To: <33030.10.7.75.26.1195548647.squirrel@mail.cs.byu.edu> References: <4741DD34.1070106@cs.byu.edu> <4741F528.6000400@cs.byu.edu> <47422B61.50703@canterbury.ac.nz> <33030.10.7.75.26.1195548647.squirrel@mail.cs.byu.edu> Message-ID: On 11/20/07, ntoronto at cs.byu.edu wrote: > > Neil Toronto wrote: > >> Because the runtime enforces isinstance(D_instance, D), everything else > >> can be handled with D_instance.method(...) or self.method() or > >> super.method(). > > But super() is not a general replacement for explicit inherited > > method calls. It's only appropriate in special, quite restricted > > circumstances. I would say it it almost always appropriate. The times it fails are when (1) You want to change the name of the method. Fair enough -- but you can usually forward to self.othername (2) You want to change the arguments of the method. Changing the signature is generally a bad idea, though it is tolerable for constructors. (3) You're explicitly managing the order of super-calls (==> fragile, and the inheritance is already a problem) (4) Backwards compatibility with some other class that uses explicit class names instead of super. Number 4 is pretty common still, but it is just a backwards compatibility hack that makes code more fragile. > Would saving six keystrokes per method, reducing noise in every method > header, and removing the need for a habit (always including self in the > parameter list) be enough to justify a change? I'm going to guess either > "no" or "not right now". If I were doing it from scratch, I'd make self > and super into keywords, and change method binding to return a function > with them included in the locals somehow. Agreed. The fact that method parameter lists look different at the definition and call sites is an annoying wart, but ... too late to change. -jJ From jimjjewett at gmail.com Tue Nov 20 20:30:06 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 20 Nov 2007 14:30:06 -0500 Subject: [Python-ideas] Optional extra globals dict for function objects In-Reply-To: <473F40B3.9030804@cs.byu.edu> References: <473F40B3.9030804@cs.byu.edu> Message-ID: On 11/17/07, Neil Toronto wrote: > I set out trying to redo the 3.0 autosuper metaclass > in 2.5 without bytecode hacking and ran into a problem: > a function's func_globals isn't polymorphic. > That is, the interpreter uses PyDict_* calls to access it, > and in one case (LOAD_GLOBAL), actually inlines > PyDict_GetItem manually. (1) Is this just one of the "this must be a real dict, not just any mapping" limits, or is there something else I'm missing? (2) Isn't the func_globals already (a read-only reference to) the module's __dict__? So is this really about changing the promise of the module type, instead of just about func_globals? Note that weakening the module.__dict__ promise to only meeting the dict API would make it easier to implement the various speed-up-globals suggestions. And to be honest, I think that assuming a UserDict.DictMixin wouldn't be that bad. How often is a module's dict used for anything time-critical except get (and maybe set, delete, iterate)? > If it weren't for this, I could have easily done 3.0 super > without bytecode hacking, by making a custom dict that > allows another dict to shadow it, and putting the new > super object in the shadowing dict. ... > I propose adding a read-only attribute func_extra_globals > to the function object, default NULL. In the interpreter loop, > global lookups try func_extra_globals first if it's not NULL. Would this really be a global dict though, or just a closure inserted between the func and the normal globals? Is the real problem that you can't change which variables are in a closure (rather than fully global) after the function is compiled? -jJ From ntoronto at cs.byu.edu Tue Nov 20 21:44:51 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Tue, 20 Nov 2007 13:44:51 -0700 Subject: [Python-ideas] Optional extra globals dict for function objects In-Reply-To: References: <473F40B3.9030804@cs.byu.edu> Message-ID: <47434743.4010305@cs.byu.edu> Jim Jewett wrote: > On 11/17/07, Neil Toronto wrote: >> I set out trying to redo the 3.0 autosuper metaclass >> in 2.5 without bytecode hacking and ran into a problem: >> a function's func_globals isn't polymorphic. >> That is, the interpreter uses PyDict_* calls to access it, >> and in one case (LOAD_GLOBAL), actually inlines >> PyDict_GetItem manually. > > (1) Is this just one of the "this must be a real dict, not just any > mapping" limits, or is there something else I'm missing? That's all it is, yes. > (2) Isn't the func_globals already (a read-only reference to) the > module's __dict__? So is this really about changing the promise of > the module type, instead of just about func_globals? My original question was about extending (with an optional dictionary) the behavior of a function with regard to its func_globals. Because of speed concerns, I didn't suggest weakening the type constraint to allow just anything that meets the dict API. > Note that weakening the module.__dict__ promise to only meeting the > dict API would make it easier to implement the various > speed-up-globals suggestions. By "implement" do you mean proof-of-concept, final, or both? At least for proof-of-concept, I totally agree. And thanks for the use case (which sort of applies to my original flawed idea), my lack of which Brett has raked me over the coals for. :) (But it didn't hurt much!) > And to be honest, I think that assuming > a UserDict.DictMixin wouldn't be that bad. How often is a module's > dict used for anything time-critical except get (and maybe set, > delete, iterate)? I doubt that delete and iterate are common enough that they'd have to be regarded as time-critical. Maybe set - maybe. It hardly happens (especially compared to get), and when it does, it's almost never in a time-critical inner loop. DictMixin is currently pure Python. That's a speed concern that wouldn't be *too* hard to address, I suppose. >> I propose adding a read-only attribute func_extra_globals >> to the function object, default NULL. In the interpreter loop, >> global lookups try func_extra_globals first if it's not NULL. > > Would this really be a global dict though, or just a closure inserted > between the func and the normal globals? Basically a customizable closure, yeah. > Is the real problem that you can't change which variables are in a > closure (rather than fully global) after the function is compiled? Really, that's it. That's why I made the silly bytecode hack to insert function parameters, which actually works better than augmenting a function's globals with a polymorphic dict. Assuming func_globals is a DictMixin is intriguing, though. Neil From greg.ewing at canterbury.ac.nz Wed Nov 21 00:48:36 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 21 Nov 2007 12:48:36 +1300 Subject: [Python-ideas] Explicit self argument, implicit super argument In-Reply-To: <33030.10.7.75.26.1195548647.squirrel@mail.cs.byu.edu> References: <4741DD34.1070106@cs.byu.edu> <4741F528.6000400@cs.byu.edu> <47422B61.50703@canterbury.ac.nz> <33030.10.7.75.26.1195548647.squirrel@mail.cs.byu.edu> Message-ID: <47437254.3070505@canterbury.ac.nz> ntoronto at cs.byu.edu wrote: > There are two common method-calling cases, and an uncommon one. > In order of expected number of occurrences, with #3 being quite low: > > 1. self.method(...) > > 2. super.method(...) > > 3. DistantParent.method(self, ...) You're still missing an important case. I would rank them as 1. self.method(...) 2. DirectParent.method(self, ...) 3. super.method(...) 4. DistantParent.method(self, ...) Anything that made number 2 difficult would be unacceptable. -- Greg From greg.ewing at canterbury.ac.nz Wed Nov 21 01:43:20 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 21 Nov 2007 13:43:20 +1300 Subject: [Python-ideas] Explicit self argument, implicit super argument In-Reply-To: References: <4741DD34.1070106@cs.byu.edu> <4741F528.6000400@cs.byu.edu> <47422B61.50703@canterbury.ac.nz> <33030.10.7.75.26.1195548647.squirrel@mail.cs.byu.edu> Message-ID: <47437F28.2040405@canterbury.ac.nz> Jim Jewett wrote: > I would say it it almost always appropriate. The times it fails are when (5) Someone multiply-inherits from your class, and you end up calling one of their methods instead of yours, when neither your method or their method is expecting this to happen. Plus various other problems. There's a good discussion of the issues here: http://fuhm.net/super-harmful/ -- Greg From aligrudi at gmail.com Wed Nov 21 05:54:46 2007 From: aligrudi at gmail.com (Ali Gholami Rudi) Date: Wed, 21 Nov 2007 08:24:46 +0330 Subject: [Python-ideas] Explicit self argument, implicit super argument In-Reply-To: <4741DD34.1070106@cs.byu.edu> References: <4741DD34.1070106@cs.byu.edu> Message-ID: <20071121045446.GA2695@oojibishe> On Mon, Nov 19, 2007 at 12:00:04PM -0700, Neil Toronto wrote: > (Disclaimer: I have no issue with "self." and "super." attribute access, > which is what most people think of when they think "implicit self".) I don't feel easy about the new super either (maybe from a different perspective than Neil's). Why should self be passed to methods using a parameter but super should use magic (something like a global name that holds different objects in different places). Instead of making self implicit, I'd like super to use less magic. I much preferred super(self).foo(*args). Some magic for finding the surrounding class might be needed but at least we don't use the first parameter of a method implicitly. (I don't see this in the alternative proposals sections of :PEP:`3135`). It can be made backward compatible, too. I have not read new super discussions; So sorry if it has been already discussed. -- Ali -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: From ntoronto at cs.byu.edu Thu Nov 22 16:40:49 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Thu, 22 Nov 2007 08:40:49 -0700 Subject: [Python-ideas] Fast global cacheless lookup Message-ID: <4745A301.5090201@cs.byu.edu> I have a hack coded up against r59068 in which LOAD_GLOBAL is even faster than LOAD_FAST. It'll be the same with STORE_GLOBAL and the *_NAME opcodes after I'm done with it, and it should be fully transparent to Python code. (That is, you can go ahead and swap out __builtins__ and crazy junk like that and everything should work as it did before.) Regression tests all pass, except test_gc on functions - I've got a refcount bug somewhere. Here's the microbenchmark I've been using to test LOAD_GLOBAL and LOAD_FAST: import timeit import dis def test_local_get(): x = 0 x; x; x; #... and 397 more of them if __name__ == '__main__': print dis.dis(test_local_get.func_code) print timeit.Timer('test_local_get()', 'from locals_test import test_local_get').timeit() The globals test puts 'x' in module scope, and the builtins test changes 'x' to 'len' and doesn't assign it to 0. Output right now: r59068 locals: 15.57 sec myhack locals: 15.61 sec (increase is probably insignificant or random) r59068 globals: 23.61 sec myhack globals: 15.14 sec (!) r59068 builtins: 28.08 sec myhack builtins: 15.26 sec (!!) Of course, it's no good if it slows everything else way the heck down. So 10 rounds of pybench says: r59068: mean 8.92, std 0.05 myhack: mean 8.99, std 0.04 From what I see in pybench, globals access is severely underrepresented compared to real programs, so those numbers aren't representative of the possible difference in real-life performance. Jim Jewett gave me the idea here: http://mail.python.org/pipermail/python-ideas/2007-November/001207.html "Note that weakening the module.__dict__ promise to only meeting the dict API would make it easier to implement the various speed-up-globals suggestions." I didn't exactly do that, but it did get me thinking. The other proposals for speeding up globals access seemed to do their darndest to leave PyDictObject alone and ended up hideously complicated because of it. Here's the main idea for this one: What if a frame could maintain an array of pointers right into a dictionary's entry table? A global lookup would then consist of a couple of pointer dereferences, and any value change would show up immediately to the frame. There was a dangerous dangling pointer problem inherent in that, so I formalized an update mechanism using an observer pattern. Here's how it works. Arbitrary objects can register themselves with a dictionary as "entry observers". The dictionary keeps track of all the registered observers, and for certain events, makes a call to each one to tell them that something has changed. The entry observers get pointers to entries via PyDict_GetEntry, which is just like PyDict_GetItem, except it returns a PyDictEntry * right from the dictionary's entry table. The dict notifies its observers on delitem, pop, popitem, resize and clear. Nothing else is necessary - nothing else will change the address of or invalidate an entry. There are very, very few changes in PyDictObject. In the general case, the pointer to the list of observers is NULL, and the only additional slowdown is when delitem, pop, popitem, resize and clear check that and move on - but those aren't called often. So get, set, iter, contains, etc., are all exactly as fast as they were before. The biggest performance hit is when a highly-observed dict like __builtin__.__dict__ resizes, but that's rare enough to not worry about. To speed up globals access, an auxiliary object to functions and frames registers itself as an observer to func_globals and __builtins__. It makes an array of PyDictEntry pointers corresponding to func_code.co_names. PyEval_EvalFrameEx indexes that array first for global values, and updates it if there's one it couldn't find when the function was created. That's pretty much it. There are corner cases I still have to address, like what happens if someone replaces or deletes __builtins__, but it should be fairly easy to monitor that. I'd love to hear your comments, everyone. I've glossed over a lot of implementation details, but I've tried to make the main ideas clear. Neil From guido at python.org Thu Nov 22 16:46:21 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 22 Nov 2007 07:46:21 -0800 Subject: [Python-ideas] Fast global cacheless lookup In-Reply-To: <4745A301.5090201@cs.byu.edu> References: <4745A301.5090201@cs.byu.edu> Message-ID: Cool! Are you willing to show the code yet (bugs and all)? Personally, I'm not sure that it's worth doing this for STORE_GLOBAL (which should be rarely used in properly written code). Some questions: - what's the space & time impact for a dict with no watchers? - does this do anything for builtins? - could this be made to work for instance variables? - what about exec(src, ns) where ns is a mapping but not a dict? --Guido On Nov 22, 2007 7:40 AM, Neil Toronto wrote: > I have a hack coded up against r59068 in which LOAD_GLOBAL is even > faster than LOAD_FAST. It'll be the same with STORE_GLOBAL and the > *_NAME opcodes after I'm done with it, and it should be fully > transparent to Python code. (That is, you can go ahead and swap out > __builtins__ and crazy junk like that and everything should work as it > did before.) Regression tests all pass, except test_gc on functions - > I've got a refcount bug somewhere. > > Here's the microbenchmark I've been using to test LOAD_GLOBAL and LOAD_FAST: > > import timeit > import dis > > def test_local_get(): > x = 0 > x; x; x; #... and 397 more of them > > if __name__ == '__main__': > print dis.dis(test_local_get.func_code) > print timeit.Timer('test_local_get()', > 'from locals_test import test_local_get').timeit() > > > The globals test puts 'x' in module scope, and the builtins test changes > 'x' to 'len' and doesn't assign it to 0. > > Output right now: > > r59068 locals: 15.57 sec > myhack locals: 15.61 sec (increase is probably insignificant or random) > > r59068 globals: 23.61 sec > myhack globals: 15.14 sec (!) > > r59068 builtins: 28.08 sec > myhack builtins: 15.26 sec (!!) > > Of course, it's no good if it slows everything else way the heck down. > So 10 rounds of pybench says: > > r59068: mean 8.92, std 0.05 > myhack: mean 8.99, std 0.04 > > From what I see in pybench, globals access is severely underrepresented > compared to real programs, so those numbers aren't representative of the > possible difference in real-life performance. > > Jim Jewett gave me the idea here: > > http://mail.python.org/pipermail/python-ideas/2007-November/001207.html > > "Note that weakening the module.__dict__ promise to only meeting the > dict API would make it easier to implement the various speed-up-globals > suggestions." > > I didn't exactly do that, but it did get me thinking. The other > proposals for speeding up globals access seemed to do their darndest to > leave PyDictObject alone and ended up hideously complicated because of > it. Here's the main idea for this one: What if a frame could maintain an > array of pointers right into a dictionary's entry table? A global lookup > would then consist of a couple of pointer dereferences, and any value > change would show up immediately to the frame. > > There was a dangerous dangling pointer problem inherent in that, so I > formalized an update mechanism using an observer pattern. > > Here's how it works. Arbitrary objects can register themselves with a > dictionary as "entry observers". The dictionary keeps track of all the > registered observers, and for certain events, makes a call to each one > to tell them that something has changed. The entry observers get > pointers to entries via PyDict_GetEntry, which is just like > PyDict_GetItem, except it returns a PyDictEntry * right from the > dictionary's entry table. > > The dict notifies its observers on delitem, pop, popitem, resize and > clear. Nothing else is necessary - nothing else will change the address > of or invalidate an entry. There are very, very few changes in > PyDictObject. In the general case, the pointer to the list of observers > is NULL, and the only additional slowdown is when delitem, pop, popitem, > resize and clear check that and move on - but those aren't called often. > > So get, set, iter, contains, etc., are all exactly as fast as they were > before. The biggest performance hit is when a highly-observed dict like > __builtin__.__dict__ resizes, but that's rare enough to not worry about. > > To speed up globals access, an auxiliary object to functions and frames > registers itself as an observer to func_globals and __builtins__. It > makes an array of PyDictEntry pointers corresponding to > func_code.co_names. PyEval_EvalFrameEx indexes that array first for > global values, and updates it if there's one it couldn't find when the > function was created. > > That's pretty much it. There are corner cases I still have to address, > like what happens if someone replaces or deletes __builtins__, but it > should be fairly easy to monitor that. > > I'd love to hear your comments, everyone. I've glossed over a lot of > implementation details, but I've tried to make the main ideas clear. > > Neil > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ntoronto at cs.byu.edu Thu Nov 22 17:42:14 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Thu, 22 Nov 2007 09:42:14 -0700 Subject: [Python-ideas] Fast global cacheless lookup In-Reply-To: References: <4745A301.5090201@cs.byu.edu> Message-ID: <4745B166.3040904@cs.byu.edu> Guido van Rossum wrote: > Cool! Are you willing to show the code yet (bugs and all)? Sure! I stayed up all night doing it and today is Thanksgiving, so I'll probably not get to it for a little while. (I know making a patch shouldn't take long, but I've never done it before.) Should I post the patch here or somewhere else? > Some questions: > > - what's the space & time impact for a dict with no watchers? I think it's almost negligible. Space: There are four bytes extra on every dict for a pointer to the observer list. It may actually be zero or eight or more depending on alignment and malloc block size - I haven't looked. Time: On dicts with no observers, dealloc, delitem, pop, popitem, clear, and resize pass through an "if (mp->ma_entryobs_list != NULL)". PyDict_New sets mp->ma_entryobs_list to NULL. Nothing else is affected. > - does this do anything for builtins? It does right now well enough to get them quickly, but setting or deleting them elsewhere won't show up yet in the frame. And it doesn't handle the case where __builtins__ is replaced. That'll take a little doing, but just mentally - it shouldn't affect performance much. Anyway, that part will work properly when I'm done. > - could this be made to work for instance variables? If my brain were thinking in straight lines, maybe I'd come up with something. :) I've got this fuzzy idea that it just might work. The hard part may be distinguishing LOAD_ATTR applied to self from LOAD_ATTR applied to something else. Hmm... Something to digest while I'm digesting the Real Other White Meat. :) > - what about exec(src, ns) where ns is a mapping but not a dict? Good question - I don't know, but I think it should work, at least as well as it did before. If there's no observer attached to a frame, it'll default to its previous behavior. Another hmm... Thanks for the prompt reply. Neil P.S. By the way, I'm very pleased with how clean and workable the codebase is. I actually cheered at the lack of warnings. My wife probably thinks I'm nuts. :D From guido at python.org Thu Nov 22 18:07:58 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 22 Nov 2007 09:07:58 -0800 Subject: [Python-ideas] Fast global cacheless lookup In-Reply-To: <4745B166.3040904@cs.byu.edu> References: <4745A301.5090201@cs.byu.edu> <4745B166.3040904@cs.byu.edu> Message-ID: [Quick] The best way to post a patch is to put it in the bug tracker at bugs.python.org, and post a link to the issue here. The best way to create a patch is svn diff, assuming you started out with an anonymous svn checkout (see python.org/dev/) and not just with a distro download. Looking forward to it! --Guido On Nov 22, 2007 8:42 AM, Neil Toronto wrote: > Guido van Rossum wrote: > > Cool! Are you willing to show the code yet (bugs and all)? > > Sure! I stayed up all night doing it and today is Thanksgiving, so I'll > probably not get to it for a little while. (I know making a patch > shouldn't take long, but I've never done it before.) Should I post the > patch here or somewhere else? > > > Some questions: > > > > - what's the space & time impact for a dict with no watchers? > > I think it's almost negligible. > > Space: There are four bytes extra on every dict for a pointer to the > observer list. It may actually be zero or eight or more depending on > alignment and malloc block size - I haven't looked. > > Time: On dicts with no observers, dealloc, delitem, pop, popitem, clear, > and resize pass through an "if (mp->ma_entryobs_list != NULL)". > PyDict_New sets mp->ma_entryobs_list to NULL. Nothing else is affected. > > > - does this do anything for builtins? > > It does right now well enough to get them quickly, but setting or > deleting them elsewhere won't show up yet in the frame. And it doesn't > handle the case where __builtins__ is replaced. That'll take a little > doing, but just mentally - it shouldn't affect performance much. > > Anyway, that part will work properly when I'm done. > > > - could this be made to work for instance variables? > > If my brain were thinking in straight lines, maybe I'd come up with > something. :) I've got this fuzzy idea that it just might work. The hard > part may be distinguishing LOAD_ATTR applied to self from LOAD_ATTR > applied to something else. Hmm... > > Something to digest while I'm digesting the Real Other White Meat. :) > > > - what about exec(src, ns) where ns is a mapping but not a dict? > > Good question - I don't know, but I think it should work, at least as > well as it did before. If there's no observer attached to a frame, it'll > default to its previous behavior. Another hmm... > > Thanks for the prompt reply. > > Neil > > P.S. By the way, I'm very pleased with how clean and workable the > codebase is. I actually cheered at the lack of warnings. My wife > probably thinks I'm nuts. :D > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ntoronto at cs.byu.edu Thu Nov 22 18:16:17 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Thu, 22 Nov 2007 10:16:17 -0700 Subject: [Python-ideas] Fast global cacheless lookup In-Reply-To: References: <4745A301.5090201@cs.byu.edu> <4745B166.3040904@cs.byu.edu> Message-ID: <4745B961.20408@cs.byu.edu> Guido van Rossum wrote: > [Quick] The best way to post a patch is to put it in the bug tracker > at bugs.python.org, and post a link to the issue here. The best way to > create a patch is svn diff, assuming you started out with an anonymous > svn checkout (see python.org/dev/) and not just with a distro > download. Figures. I couldn't get it through svn because my university has a transparent proxy that doesn't like REPORT requests. Is there any chance of getting https enabled at svn.python.org sometime so I don't have to stick to snapshots? Anyway, I'll get to this sometime after I get to bed. > Looking forward to it! Yes sir! :) Neil From facundobatista at gmail.com Thu Nov 22 18:26:05 2007 From: facundobatista at gmail.com (Facundo Batista) Date: Thu, 22 Nov 2007 14:26:05 -0300 Subject: [Python-ideas] Fast global cacheless lookup In-Reply-To: <4745B961.20408@cs.byu.edu> References: <4745A301.5090201@cs.byu.edu> <4745B166.3040904@cs.byu.edu> <4745B961.20408@cs.byu.edu> Message-ID: 2007/11/22, Neil Toronto : > Figures. I couldn't get it through svn because my university has a > transparent proxy that doesn't like REPORT requests. Is there any chance > of getting https enabled at svn.python.org sometime so I don't have to > stick to snapshots? I have the same problem in the virtualized Ubuntu at work, but... 1. I create a dynamic tunnel with SSH to the machine at home. 2. Execute svn with tsocks, actually sending the SVN traffic through the tunnel (that acts like a SOCKS proxy). Here is a detailed how to, but in Spanish: http://www.taniquetil.com.ar/plog/post/1/303 If you can access other machine with SSH, feel free to contact me directly if need help to set up something like this. Regards, -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ From gnewsg at gmail.com Thu Nov 22 22:04:05 2007 From: gnewsg at gmail.com (Giampaolo Rodola') Date: Thu, 22 Nov 2007 13:04:05 -0800 (PST) Subject: [Python-ideas] os.listdir iteration support Message-ID: Hi to all, I would find very useful having a version of os.listdir returning a generator. If a directory has many files, say 20,000, it could take a long time getting all of them with os.listdir and this could be a problem in asynchronous environments (e.g. asynchronous servers). The only solution which comes to my mind in such case is using a thread/fork or having a non-blocking version of listdir() returning an iterator. What do you think about that? From eyal.lotem at gmail.com Thu Nov 22 23:07:10 2007 From: eyal.lotem at gmail.com (Eyal Lotem) Date: Fri, 23 Nov 2007 00:07:10 +0200 Subject: [Python-ideas] Fast global cacheless lookup In-Reply-To: References: <4745A301.5090201@cs.byu.edu> <4745B166.3040904@cs.byu.edu> Message-ID: Hey, I had a very similar idea and implementation back in June (that also passed all regression tests): http://mail.python.org/pipermail/python-ideas/2007-June/000902.html When I read Neil's mail I almost thought it was my old mail :-) Unfortunately, when I posted my optimization, it pretty much got ignored. Maybe I have not worded it properly. The main difference between our implementations, if I understand Neil's explanation correctly, is that you use direct ptrs into the dict and notify the ptr holders of relocations. I used a different method, where you call a new PyDict_ExportKey method and it creates a mediating element. The mediating element has a fixed position so it can be dereferenced directly. Direct access to it is just as fast, but it may be slightly affecting dict performance. I think a hybrid approach similar to Neil's, but with a mediating object to represent the access to the dict and do the observing for its user could be nicer (hell, Neil might already be doing this). P.S: I also had a more ambitious plan, after eliminating globals/builtins dict lookups, to use mro caches more aggressively with this optimization and type-specialization on code objects, to also eliminate class-side dict lookups. The user can also eliminate instance-side dict lookupts via __slots__ - effectively allowing the conversion of virtually _all_ namespace dict lookups in pure Python code to be direct memory dereferences, isn't that exciting? :-) Eyal On Nov 22, 2007 7:07 PM, Guido van Rossum wrote: > [Quick] The best way to post a patch is to put it in the bug tracker > at bugs.python.org, and post a link to the issue here. The best way to > create a patch is svn diff, assuming you started out with an anonymous > svn checkout (see python.org/dev/) and not just with a distro > download. > > Looking forward to it! > > --Guido > > > On Nov 22, 2007 8:42 AM, Neil Toronto wrote: > > Guido van Rossum wrote: > > > Cool! Are you willing to show the code yet (bugs and all)? > > > > Sure! I stayed up all night doing it and today is Thanksgiving, so I'll > > probably not get to it for a little while. (I know making a patch > > shouldn't take long, but I've never done it before.) Should I post the > > patch here or somewhere else? > > > > > Some questions: > > > > > > - what's the space & time impact for a dict with no watchers? > > > > I think it's almost negligible. > > > > Space: There are four bytes extra on every dict for a pointer to the > > observer list. It may actually be zero or eight or more depending on > > alignment and malloc block size - I haven't looked. > > > > Time: On dicts with no observers, dealloc, delitem, pop, popitem, clear, > > and resize pass through an "if (mp->ma_entryobs_list != NULL)". > > PyDict_New sets mp->ma_entryobs_list to NULL. Nothing else is affected. > > > > > - does this do anything for builtins? > > > > It does right now well enough to get them quickly, but setting or > > deleting them elsewhere won't show up yet in the frame. And it doesn't > > handle the case where __builtins__ is replaced. That'll take a little > > doing, but just mentally - it shouldn't affect performance much. > > > > Anyway, that part will work properly when I'm done. > > > > > - could this be made to work for instance variables? > > > > If my brain were thinking in straight lines, maybe I'd come up with > > something. :) I've got this fuzzy idea that it just might work. The hard > > part may be distinguishing LOAD_ATTR applied to self from LOAD_ATTR > > applied to something else. Hmm... > > > > Something to digest while I'm digesting the Real Other White Meat. :) > > > > > - what about exec(src, ns) where ns is a mapping but not a dict? > > > > Good question - I don't know, but I think it should work, at least as > > well as it did before. If there's no observer attached to a frame, it'll > > default to its previous behavior. Another hmm... > > > > Thanks for the prompt reply. > > > > Neil > > > > P.S. By the way, I'm very pleased with how clean and workable the > > codebase is. I actually cheered at the lack of warnings. My wife > > probably thinks I'm nuts. :D > > > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas > > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From tjreedy at udel.edu Fri Nov 23 00:11:46 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 22 Nov 2007 18:11:46 -0500 Subject: [Python-ideas] Fast global cacheless lookup References: <4745A301.5090201@cs.byu.edu><4745B166.3040904@cs.byu.edu> Message-ID: "Eyal Lotem" wrote in message news:b64f365b0711221407l7564b507p7359c227866fd230 at mail.gmail.com... | Hey, I had a very similar idea and implementation back in June (that | also passed all regression tests): | http://mail.python.org/pipermail/python-ideas/2007-June/000902.html | | When I read Neil's mail I almost thought it was my old mail :-) | | Unfortunately, when I posted my optimization, it pretty much got | ignored. Maybe I have not worded it properly. Rereading your post, I think Neil's was a bit clearer, partly because it had more details. In particular, I see no mention of ... | I used a different method, where you call a new PyDict_ExportKey | method and it creates a mediating element. The mediating element has | a fixed position so it can be dereferenced directly. (which I do not quite get, actually, but that is probably just me.) More important, I think, is timing. Last June, the focus was on defining what 3.0 would consist of. Now that that is mostly done, and the result found to be slower than 2.5, I think more attention is available for speed issues. It will be great if the two of you can come up with a clean lookup speedup that avoids any showstopper issues. This issue has been rumbling around 'in the basement' for several years. Terry Jan Reedy From tjreedy at udel.edu Fri Nov 23 00:25:06 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 22 Nov 2007 18:25:06 -0500 Subject: [Python-ideas] os.listdir iteration support References: Message-ID: "Giampaolo Rodola'" wrote in message news:d827975f-7c1e-471e-bac1-8d55262ab122 at d27g2000prf.googlegroups.com... | I would find very useful having a version of os.listdir returning a generator. If there are no technical issues in the way, such a replacement (rather than addition) would be in line with other list -> iterator replacements in 3.0 (range, dict,items, etc). A list could then be obtained with list(os.listdir). tjr From greg.ewing at canterbury.ac.nz Fri Nov 23 00:33:15 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 23 Nov 2007 12:33:15 +1300 Subject: [Python-ideas] Fast global cacheless lookup In-Reply-To: <4745B166.3040904@cs.byu.edu> References: <4745A301.5090201@cs.byu.edu> <4745B166.3040904@cs.byu.edu> Message-ID: <474611BB.70608@canterbury.ac.nz> Neil Toronto wrote: > The hard > part may be distinguishing LOAD_ATTR applied to self from LOAD_ATTR > applied to something else. Why would you *want* to distinguish that? A decent attribute lookup acceleration mechanism should work for attributes of any object, not just self. Think method calls, which are probably even more common than accesses to globals. -- Greg From guido at python.org Fri Nov 23 02:40:45 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 22 Nov 2007 17:40:45 -0800 Subject: [Python-ideas] os.listdir iteration support In-Reply-To: References: Message-ID: On Nov 22, 2007 3:25 PM, Terry Reedy wrote: > "Giampaolo Rodola'" wrote > > I would find very useful having a version of os.listdir returning a > > generator. > > If there are no technical issues in the way, such a replacement (rather > than addition) would be in line with other list -> iterator replacements in > 3.0 (range, dict,items, etc). A list could then be obtained with > list(os.listdir). But how common is this use case really? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz at pythoncraft.com Fri Nov 23 05:59:02 2007 From: aahz at pythoncraft.com (Aahz) Date: Thu, 22 Nov 2007 20:59:02 -0800 Subject: [Python-ideas] os.listdir iteration support In-Reply-To: References: Message-ID: <20071123045902.GA4136@panix.com> On Thu, Nov 22, 2007, Giampaolo Rodola' wrote: > > I would find very useful having a version of os.listdir returning a > generator. If a directory has many files, say 20,000, it could take > a long time getting all of them with os.listdir and this could be a > problem in asynchronous environments (e.g. asynchronous servers). > > The only solution which comes to my mind in such case is using a > thread/fork or having a non-blocking version of listdir() returning an > iterator. > > What do you think about that? -1 The problem is that reading a directory requires an open file handle; given a generator context, there's no clear mechanism for determining when to close the handle. Because the list needs to be created in the first place, why bother with a generator? -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "Typing is cheap. Thinking is expensive." --Roy Smith From adam at atlas.st Fri Nov 23 06:54:48 2007 From: adam at atlas.st (Adam Atlas) Date: Fri, 23 Nov 2007 00:54:48 -0500 Subject: [Python-ideas] os.listdir iteration support In-Reply-To: <20071123045902.GA4136@panix.com> References: <20071123045902.GA4136@panix.com> Message-ID: On 22 Nov 2007, at 23:59, Aahz wrote: > The problem is that reading a directory requires an open file handle; > given a generator context, there's no clear mechanism for determining > when to close the handle. Whenever the generator is __del__ed, or whenever the iteration completes, whichever comes first? > Because the list needs to be created in the first place How so? From greg.ewing at canterbury.ac.nz Fri Nov 23 08:01:43 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 23 Nov 2007 20:01:43 +1300 Subject: [Python-ideas] os.listdir iteration support In-Reply-To: References: <20071123045902.GA4136@panix.com> Message-ID: <47467AD7.8070702@canterbury.ac.nz> Adam Atlas wrote: > On 22 Nov 2007, at 23:59, Aahz wrote: > >>The problem is that reading a directory requires an open file handle; >>given a generator context, there's no clear mechanism for determining >>when to close the handle. > > Whenever the generator is __del__ed, or whenever the iteration > completes, whichever comes first? Maybe what we really want is the functionality of the C opendir and readdir functions exposed in the os module. Then we could have an explicit method for closing the file handle. -- Greg From ntoronto at cs.byu.edu Fri Nov 23 08:18:37 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Fri, 23 Nov 2007 00:18:37 -0700 Subject: [Python-ideas] os.listdir iteration support In-Reply-To: References: <20071123045902.GA4136@panix.com> Message-ID: <47467ECD.1090406@cs.byu.edu> Adam Atlas wrote: > On 22 Nov 2007, at 23:59, Aahz wrote: >> Because the list needs to be created in the first place > > How so? It doesn't, actually. On Windows, os.listdir uses FindFirstFile and FindNextFile, on OS2 it's DosFindFirst and DosFindNext, and on everything else it's Posix opendir and readdir. All of these are incremental, so a generator is the most natural way to expose the underlying API. That's just a set of facts and a single opinion. Past that I personally have no preference. Neil From ntoronto at cs.byu.edu Fri Nov 23 09:26:19 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Fri, 23 Nov 2007 01:26:19 -0700 Subject: [Python-ideas] Fast global cacheless lookup In-Reply-To: References: <4745A301.5090201@cs.byu.edu> <4745B166.3040904@cs.byu.edu> Message-ID: <47468EAB.905@cs.byu.edu> Eyal Lotem wrote: > Hey, I had a very similar idea and implementation back in June (that > also passed all regression tests): > http://mail.python.org/pipermail/python-ideas/2007-June/000902.html > > When I read Neil's mail I almost thought it was my old mail :-) > > Unfortunately, when I posted my optimization, it pretty much got > ignored. Maybe I have not worded it properly. > > The main difference between our implementations, if I understand > Neil's explanation correctly, is that you use direct ptrs into the > dict and notify the ptr holders of relocations. > > I used a different method, where you call a new PyDict_ExportKey > method and it creates a mediating element. The mediating element has > a fixed position so it can be dereferenced directly. Direct access to > it is just as fast, but it may be slightly affecting dict performance. Nicely done. :) > I think a hybrid approach similar to Neil's, but with a mediating > object to represent the access to the dict and do the observing for > its user could be nicer (hell, Neil might already be doing this). I am, actually. I originally had the observer be the function object itself, but that presented problems with generators, which create a frame object from a function and then dump the function. I had assumed that a frame would never outlast the function object it was created from and ended up with dangling pointers. D'oh! Anyway, it's correct now and the details are well-abstracted. The mediating object is called PyFastGlobalsAdapter. ("Adapter" because it allows you to getitem/setitem a dict like you do a list - using an index.) It gets bunted about among functions, frames, and eval code. It basically has the following members: PyObject *globals; /* PyDictObject only */ PyObject *names; /* From func_code->co_names */ PyDictEntry **entries; /* Struct pointers into globals entries */ (I've omitted the details for builtins because I haven't got them totally worked out yet. I'll probably have a PyObject *builtins and a PyDictEntry *builtins_entry pointing at the globals dict so I can detect when __builtins__ is replaced within globals.) On init, it registers itself with the globals dict and starts keeping track of pointers to dict entries in "entries". "entries" is the same length as "names". Getting the value globals[names[i]] is done by just referencing entries[i]->me_value. It's very quick. :) There's a PyFastGlobals_GetItem(PyObject *fg, int index) that does it for you and also does the necessary bookkeeping to update the PyDictEntry pointers when there's a miss. (entries[i] == NULL; happens when a key is anticipated but not in the dict at first.) I agree that the dict observer interface + an adapter is a good way to go. The dict part should be flexible, lean and fast (like dicts themselves), and the simple observer interface does just that. The adapter keeps it all correct and refcount-y, and provides a convenient way to get values by index. Would it be worth it to expose dict adapters as a Python object? Then Python code could do this kind of crazy stuff: d = {} a = dictadapter(d, ('keys', 'i', 'want', 'fast', 'access', 'to')) a[0] == d['keys'] and a[1] == d['i'] #..., etc. => True That could make it a lot easier to experiment with fast cacheless dict lookups in other contexts. The problem is, I have no idea what those contexts might be, at least within Python code. :) > P.S: I also had a more ambitious plan, after eliminating > globals/builtins dict lookups, to use mro caches more aggressively > with this optimization and type-specialization on code objects, to > also eliminate class-side dict lookups. The user can also eliminate > instance-side dict lookupts via __slots__ - effectively allowing the > conversion of virtually _all_ namespace dict lookups in pure Python > code to be direct memory dereferences, isn't that exciting? :-) Extremely! There's no reason it couldn't be done. But I'm not exactly sure what you mean (seeing the idea only from the context of my own hacks), so could you elaborate a bit? :D When you said "mro" I immediately thought of this adapter layout: PyList of MRO class dicts PyTuple of common attribute names (or list of tuples) PyDictEntry * array per MRO class But where would the name index come from? The co_names tuples are different for each method. Neil From ntoronto at cs.byu.edu Fri Nov 23 09:42:51 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Fri, 23 Nov 2007 01:42:51 -0700 Subject: [Python-ideas] Fast global cacheless lookup In-Reply-To: <474611BB.70608@canterbury.ac.nz> References: <4745A301.5090201@cs.byu.edu> <4745B166.3040904@cs.byu.edu> <474611BB.70608@canterbury.ac.nz> Message-ID: <4746928B.60101@cs.byu.edu> Greg Ewing wrote: > Neil Toronto wrote: >> The hard >> part may be distinguishing LOAD_ATTR applied to self from LOAD_ATTR >> applied to something else. > > Why would you *want* to distinguish that? A decent attribute > lookup acceleration mechanism should work for attributes of > any object, not just self. Think method calls, which are > probably even more common than accesses to globals. Now that's a durned good point. My cute little hack can be used anywhere you have a mostly-static dict (or at least one that grows infrequently) and a tuple of keys for which you want to repeatedly get or set values. As long as lookups start as tuple indexes (like indexes into co_names and such), things go fast. I'm still a bit fuzzy about how it would be used with LOAD_ATTR. Let's restrict it to just accelerating self. lookups for now. The oparg to LOAD_ATTR and STORE_ATTR is the co_names index, so co_names is again the tuple of keys. But it seems like you'd need an adapter (see previous reply to Eyal for terminology) for each pair of (self, method). Is there a better way? Neil From g.brandl at gmx.net Fri Nov 23 09:06:43 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 23 Nov 2007 09:06:43 +0100 Subject: [Python-ideas] os.listdir iteration support In-Reply-To: <47467AD7.8070702@canterbury.ac.nz> References: <20071123045902.GA4136@panix.com> <47467AD7.8070702@canterbury.ac.nz> Message-ID: Greg Ewing schrieb: > Adam Atlas wrote: >> On 22 Nov 2007, at 23:59, Aahz wrote: >> >>>The problem is that reading a directory requires an open file handle; >>>given a generator context, there's no clear mechanism for determining >>>when to close the handle. >> >> Whenever the generator is __del__ed, or whenever the iteration >> completes, whichever comes first? > > Maybe what we really want is the functionality of > the C opendir and readdir functions exposed in the os > module. Then we could have an explicit method for > closing the file handle. What about an os.iterdir() generator which uses opendir/readdir as proposed? The generator's close() could also call closedir(), and you could have a warning in the docs about making sure to have it closed at some point. One could even use an enclosing with closing(os.iterdir()) as d: block. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From greg.ewing at canterbury.ac.nz Fri Nov 23 10:30:57 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 23 Nov 2007 22:30:57 +1300 Subject: [Python-ideas] Fast global cacheless lookup In-Reply-To: <4746928B.60101@cs.byu.edu> References: <4745A301.5090201@cs.byu.edu> <4745B166.3040904@cs.byu.edu> <474611BB.70608@canterbury.ac.nz> <4746928B.60101@cs.byu.edu> Message-ID: <47469DD1.3060004@canterbury.ac.nz> Neil Toronto wrote: > But it seems like you'd need an adapter (see previous > reply to Eyal for terminology) for each pair of (self, method). Is there > a better way? I started writing down some ideas for this, but then I realised that it doesn't really extend to attribute lookup in general. The reason is that only some kinds of attribute have their values stored in dict entries -- mainly just instance variables of user-defined class instances. Bound methods, attributes of built-in objects, etc., would be left out. I think the way to approach this is to have a global cache which is essentially a dictionary mapping (obj, name) pairs to some object that knows how to set or get the attribute value as directly as possible. While this wouldn't eliminate dict lookups entirely, in the case of a cache hit it would just be a single lookup instead of potentially many. Some of the ideas behind your adapter might be carried over, such as the idea of callbacks triggered by changes to the underlying objects to help keep the cache up to date. But there would probably have to be a variety of such callback mechanisms for use by different kinds of objects. -- Greg From greg.ewing at canterbury.ac.nz Fri Nov 23 12:11:35 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 24 Nov 2007 00:11:35 +1300 Subject: [Python-ideas] os.listdir iteration support In-Reply-To: References: <20071123045902.GA4136@panix.com> <47467AD7.8070702@canterbury.ac.nz> Message-ID: <4746B567.2020806@canterbury.ac.nz> Georg Brandl wrote: > What about an os.iterdir() generator which uses opendir/readdir as proposed? I was feeling in the mood for a diversion, so I whipped up a Pyrex prototype of an opendir() object that can be used either as a file-like object or an iterator. Here's the docstring: """opendir(pathname) --> an open directory object Opens a directory and provides incremental access to the filenames it contains. May be used as a file-like object or as an iterator. When used as a file-like object, each call to read() returns one filename, or an empty string when the end of the directory is reached. The close() method should be called when finished with the directory. The close() method should also be called when used as an iterator and iteration is stopped prematurely. If iteration proceeds to completion, the directory is closed automatically.""" Source, setup.py and a brief test attached. -- Greg -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: opendir.pyx URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: setup.py URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test.py URL: From gnewsg at gmail.com Fri Nov 23 15:06:01 2007 From: gnewsg at gmail.com (Giampaolo Rodola') Date: Fri, 23 Nov 2007 06:06:01 -0800 (PST) Subject: [Python-ideas] os.listdir iteration support In-Reply-To: References: Message-ID: <85d8d06e-6287-4dbf-9f2b-89bf4dfe662b@w28g2000hsf.googlegroups.com> imho, not so unusual. First examples which come to my mind are HTTP and FTP servers which commonly have to list the content of local directories. FTP servers, in particular, have to do that VERY often. On 23 Nov, 02:40, "Guido van Rossum" wrote: > On Nov 22, 2007 3:25 PM, Terry Reedy wrote: > > > "Giampaolo Rodola'" wrote > > > I would find very useful having a version of os.listdir returning a > > > generator. > > > If there are no technical issues in the way, such a replacement (rather > > than addition) would be in line with other list -> iterator replacements in > > 3.0 (range, dict,items, etc). A list could then be obtained with > > list(os.listdir). > > But how common is this use case really? > > -- > --Guido van Rossum (home page:http://www.python.org/~guido/) > _______________________________________________ > Python-ideas mailing list > Python-id... at python.orghttp://mail.python.org/mailman/listinfo/python-ideas From gnewsg at gmail.com Fri Nov 23 15:12:30 2007 From: gnewsg at gmail.com (Giampaolo Rodola') Date: Fri, 23 Nov 2007 06:12:30 -0800 (PST) Subject: [Python-ideas] os.listdir iteration support In-Reply-To: <4746B567.2020806@canterbury.ac.nz> References: <20071123045902.GA4136@panix.com> <47467AD7.8070702@canterbury.ac.nz> <4746B567.2020806@canterbury.ac.nz> Message-ID: On 23 Nov, 12:11, Greg Ewing wrote: > Georg Brandl wrote: > from opendir import opendir > > print "READ" > d = opendir(".") > while 1: > name = d.read() > if not name: > break > print " ", name > print "EOF" > > print "ITERATE" > d = opendir(".") > for name in d: > print " ", name > print "STOP" > > print "TELL/SEEK" > d = opendir(".") > for i in range(3): > name = d.read() > print " ", name > pos = d.tell() > for i in range(3): > name = d.read() > print " ", name > d.seek(pos) > while 1: > name = d.read() > if not name: > break > print " ", name > print "EOF" This is exactly the usage I was talking about. From aahz at pythoncraft.com Fri Nov 23 15:39:39 2007 From: aahz at pythoncraft.com (Aahz) Date: Fri, 23 Nov 2007 06:39:39 -0800 Subject: [Python-ideas] os.listdir iteration support In-Reply-To: References: <20071123045902.GA4136@panix.com> Message-ID: <20071123143939.GA28219@panix.com> On Fri, Nov 23, 2007, Adam Atlas wrote: > On 22 Nov 2007, at 23:59, Aahz wrote: >> >> The problem is that reading a directory requires an open file handle; >> given a generator context, there's no clear mechanism for determining >> when to close the handle. > > Whenever the generator is __del__ed, or whenever the iteration > completes, whichever comes first? Enh. That is not reliable without work, and getting it reliable is a waste of work. The proposed idea for adding an opendir() function is workable, but it still doesn't solve the need for closing the handle within listdir(). No matter what, changes the semantics of listdir() to leave a handle lying around is going to cause problems for some people. >> Because the list needs to be created in the first place > > How so? If you're going to ask a question, it would be nice to leave the entire original context in place, especially given that it's not a particularly long chunk of text. Anyway, the Windows case aside, if you don't have a reliable close() mechanism, you need to slurp the whole thing into a list in one swell foop so that you can just close the handle. Even in the Windows case, you need a handle, and I don't know what the consequences are of leaving it lying around. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "Typing is cheap. Thinking is expensive." --Roy Smith From guido at python.org Fri Nov 23 21:23:37 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 23 Nov 2007 12:23:37 -0800 Subject: [Python-ideas] os.listdir iteration support In-Reply-To: <85d8d06e-6287-4dbf-9f2b-89bf4dfe662b@w28g2000hsf.googlegroups.com> References: <85d8d06e-6287-4dbf-9f2b-89bf4dfe662b@w28g2000hsf.googlegroups.com> Message-ID: But how many FTP servers are written in Python *and* have directories with 20,000 files in them? --Guido On Nov 23, 2007 6:06 AM, Giampaolo Rodola' wrote: > imho, not so unusual. > First examples which come to my mind are HTTP and FTP servers which > commonly have to list the content of local directories. > FTP servers, in particular, have to do that VERY often. > > On 23 Nov, 02:40, "Guido van Rossum" wrote: > > On Nov 22, 2007 3:25 PM, Terry Reedy wrote: > > > > > "Giampaolo Rodola'" wrote > > > > I would find very useful having a version of os.listdir returning a > > > > generator. > > > > > If there are no technical issues in the way, such a replacement (rather > > > than addition) would be in line with other list -> iterator replacements in > > > 3.0 (range, dict,items, etc). A list could then be obtained with > > > list(os.listdir). > > > > But how common is this use case really? > > > > -- > > --Guido van Rossum (home page:http://www.python.org/~guido/) > > _______________________________________________ > > Python-ideas mailing list > > Python-id... at python.orghttp://mail.python.org/mailman/listinfo/python-ideas > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From gnewsg at gmail.com Fri Nov 23 22:26:40 2007 From: gnewsg at gmail.com (Giampaolo Rodola') Date: Fri, 23 Nov 2007 13:26:40 -0800 (PST) Subject: [Python-ideas] os.listdir iteration support In-Reply-To: References: <85d8d06e-6287-4dbf-9f2b-89bf4dfe662b@w28g2000hsf.googlegroups.com> Message-ID: On 23 Nov, 21:23, "Guido van Rossum" wrote: > But how many FTP servers are written in Python *and* have directories > with 20,000 files in them? > > --Guido I sincerely don't know. Surely it's a rather specific use case, but it is one of the tasks which takes the longest amount of time on an FTP server. 20,000 is probably an exaggerated hypothetical situation, so I did a simple test with a more realistic scenario. On windows a very crowded directory is C:\windows\system32. Currently the C:\windows\system32 of my Windows XP workstation contains 2201 files. I tried to run the code below which is how an FTP server should properly respond to a "LIST" command issued by client. It took 1.70300006866 seconds to complete the first time and 0.266000032425 the second one. I don't know if such specific use case could justify a listdir generators support to have into the stdlib but having something like Greg Ewing's opendirs module could have saved a lot of time in this specific case. -- Giampaolo import os, stat, time from tarfile import filemode try: import pwd, grp except ImportError: pwd = grp = None def format_list(directory): """Return a directory listing emulating "/bin/ls -lA" UNIX command output. This is how output appears to client: -rw-rw-rw- 1 owner group 7045120 Sep 02 3:47 music.mp3 drwxrwxrwx 1 owner group 0 Aug 31 18:50 e-books -rw-rw-rw- 1 owner group 380 Sep 02 3:40 module.py """ listing = os.listdir(directory) result = [] for basename in listing: file = os.path.join(directory, basename) # if the file is a broken symlink, use lstat to get stat for # the link try: stat_result = os.stat(file) except (OSError,AttributeError): stat_result = os.lstat(file) perms = filemode(stat_result.st_mode) # permissions nlinks = stat_result.st_nlink # number of links to inode if not nlinks: # non-posix system, let's use a bogus value nlinks = 1 if pwd and grp: # get user and group name, else just use the raw uid/gid try: uname = pwd.getpwuid(stat_result.st_uid).pw_name except KeyError: uname = stat_result.st_uid try: gname = grp.getgrgid(stat_result.st_gid).gr_name except KeyError: gname = stat_result.st_gid else: # on non-posix systems the only chance we use default # bogus values for owner and group uname = "owner" gname = "group" size = stat_result.st_size # file size # stat.st_mtime could fail (-1) if file's last modification # time is too old, in that case we return local time as last # modification time. try: mtime = time.strftime("%b %d %H:%M", time.localtime(stat_result.st_mtime)) except ValueError: mtime = time.strftime("%b %d %H:%M") # if the file is a symlink, resolve it, e.g. "symlink -> real_file" if stat.S_ISLNK(stat_result.st_mode): basename = basename + " -> " + os.readlink(file) # formatting is matched with proftpd ls output result.append("%s %3s %-8s %-8s %8s %s %s\r\n" %( perms, nlinks, uname, gname, size, mtime, basename)) return ''.join(result) if __name__ == '__main__': before = time.time() format_list(r'C:\windows\system32') print time.time() - before From ntoronto at cs.byu.edu Sat Nov 24 13:41:56 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Sat, 24 Nov 2007 05:41:56 -0700 Subject: [Python-ideas] __builtins__ behavior and... the FUTURE! Message-ID: <47481C14.90009@cs.byu.edu> I'd post this on Python-dev, but it has more to do with the future of Python, and it directly impacts the fairly-well-received Python-idea I'm working on right now. The current behavior has persisted since revision 9877, nine years ago: http://svn.python.org/view?rev=9877&view=rev "Vladimir Marangozov' performance hack: copy f_builtins from ancestor if the globals are the same." A variant of the behavior has persisted since the age of the dinosaurs, as far as I can tell - or at least ever since Python had stack frames. Here's how the globals/builtins lookup is currently presented as working: 1. If 'name' is in globals, return globals['name'] 2. Return globals['__builtins__']['name'] Glossing over a lot of details, here's how it *actually* worked before the performance hack: 0. A code object gets executed, which creates a stack frame. It sets frame.builtins = globals['__builtins__']. While executing the code: 1. If 'name' is in globals, return globals['name']. 2. Otherwise return frame.builtins['name']. A problem example, which is still a problem today: __builtins__ = {'len': lambda x: 1} print len([1, 2, 3]) # prints: # '3' when run as a script # '1' in interactive mode If running as a script or part of an import, the module's frame caches builtins, so it doesn't matter that it gets reassigned. When 'len' is looked up for the print statement, it's looked up in the cached version. But in interactive mode, each statement is executed in its own frame, so it doesn't have this problem. Well, at least module *functions* will run in their own frames, so they'll see the new builtins, right? But here's how it works now, after the performance hack: 0. A code object gets executed, which creates a stack frame. a. If the stack frame has a parent (think "call site") and the parent has the same globals, it sets frame.builtins = parent.builtins. b. Otherwise it sets frame.builtins = globals['__builtins__']. While executing the code: 1. If 'name' is in globals, return globals['name']. 2. Otherwise return frame.builtins['name']. A problem example: __builtins__ = {'len': lambda x: 1} def f(): print len([1, 2, 3]) f() # prints: # '3' when run as a script # '1' in interactive mode At the call site "f()", frame.builtins is the original, cached builtins. Before the hack, f()'s frame would have recalculated and re-cached it. After the hack, f()'s frame inherits the cached version. But this only happens in a script, which runs its code in a single frame. If you try this in interactive mode, you'll get correct behavior. If function calls stay within a module, builtins is effectively frozen at the value it had when the module started execution. But if outside modules call those same functions, builtins will have its new value! That could be bad: import my_extra_special_builtins as __builtins__ def run_tests_on_extra_special_functions(): if __name__ == '__main__': run_tests_on_extra_special_functions() The special library functions work, but the tests don't. The special builtins module only shows up when functions are called from outside modules (where the call sites have different globals) and the functions' frames are forced to recalculate builtins rather than inheriting it. Here are some ways around the problem: 1. Put all the tests in a different module. 2. Use a unit testing framework, which will call the module functions from outside the module. 3. Call functions using exec with custom globals. 4. Replace functions using types.FunctionType with custom globals. #3 and #4 are decidedly unlikely. :) #1 is generally discouraged (AFAIK) if not annoying, and #2 is encouraged. In the last thread on __builtins__ vs. __builtin__, back in March, it seemed that Guido was open to new ideas for Python 3.0 on the subject. Well, keeping in mind this strange behavior and the length of time it's gone on, here's my recommendation: Kill __builtins__. Take it out of the module dict. Let LOAD_GLOBAL look in "builtins" (currently "__builtin__") for names after it checks globals. If modules want to hack at builtins, they can import it. But they hack it globally or not at all. I honestly can't think of a use case you can handle by replacing a module's __builtins__ that can't be handled without. If there is one, nobody actually does it, because we would have heard them screaming in agony and banging their heads against the walls from thousands of miles away by now. You just can't do it reliably as of February 1998. The regression test suite doesn't even touch things like this. It only goes as far as injecting stuff into __builtin__. Finally, on to my practical problem. I'm working on the fast globals stuff, which is how I got onto this subject in the first place. Here are a few of my options: 1. I can make __builtins__ work like it was always supposed to, at the cost of decreased performance and extra complexity. It would still be much faster than it is now, though. 2. Status quo: I can make __builtins__ work like it does now. I think I can do this, anyway. It's actually more complex than #1, and very likely slower. I would rather not take this route. 3. For a given function, I can freeze __builtins__ at the value it was at when the function was defined. 4. I can make it work like I suggested for Python 3.0, but make __builtin__ automatically available to modules as __builtins__. With or without it, I should be posting my patch for fast globals soon. No, don't look at me like that. I'm serious! Wondering-what-to-do-ly, Neil From greg.ewing at canterbury.ac.nz Sun Nov 25 00:09:02 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 25 Nov 2007 12:09:02 +1300 Subject: [Python-ideas] __builtins__ behavior and... the FUTURE! In-Reply-To: <47481C14.90009@cs.byu.edu> References: <47481C14.90009@cs.byu.edu> Message-ID: <4748AF0E.6090401@canterbury.ac.nz> Neil Toronto wrote: > Kill __builtins__. Take it out of the module dict. Let LOAD_GLOBAL > look in "builtins" (currently "__builtin__") for names after it > checks globals. What about things like running code sandboxed with a restricted set of builtins? -- Greg From jimjjewett at gmail.com Sun Nov 25 00:22:04 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Sat, 24 Nov 2007 18:22:04 -0500 Subject: [Python-ideas] Fast global cacheless lookup In-Reply-To: <4745A301.5090201@cs.byu.edu> References: <4745A301.5090201@cs.byu.edu> Message-ID: On 11/22/07, Neil Toronto wrote: > ... What if a frame could maintain an > array of pointers right into a dictionary's entry table? > The dict notifies its observers on delitem, pop, popitem, resize and > clear. Nothing else is necessary - nothing else will change the address > of or invalidate an entry. I think this isn't quite true, because of DUMMY entries. Insert key1. Insert key2 that wants the same slot. Register an observer that cares about key2 but not key1. Delete key1. The key1 entry is replaced with DUMMY, but the entry for key2 is not affected. Look up key2 (by some other code which hasn't already taken this shortcut) and the lookdict function (as a side effect) moves key2 to the better location that key1 no longer occupies. As described, I think this breaks your cache. Of course you can get around this by just not moving things without a resize, but that is likely to be horrible for the (non-namespace?) dictionaries that do see frequent deletions. Another way around it is to also notify the observers whenever lookdict moves an entry; I'm not sure how that would affect normal lookup performance. A more radical change is to stop exposing the internal structure at all. For example, a typical namespace might instead be represented as an array of values, plus a dict mapping names to indices. The cost would be an extra pointer for each key ever in the dictionary (since you wouldn't reuse positional slots), and the savings would be that most lookups could just grab namespace[i] without having to even check that they got the right key, let alone following a trail of collision resolutions. > To speed up globals access, an auxiliary object to functions and frames > registers itself as an observer to func_globals and __builtins__. Note that func_globals probably *will* be updated again in the future, if only to register this very function with its module. You could wait to "seal" a namespace until you think all its names are known, or you could adapt the timestamp solution suggested in http://bugs.python.org/issue1616125 -jJ From aahz at pythoncraft.com Sun Nov 25 01:29:17 2007 From: aahz at pythoncraft.com (Aahz) Date: Sat, 24 Nov 2007 16:29:17 -0800 Subject: [Python-ideas] os.listdir iteration support In-Reply-To: References: <85d8d06e-6287-4dbf-9f2b-89bf4dfe662b@w28g2000hsf.googlegroups.com> Message-ID: <20071125002916.GA12966@panix.com> On Fri, Nov 23, 2007, Giampaolo Rodola' wrote: > > Surely it's a rather specific use case, but it is one of the tasks > which takes the longest amount of time on an FTP server. 20,000 is > probably an exaggerated hypothetical situation, so I did a simple test > with a more realistic scenario. > On windows a very crowded directory is C:\windows\system32. Currently > the C:\windows\system32 of my Windows XP workstation contains 2201 > files. > I tried to run the code below which is how an FTP server should > properly respond to a "LIST" command issued by client. > It took 1.70300006866 seconds to complete the first time and > 0.266000032425 the second one. Your code calls os.stat() on each file. I know from past experience that os.stat() is *extremely* expensive. Because os.listdir() runs at C speed, it only gets slow when run against hundreds of thousands of entries. (One directory on a work server has over 200K entries, and it takes os.listdir() about twenty seconds. I believe that if we switched from ext3 to something more appropriate that would get reduced.) > I don't know if such specific use case could justify a listdir > generators support to have into the stdlib but having something like > Greg Ewing's opendirs module could have saved a lot of time in this > specific case. Doubtful. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "Typing is cheap. Thinking is expensive." --Roy Smith From jimjjewett at gmail.com Sun Nov 25 02:21:37 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Sat, 24 Nov 2007 20:21:37 -0500 Subject: [Python-ideas] __builtins__ behavior and... the FUTURE! In-Reply-To: <47481C14.90009@cs.byu.edu> References: <47481C14.90009@cs.byu.edu> Message-ID: On 11/24/07, Neil Toronto wrote: [I'm summarizing and paraphrasing] If a name isn't in globals, python looks in globals['__builtins__']['name'] Unfortunately, it may use a stale cached value for globals['__builtins__'] ... > Well, keeping in mind this strange behavior and the length > of time it's gone on, here's my recommendation: > Kill __builtins__. Take it out of the module dict. Let LOAD_GLOBAL > look in "builtins" (currently "__builtin__") for names after it > checks globals. If modules want to hack at builtins, they can > import it. But they hack it globally or not at all. As Greg pointed out, this isn't so good for sandboxes. But as long as you're changing dicts to be better namespaces, why not go a step farther? Instead of using a magic key name (some spelling variant of builtin), make the fallback part of the dict itself. For example: Use a defaultdict and set the __missing__ method to the builtin's __getitem__. Then neither python nor the frame need to worry about tracking the builtin namespace, but the fallback can be reset (even on a per-function basis) by simply replacing the fallback method. -jJ From ntoronto at cs.byu.edu Sun Nov 25 03:18:25 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Sat, 24 Nov 2007 19:18:25 -0700 Subject: [Python-ideas] Fast global cacheless lookup In-Reply-To: References: <4745A301.5090201@cs.byu.edu> Message-ID: <4748DB71.6000601@cs.byu.edu> Jim Jewett wrote: > On 11/22/07, Neil Toronto wrote: > >> ... What if a frame could maintain an >> array of pointers right into a dictionary's entry table? > >> The dict notifies its observers on delitem, pop, popitem, resize and >> clear. Nothing else is necessary - nothing else will change the address >> of or invalidate an entry. > > I think this isn't quite true, because of DUMMY entries. > > Insert key1. > Insert key2 that wants the same slot. > Register an observer that cares about key2 but not key1. > > Delete key1. The key1 entry is replaced with DUMMY, but the entry > for key2 is not affected. > > Look up key2 (by some other code which hasn't already taken this > shortcut) and the lookdict function (as a side effect) moves key2 to > the better location that key1 no longer occupies. As described, I > think this breaks your cache. Good grief old chap, you freaked me out. Turns out it all still works. Whether the lookdict functions used to move entries around I don't know, but now it doesn't. It's probably because deletions are so rare compared to other operations that it's not worth the extra logic in those tight little loops. Mind if I keep rambling, just to make sure I've got it right? :) It's the dummy entries that make lookup work at all. The lookdict functions use them as flags so that it knows to keep skipping around the table looking for an open entry or an entry with the right key. It's basically: "If ep->me_key != key or ep->me_key == dummy, I need to keep trying different ep's. If I reach an empty ep, return the first dummy I found or that ep if I didn't find one. If I reach an ep with the right key, return that." I wasn't completely satisfied by static analysis, so I traced the case you brought up through both lookdict and lookdict_string. Here it is: Assume hash(key1) == hash(key2). Assume (without loss of generality) that for this hash, entries are traversed in order 0, 1, 2... Insert key1: 0: key1 1: NULL 2: ... Insert key2: 0: key1 1: key2 2: ... Delete key1: 0: dummy 1: key2 2: ... Look up key2 (trace): ("freeslot" keeps track of the first dummy found on the traversal; is NULL if none found) start: ep = 0 freeslot = 0 [ep->me_key == dummy] loop: ep = 1 return ep [ep->me_key == key2] In that last bit would have been the part that goes something like this: if (freeslot != NULL) { /* refcount-neutral */ *freeslot = *ep; ep->me_key = dummy; ep->me_value = NULL; return freeslot; } else return ep; It might be a speed improvement if you assume that the key is very likely to be looked up again. But it's extra complexity in a speed-critical code path and you never know whether you lengthened the traversal for other lookups. As long as it's a wash in the end, it might as well be left alone, at least for the fast globals. :D Neil From guido at python.org Mon Nov 26 18:40:59 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 26 Nov 2007 09:40:59 -0800 Subject: [Python-ideas] Fast global cacheless lookup In-Reply-To: <4748DB71.6000601@cs.byu.edu> References: <4745A301.5090201@cs.byu.edu> <4748DB71.6000601@cs.byu.edu> Message-ID: On Nov 24, 2007 6:18 PM, Neil Toronto wrote: > Jim Jewett wrote: > > I think this isn't quite true, because of DUMMY entries. > > > > Insert key1. > > Insert key2 that wants the same slot. > > Register an observer that cares about key2 but not key1. > > > > Delete key1. The key1 entry is replaced with DUMMY, but the entry > > for key2 is not affected. > > > > Look up key2 (by some other code which hasn't already taken this > > shortcut) and the lookdict function (as a side effect) moves key2 to > > the better location that key1 no longer occupies. As described, I > > think this breaks your cache. > > Good grief old chap, you freaked me out. > > Turns out it all still works. Whether the lookdict functions used to > move entries around I don't know, but now it doesn't. It's probably > because deletions are so rare compared to other operations that it's not > worth the extra logic in those tight little loops. I don't know where Jim gets his information, but I don't recall that just looking up a key has ever moved entries around. You'd have to delete and re-add it to get it moved. (Or you'd have to hit the "rehash everything to a larger hash table" of course.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Nov 26 18:46:44 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 26 Nov 2007 09:46:44 -0800 Subject: [Python-ideas] __builtins__ behavior and... the FUTURE! In-Reply-To: <47481C14.90009@cs.byu.edu> References: <47481C14.90009@cs.byu.edu> Message-ID: The semantics of __builtins__ are an implementation detail used for sandboxing, and assignment to __builtins__ is not supported. Alas, I can't quite figure out what you're after; your post doesn't start with a clear problem statement, so I'm not even sure if this is helpful information. I just hope to encourage you from trying to change the semantics of __builtins__. In 3.0, __builtins__ may well be renamed. --Guido On Nov 24, 2007 4:41 AM, Neil Toronto wrote: > I'd post this on Python-dev, but it has more to do with the future of > Python, and it directly impacts the fairly-well-received Python-idea I'm > working on right now. > > The current behavior has persisted since revision 9877, nine years ago: > > http://svn.python.org/view?rev=9877&view=rev > > "Vladimir Marangozov' performance hack: copy f_builtins from ancestor > if the globals are the same." > > A variant of the behavior has persisted since the age of the dinosaurs, > as far as I can tell - or at least ever since Python had stack frames. > > Here's how the globals/builtins lookup is currently presented as working: > > 1. If 'name' is in globals, return globals['name'] > 2. Return globals['__builtins__']['name'] > > Glossing over a lot of details, here's how it *actually* worked before > the performance hack: > > 0. A code object gets executed, which creates a stack frame. It > sets frame.builtins = globals['__builtins__']. > While executing the code: > 1. If 'name' is in globals, return globals['name']. > 2. Otherwise return frame.builtins['name']. > > A problem example, which is still a problem today: > > __builtins__ = {'len': lambda x: 1} > print len([1, 2, 3]) > # prints: > # '3' when run as a script > # '1' in interactive mode > > If running as a script or part of an import, the module's frame caches > builtins, so it doesn't matter that it gets reassigned. When 'len' is > looked up for the print statement, it's looked up in the cached version. > But in interactive mode, each statement is executed in its own frame, so > it doesn't have this problem. > > Well, at least module *functions* will run in their own frames, so > they'll see the new builtins, right? But here's how it works now, after > the performance hack: > > 0. A code object gets executed, which creates a stack frame. > a. If the stack frame has a parent (think "call site") and > the parent has the same globals, it sets > frame.builtins = parent.builtins. > b. Otherwise it sets frame.builtins = globals['__builtins__']. > While executing the code: > 1. If 'name' is in globals, return globals['name']. > 2. Otherwise return frame.builtins['name']. > > A problem example: > > __builtins__ = {'len': lambda x: 1} > def f(): print len([1, 2, 3]) > f() > # prints: > # '3' when run as a script > # '1' in interactive mode > > > At the call site "f()", frame.builtins is the original, cached builtins. > Before the hack, f()'s frame would have recalculated and re-cached it. > After the hack, f()'s frame inherits the cached version. But this only > happens in a script, which runs its code in a single frame. If you try > this in interactive mode, you'll get correct behavior. > > If function calls stay within a module, builtins is effectively frozen > at the value it had when the module started execution. But if outside > modules call those same functions, builtins will have its new value! > That could be bad: > > import my_extra_special_builtins as __builtins__ > > > > def run_tests_on_extra_special_functions(): > > > if __name__ == '__main__': > run_tests_on_extra_special_functions() > > The special library functions work, but the tests don't. The special > builtins module only shows up when functions are called from outside > modules (where the call sites have different globals) and the functions' > frames are forced to recalculate builtins rather than inheriting it. > Here are some ways around the problem: > > 1. Put all the tests in a different module. > 2. Use a unit testing framework, which will call the module > functions from outside the module. > 3. Call functions using exec with custom globals. > 4. Replace functions using types.FunctionType with custom globals. > > #3 and #4 are decidedly unlikely. :) #1 is generally discouraged (AFAIK) > if not annoying, and #2 is encouraged. > > In the last thread on __builtins__ vs. __builtin__, back in March, it > seemed that Guido was open to new ideas for Python 3.0 on the subject. > Well, keeping in mind this strange behavior and the length of time it's > gone on, here's my recommendation: > > Kill __builtins__. Take it out of the module dict. Let LOAD_GLOBAL > look in "builtins" (currently "__builtin__") for names after it > checks globals. If modules want to hack at builtins, they can > import it. But they hack it globally or not at all. > > I honestly can't think of a use case you can handle by replacing a > module's __builtins__ that can't be handled without. If there is one, > nobody actually does it, because we would have heard them screaming in > agony and banging their heads against the walls from thousands of miles > away by now. You just can't do it reliably as of February 1998. > > The regression test suite doesn't even touch things like this. It only > goes as far as injecting stuff into __builtin__. > > Finally, on to my practical problem. > > I'm working on the fast globals stuff, which is how I got onto this > subject in the first place. Here are a few of my options: > > 1. I can make __builtins__ work like it was always supposed to, at > the cost of decreased performance and extra complexity. It would > still be much faster than it is now, though. > 2. Status quo: I can make __builtins__ work like it does now. I > think I can do this, anyway. It's actually more complex than #1, > and very likely slower. I would rather not take this route. > 3. For a given function, I can freeze __builtins__ at the value it > was at when the function was defined. > 4. I can make it work like I suggested for Python 3.0, but make > __builtin__ automatically available to modules as __builtins__. > > With or without it, I should be posting my patch for fast globals soon. > No, don't look at me like that. I'm serious! > > Wondering-what-to-do-ly, > Neil > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Mon Nov 26 19:43:03 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 26 Nov 2007 13:43:03 -0500 Subject: [Python-ideas] Fwd: Fast global cacheless lookup In-Reply-To: References: <4745A301.5090201@cs.byu.edu> <4748DB71.6000601@cs.byu.edu> Message-ID: gaah ... this should have been sent to the list for archiving. The summary is that my memory was wrong, and items are *not* jostled back to "better" locations. -jJ From ntoronto at cs.byu.edu Mon Nov 26 21:50:53 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Mon, 26 Nov 2007 13:50:53 -0700 Subject: [Python-ideas] __builtins__ behavior and... the FUTURE! In-Reply-To: References: <47481C14.90009@cs.byu.edu> Message-ID: <474B31AD.9060004@cs.byu.edu> Guido van Rossum wrote: > The semantics of __builtins__ are an implementation detail used for > sandboxing, and assignment to __builtins__ is not supported. Alas, I > can't quite figure out what you're after; your post doesn't start with > a clear problem statement, so I'm not even sure if this is helpful > information. I just hope to encourage you from trying to change the > semantics of __builtins__. In 3.0, __builtins__ may well be renamed. Sorry - it was very early in the morning when I did my analysis, so I wasn't as clear as I could have been. I had two points: 1. A suggestion for future builtins, which is probably the wrong thing to do. Please disregard this. 2. A question about which semantics fast globals should support, and how different they can be from the current semantics and still be acceptable. I have two problems with the current semantics: 1. They seem very wrong to me, even for an implementation detail. Python developers rely on function behavior being invariant to the call site. (As much as Python developers could be said to rely on any invariance, anyway.) 2. Implementing the current semantics with fast globals seems unnecessary. It no longer helps performance (it hurts it a tiny bit), and the code that does it reads like a pasted-on hack. I've since discovered that it wouldn't be much slower. Here are some times for one of my "builtins get" benchmarks: Current builtins: 3.11 sec Fast builtins, immediate semantics: 1.81 sec Fast builtins, current or pre-1998: 1.64 sec (+ epsilon for hack) "Immediate" semantics (which I find most correct) are a little slower because it has to check whether __builtins__ has changed every time a globals lookup fails, before it does a builtins lookup. In "pre-1998" semantics, a change of __builtins__ is checked only with a new stack frame. Besides those results, fast globals reduces function call overhead by 10%. I haven't measured what effect the hack has on that. Personally, I like fast globals with pre-1998 semantics best, though there's still a difference in meaning between script and interactive mode. I can do it that way, the current way, or the immediate way. Or I could make current vs. pre-1998 selectable by macro. Do you have a preference? I swear, though, I'm nearly ready to post a patch. :) Neil From guido at python.org Mon Nov 26 22:40:00 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 26 Nov 2007 13:40:00 -0800 Subject: [Python-ideas] __builtins__ behavior and... the FUTURE! In-Reply-To: <474B31AD.9060004@cs.byu.edu> References: <47481C14.90009@cs.byu.edu> <474B31AD.9060004@cs.byu.edu> Message-ID: On Nov 26, 2007 12:50 PM, Neil Toronto wrote: > [...] A question about which semantics fast globals should support, and how > different they can be from the current semantics and still be acceptable. > > I have two problems with the current semantics: > > 1. They seem very wrong to me, even for an implementation detail. Python > developers rely on function behavior being invariant to the call site. > (As much as Python developers could be said to rely on any invariance, > anyway.) Please assume I didn't read your initial post. "Very wrong" is a strong stance. Care to explain what's wrong and why? Without more info I'm not sure I understand what you're saying about call site invariance. > 2. Implementing the current semantics with fast globals seems > unnecessary. It no longer helps performance (it hurts it a tiny bit), > and the code that does it reads like a pasted-on hack. Please provide full context (I'm also behind on the fast globals thread). What exactly do you mean by "the current semantics"? And what's the problem with implementing it with fast globals? > I've since discovered that it wouldn't be much slower. Here are some > times for one of my "builtins get" benchmarks: > > Current builtins: 3.11 sec > Fast builtins, immediate semantics: 1.81 sec > Fast builtins, current or pre-1998: 1.64 sec (+ epsilon for hack) Where's the benchmark source code? > "Immediate" semantics (which I find most correct) Even though I already told you not to care? > are a little slower > because it has to check whether __builtins__ has changed every time a > globals lookup fails, before it does a builtins lookup. In "pre-1998" > semantics, a change of __builtins__ is checked only with a new stack frame. > > Besides those results, fast globals reduces function call overhead by > 10%. I haven't measured what effect the hack has on that. > > Personally, I like fast globals with pre-1998 semantics best, though > there's still a difference in meaning between script and interactive > mode. I can do it that way, the current way, or the immediate way. Or I > could make current vs. pre-1998 selectable by macro. Do you have a > preference? Given that *nobody* should assign to __builtins__ in their current globals, *ever*, I'm fine with pre-1998 semantics if it's fastest. > I swear, though, I'm nearly ready to post a patch. :) Please consider posting it before replying to this post. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Tue Nov 27 00:21:15 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 27 Nov 2007 12:21:15 +1300 Subject: [Python-ideas] __builtins__ behavior and... the FUTURE! In-Reply-To: References: <47481C14.90009@cs.byu.edu> Message-ID: <474B54EB.9090700@canterbury.ac.nz> Guido van Rossum wrote: > The semantics of __builtins__ are an implementation detail used for > sandboxing, and assignment to __builtins__ is not supported. Perhaps in 3.0 there could be an additional argument to eval and exec for supplying a builtin namespace? Then sandboxing code wouldn't have to make assumptions about the implementation, and the way would be open for optimising it in any way we wanted. -- Greg From guido at python.org Tue Nov 27 00:29:10 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 26 Nov 2007 15:29:10 -0800 Subject: [Python-ideas] __builtins__ behavior and... the FUTURE! In-Reply-To: <474B54EB.9090700@canterbury.ac.nz> References: <47481C14.90009@cs.byu.edu> <474B54EB.9090700@canterbury.ac.nz> Message-ID: On Nov 26, 2007 3:21 PM, Greg Ewing wrote: > Guido van Rossum wrote: > > The semantics of __builtins__ are an implementation detail used for > > sandboxing, and assignment to __builtins__ is not supported. > > Perhaps in 3.0 there could be an additional argument to > eval and exec for supplying a builtin namespace? Then > sandboxing code wouldn't have to make assumptions about > the implementation, and the way would be open for > optimising it in any way we wanted. Good idea. If only I hadn't made a mistake in the signature design... It's kind of awkward to have it be exec(code, globals, locals, builtins), but I'm afraid that changing it to exec(code, locals, globals, builtins) would break too much code in the transition (2to3 notwithstanding). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From mark at qtrac.eu Tue Nov 27 09:31:04 2007 From: mark at qtrac.eu (Mark Summerfield) Date: Tue, 27 Nov 2007 08:31:04 +0000 Subject: [Python-ideas] P3k __builtins__ identifiers -> warning Message-ID: <200711270831.04321.mark@qtrac.eu> Here is a nice little Python 3 program, test.py: import string buffer = string.ascii_letters bytes = [] sum = 0 for chr in buffer: int = ord(chr) if 32 <= int < 127: bytes.append(chr) sum += 1 str = "".join(bytes) print(sum, str) If run as: python30a -W all test.py It produces the expected output: 52 abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ But unfortunately it uses as identifiers: buffer, bytes, chr, int, sum, and str. None of these are keywords so none of them provokes a SyntaxError. In fact there are over 130 such identifiers; print(dir(__builtins__)) to see them. I think many newcomers to Python will find it difficult to remember 160 identifiers (keywords + __builtins__) and since some of them have appealing names (esp. buffer, bytes, min, max, and sum), they may make use of them without realising that this could cause them problems later on. My python-idea is that if python is run with -W all then it should report uses of __builtins__ as identifiers. -- Mark Summerfield, Qtrac Ltd., www.qtrac.eu From guido at python.org Tue Nov 27 19:43:47 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 27 Nov 2007 10:43:47 -0800 Subject: [Python-ideas] P3k __builtins__ identifiers -> warning In-Reply-To: <200711270831.04321.mark@qtrac.eu> References: <200711270831.04321.mark@qtrac.eu> Message-ID: IMO this is a task for tools llike pylint or pychecker (both of which flag this). Also, it's controversial -- especially since you're unlikely to want to use a builtin whose name you can't remember. :-) The builtins were not made keywords for a reason. --Guido On Nov 27, 2007 12:31 AM, Mark Summerfield wrote: > Here is a nice little Python 3 program, test.py: > > import string > buffer = string.ascii_letters > bytes = [] > sum = 0 > for chr in buffer: > int = ord(chr) > if 32 <= int < 127: > bytes.append(chr) > sum += 1 > str = "".join(bytes) > print(sum, str) > > If run as: > > python30a -W all test.py > > It produces the expected output: > > 52 abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ > > But unfortunately it uses as identifiers: buffer, bytes, chr, int, sum, > and str. None of these are keywords so none of them provokes a > SyntaxError. In fact there are over 130 such identifiers; > print(dir(__builtins__)) to see them. > > I think many newcomers to Python will find it difficult to remember 160 > identifiers (keywords + __builtins__) and since some of them have > appealing names (esp. buffer, bytes, min, max, and sum), they may make > use of them without realising that this could cause them problems later > on. > > My python-idea is that if python is run with -W all then it should > report uses of __builtins__ as identifiers. > > -- > Mark Summerfield, Qtrac Ltd., www.qtrac.eu > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From scott+python-ideas at scottdial.com Wed Nov 28 07:11:24 2007 From: scott+python-ideas at scottdial.com (Scott Dial) Date: Wed, 28 Nov 2007 01:11:24 -0500 Subject: [Python-ideas] P3k __builtins__ identifiers -> warning In-Reply-To: <200711270831.04321.mark@qtrac.eu> References: <200711270831.04321.mark@qtrac.eu> Message-ID: <474D068C.1000508@scottdial.com> Mark Summerfield wrote: > My python-idea is that if python is run with -W all then it should > report uses of __builtins__ as identifiers. This could never work as the stdlib violates this rule and would invoke a large number of these warnings. And given the controversy about it, I doubt anyone is that interested in patching the stdlib to avoid these names. As far as I am concerned, I don't really see the point in avoiding these names. As you say, several of them are very attractive and just because there is a built-in with that name doesn't always deter me from using it. The scoping rules of python are fairly simple, so it is not difficult to keep track of the shadowing. And it's pretty easy to recover a built-in by retrieving the object from the __builtins__ module, though not very obvious to newcomers. -Scott -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu