From rrr at ronadam.com Sun Apr 1 08:14:10 2007 From: rrr at ronadam.com (Ron Adam) Date: Sun, 01 Apr 2007 01:14:10 -0500 Subject: [Python-ideas] Thread IPC idea: Quipe? Sockqueue? (Re: Python and Concurrency) In-Reply-To: References: <4602354F.1010003@acm.org> <46062BC9.2020208@acm.org> <460B23FE.2010309@canterbury.ac.nz> <460C0E69.9060007@ronadam.com> Message-ID: <460F4DB2.7030600@ronadam.com> Jim Jewett wrote: > On 3/29/07, Ron Adam wrote: >> Jim Jewett wrote: >> > What we really need is a Task object that treats shared memory >> > (perhaps with a small list of specified exceptions) as immutable. > >> * A 'better' task object for easily creating tasks. >> + We have a threading object now. (Needs improving.) > > But the task isn't in any way restricted. Brett's security sandbox > might be a useful starting point, if it is fast enough. Otherwise, > we'll probably need to stick with microthreading to get things small > enough to contain. > >> * Shared memory - >> + Prevent names from being rebound >> + Prevent objects from being altered > > I had thought of the names as being part of a shared dictionary. (Of > course, immutable dictionaries aren't really available out-of-the-box > now, and I'm not sure I would trust the supposed immutability of > anything that wasn't proxied.) Not all that different. The immutable dictionary would be an implantation detail of a locked name space I think. I'm wondering if there might be a way to have an inheritance by container relationship, where certain characteristics can be acquired from parent containers. Not exactly the same as class inheritance. >> frozen: object can't be altered while frozen >> locked: name can't be rebound to another object > > >> 3. Pass mutable "deep" copies back and forth. >> ? Works now. (but not for all objects?) > > Well, anything that can be deep-copied -- unless you also want the > mutations to be collected back into a single location. It would need to be able to make a round trip. >> 4. Pass frozen mutable objects. >> - Needs freezable/unfreezable mutable objects. >> (Not the same as making an immutable copy.) > > And there is where it starts to fall apart. Though if you look at the > pypy dict and interpreter optimizations, they have started to deal > with it through versioning types. I didn't find anything about "versioning" at these links. Did I miss it? > http://codespeak.net/pypy/dist/pypy/doc/interpreter-optimizations.html#multi-dicts > http://codespeak.net/pypy/dist/pypy/doc/interpreter-optimizations.html#id23 _Ron From ntoronto at cs.byu.edu Mon Apr 2 06:08:02 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Sun, 01 Apr 2007 22:08:02 -0600 Subject: [Python-ideas] Python and Concurrency In-Reply-To: <4602354F.1010003@acm.org> References: <4602354F.1010003@acm.org> Message-ID: <461081A2.50507@cs.byu.edu> Talin wrote: > One thing that is important to understand is that I'm not talking about > "automatic parallelization" where the compiler automatically figures out > what parts can be parallelized. That would require so much change to the > Python language as to make it no longer Python. > ... > I am not even necessarily talking about changing the Python language > (although certainly the internals of the interpreter will need to be > changed.) The same language can be used to describe the same kinds of > problems and operations, but the implications of those language elements > will change. This is analogous to the fact that these massively > multicore CPUs 10 years from now will most likely be using the same > instruction sets as today - but that does not mean that programming as a > discipline will anything like what it is now. > I'm not convinced you wouldn't have to change Python. After dorking around online for years, I've *finally* found someone who put into math-talk my biggest problem with current programming paradigms and how they relate to concurrency: http://alarmingdevelopment.org/index.php?p=5 I don't agree with everything in the post, but this part I do: "Most programming languages adopt a control flow model of execution, mirroring the hardware, in which there is an execution pointer flowing through the program. The primary reason for this is to permit side-effects to be ordered by the programmer. The problem is that interdependencies between side-effects are naturally a partial order, while control flow establishes a total (linear) order. This means that the actual design exists only in the programmer?s mind. It is up to the programmer to mentally compile (by a topological sort) these implicit dependencies into a total order expressed in a control flow. Whenever the program?s control flow is to be changed, the implicit interdependencies encoded into it must be remembered or guessed at in order to maintain them. Obviously the language should allow the partial order to be explicitly specified, and a compiler should take care of working out an execution schedule for it." There's an elephant-in-the-living-room UI problem, here: how would one go about extracting a partial order from a programmer? A text editor is fine for a total order, but I can't think of how I'd use one non-messily to define a partial order. How about a Gantt chart for a partial order, or some other kind of dependency diagram? How would you make it as easy to use as a text editor? The funny thing is, once you solve this problem, it may even be *easier* to program this way, because rather than maintaining the partial order in your head (or inferring it from a total order in the code), it'd be right in front of you. There's no reason a program with partial flow control couldn't have very Python-like syntax. After reading this, though, which formalized what I've long felt is the biggest problem with concurrent programming, I'd have to say it'd definitely not be Python itself. For the record, I disagree strongly with the "let's keep concurrency in the libraries" idea. I want to program in *Python*, dangit. Or at least something that feels a lot like it. Neil From jcarlson at uci.edu Mon Apr 2 08:27:34 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Sun, 01 Apr 2007 23:27:34 -0700 Subject: [Python-ideas] Python and Concurrency In-Reply-To: <461081A2.50507@cs.byu.edu> References: <4602354F.1010003@acm.org> <461081A2.50507@cs.byu.edu> Message-ID: <20070401223816.049E.JCARLSON@uci.edu> Neil Toronto wrote: (I'm going to rearrange your post so that my reply flows a bit better) > There's no reason a program with partial flow control couldn't have very > Python-like syntax. After reading this, though, which formalized what > I've long felt is the biggest problem with concurrent programming, I'd > have to say it'd definitely not be Python itself. It depends on what operations one wants to support. "Apply this function to all of this data" is easy. To say things like 'line x depends on line x-2, and line x+1 depends on line x-1, and line x+2 depends on line x and x+1, certainly that is not easy. But I question the purpose of being able to offer up that kind of information (in Python specifically). Presumably it is so that those tasks that don't depend on each other could be executed in parallel; but unless you have a method by which parallel execution is fast (or at least faster than just doing it in series), it's not terribly useful (especially if those operations are data structure manipulations that need to be propagated back to the 'main process'). > There's an elephant-in-the-living-room UI problem, here: how would one > go about extracting a partial order from a programmer? A text editor is > fine for a total order, but I can't think of how I'd use one non-messily > to define a partial order. How about a Gantt chart for a partial order, > or some other kind of dependency diagram? How would you make it as easy > to use as a text editor? The funny thing is, once you solve this > problem, it may even be *easier* to program this way, because rather > than maintaining the partial order in your head (or inferring it from a > total order in the code), it'd be right in front of you. Generally, the standard way of defining a partial order is via dependency graph. Unfortunately, breaking blocks of code into a dependency graph (or partial-order control-flow) tends to make the code hard to understand. I know there are various tools that use this particular kind of method, but those that I have seen leave much to be desired. Alternatively, there is a huge amount of R&D that has gone into C/C++ compilers to extract this information automatically from source code, and even more on the hardware end of things to automatically extract this information from machine code as it executes. Unfortunately, due to Python's dynamic nature, even something as simple as 'i += 0' can lead to all sorts of underlying system changes, and we may not be able to reliably extract this information (though PyPy with the LLVM backend may offer opportunities here). > For the record, I disagree strongly with the "let's keep concurrency in > the libraries" idea. I want to program in *Python*, dangit. Or at least > something that feels a lot like it. And leaving concurrency in a library allows Python to stay Python. For certain tasks, one merely needs parallel variants of currently existing Python functions/operations. Take Google's MapReduce [1], which applies a function to a large number of data elements in parallel, then combines the results of those computations. While it is not universal, it can do certain operations quickly. Other tasks merely require the execution of *some function* while *some other function* is executing. Free threading, and various ways of allowing concurrent thread execution has been offered, but the more I read about the Processing package, the more I like it. These options don't offer a solution to what you seem to be wanting; an easy definition of partial order on code to be executed in Python. However, without language-level support for something like... exec lines in *block in *parallel: i += 1 j += fcn(foo) bar = fcn2(bar) ...I don't see how it is possible. Then again, I'm not sure I completely agree with Mr. Edwards or yourself in that being able to state partial ordering will offer improvements over the status quo. Then again, I tend to not worry about the blocks of 3-4 lines that aren't dependent on one another, as much as the entire function suite returning what I intended it to. - Josiah [1] http://labs.google.com/papers/mapreduce.html From rrr at ronadam.com Mon Apr 2 11:01:59 2007 From: rrr at ronadam.com (Ron Adam) Date: Mon, 02 Apr 2007 04:01:59 -0500 Subject: [Python-ideas] Python and Concurrency In-Reply-To: <461081A2.50507@cs.byu.edu> References: <4602354F.1010003@acm.org> <461081A2.50507@cs.byu.edu> Message-ID: <4610C687.7090603@ronadam.com> Neil Toronto wrote: > There's an elephant-in-the-living-room UI problem, here: how would one > go about extracting a partial order from a programmer? Beat him or her with a stick? Just kidding of course. ;-) I think what you mean is how can we make it easier for a programmer to express their intention. One way is to provide a rich set of alternatives in extension modules and letting them choose. The ones that work will bubble up to the top, and the hard to manage and maintain choices will either be improved or forgotten. > A text editor is fine for a total order, but I can't think of how I'd > use one non-messily to define a partial order. How about a Gantt chart > for a partial order, or some other kind of dependency diagram? How would > you make it as easy to use as a text editor? The funny thing is, once > you solve this problem, it may even be *easier* to program this way, > because rather than maintaining the partial order in your head (or > inferring it from a total order in the code), it'd be right in front of > you. Well, you wouldn't want to interleave several tasks instructions together in any fixed (or otherwise) way. That definitely would not be anything I would want to maintain. Being able to define serial blocks of code to execute in a parallel fashion can make some things easier to express because it gives you another way you can group related code together instead of having to split op, or combine unrelated, code together because of ordering dependencies. But addressing your partial order concerns, most likely you will have parallel structures communicating to one another with no apparent predefined order. The ordering could be completely dependent on the data they get and send to each other, and completely dependent on outside events. Think of tasks that can open multiple communication channels to other tasks as needed. What order would these be executed in? Who knows! And you may not need to know. It may be that a partial-order execution order could be considered a subset of indeterminate execution order. > There's no reason a program with partial flow control couldn't have very > Python-like syntax. After reading this, though, which formalized what > I've long felt is the biggest problem with concurrent programming, I'd > have to say it'd definitely not be Python itself. I think it would still be python. From what I see Python will continue to be improved on for quite some time. But these ideas must prove themselves to be pythonic before they get put in python. (My spell checker picked out polyphonic for pythonic... PolyPython?) > For the record, I disagree strongly with the "let's keep concurrency in > the libraries" idea. I want to program in *Python*, dangit. Or at least > something that feels a lot like it. My guess, (if/when this is ever added), it will most likely be a combination of some basic supporting enhancements to names and objects so that they can work with task libraries better, along with one or more tasks/multi-processing libraries. If it turns out that the use of some of these ideas becomes both frequent and common. Then syntax similar to the 'with' statement might find support. But all of this is still quite a ways off unless some (yet to be identified) overwhelming need pushes it forward. Just my two cents, _Ron From jason.orendorff at gmail.com Mon Apr 2 15:07:58 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Mon, 2 Apr 2007 09:07:58 -0400 Subject: [Python-ideas] Python and Concurrency In-Reply-To: <461081A2.50507@cs.byu.edu> References: <4602354F.1010003@acm.org> <461081A2.50507@cs.byu.edu> Message-ID: On 4/2/07, Neil Toronto wrote: > I don't agree with everything in the post, but this part I do: > > [...] It is up to the > programmer to mentally compile (by a topological sort) these implicit > dependencies into a total order expressed in a control flow. The fancy phrase "topological sort" here obscures that this "compilation" is something humans are good at. We do it all the time. We make plans and carry them out. -j From jimjjewett at gmail.com Mon Apr 2 18:07:30 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 2 Apr 2007 12:07:30 -0400 Subject: [Python-ideas] Thread IPC idea: Quipe? Sockqueue? (Re: Python and Concurrency) In-Reply-To: <460F4DB2.7030600@ronadam.com> References: <4602354F.1010003@acm.org> <46062BC9.2020208@acm.org> <460B23FE.2010309@canterbury.ac.nz> <460C0E69.9060007@ronadam.com> <460F4DB2.7030600@ronadam.com> Message-ID: On 4/1/07, Ron Adam wrote: > Jim Jewett wrote: > > And there is where it starts to fall apart. Though if you look at the > > pypy dict and interpreter optimizations, they have started to deal > > with it through versioning types. > > I didn't find anything about "versioning" at these links. Did I miss it? > http://codespeak.net/pypy/dist/pypy/doc/interpreter-optimizations.html#multi-dicts > > http://codespeak.net/pypy/dist/pypy/doc/interpreter-optimizations.html#id23 Sorry; I wasn't pointing to enough of the document. http://codespeak.net/pypy/dist/pypy/doc/interpreter-optimizations.html discussed versions under method caching, just above the Interpreter Optimizations section. -jJ From eoghan at qatano.com Wed Apr 11 11:38:43 2007 From: eoghan at qatano.com (Eoghan Murray) Date: Wed, 11 Apr 2007 10:38:43 +0100 Subject: [Python-ideas] Implicit String Concatenation Message-ID: I heard the call for the P3K PEP April deadline, so I thought I better get this sent off! When I was first exposed to Python, I was delighted that I could do the following; >>> "Hello" ' world' 'Hello world' This turned to confusion when I tried; >>> domain = " world" >>> "hello" domain Syntax Error ... Invalid Syntax My proposal for Python3K is to allow string-concatenation via juxtaposition between string-literals, string-variables and expressions that evaluate to strings. Juxtaposition has some precedence in Python (the example above) and also in the awk programming language. If anyone agrees that this is a good idea, then I'd be happy to write up a PEP explaining why I think that implicit string concatenation is better than overloading the plus operator (which this proposal wouldn't deprecate) and more elegant than template strings or string interpolation. Eoghan -------------- next part -------------- An HTML attachment was scrubbed... URL: From thobes at gmail.com Wed Apr 11 13:01:04 2007 From: thobes at gmail.com (Tobias Ivarsson) Date: Wed, 11 Apr 2007 13:01:04 +0200 Subject: [Python-ideas] from __future__ import function_annotations Message-ID: <9997d5e60704110401k7333a276j3104f69ed0c8ca5a@mail.gmail.com> I am just curiously wondering about the plans for introducing function annotations (PEP 3107). I could not find any information about this in the PEP, neither when I searched the mail archives. The way I see it this feature could be quite interesting to introduce as early as possible since I believe that there are quite a few tools that could benefit from this today. I could for example see Jython using function annotations for declaring methods that are supposed to be accessible from java code. This is done via annotations in the doc string today, and would be a lot clearer using function annotations. Jython could implement this use of function annotations without python supporting it, but that would make the code incompatible between python and Jython, which would be highly unfortunate. Therefore i propose that python adds support for function annotations in version 2.6 via from __future__ import function_annotations This would make the change as compatible as for example @decorators or the with-statement. /Tobias -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Wed Apr 11 16:21:47 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 11 Apr 2007 16:21:47 +0200 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: References: Message-ID: Eoghan Murray schrieb: > I heard the call for the P3K PEP April deadline, so I thought I better > get this sent off! > > When I was first exposed to Python, I was delighted that I could do the > following; >> >> "Hello" ' world' > 'Hello world' > This turned to confusion when I tried; >> >> domain = " world" >> >> "hello" domain > Syntax Error ... Invalid Syntax > > My proposal for Python3K is to allow string-concatenation via > juxtaposition between string-literals, string-variables and expressions > that evaluate to strings. > Juxtaposition has some precedence in Python (the example above) and also > in the awk programming language. No, please! The concatenation of string literals is done in the parser. Your proposal would move that to runtime and introduce a "whitespace operator". How would you spell that? How would you overload it? etc. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From jcarlson at uci.edu Wed Apr 11 17:00:20 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 11 Apr 2007 08:00:20 -0700 Subject: [Python-ideas] from __future__ import function_annotations In-Reply-To: <9997d5e60704110401k7333a276j3104f69ed0c8ca5a@mail.gmail.com> References: <9997d5e60704110401k7333a276j3104f69ed0c8ca5a@mail.gmail.com> Message-ID: <20070411075823.62AB.JCARLSON@uci.edu> "Tobias Ivarsson" wrote: > I am just curiously wondering about the plans for introducing function > annotations (PEP 3107). I could not find any information about this in the > PEP, neither when I searched the mail archives. Function annotations are a Python 3.0 feature. From what I understand, they have a potential implementation, tests, and documentation. You just have to wait until the Alpha comes out. - Josiah From collinw at gmail.com Wed Apr 11 17:01:54 2007 From: collinw at gmail.com (Collin Winter) Date: Wed, 11 Apr 2007 08:01:54 -0700 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: References: Message-ID: <43aa6ff70704110801w2dd29e96j35d087cbee7a384f@mail.gmail.com> On 4/11/07, Georg Brandl wrote: > Eoghan Murray schrieb: [snip] > > My proposal for Python3K is to allow string-concatenation via > > juxtaposition between string-literals, string-variables and expressions > > that evaluate to strings. > > Juxtaposition has some precedence in Python (the example above) and also > > in the awk programming language. > > No, please! The concatenation of string literals is done in the parser. > Your proposal would move that to runtime and introduce a "whitespace operator". > How would you spell that? How would you overload it? etc. A single-width whitespace operator would just be confusing since PEP 3117 will be using zero-width spaces for the None typedef : ) Collin From collinw at gmail.com Wed Apr 11 17:11:02 2007 From: collinw at gmail.com (Collin Winter) Date: Wed, 11 Apr 2007 08:11:02 -0700 Subject: [Python-ideas] from __future__ import function_annotations In-Reply-To: <20070411075823.62AB.JCARLSON@uci.edu> References: <9997d5e60704110401k7333a276j3104f69ed0c8ca5a@mail.gmail.com> <20070411075823.62AB.JCARLSON@uci.edu> Message-ID: <43aa6ff70704110811s36e5d707l634f80783ba7fcf1@mail.gmail.com> On 4/11/07, Josiah Carlson wrote: > "Tobias Ivarsson" wrote: > > I am just curiously wondering about the plans for introducing function > > annotations (PEP 3107). I could not find any information about this in the > > PEP, neither when I searched the mail archives. > > Function annotations are a Python 3.0 feature. From what I understand, > they have a potential implementation, tests, and documentation. You > just have to wait until the Alpha comes out. Backporting annotations to 2.x was discussed in March (http://mail.python.org/pipermail/python-3000/2007-March/006107.html) to generally positive reception. The only possible hiccup would be that annotations wouldn't be supported for tuple parameters, since tuple params won't survive in 3.0 anyway. Collin Winter From tony at PageDNA.com Wed Apr 11 17:24:19 2007 From: tony at PageDNA.com (Tony Lownds) Date: Wed, 11 Apr 2007 08:24:19 -0700 Subject: [Python-ideas] from __future__ import function_annotations In-Reply-To: <9997d5e60704110401k7333a276j3104f69ed0c8ca5a@mail.gmail.com> References: <9997d5e60704110401k7333a276j3104f69ed0c8ca5a@mail.gmail.com> Message-ID: <133B3E94-4E72-4E59-8D21-2056316C210D@PageDNA.com> On Apr 11, 2007, at 4:01 AM, Tobias Ivarsson wrote: > I am just curiously wondering about the plans for introducing > function annotations (PEP 3107). I could not find any information > about this in the PEP, neither when I searched the mail archives. > The way I see it this feature could be quite interesting to > introduce as early as possible since I believe that there are quite > a few tools that could benefit from this today. > I could for example see Jython using function annotations for > declaring methods that are supposed to be accessible from java > code. This is done via annotations in the doc string today, and > would be a lot clearer using function annotations. > Jython could implement this use of function annotations without > python supporting it, but that would make the code incompatible > between python and Jython, which would be highly unfortunate. > Therefore i propose that python adds support for function > annotations in version 2.6 via > from __future__ import function_annotations > This would make the change as compatible as for example @decorators > or the with-statement. > Function annotations PEP is accepted and code has been checked in to p3yk. I don't think there would be much support for the syntax in 2.6, but I could be wrong. A more palatable compatibility strategy may be to introduce a decorator that sets function.__annotations__, so that these two function definitions would have equivalent annotations: >>> @annotate(int, int, returns=int) ... def gcd1(m, n): ... etc >>> def gcd2(m: int, n: int) -> int: ... etc It's easier for a decorator to be compatible with run-time semantics, and more likely to avoid syntax questions, than embedding the annotattions in docstrings. Source code conversion (2to3) could change the @annotate decorator form to the in-line function form. Tools could be written to use either annotation set. What do y'all think? -Tony From g.brandl at gmx.net Wed Apr 11 18:10:23 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 11 Apr 2007 18:10:23 +0200 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: <43aa6ff70704110801w2dd29e96j35d087cbee7a384f@mail.gmail.com> References: <43aa6ff70704110801w2dd29e96j35d087cbee7a384f@mail.gmail.com> Message-ID: Collin Winter schrieb: > On 4/11/07, Georg Brandl wrote: >> Eoghan Murray schrieb: > [snip] >> > My proposal for Python3K is to allow string-concatenation via >> > juxtaposition between string-literals, string-variables and expressions >> > that evaluate to strings. >> > Juxtaposition has some precedence in Python (the example above) and also >> > in the awk programming language. >> >> No, please! The concatenation of string literals is done in the parser. >> Your proposal would move that to runtime and introduce a "whitespace operator". >> How would you spell that? How would you overload it? etc. > > A single-width whitespace operator would just be confusing since PEP > 3117 will be using zero-width spaces for the None typedef : ) Thinking in that directing, NO-BREAK SPACE would be a perfect choice for an operator! Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From collinw at gmail.com Wed Apr 11 18:16:02 2007 From: collinw at gmail.com (Collin Winter) Date: Wed, 11 Apr 2007 09:16:02 -0700 Subject: [Python-ideas] from __future__ import function_annotations In-Reply-To: <133B3E94-4E72-4E59-8D21-2056316C210D@PageDNA.com> References: <9997d5e60704110401k7333a276j3104f69ed0c8ca5a@mail.gmail.com> <133B3E94-4E72-4E59-8D21-2056316C210D@PageDNA.com> Message-ID: <43aa6ff70704110916odd7df89waf95566f9ea9d19@mail.gmail.com> On 4/11/07, Tony Lownds wrote: > Function annotations PEP is accepted and code has been checked in to > p3yk. I don't think there would be much support for > the syntax in 2.6, but I could be wrong. A more palatable > compatibility strategy may be to introduce a decorator that sets > function.__annotations__, so that these two function definitions > would have equivalent annotations: > > >>> @annotate(int, int, returns=int) > ... def gcd1(m, n): > ... etc > > >>> def gcd2(m: int, n: int) -> int: > ... etc > > It's easier for a decorator to be compatible with run-time semantics, > and more likely to avoid syntax questions, than embedding > the annotattions in docstrings. Source code conversion (2to3) could > change the @annotate decorator form to the in-line function > form. Tools could be written to use either annotation set. > > What do y'all think? Speaking only to the part about 2to3, that sort of conversion would be a pain in the ass to write. Even if the @annotate decorator were keyword-args only (allowing positional args complicates the implementation more than you would expect), it would still probably be quicker/easier/more accurate just to port the 3.0 annotations implementation to 2.6. Collin Winter From eoghan at qatano.com Wed Apr 11 18:23:46 2007 From: eoghan at qatano.com (Eoghan Murray) Date: Wed, 11 Apr 2007 17:23:46 +0100 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: <43aa6ff70704110801w2dd29e96j35d087cbee7a384f@mail.gmail.com> References: <43aa6ff70704110801w2dd29e96j35d087cbee7a384f@mail.gmail.com> Message-ID: Hi guys, Thanks for your replies: On 4/11/07, Georg Brandl wrote: > [snip] > > No, please! The concatenation of string literals is done in the parser. Your proposal would move that to runtime [snip...] An implementation detail? [...snip] and introduce a "whitespace operator". > How would you spell that? How would you overload it? etc. This is exactly what I'm proposing. You could spell it __juxta__ short for juxtaposition or __concat__, and overload it as usual :-) On 11/04/07, Collin Winter wrote: > A single-width whitespace operator would just be confusing since PEP > 3117 will be using zero-width spaces for the None typedef : ) 3117 looks cool, but it is in draft stages so needn't factor. Anyone with any positive reactions? Eoghan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcarlson at uci.edu Wed Apr 11 18:51:14 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 11 Apr 2007 09:51:14 -0700 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: References: <43aa6ff70704110801w2dd29e96j35d087cbee7a384f@mail.gmail.com> Message-ID: <20070411095003.62B1.JCARLSON@uci.edu> "Eoghan Murray" wrote: > Anyone with any positive reactions? Sorry, only negative from me. - Josiah From tony at pagedna.com Wed Apr 11 19:32:24 2007 From: tony at pagedna.com (Tony Lownds) Date: Wed, 11 Apr 2007 10:32:24 -0700 Subject: [Python-ideas] from __future__ import function_annotations In-Reply-To: <43aa6ff70704110916odd7df89waf95566f9ea9d19@mail.gmail.com> References: <9997d5e60704110401k7333a276j3104f69ed0c8ca5a@mail.gmail.com> <133B3E94-4E72-4E59-8D21-2056316C210D@PageDNA.com> <43aa6ff70704110916odd7df89waf95566f9ea9d19@mail.gmail.com> Message-ID: <5745BD18-FFEB-49B2-8E4E-FAC11516F599@pagedna.com> On Apr 11, 2007, at 9:16 AM, Collin Winter wrote: > Speaking only to the part about 2to3, that sort of conversion would be > a pain in the ass to write. Even if the @annotate decorator were > keyword-args only (allowing positional args complicates the > implementation more than you would expect), it would still probably be > quicker/easier/more accurate just to port the 3.0 annotations > implementation to 2.6. Ok. +1 backporting the syntax from me, FWIW -Tony From jason.orendorff at gmail.com Wed Apr 11 20:01:59 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Wed, 11 Apr 2007 14:01:59 -0400 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: References: <43aa6ff70704110801w2dd29e96j35d087cbee7a384f@mail.gmail.com> Message-ID: On 4/11/07, Eoghan Murray wrote: > This is exactly what I'm proposing. You could spell it __juxta__ short for > juxtaposition or __concat__, and overload it as usual :-) And if __juxta__ is not defined, it should fall back first on __call__, then __mul__, then __add__. If it binds right-to-left, you could write things like from math import * print (2 sin x + cos x) We might as well make newlines an operator at the same time. There's precedent for this in Haskell, and good synergy--adding the STM monad to Python would solve the GIL problem. You could spell that operator __bind__ or just __>>=__, take your pick. And I think Guido already committed to ripping out the @decorator syntax in Py3k in favor of comment overloading, via __rem__(). Just kidding, of course... > Anyone with any positive reactions? Eoghan, thanks for taking the time to write. I don't think anyone likes the idea, though. It causes many grammatical problems: should a[0] parse as a.__getitem__(0) or a.__juxta__([0])? What about (foo)(bar)? And while "sin x" would of course mean sin.__juxta__(x), "sin -x" would parse as "sin - x", or sin.__sub__(x). A few extra + signs are a small price to pay. -j From g.brandl at gmx.net Wed Apr 11 20:11:51 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 11 Apr 2007 20:11:51 +0200 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: References: <43aa6ff70704110801w2dd29e96j35d087cbee7a384f@mail.gmail.com> Message-ID: Eoghan Murray schrieb: > Hi guys, > > Thanks for your replies: > > > On 4/11/07, Georg Brandl < g.brandl at gmx.net > > wrote: > [snip] > > No, please! The concatenation of string literals is done in the parser. > > Your proposal would move that to runtime [snip...] > > > An implementation detail? A rather involved "detail". > [...snip] and introduce a "whitespace operator". > How would you spell that? How would you overload it? etc. > > > This is exactly what I'm proposing. You could spell it __juxta__ short > for juxtaposition or __concat__, and overload it as usual :-) > > A single-width whitespace operator would just be confusing since PEP > 3117 will be using zero-width spaces for the None typedef : ) > > > 3117 looks cool, but it is in draft stages so needn't factor. This is a joke, isn't it? You're a bit late... Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From adam at atlas.st Wed Apr 11 20:39:00 2007 From: adam at atlas.st (Adam Atlas) Date: Wed, 11 Apr 2007 14:39:00 -0400 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: <43aa6ff70704110801w2dd29e96j35d087cbee7a384f@mail.gmail.com> References: <43aa6ff70704110801w2dd29e96j35d087cbee7a384f@mail.gmail.com> Message-ID: <7B3BA968-4228-49E2-A590-D84E34550EF4@atlas.st> On 11 Apr 2007, at 11.01, Collin Winter wrote: > On 4/11/07, Georg Brandl wrote: >> No, please! The concatenation of string literals is done in the >> parser. >> Your proposal would move that to runtime and introduce a >> "whitespace operator". >> How would you spell that? How would you overload it? etc. > > A single-width whitespace operator would just be confusing since PEP > 3117 will be using zero-width spaces for the None typedef : ) I propose we use the ASCII character 0x07 (BEL) as the concatenation operator. It's invisible, so your code still looks nice and clean, but you know it's there because your text editor will beep at you every time you pass it. :) (Speaking of PEP 3117, I will fight it to the death unless the typedef for Exception is changed to Unicode character 2620 (SKULL AND CROSSBONES) or 2623 (BIOHAZARD SIGN). Brilliant choice for frozenset, though. No longer need I wonder why the Unicode Consortium saw fit to include a snowman character!) From jimjjewett at gmail.com Wed Apr 11 22:15:54 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 11 Apr 2007 16:15:54 -0400 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: References: Message-ID: On 4/11/07, Eoghan Murray wrote: > When I was first exposed to Python, I was delighted that I could do the > following; > >>> "Hello" ' world' > 'Hello world' > This turned to confusion when I tried; > >>> domain = " world" > >>> "hello" domain > Syntax Error ... Invalid Syntax I would support a proposal to remove the implicit concatenation entirely. I suspect it would be shot down for backwards compatibility (even in Py3K), but from a readability standpoint ... I have never seen a string concatentation that would look worse because of a "+". I *have* seen some bugs where a comma was forgotten, and two arguments got invisibly jammed together. That's a pain to debug in C; in python with default values, the interpreter may not even gripe sensibly. -jJ From jason.orendorff at gmail.com Wed Apr 11 23:03:10 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Wed, 11 Apr 2007 17:03:10 -0400 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: References: Message-ID: On 4/11/07, Jim Jewett wrote: > I have never seen a string concatentation that would look worse > because of a "+". > > I *have* seen some bugs where a comma was forgotten, and two arguments > got invisibly jammed together. That's a pain to debug in C; in python > with default values, the interpreter may not even gripe sensibly. Oh. I just realized this happens a lot out here. Where I work, we use scons, and each SConscript has a long list of filenames: sourceFiles = [ 'foo.c', 'bar.c', #...many lines omitted... 'q1000x.c'] It's a common mistake to leave off a comma, and then scons complains that it can't find 'foo.cbar.c'. This is pretty bewildering behavior even if you *are* a Python programmer, and not everyone here is. -j From g.brandl at gmx.net Wed Apr 11 23:06:54 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 11 Apr 2007 23:06:54 +0200 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: References: Message-ID: Jason Orendorff schrieb: > On 4/11/07, Jim Jewett wrote: >> I have never seen a string concatentation that would look worse >> because of a "+". >> >> I *have* seen some bugs where a comma was forgotten, and two arguments >> got invisibly jammed together. That's a pain to debug in C; in python >> with default values, the interpreter may not even gripe sensibly. > > Oh. I just realized this happens a lot out here. Where I work, we > use scons, and each SConscript has a long list of filenames: > > sourceFiles = [ > 'foo.c', > 'bar.c', > #...many lines omitted... > 'q1000x.c'] > > It's a common mistake to leave off a comma, and then scons complains > that it can't find 'foo.cbar.c'. This is pretty bewildering behavior > even if you *are* a Python programmer, and not everyone here is. I think that convinces me to support the removal. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From adam at atlas.st Wed Apr 11 23:09:22 2007 From: adam at atlas.st (Adam Atlas) Date: Wed, 11 Apr 2007 17:09:22 -0400 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: References: Message-ID: On 11 Apr 2007, at 16.15, Jim Jewett wrote: > I would support a proposal to remove the implicit concatenation > entirely. I'd agree with that. The parser can probably just do the same optimization automatically if it gets [string literal] "+" [string literal]. (Or does it already?) Meanwhile, on a similar subject, I have a... strange idea. I'm not sure how easy/hard it would be to parse or how necessary it is, but it's just a thought. Currently, you can do multiline strings a couple of ways: x = '''foo bar baz''' x = 'foo' \ 'bar' \ 'baz' Neither of these seem ideal. Triple-quoting is decent, but it can get ugly if you're using it in an indented block (as you most often will be), since the following lines are considered to start right after the newline, not after the containing block's indentation level. But changing it to the latter behaviour has been discussed before, if I remember correctly, and that didn't seem popular. That's understandable; the current triple-quote multiline behaviour makes sense from a logical point of view, it just doesn't look as nice to have text suddenly fall down to 0 indentation and then jump back to the original indentation level when the quote is over. So anyway, what I'm proposing is the following: x = 'foo 'bar 'baz' In other words, if you start a ' or "-quoted string, and don't close it at the end of the line, you can continue it on the next line. It would be generally equivalent to appending \n, closing the quote, and preceding the physical newline with a backslash. (And inserting a plus sign, if we take Jim's proposal into account.) Not closing a quote and doing something else on the next line (i.e. not starting it with a quote character after any whitespace) would still be a syntax error. This style takes precedent from multi-paragraph quoting style in English: if you end a paragraph without closing a quote, then you continue it by starting the next one with a quote, and you can continue like that until you do have an end-quote. I think it would improve readability/writability for when you need to include multiline text blocks or code blocks. Having to have that \n"+ \ at the end of each line really breaks up the flow, whether of a block of human or computer language text. And having subsequent lines fall to 0 indentation (if you choose to use triple-quotes) breaks up the flow of the surrounding Python code. This seems like a good solution, especially since it has precedent in written English. Any thoughts? From jcarlson at uci.edu Thu Apr 12 00:19:29 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 11 Apr 2007 15:19:29 -0700 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: References: Message-ID: <20070411151507.62BA.JCARLSON@uci.edu> Adam Atlas wrote: [snip] > Currently, you can do multiline strings a couple of ways: > x = '''foo > bar > baz''' > x = 'foo' \ > 'bar' \ > 'baz' [snip] > x = 'foo > 'bar > 'baz' That's a horrible idea. It's even worse than the 'space implies concatenation' suggestion made earlier. If you want to get rid of indentation in the case of... x = '''foo bar baz''' use textwrap.dedent and friends. - Josiah From greg.ewing at canterbury.ac.nz Thu Apr 12 00:48:36 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 12 Apr 2007 10:48:36 +1200 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: References: Message-ID: <461D65C4.4070108@canterbury.ac.nz> Georg Brandl wrote: > Your proposal would move that to runtime and introduce a "whitespace operator". > How would you spell that? How would you overload it? etc. Using the ____() method, obviously. :-) But seriously, there is no way this is going to fly. Python is not Perl or awk (or SNOBOL). -- Greg From tjreedy at udel.edu Wed Apr 11 23:54:11 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 11 Apr 2007 17:54:11 -0400 Subject: [Python-ideas] Implicit String Concatenation References: Message-ID: "Adam Atlas" wrote in message news:BCDF57D3-8555-4230-8ABD-8419A41A8E1C at atlas.st... | | On 11 Apr 2007, at 16.15, Jim Jewett wrote: | > I would support a proposal to remove the implicit concatenation | > entirely. Raymond H. is proposing this for Py3. | I'd agree with that. The parser can probably just do the same | optimization automatically if it gets [string literal] "+" [string | literal]. (Or does it already?) He says it does (not sure which version he meant). | what I'm proposing is the following: | | x = 'foo | 'bar | 'baz' -1 Looks ugly to me ;-) tjr From jan.kanis at phil.uu.nl Thu Apr 12 11:53:58 2007 From: jan.kanis at phil.uu.nl (Jan Kanis) Date: Thu, 12 Apr 2007 11:53:58 +0200 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: References: Message-ID: On Wed, 11 Apr 2007 23:54:11 +0200, Terry Reedy wrote: > | what I'm proposing is the following: > | > | x = 'foo > | 'bar > | 'baz' > > -1 Looks ugly to me ;-) Indeed, I don't really like this syntax. I do like if there'd be a way to spell 'multiline string with indentation chopped off'. The easiest way (syntax-wise) would be to just have tripple quote do that, but that's gonna give backward compat problems. Jan From eoghan at qatano.com Thu Apr 12 13:34:18 2007 From: eoghan at qatano.com (Eoghan Murray) Date: Thu, 12 Apr 2007 12:34:18 +0100 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: <7B3BA968-4228-49E2-A590-D84E34550EF4@atlas.st> References: <43aa6ff70704110801w2dd29e96j35d087cbee7a384f@mail.gmail.com> <7B3BA968-4228-49E2-A590-D84E34550EF4@atlas.st> Message-ID: On 11/04/07, Adam Atlas wrote: > > On 11 Apr 2007, at 11.01, Collin Winter wrote: > > > On 4/11/07, Georg Brandl wrote: > >> No, please! The concatenation of string literals is done in the > >> parser. > >> Your proposal would move that to runtime and introduce a > >> "whitespace operator". > >> How would you spell that? How would you overload it? etc. > > > > A single-width whitespace operator would just be confusing since PEP > > 3117 will be using zero-width spaces for the None typedef : ) > > I propose we use the ASCII character 0x07 (BEL) as the concatenation > operator. It's invisible, so your code still looks nice and clean, > but you know it's there because your text editor will beep at you > every time you pass it. :) > LOL, I'll reply to the funniest put down! The rationale for this is that Python should have one definitive way of concatenating strings. I dislike '+' as a string concatenation operator as I think overloading the meaning of '+' for non-numbers is ugly, and I dislike '%s' string formatting as it perpetuates perhaps obscure C syntax, as well as shunting the variables to the end of the line - hard for a human to parse. Given that __juxta__ isn't going to fly, +1 for complete removal of implicit string concatenation in Py3k Eoghan From adam at atlas.st Thu Apr 12 16:11:40 2007 From: adam at atlas.st (Adam Atlas) Date: Thu, 12 Apr 2007 10:11:40 -0400 Subject: [Python-ideas] Regular expression algorithms Message-ID: Has anyone seen this article? http://swtch.com/~rsc/regexp/regexp1.html Are its criticisms of Python's regex algorithm accurate? If so, might it be possible to revise Python's `re` module to use this sort of algorithm? I noticed it says that this approach doesn't work if the pattern contains backreferences, but maybe the module could at least sort of self-optimize by switching to this method when no backrefs are used. From ntoronto at cs.byu.edu Thu Apr 12 17:39:47 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Thu, 12 Apr 2007 09:39:47 -0600 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: References: Message-ID: <461E52C3.8030907@cs.byu.edu> Jan Kanis wrote: > On Wed, 11 Apr 2007 23:54:11 +0200, Terry Reedy wrote: > >> | what I'm proposing is the following: >> | >> | x = 'foo >> | 'bar >> | 'baz' >> >> -1 Looks ugly to me ;-) >> > > Indeed, I don't really like this syntax. I do like if there'd be a way to > spell 'multiline string with indentation chopped off'. The easiest way > (syntax-wise) would be to just have tripple quote do that, but that's > gonna give backward compat problems. > These cases would be fine: a = """Some text. Some more text.""" def f(x): """"Translates x into Hungarian. Does it quite badly.""" pass This wouldn't: a = """Some text. Some intentionally indented text.""" How often do people rely on those tabs or spaces being preserved? Neil From jcarlson at uci.edu Thu Apr 12 18:04:57 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Thu, 12 Apr 2007 09:04:57 -0700 Subject: [Python-ideas] Regular expression algorithms In-Reply-To: References: Message-ID: <20070412084641.62C6.JCARLSON@uci.edu> Adam Atlas wrote: > > Has anyone seen this article? > http://swtch.com/~rsc/regexp/regexp1.html Yes, it has been posted in the sourceforge tracker as a feature request, in python-dev, and now here. > Are its criticisms of Python's regex algorithm accurate? In the worst-case, yes, Python's regular expression runs in O(2^n) time (where n is the length of the string you are searching). However, as stated in the sourceforge entry, and has been stated in other places, one has to write a fairly useless regular expression to get it into the O(2^n) running time. For many cases, Python's regular expression engine is quite competitive with the Thompson NFA. > If so, might > it be possible to revise Python's `re` module to use this sort of > algorithm? I noticed it says that this approach doesn't work if the > pattern contains backreferences, but maybe the module could at least > sort of self-optimize by switching to this method when no backrefs > are used. It is possible, but only if someone takes the time to offer a patch. One thing to remember is that as stated in the documentation for Python's re module, certain operators are "greedy", that is, a* will pick up as many a's as it possibly can. Where as a base Thompson NFA will move on to the next state as early as possible, making a* with Thompson analagous to a*? in the Python (and others') regular expression engine. Yes, the Thompson NFA can be modified to also be greedy, but that is a particular characteristic of Python's engine that a Thompson NFA based engine will have to emulate (along with groups, named references, etc., which are a PITA for non-recursive engines). - Josiah From jimjjewett at gmail.com Thu Apr 12 18:11:01 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 12 Apr 2007 12:11:01 -0400 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: <461E52C3.8030907@cs.byu.edu> References: <461E52C3.8030907@cs.byu.edu> Message-ID: On 4/12/07, Neil Toronto wrote: > Jan Kanis wrote: > > On Wed, 11 Apr 2007 23:54:11 +0200, Terry Reedy wrote: > > Indeed, I don't really like this syntax. I do like if there'd be a way to > > spell 'multiline string with indentation chopped off'. Most of the time, the extra indents are OK. And if they aren't, it is usually OK to start the string with a blank line. (So everything is aligned to left, at least.) Would textwrap.dedent do what you wanted (if it were added to __all__)? Should it have a mode to skip the first line? Should there be a TextWrapper expose it somehow? (My thought would be to optionally call it from within _munge_whitespace.) > a = """Some text. > Some intentionally indented text.""" > How often do people rely on those tabs or spaces being preserved? For doctests, mainly, so a consistent change would be OK ... but triple quoted strings are supposed to be almost exactly WYSIWYG. -jJ From rrr at ronadam.com Thu Apr 12 20:10:52 2007 From: rrr at ronadam.com (Ron Adam) Date: Thu, 12 Apr 2007 13:10:52 -0500 Subject: [Python-ideas] Thread IPC idea: Quipe? Sockqueue? (Re: Python and Concurrency) In-Reply-To: References: <4602354F.1010003@acm.org> <46062BC9.2020208@acm.org> <460B23FE.2010309@canterbury.ac.nz> <460C0E69.9060007@ronadam.com> <460F4DB2.7030600@ronadam.com> Message-ID: <461E762C.6040100@ronadam.com> Jim Jewett wrote: > On 4/1/07, Ron Adam wrote: >> Jim Jewett wrote: > >> > And there is where it starts to fall apart. Though if you look at the >> > pypy dict and interpreter optimizations, they have started to deal >> > with it through versioning types. >> >> I didn't find anything about "versioning" at these links. Did I miss it? > >> http://codespeak.net/pypy/dist/pypy/doc/interpreter-optimizations.html#multi-dicts >> > >> > >> http://codespeak.net/pypy/dist/pypy/doc/interpreter-optimizations.html#id23 >> > > Sorry; I wasn't pointing to enough of the document. > > http://codespeak.net/pypy/dist/pypy/doc/interpreter-optimizations.html > > discussed versions under method caching, just above the Interpreter > Optimizations section. Thanks, Found it. (Been busy with other things.) I may also depend on what abstraction level is desired. A high level of abstraction would hide all of this under the covers and it would be done transparently. A lower level would provide the tools needed to do it with, but also have the property of "if it hurts, don't do that." Ron From greg.ewing at canterbury.ac.nz Fri Apr 13 01:27:44 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 13 Apr 2007 11:27:44 +1200 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: References: Message-ID: <461EC070.9060802@canterbury.ac.nz> For Py3k, how about changing the definition of triple quoted strings so that indentation is stripped up to the level of the line where the string began? In other words, apply an implicit dedent() to it in the parser. -- Greg From adam at atlas.st Fri Apr 13 01:39:05 2007 From: adam at atlas.st (Adam Atlas) Date: Thu, 12 Apr 2007 19:39:05 -0400 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: <461EC070.9060802@canterbury.ac.nz> References: <461EC070.9060802@canterbury.ac.nz> Message-ID: On 12 Apr 2007, at 19.27, Greg Ewing wrote: > For Py3k, how about changing the definition of triple > quoted strings so that indentation is stripped up > to the level of the line where the string began? I'm almost positive that's been discussed before. Can anyone find a link? From greg.ewing at canterbury.ac.nz Fri Apr 13 02:12:32 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 13 Apr 2007 12:12:32 +1200 Subject: [Python-ideas] Regular expression algorithms In-Reply-To: <20070412084641.62C6.JCARLSON@uci.edu> References: <20070412084641.62C6.JCARLSON@uci.edu> Message-ID: <461ECAF0.1010408@canterbury.ac.nz> Josiah Carlson wrote: > a base Thompson NFA > will move on to the next state as early as possible, making a* with > Thompson analagous to a*? in the Python Are you sure that's an inherent characteristic of a Thompson NFA? As I understood it, using a Thompson NFA is no different from building an NFA and converting it to a DFA, except it does the conversion lazily. And when using a DFA, whether it matches greedily or not depends on how you drive it. If you stop as soon as you reach the first accepting state, it's non-greedy; if you keep going until you can't go any further, it's greedy. -- Greg From greg.ewing at canterbury.ac.nz Fri Apr 13 02:16:42 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 13 Apr 2007 12:16:42 +1200 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: References: <461E52C3.8030907@cs.byu.edu> Message-ID: <461ECBEA.2050001@canterbury.ac.nz> Jim Jewett wrote: > For doctests, mainly, so a consistent change would be OK ... but > triple quoted strings are supposed to be almost exactly WYSIWYG. But they're *not* WYSIWYG, according to what you naturally "see" when looking at the code. Not sure about anyone else, but what I see is some lines of text that happen to be indented because the're part of a code block. I don't see the indentation as being an intended part of the string. Does anyone have a use case where they *need* the indentation to be preserved? (As opposed to just not caring whether it's there or not.) -- Greg From greg.ewing at canterbury.ac.nz Fri Apr 13 02:46:49 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 13 Apr 2007 12:46:49 +1200 Subject: [Python-ideas] Ideas towards GIL removal Message-ID: <461ED2F9.9020407@canterbury.ac.nz> I've been thinking about some ideas for reducing the amount of refcount adjustment that needs to be done, with a view to making GIL removal easier. 1) Permanent objects In a typical Python program there are many objects that are created at the beginning and exist for the life of the program -- classes, functions, literals, etc. Refcounting these is a waste of effort, since they're never going to go away. So perhaps there could be a way of marking such objects as "permanent" or "immortal". Any refcount operation on a permanent object would be a no-op, so no locking would be needed. This would also have the benefit of eliminating any need to write to the object's memory at all when it's only being read. 2) Objects owned by a thread Python code creates and destroys temporary objects at a high rate -- stack frames, argument tuples, intermediate results, etc. If the code is executed by a thread, those objects are rarely if ever seen outside of that thread. It would be beneficial if refcount operations on such objects could be carried out by the thread that created them without locking. To achieve this, two extra fields could be added to the object header: an "owning thread id" and a "local reference count". (The existing refcount field will be called the "global reference count" in what follows.) An object created by a thread has its owning thread id set to that thread. When adjusting an object's refcount, if the current thread is the object's owning thread, the local refcount is updated without locking. If the object has no owning thread, or belongs to a different thread, the object is locked and the global refcount is updated. The object is considered garbage only when both refcounts drop to zero. Thus, after a decref, both refcounts would need to be checked to see if they are zero. When decrementing the local refcount and it reaches zero, the global refcount can be checked without locking, since a zero will never be written to it until it truly has zero non-local references remaining. I suspect that these two strategies together would eliminate a very large proportion of refcount-related activities requiring locking, perhaps to the point where those remaining are infrequent enough to make GIL removal practical. -- Greg From brett at python.org Fri Apr 13 04:15:28 2007 From: brett at python.org (Brett Cannon) Date: Thu, 12 Apr 2007 19:15:28 -0700 Subject: [Python-ideas] Ideas towards GIL removal In-Reply-To: <461ED2F9.9020407@canterbury.ac.nz> References: <461ED2F9.9020407@canterbury.ac.nz> Message-ID: On 4/12/07, Greg Ewing wrote: > > I've been thinking about some ideas for reducing the > amount of refcount adjustment that needs to be done, > with a view to making GIL removal easier. > > 1) Permanent objects > > In a typical Python program there are many objects > that are created at the beginning and exist for the > life of the program -- classes, functions, literals, > etc. Refcounting these is a waste of effort, since > they're never going to go away. In reality this is true, but obviously not technically true. You could delete a class if you really wanted to. But obviously it rarely happens. So perhaps there could be a way of marking such > objects as "permanent" or "immortal". Any refcount > operation on a permanent object would be a no-op, > so no locking would be needed. This would also have > the benefit of eliminating any need to write to the > object's memory at all when it's only being read. > > 2) Objects owned by a thread > > Python code creates and destroys temporary objects > at a high rate -- stack frames, argument tuples, > intermediate results, etc. If the code is executed > by a thread, those objects are rarely if ever seen > outside of that thread. It would be beneficial if > refcount operations on such objects could be carried > out by the thread that created them without locking. > > To achieve this, two extra fields could be added > to the object header: an "owning thread id" and a > "local reference count". (The existing refcount > field will be called the "global reference count" > in what follows.) > > An object created by a thread has its owning thread > id set to that thread. When adjusting an object's > refcount, if the current thread is the object's owning > thread, the local refcount is updated without locking. > If the object has no owning thread, or belongs to > a different thread, the object is locked and the > global refcount is updated. > > The object is considered garbage only when both > refcounts drop to zero. Thus, after a decref, both > refcounts would need to be checked to see if they > are zero. When decrementing the local refcount and > it reaches zero, the global refcount can be checked > without locking, since a zero will never be written > to it until it truly has zero non-local references > remaining. > > I suspect that these two strategies together would > eliminate a very large proportion of refcount-related > activities requiring locking, perhaps to the point > where those remaining are infrequent enough to make > GIL removal practical. > > I wonder what the overhead is going to be. If for every INCREF or DECREF you have to check that an object is immortal or whether it is a thread-owned object is going to incur at least an 'if' check, if not more. I wonder what the performance hit is going to be. And for the second idea, adding two more fields to every object might be considered expensive by some in terms of memory. Also, how would this scenario be handled: object foo is created in thread A (does it have a local-thread refcount of 1, a global of 1, or are both 1?), is passed to thread B, and then DECREF'ed in thread B as the object is no longer needed by anyone. If the local-thread refcount is 1 then this would not work as it would fail with the global refcount already at 0. But if objects start with a global refcount of 1 but a local refcount of 0 and it is DECREF'ed locally then wouldn't that fail for the same reason? I guess I am wondering how refcounts would be handled when objects cross between threads. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From george.sakkis at gmail.com Fri Apr 13 06:39:41 2007 From: george.sakkis at gmail.com (George Sakkis) Date: Fri, 13 Apr 2007 00:39:41 -0400 Subject: [Python-ideas] iter() on steroids Message-ID: <91ad5bf80704122139h5bb5ec4bmfacd1dc7d685b4eb@mail.gmail.com> I proposed an (admittedly more controversial) version of this a few months back at the py3k list and the reaction was unexpectedly (IMO) negative or indifferent, so I'm wondering if things have changed a bit since. The proposal is to make the the builtin iter() return an object with an API that consists of (most) functions currently at itertools. In addition to saving one "from itertools import chain,islice,..." line in every other module I write these days, an extra bonus of the OO interface is that islice can be replaced with slice syntax and chain with '+' (and/or perhaps the "pipe" character '|'). As a (deliberately involved) example, consider this: # A composite iterator over two files specified as follows: # - each yielded line is right stripped. # - the first 3 lines of the first file are yielded. # - the first line of the second file is skipped and its next 4 lines are yielded # - empty lines (after the right stripping) are filtered out. # - the remaining lines are enumerated. f1,f2 = [iter(open(f)).map(str.rstrip) for f in 'foo.txt','bar.txt'] for i,line in (f1[:3] + f2[1:5]).filter(None).enumerate(): print i,line The equivalent itertools version is left as an exercise to the reader. This is actually backwards compatible and could even go in 2.x if accepted, but I'm focusing on py3K here. Comments ? George PS: FYI, a proof of concept implementation is posted as a recipe at: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498272 From jcarlson at uci.edu Fri Apr 13 06:57:05 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Thu, 12 Apr 2007 21:57:05 -0700 Subject: [Python-ideas] iter() on steroids In-Reply-To: <91ad5bf80704122139h5bb5ec4bmfacd1dc7d685b4eb@mail.gmail.com> References: <91ad5bf80704122139h5bb5ec4bmfacd1dc7d685b4eb@mail.gmail.com> Message-ID: <20070412215622.62E9.JCARLSON@uci.edu> "George Sakkis" wrote: > > I proposed an (admittedly more controversial) version of this a few > months back at the py3k list and the reaction was unexpectedly (IMO) > negative or indifferent, so I'm wondering if things have changed a bit > since. [snip] > Comments ? Still -1. - Josiah From cvrebert at gmail.com Fri Apr 13 07:16:15 2007 From: cvrebert at gmail.com (Chris Rebert) Date: Thu, 12 Apr 2007 22:16:15 -0700 Subject: [Python-ideas] iter() on steroids In-Reply-To: <91ad5bf80704122139h5bb5ec4bmfacd1dc7d685b4eb@mail.gmail.com> References: <91ad5bf80704122139h5bb5ec4bmfacd1dc7d685b4eb@mail.gmail.com> Message-ID: <461F121F.2020208@gmail.com> George Sakkis wrote: > I proposed an (admittedly more controversial) version of this a few > months back at the py3k list and the reaction was unexpectedly (IMO) > negative or indifferent, so I'm wondering if things have changed a bit > since. > > The proposal is to make the the builtin iter() return an object with > an API that consists of (most) functions currently at itertools. In > addition to saving one "from itertools import chain,islice,..." line > in every other module I write these days, an extra bonus of the OO > interface is that islice can be replaced with slice syntax and chain > with '+' (and/or perhaps the "pipe" character '|'). As a (deliberately > involved) example, consider this: [snipped] > Comments ? +0 on your proposal I just don't see itertools being used often enough to justify your change, but I can see the utility for those instances where it is used heavily. +1 on adding your Iter class (or something similar) to itertools Less controversial and just as succinct/convenient as your proposal (Iter() vs iter()), save another line for the requisite import. - Chris Rebert From talin at acm.org Fri Apr 13 08:51:11 2007 From: talin at acm.org (Talin) Date: Thu, 12 Apr 2007 23:51:11 -0700 Subject: [Python-ideas] Ideas towards GIL removal In-Reply-To: <461ED2F9.9020407@canterbury.ac.nz> References: <461ED2F9.9020407@canterbury.ac.nz> Message-ID: <461F285F.4060003@acm.org> Greg Ewing wrote: > I've been thinking about some ideas for reducing the > amount of refcount adjustment that needs to be done, > with a view to making GIL removal easier. (omitted) I'm thinking along similar lines, but my approach is to eliminate refcounting entirely. (Note that the incref and decref macros could still exist for backwards compatibility, but would do nothing.) Garbage collection for concurrent systems is an active area of research, however it appears that many of the research systems out there have settled on a few basic design parameters. Most of them use a copying collector for the "young" generation (with a separate heap per thread, for exactly the reasons you suggest), and a shared mark-and-sweep heap store for the older generation that uses a traditional free list. For my own amusement and curiosity, I've been playing around with implementing such a collector, using a heap allocator design that's loosely based on the one from dlmalloc, which is an open source malloc implementation with very good overall performance. The idea is to build a stand-alone garbage collection library, similar to the popular Boehm collector, but parallel by design and intended for "cooperative" language interpreters rather than uncooperative languages such as C and C++. Of course, this is purely a hobby-level effort, and I admit that I really have no clue as to what I am doing here - the real point of the exercise is to educate myself about the problem space, not to actually produce a working library, although that might be a possible side effect (equally likely is that I'll never finish it.) I doubt that an untutored amateur such as myself can actually create a robust, efficient parallel implementation, given how hard such programming actually is, and how inexperienced I am. But it's interesting and enjoyable to work on, and that's the only reason I need. (And also produces some interesting side effects - I wasn't happy with the various graphical front-ends to gdb, so I took a couple of days off on wrote one using wxPython :) Now, all that being said, even if such a GC library were to exist, that is a long way from removal of the GIL, although it is a necessary step. For example, take the case of a dictionary in which more than one thread is inserting values. Clearly, that will require a lock or some other mechanism to prevent corruption of the hash table as it is updated. I think we want to avoid the Java situation where every object has its own lock. Instead, we'd have to require the user to provide a lock around that insertion operation. But what about dictionaries that the user isn't aware of, such as class methods and module contents? In a world without a GIL, what kind of steps need to be taken to insure that shared data structures can be updated without creating chaos? -- Talin From jcarlson at uci.edu Fri Apr 13 09:34:33 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 13 Apr 2007 00:34:33 -0700 Subject: [Python-ideas] Ideas towards GIL removal In-Reply-To: References: <461ED2F9.9020407@canterbury.ac.nz> Message-ID: <20070412224427.62F5.JCARLSON@uci.edu> "Brett Cannon" wrote: > On 4/12/07, Greg Ewing wrote: > > > > I've been thinking about some ideas for reducing the > > amount of refcount adjustment that needs to be done, > > with a view to making GIL removal easier. > > > > 1) Permanent objects > > > > In a typical Python program there are many objects > > that are created at the beginning and exist for the > > life of the program -- classes, functions, literals, > > etc. Refcounting these is a waste of effort, since > > they're never going to go away. > > > > In reality this is true, but obviously not technically true. You could > delete a class if you really wanted to. But obviously it rarely happens. > > > So perhaps there could be a way of marking such > > objects as "permanent" or "immortal". Any refcount > > operation on a permanent object would be a no-op, > > so no locking would be needed. This would also have > > the benefit of eliminating any need to write to the > > object's memory at all when it's only being read. > > > > 2) Objects owned by a thread > > > > Python code creates and destroys temporary objects > > at a high rate -- stack frames, argument tuples, > > intermediate results, etc. If the code is executed > > by a thread, those objects are rarely if ever seen > > outside of that thread. It would be beneficial if > > refcount operations on such objects could be carried > > out by the thread that created them without locking. > > > > To achieve this, two extra fields could be added > > to the object header: an "owning thread id" and a > > "local reference count". (The existing refcount > > field will be called the "global reference count" > > in what follows.) > > > > An object created by a thread has its owning thread > > id set to that thread. When adjusting an object's > > refcount, if the current thread is the object's owning > > thread, the local refcount is updated without locking. > > If the object has no owning thread, or belongs to > > a different thread, the object is locked and the > > global refcount is updated. > > > > The object is considered garbage only when both > > refcounts drop to zero. Thus, after a decref, both > > refcounts would need to be checked to see if they > > are zero. When decrementing the local refcount and > > it reaches zero, the global refcount can be checked > > without locking, since a zero will never be written > > to it until it truly has zero non-local references > > remaining. > > > > I suspect that these two strategies together would > > eliminate a very large proportion of refcount-related > > activities requiring locking, perhaps to the point > > where those remaining are infrequent enough to make > > GIL removal practical. > > > > > > I wonder what the overhead is going to be. If for every INCREF or DECREF > you have to check that an object is immortal or whether it is a thread-owned > object is going to incur at least an 'if' check, if not more. I wonder what > the performance hit is going to be. The real question is whether the wasteful parallel if branches will be faster or slower than the locking non-parallel increments. > And for the second idea, adding two more fields to every object might be > considered expensive by some in terms of memory. In the worst case, it would double the size of an object (technically, a minimal Python instance can consist of a refcount and a type pointer). In the case of an integer, it would increase its space use by 2/3. > Also, how would this scenario be handled: object foo is created in thread A > (does it have a local-thread refcount of 1, a global of 1, or are both 1?), I would say global 0, thread 1. > is passed to thread B, and then DECREF'ed in thread B as the object is no > longer needed by anyone. If the local-thread refcount is 1 then this would > not work as it would fail with the global refcount already at 0. But if If the object is still being used in thread A, its thread refcount should be at least 1. If thread B decrefs the global refcount, and it becomes 0, then it can check the thread refcount and notice it is nonzero and not deallocate, or if it notices that it *is* zero, then since it already has the GIL (necessary to have decrefed the global refcount), it can pass the object to the deallocator. Now, if thread B is using the object, and the object's thread refcount drops to zero, and thread B passes the object to thread C, then thread C is free to use the thread refcount. It takes a bit of work, but the system could be made to work. However, I am fairly certain that though it would remove the need to have the GIL during some object reference passing, specifically for objects whose whole lifetime is within a single thread, the larger definitions necessary for increfs and decrefs would put more pressure on processor cache, and regardless of locking, cache coherency requirements could ruin performance when two threads were running on different processors (due to cache line alignment). I ran a microbenchmark, but all it seemed to tell me was that dealing with the GIL is slow in multiple threads, but I didn't get conclusive results either way (either positive or negative). - Josiah From jcarlson at uci.edu Fri Apr 13 09:50:11 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 13 Apr 2007 00:50:11 -0700 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: <461ECBEA.2050001@canterbury.ac.nz> References: <461ECBEA.2050001@canterbury.ac.nz> Message-ID: <20070412193258.62E6.JCARLSON@uci.edu> Greg Ewing wrote: > Does anyone have a use case where they *need* > the indentation to be preserved? (As opposed > to just not caring whether it's there or not.) Not personally. I think that telling people to use textwrap.dedent() is sufficient. Generally I'm -.5 on the change. - Josiah From ivan at selidor.net Fri Apr 13 09:43:01 2007 From: ivan at selidor.net (Ivan Vilata i Balaguer) Date: Fri, 13 Apr 2007 09:43:01 +0200 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: <461EC070.9060802@canterbury.ac.nz> References: <461EC070.9060802@canterbury.ac.nz> Message-ID: <20070413074301.GJ16507@tardis.terramar.selidor.net> Greg Ewing (el 2007-04-13 a les 11:27:44 +1200) va dir:: > For Py3k, how about changing the definition of triple > quoted strings so that indentation is stripped up > to the level of the line where the string began? > > In other words, apply an implicit dedent() to it > in the parser. I'd rather make it explicit by using some string prefix a la 'r' or 'u', 'i', for instance: >>> normal_string = """ ... foo ... bar \ ... baz ... """ >>> print repr(normal_string) '\n foo\n bar baz\n' >>> indented_string1 = i""" ... foo ... bar \ ... baz ... """ >>> print repr(indented_string1) 'foo\n bar baz\n' >>> indented_string2 = i"""foo ... bar \ ... baz ... """ >>> print repr(indented_string2) 'foo\n bar baz\n' As you see, strings marked with 'i' are dedented to the outer non-blank character, and their first empty line is ignored. I haven't meditated this much, so some questions come to my mind: * Is it really OK to remove the first empty line? * How would this interact with an 'r' prefix? Should initial space be kept then? (This would effectively disable 'i'.) * Should leading space in a line after a continuation backslash really be removed? Of course the proposal can be made a lot better with some insight. What do you think of the basic idea? :: Ivan Vilata i Balaguer @ Welcome to the European Banana Republic! @ http://www.selidor.net/ @ http://www.nosoftwarepatents.com/ @ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 307 bytes Desc: Digital signature URL: From g.brandl at gmx.net Fri Apr 13 10:40:17 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 13 Apr 2007 10:40:17 +0200 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: <20070412193258.62E6.JCARLSON@uci.edu> References: <461ECBEA.2050001@canterbury.ac.nz> <20070412193258.62E6.JCARLSON@uci.edu> Message-ID: Josiah Carlson schrieb: > Greg Ewing wrote: >> Does anyone have a use case where they *need* >> the indentation to be preserved? (As opposed >> to just not caring whether it's there or not.) > > Not personally. I think that telling people to use textwrap.dedent() is > sufficient. Generally I'm -.5 on the change. I've already suggested at one time that a dedent() method be added to strings, which would make it more obvious, but what is one import... Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From taleinat at gmail.com Fri Apr 13 13:18:32 2007 From: taleinat at gmail.com (Tal Einat) Date: Fri, 13 Apr 2007 14:18:32 +0300 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: References: <461ECBEA.2050001@canterbury.ac.nz> <20070412193258.62E6.JCARLSON@uci.edu> Message-ID: <7afdee2f0704130418q3628c57bp9d59677c22b7065b@mail.gmail.com> Georg Brandl wrote: > > > I've already suggested at one time that a dedent() method be added to > strings, > which would make it more obvious, but what is one import... I'm not sure this is the way to go. IMO string methods should be generic manipulations on strings, and personally I find indenting/dedenting multi-line strings doesn't fit in. For me, a stdlib function is just fine. Ivan Vilata i Balaguer wrote: > I'd rather make it explicit by using some string prefix a la 'r' or 'u', > 'i', for instance: This could be a reasonable solution, but it has some downsides: * It's less readable than a well named function * It's harder to understand for a newbie - a function/method has a docstring, this would have to be looked up in the docs * It's easy to miss while reading code - one small letter making a big difference * It paves the road for making more such string prefixes, and then we'd have to memorize all of them... or consult the docs often -1 from me. -------------- next part -------------- An HTML attachment was scrubbed... URL: From erez27 at gmail.com Fri Apr 13 13:39:45 2007 From: erez27 at gmail.com (Erez Sh.) Date: Fri, 13 Apr 2007 13:39:45 +0200 Subject: [Python-ideas] iter() on steroids In-Reply-To: <91ad5bf80704122139h5bb5ec4bmfacd1dc7d685b4eb@mail.gmail.com> References: <91ad5bf80704122139h5bb5ec4bmfacd1dc7d685b4eb@mail.gmail.com> Message-ID: <3ee1180704130439r7c5657fdq6dd1689c202a01b5@mail.gmail.com> I think it's a great idea. In a language where everything is meant to be object-oriented and easy-to-use, it's funny that iterators (which are becoming increasingly popular) have such a bumpy interface. We wouldn't want to do "from numtools import nadd,nmul ; nadd(1,2)". The claim that iterators aren't "being used often enough to justify your change" is very disturbing considering that in future pythons most default functions will return iterators if possible (such as key()/items()/values() of dict). The WILL be used more than files, and it would be foolish to force the user to do "from filetools import seek, tell, readlines" The specifics of your proposal may need a little improvement here and there, but the idea itself, IMHO, is very good and pythonic. +1 On 4/13/07, George Sakkis wrote: > > I proposed an (admittedly more controversial) version of this a few > months back at the py3k list and the reaction was unexpectedly (IMO) > negative or indifferent, so I'm wondering if things have changed a bit > since. > > The proposal is to make the the builtin iter() return an object with > an API that consists of (most) functions currently at itertools. In > addition to saving one "from itertools import chain,islice,..." line > in every other module I write these days, an extra bonus of the OO > interface is that islice can be replaced with slice syntax and chain > with '+' (and/or perhaps the "pipe" character '|'). As a (deliberately > involved) example, consider this: > > # A composite iterator over two files specified as follows: > # - each yielded line is right stripped. > # - the first 3 lines of the first file are yielded. > # - the first line of the second file is skipped and its next 4 lines > are yielded > # - empty lines (after the right stripping) are filtered out. > # - the remaining lines are enumerated. > > f1,f2 = [iter(open(f)).map(str.rstrip) for f in 'foo.txt','bar.txt'] > for i,line in (f1[:3] + f2[1:5]).filter(None).enumerate(): > print i,line > > The equivalent itertools version is left as an exercise to the reader. > > This is actually backwards compatible and could even go in 2.x if > accepted, but I'm focusing on py3K here. > > Comments ? > > George > > > PS: FYI, a proof of concept implementation is posted as a recipe at: > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498272 > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.orendorff at gmail.com Fri Apr 13 16:54:37 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Fri, 13 Apr 2007 10:54:37 -0400 Subject: [Python-ideas] iter() on steroids In-Reply-To: <91ad5bf80704122139h5bb5ec4bmfacd1dc7d685b4eb@mail.gmail.com> References: <91ad5bf80704122139h5bb5ec4bmfacd1dc7d685b4eb@mail.gmail.com> Message-ID: On 4/13/07, George Sakkis wrote: > f1,f2 = [iter(open(f)).map(str.rstrip) for f in 'foo.txt','bar.txt'] > for i,line in (f1[:3] + f2[1:5]).filter(None).enumerate(): > print i,line George, you've got to pick a better example next time. This one is terrifying. :) -j From gsakkis at rutgers.edu Fri Apr 13 17:32:41 2007 From: gsakkis at rutgers.edu (George Sakkis) Date: Fri, 13 Apr 2007 11:32:41 -0400 Subject: [Python-ideas] iter() on steroids In-Reply-To: References: <91ad5bf80704122139h5bb5ec4bmfacd1dc7d685b4eb@mail.gmail.com> Message-ID: <91ad5bf80704130832y2641a50aj8adb23915e6f4c05@mail.gmail.com> On 4/13/07, Jason Orendorff wrote: > On 4/13/07, George Sakkis wrote: > > f1,f2 = [iter(open(f)).map(str.rstrip) for f in 'foo.txt','bar.txt'] > > for i,line in (f1[:3] + f2[1:5]).filter(None).enumerate(): > > print i,line > > George, you've got to pick a better example next time. This one > is terrifying. :) I know, but the equivalent using itertools is at least as terrifying :-) George From george.sakkis at gmail.com Fri Apr 13 17:49:22 2007 From: george.sakkis at gmail.com (George Sakkis) Date: Fri, 13 Apr 2007 11:49:22 -0400 Subject: [Python-ideas] iter() on steroids In-Reply-To: <461F121F.2020208@gmail.com> References: <91ad5bf80704122139h5bb5ec4bmfacd1dc7d685b4eb@mail.gmail.com> <461F121F.2020208@gmail.com> Message-ID: <91ad5bf80704130849y5117321ev475acf240b4f086f@mail.gmail.com> On 4/13/07, Chris Rebert wrote: > +0 on your proposal > I just don't see itertools being used often enough to justify your > change, but I can see the utility for those instances where it is used > heavily. In some sense it's a chicken-and-egg problem. My guess is that one reason itertools are not used as much as they could/should is that they are "hidden away" in a module, which makes one think it twice before importing it (let alone newbies that don't even know its existence). As a single data point, I'm a big fan itertools and still I'm often lazy to import it to use, say izip() only once; I just go with zip() instead. George From jan.kanis at phil.uu.nl Fri Apr 13 18:51:18 2007 From: jan.kanis at phil.uu.nl (Jan Kanis) Date: Fri, 13 Apr 2007 18:51:18 +0200 Subject: [Python-ideas] Ideas towards GIL removal In-Reply-To: <461F285F.4060003@acm.org> References: <461ED2F9.9020407@canterbury.ac.nz> <461F285F.4060003@acm.org> Message-ID: On Fri, 13 Apr 2007 08:51:11 +0200, Talin wrote: > Now, all that being said, even if such a GC library were to exist, that > is a long way from removal of the GIL, although it is a necessary step. > For example, take the case of a dictionary in which more than one thread > is inserting values. Clearly, that will require a lock or some other > mechanism to prevent corruption of the hash table as it is updated. I > think we want to avoid the Java situation where every object has its own > lock. Instead, we'd have to require the user to provide a lock around > that insertion operation. But what about dictionaries that the user > isn't aware of, such as class methods and module contents? In a world > without a GIL, what kind of steps need to be taken to insure that shared > data structures can be updated without creating chaos? In the case of hashtables, a nonblocking variant could perhaps be an option. There was a nice article on reddit some time ago: http://blogs.azulsystems.com/cliff/2007/03/a_nonblocking_h.html , the guy claims that it's competitive in speed to non-lock protected (so thread unsafe) implementations. Nonblocking algorithms don't exist for all data structures, but perhaps they exist for most ones that are used in the python interpreter? - Jan From steven.bethard at gmail.com Fri Apr 13 19:07:22 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Fri, 13 Apr 2007 11:07:22 -0600 Subject: [Python-ideas] iter() on steroids In-Reply-To: References: <91ad5bf80704122139h5bb5ec4bmfacd1dc7d685b4eb@mail.gmail.com> Message-ID: On 4/13/07, Jason Orendorff wrote: > On 4/13/07, George Sakkis wrote: > > f1,f2 = [iter(open(f)).map(str.rstrip) for f in 'foo.txt','bar.txt'] > > for i,line in (f1[:3] + f2[1:5]).filter(None).enumerate(): > > print i,line > > George, you've got to pick a better example next time. This one > is terrifying. :) Yeah, that's pretty awful. ;-) Maybe a more reasonable example:: # skip the first line for line in iter(fileobj)[1:]: .... where currently you'd write:: # skip the first line fileobj.next() for line in fileobj: ... I'm floating around -0.5 on this one. The itertools functions I use most are chain(), izip(), and count(). None of these are particularly natural as a method of a single iterator object. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From tjreedy at udel.edu Fri Apr 13 20:33:13 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 13 Apr 2007 14:33:13 -0400 Subject: [Python-ideas] iter() on steroids References: <91ad5bf80704122139h5bb5ec4bmfacd1dc7d685b4eb@mail.gmail.com><461F121F.2020208@gmail.com> <91ad5bf80704130849y5117321ev475acf240b4f086f@mail.gmail.com> Message-ID: "George Sakkis" wrote in message news:91ad5bf80704130849y5117321ev475acf240b4f086f at mail.gmail.com... | In some sense it's a chicken-and-egg problem. My guess is that one | reason itertools are not used as much as they could/should is that | they are "hidden away" in a module, which makes one think it twice | before importing it (let alone newbies that don't even know its | existence). As a single data point, I'm a big fan itertools and still | I'm often lazy to import it to use, say izip() only once; I just go | with zip() instead. I personally think there are too many builtins. So I would like some pushed to modules, which means more import statements. Oh, dear. If you have trouble writing 'from itertools import izip' or 'import itertools as it', then I guess it is hard to promote a module. Nonetheless, I think perhaps you should write your own based on iter and itertools. And put it up on PyPI if it works at least for you. Terry Jan Reedy From gsakkis at rutgers.edu Fri Apr 13 22:00:39 2007 From: gsakkis at rutgers.edu (George Sakkis) Date: Fri, 13 Apr 2007 16:00:39 -0400 Subject: [Python-ideas] iter() on steroids In-Reply-To: References: <91ad5bf80704122139h5bb5ec4bmfacd1dc7d685b4eb@mail.gmail.com> <461F121F.2020208@gmail.com> <91ad5bf80704130849y5117321ev475acf240b4f086f@mail.gmail.com> Message-ID: <91ad5bf80704131300h359faeeeqbf585e83fdea6b8a@mail.gmail.com> On 4/13/07, Terry Reedy wrote: > > "George Sakkis" > wrote in message > news:91ad5bf80704130849y5117321ev475acf240b4f086f at mail.gmail.com... > | In some sense it's a chicken-and-egg problem. My guess is that one > | reason itertools are not used as much as they could/should is that > | they are "hidden away" in a module, which makes one think it twice > | before importing it (let alone newbies that don't even know its > | existence). As a single data point, I'm a big fan itertools and still > | I'm often lazy to import it to use, say izip() only once; I just go > | with zip() instead. > > I personally think there are too many builtins. I agree, that's why I don't suggest a new builtin but adding features to an existing one. In fact, this can even *reduce* the builtins since map(), zip() and enumerate() could be removed from the builtin namespace. George From jcarlson at uci.edu Fri Apr 13 23:09:16 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 13 Apr 2007 14:09:16 -0700 Subject: [Python-ideas] iter() on steroids In-Reply-To: <91ad5bf80704131300h359faeeeqbf585e83fdea6b8a@mail.gmail.com> References: <91ad5bf80704131300h359faeeeqbf585e83fdea6b8a@mail.gmail.com> Message-ID: <20070413140557.62FE.JCARLSON@uci.edu> "George Sakkis" wrote: > On 4/13/07, Terry Reedy wrote: > > I personally think there are too many builtins. > > I agree, that's why I don't suggest a new builtin but adding features > to an existing one. In fact, this can even *reduce* the builtins since > map(), zip() and enumerate() could be removed from the builtin > namespace. Map was already going to be removed because it can be replaced by... [f(x) for x in y] Filter was already going to be removed because it can be replaced by... [x for x in y if f(x)] I can't remember if anything was going to happen to zip or any of the other functional programming functions. - Josiah From brett at python.org Sat Apr 14 00:45:54 2007 From: brett at python.org (Brett Cannon) Date: Fri, 13 Apr 2007 15:45:54 -0700 Subject: [Python-ideas] iter() on steroids In-Reply-To: <20070413140557.62FE.JCARLSON@uci.edu> References: <91ad5bf80704131300h359faeeeqbf585e83fdea6b8a@mail.gmail.com> <20070413140557.62FE.JCARLSON@uci.edu> Message-ID: On 4/13/07, Josiah Carlson wrote: > > > "George Sakkis" wrote: > > On 4/13/07, Terry Reedy wrote: > > > I personally think there are too many builtins. > > > > I agree, that's why I don't suggest a new builtin but adding features > > to an existing one. In fact, this can even *reduce* the builtins since > > map(), zip() and enumerate() could be removed from the builtin > > namespace. > > Map was already going to be removed because it can be replaced by... > [f(x) for x in y] > > Filter was already going to be removed because it can be replaced by... > [x for x in y if f(x)] > > I can't remember if anything was going to happen to zip or any of the > other functional programming functions. zip is staying but replaced underneath the covers with itertools.izip. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Sat Apr 14 02:51:04 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 14 Apr 2007 12:51:04 +1200 Subject: [Python-ideas] Ideas towards GIL removal In-Reply-To: References: <461ED2F9.9020407@canterbury.ac.nz> Message-ID: <46202578.9080507@canterbury.ac.nz> Brett Cannon wrote: > In reality this is true, but obviously not technically true. You could > delete a class if you really wanted to. But obviously it rarely happens. And if it does, the worst that will happen is that the original version will hang around, tying up a small amount of memory. > I wonder what the overhead is going to be. If for every INCREF or > DECREF you have to check that an object is immortal or whether it is a > thread-owned object is going to incur at least an 'if' check, if not > more. Clearly, there will be a small increase in overhead. But it may be worth it if it avoids the need for a rather expensive lock/unlock. It was pointed out earlier that, even using the special instructions provided by some processors, this can take a great many times longer than a normal memory access or two. > And for the second idea, adding two more fields to every object might be > considered expensive by some in terms of memory. Again, it's a tradeoff. If it enables removal of the GIL and massive threading on upcoming multi- core CPUs, it might be considered worth the cost. > > Also, how would this scenario be handled: object foo is created in > thread A ... is passed to thread B, and then DECREF'ed in thread B as > the object is no longer needed by anyone. I'll have to think about that. If a thread gives away a reference to another thread, it really needs to be a global reference rather than a local one. The tricky part is telling when this happens. > But if objects start with a global refcount of 1 but a > local refcount of 0 and it is DECREF'ed locally then wouldn't that fail > for the same reason? That one's easier -- if the local refcount is 0 on a decref, you need to lock the object and decrement the global refcount. -- Greg From greg.ewing at canterbury.ac.nz Sat Apr 14 03:26:14 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 14 Apr 2007 13:26:14 +1200 Subject: [Python-ideas] Ideas towards GIL removal In-Reply-To: <461F285F.4060003@acm.org> References: <461ED2F9.9020407@canterbury.ac.nz> <461F285F.4060003@acm.org> Message-ID: <46202DB6.9000802@canterbury.ac.nz> Talin wrote: > I'm thinking along similar lines, but my approach is to eliminate > refcounting entirely. That's a possibility, although refcounting does have some nice properties -- it's cache-friendly, and it's usually fairly easy to get it to work with other libraries that have their own scheme for managing memory and don't know about Python's one. > For example, take the case of a dictionary in which more than one thread > is inserting values. .. I > think we want to avoid the Java situation where every object has its own > lock. Having to lock dictionaries mightn't be so bad, as long as it can be done using special instructions. It's still a much larger-grained locking unit than an incref or decref. But I'm wondering whether the problem might get solved for us from the hardware end if we wait long enough. If we start seeing massively-multicore CPUs, I expect there will be a lot of pressure to come up with more efficient ways of doing fine- grained locking in order to make effective use of them. Maybe a special lump of high-speed multi-port memory used just for locks, with surrounding hardware designed for using it as such. -- Greg From greg.ewing at canterbury.ac.nz Sat Apr 14 03:51:49 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 14 Apr 2007 13:51:49 +1200 Subject: [Python-ideas] Ideas towards GIL removal In-Reply-To: <20070412224427.62F5.JCARLSON@uci.edu> References: <461ED2F9.9020407@canterbury.ac.nz> <20070412224427.62F5.JCARLSON@uci.edu> Message-ID: <462033B5.7030706@canterbury.ac.nz> Josiah Carlson wrote: > If thread B decrefs the global refcount, and it > becomes 0, then it can check the thread refcount and notice it is > nonzero and not deallocate, or if it notices that it *is* zero, then > since it already has the GIL (necessary to have decrefed the global > refcount), it can pass the object to the deallocator. The problem with that is the owning thread needs to be able to manipulate the local refcount without holding any kind of lock. That's the whole point of it. -- Greg From greg.ewing at canterbury.ac.nz Sat Apr 14 03:51:57 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 14 Apr 2007 13:51:57 +1200 Subject: [Python-ideas] iter() on steroids In-Reply-To: <91ad5bf80704122139h5bb5ec4bmfacd1dc7d685b4eb@mail.gmail.com> References: <91ad5bf80704122139h5bb5ec4bmfacd1dc7d685b4eb@mail.gmail.com> Message-ID: <462033BD.8040506@canterbury.ac.nz> George Sakkis wrote: > The proposal is to make the the builtin iter() return an object with > an API that consists of (most) functions currently at itertools. The problem with this kind of thing is that it becomes an arbitrary choice what is included as a method. Anything not included in that choice is left out in the cold and has to be applied as a function anyway. If there were a certain set of iterator algebra functions that were *very* frequently used, there could be an argument for making methods of them. But I think you're overestimating how much the itertools functions are used. Some people may make heavy use of them, but they're not used much in general. If you happen to be a heavy user, there's nothing stopping you from creating your own version of iter() that returns an object with all the methods you want. Let's keep the standard iterator objects clean and simple. -- Greg From greg.ewing at canterbury.ac.nz Sat Apr 14 03:55:51 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 14 Apr 2007 13:55:51 +1200 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: <20070412193258.62E6.JCARLSON@uci.edu> References: <461ECBEA.2050001@canterbury.ac.nz> <20070412193258.62E6.JCARLSON@uci.edu> Message-ID: <462034A7.2070603@canterbury.ac.nz> Josiah Carlson wrote: >>Does anyone have a use case where they *need* >>the indentation to be preserved? > Not personally. I think that telling people to > use textwrap.dedent() is sufficient. But it seems crazy to make people do this all the time, when there's no reason not to do it automatically in the first place. -- Greg From greg.ewing at canterbury.ac.nz Sat Apr 14 04:05:09 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 14 Apr 2007 14:05:09 +1200 Subject: [Python-ideas] iter() on steroids In-Reply-To: <3ee1180704130439r7c5657fdq6dd1689c202a01b5@mail.gmail.com> References: <91ad5bf80704122139h5bb5ec4bmfacd1dc7d685b4eb@mail.gmail.com> <3ee1180704130439r7c5657fdq6dd1689c202a01b5@mail.gmail.com> Message-ID: <462036D5.9080606@canterbury.ac.nz> Erez Sh. wrote: > The claim that iterators aren't "being used often enough to justify your > change" is very disturbing The claim wasn't that iterators are used infrequently, but the functions in the itertools module. The vast majority of the time, the only thing people do with iterators is iterate over them. > considering that in future pythons most > default functions will return iterators if possible (such as > key()/items()/values() of dict). This is wrong -- they won't return iterators, they'll return *views* that can be indexed and otherwise used as sequences or mappings. So this has nothing to do with the proposal at hand. -- Greg From greg.ewing at canterbury.ac.nz Sat Apr 14 04:21:30 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 14 Apr 2007 14:21:30 +1200 Subject: [Python-ideas] iter() on steroids In-Reply-To: <91ad5bf80704130849y5117321ev475acf240b4f086f@mail.gmail.com> References: <91ad5bf80704122139h5bb5ec4bmfacd1dc7d685b4eb@mail.gmail.com> <461F121F.2020208@gmail.com> <91ad5bf80704130849y5117321ev475acf240b4f086f@mail.gmail.com> Message-ID: <46203AAA.9000802@canterbury.ac.nz> George Sakkis wrote: > I'm often lazy to import it to use, say izip() only once; I just go > with zip() instead. I think that range() and zip() are going to return views or iterators in Py3k, so you won't be needing izip() any more. -- Greg From greg.ewing at canterbury.ac.nz Sat Apr 14 04:24:26 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 14 Apr 2007 14:24:26 +1200 Subject: [Python-ideas] iter() on steroids In-Reply-To: References: <91ad5bf80704122139h5bb5ec4bmfacd1dc7d685b4eb@mail.gmail.com> <461F121F.2020208@gmail.com> <91ad5bf80704130849y5117321ev475acf240b4f086f@mail.gmail.com> Message-ID: <46203B5A.9080307@canterbury.ac.nz> Terry Reedy wrote: > So I would like some > pushed to modules, which means more import statements. A middle ground would be to move them into modules, but have the modules pre-imported into the builtin namespace. -- Greg From rrr at ronadam.com Sat Apr 14 04:54:44 2007 From: rrr at ronadam.com (Ron Adam) Date: Fri, 13 Apr 2007 21:54:44 -0500 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: <462034A7.2070603@canterbury.ac.nz> References: <461ECBEA.2050001@canterbury.ac.nz> <20070412193258.62E6.JCARLSON@uci.edu> <462034A7.2070603@canterbury.ac.nz> Message-ID: <46204274.5000907@ronadam.com> Greg Ewing wrote: > Josiah Carlson wrote: > >>> Does anyone have a use case where they *need* >>> the indentation to be preserved? > >> Not personally. I think that telling people to > > use textwrap.dedent() is sufficient. > > But it seems crazy to make people do this all > the time, when there's no reason not to do > it automatically in the first place. Reminds me of ... http://www.artima.com/weblogs/viewpost.jsp?thread=101968 Note that the optional implementation of this has already been put in Python 2.5 just as it said it would be. How about using indenting along with implicit string endings? def foo(...): ``` Just another foo. message = ``` This is a multi- line string + implicit right stripping. print message Just kidding of course. The back-quotes will never be approved. ;-) I don't know what would be the best solution because just about anything I can think of has some sort of side effects in some situations. Maybe if line based editors are ever completely replaced with folding graphic editors it will no longer be a problem because all our multi-line strings can have nice borders around them. Cheers, Ron From jcarlson at uci.edu Sat Apr 14 09:21:54 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Sat, 14 Apr 2007 00:21:54 -0700 Subject: [Python-ideas] Ideas towards GIL removal In-Reply-To: <462033B5.7030706@canterbury.ac.nz> References: <20070412224427.62F5.JCARLSON@uci.edu> <462033B5.7030706@canterbury.ac.nz> Message-ID: <20070414001918.6301.JCARLSON@uci.edu> Greg Ewing wrote: > Josiah Carlson wrote: > > If thread B decrefs the global refcount, and it > > becomes 0, then it can check the thread refcount and notice it is > > nonzero and not deallocate, or if it notices that it *is* zero, then > > since it already has the GIL (necessary to have decrefed the global > > refcount), it can pass the object to the deallocator. > > The problem with that is the owning thread needs to be > able to manipulate the local refcount without holding > any kind of lock. That's the whole point of it. Certainly, but thread B isn't the owning thread, thread A was the owning thread, and by virtue of decrefing its thread count to zero, acquiring the GIL, and checking the global refcount to make sure that either someone else is responsible for its deallocation (global refcount > 0), or that thread A is responsible for its deallocation (global refcount == 0). - Josiah From taleinat at gmail.com Sat Apr 14 18:34:32 2007 From: taleinat at gmail.com (Tal Einat) Date: Sat, 14 Apr 2007 19:34:32 +0300 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: References: Message-ID: <7afdee2f0704140934v410a4f67hb2dec4f8cc682ba@mail.gmail.com> On 4/12/07, Adam Atlas wrote: > > > Meanwhile, on a similar subject, I have a... strange idea. I'm not > sure how easy/hard it would be to parse or how necessary it is, but > it's just a thought. [snip] So anyway, > what I'm proposing is the following: > > x = 'foo > 'bar > 'baz' > > Any > thoughts? -1 on such new syntax. What i usually do is: message = ("yada yada\n" "more yada yada\n" "even more yada.") This works a lot like what you suggest, but with Python's current syntax. If implicit string concatenation were removed, I'd just add a plus sign at the end of each line. This is also a possibility: message = "\n".join([ "yada yada", "more yada yada", "even more yada."]) The latter would work even better with the removal of implicit string concatenation, since forgetting a comma would cause a syntax error instead of skipping a newline. - Tal -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at janc.be Sun Apr 15 04:46:55 2007 From: lists at janc.be (Jan Claeys) Date: Sun, 15 Apr 2007 04:46:55 +0200 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: References: <43aa6ff70704110801w2dd29e96j35d087cbee7a384f@mail.gmail.com> <7B3BA968-4228-49E2-A590-D84E34550EF4@atlas.st> Message-ID: <1176605216.28153.112.camel@localhost> Op donderdag 12-04-2007 om 12:34 uur [tijdzone +0100], schreef Eoghan Murray: > I dislike '+' as a string concatenation operator as I think > overloading the meaning of '+' for non-numbers is ugly, D uses '~' as a string concatenation operator... -- Jan Claeys From adam at atlas.st Sun Apr 15 05:07:56 2007 From: adam at atlas.st (Adam Atlas) Date: Sat, 14 Apr 2007 23:07:56 -0400 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: <1176605216.28153.112.camel@localhost> References: <43aa6ff70704110801w2dd29e96j35d087cbee7a384f@mail.gmail.com> <7B3BA968-4228-49E2-A590-D84E34550EF4@atlas.st> <1176605216.28153.112.camel@localhost> Message-ID: <2F132947-A3F1-4622-91D0-F10BDFC229E2@atlas.st> On 14 Apr 2007, at 22.46, Jan Claeys wrote: > D uses '~' as a string concatenation operator... Eh... I like D, but that would be confusing in Python, since it already uses ~ as a unary operator that means something totally different. From greg.ewing at canterbury.ac.nz Sun Apr 15 13:37:15 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 15 Apr 2007 23:37:15 +1200 Subject: [Python-ideas] Ideas towards GIL removal In-Reply-To: <20070414001918.6301.JCARLSON@uci.edu> References: <20070412224427.62F5.JCARLSON@uci.edu> <462033B5.7030706@canterbury.ac.nz> <20070414001918.6301.JCARLSON@uci.edu> Message-ID: <46220E6B.5060208@canterbury.ac.nz> Josiah Carlson wrote: > Certainly, but thread B isn't the owning thread, thread A was the owning > thread, and by virtue of decrefing its thread count to zero, acquiring > the GIL, and checking the global refcount to make sure that either > someone else is responsible for its deallocation (global refcount > 0), > or that thread A is responsible for its deallocation (global refcount == > 0). Thread B holding the GIL doesn't help, because the local refcount is not covered by the GIL. Thread A must be able to assume it has total ownership of the local refcount, otherwise there's no benefit in the scheme. -- Greg From talin at acm.org Sun Apr 15 19:12:04 2007 From: talin at acm.org (Talin) Date: Sun, 15 Apr 2007 10:12:04 -0700 Subject: [Python-ideas] Ideas towards GIL removal In-Reply-To: <461ED2F9.9020407@canterbury.ac.nz> References: <461ED2F9.9020407@canterbury.ac.nz> Message-ID: <46225CE4.4040207@acm.org> Greg Ewing wrote: > 2) Objects owned by a thread > > Python code creates and destroys temporary objects > at a high rate -- stack frames, argument tuples, > intermediate results, etc. If the code is executed > by a thread, those objects are rarely if ever seen > outside of that thread. It would be beneficial if > refcount operations on such objects could be carried > out by the thread that created them without locking. While we are on the topic of reference counting, I'd like to direct your attention to Recycler, an IBM research project: "The Recycler is a concurrent multiprocessor garbage collector with extremely low pause times (maximum of 6 milliseconds over eight benchmarks) while remaining competitive with the best throughput- oriented collectors in end-to-end execution times. This paper describes the overall architecture of the Recycler, including its use of reference counting and concurrent cycle collection, and presents extensive measurements of the system comparing it to a parallel, stop-the-world mark-and-sweep collector." There are a bunch of research papers describing Recycler which can be found at the following link: http://www.research.ibm.com/people/d/dfb/publications.html I'd start with the papers entitled "Java without the Coffee Breaks: A Non-intrusive Multiprocessor Garbage Collector" and "Concurrent Cycle Collection in Reference Counted Systems". Let me describe a bit about how the Recycler works and how it relates to what you've proposed. The basic idea is that for each thread, there is a set of thread local data (TLD) that contains a pair of "refcount buffers", one buffer for increfs and one buffer for decrefs. Each refcount buffer is a flat array of pointers which starts empty and gradually fills up. The "incref" operation does not actually touch the reference count field of the object. Instead, an "incref" appends a pointer to the object to the end of the incref buffer for that thread. Similarly, a decref operation appends a pointer to the object to the decref buffer. Since the refcount buffers are thread-local, there is no need for locking or synchronization. When one of the buffers gets full, both buffers are swapped out for new ones, and the old buffers are placed on a queue which is processed by the collector thread. The collector thread is the only thread which is allowed to actually touch the reference counts of the individual objects, and its the only thread which is allowed to delete objects. Processing the buffers is relatively simple: First, the incref buffer is processed. The collector thread scans through each pointer in the buffer, and increments the refcount of each object. Then the decref buffer is processed in a similar way, decrementing the refcount. However, it also needs to process the buffers for the other threads before the memory can be reclaimed. Recycler defines a series of "epochs" (i.e. intervals between collections). Within a refcount buffer, each epoch is represented as a contiguous range of values within the array -- all of the increfs and decrefs which occurred during that epoch. An auxiliary array records the high water mark for each epoch. Using this information, the collector thread is able to process only those increfs and decrefs for all threads which occurred before the current epoch. This does mean that objects whose refcount falls to zero during the current epoch will not be deleted until the next collection cycle. Recycler also handles cyclic garbage via cycle detection, which is described in the paper. It does not use a "mark and sweep" type of algorithm, but is instead able to detect cycles locally without scanning the entire heap. Thus, the Recycler's use of refcount buffers achieves what you were trying to achieve, which is refcounting without locking. However, it does require access to thread-local data for each incref / release operation. The performance of this scheme will greatly depend on how quickly the code can get access to thread local data. The fastest possible method would be to dedicate a register, but that's infeasible on most systems. Another idea is for large functions to look up the TLD and stuff it in a local variable at the beginning of the function. For older source code the existing, backwards-compatible incref and decref macros could each individually get access to the TLD, but these would be slower than the more optimized methods in which the TLD was supplied as a parameter. -- Talin From jcarlson at uci.edu Sun Apr 15 20:16:53 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Sun, 15 Apr 2007 11:16:53 -0700 Subject: [Python-ideas] Ideas towards GIL removal In-Reply-To: <46220E6B.5060208@canterbury.ac.nz> References: <20070414001918.6301.JCARLSON@uci.edu> <46220E6B.5060208@canterbury.ac.nz> Message-ID: <20070415111128.6307.JCARLSON@uci.edu> Greg Ewing wrote: > Josiah Carlson wrote: > > > Certainly, but thread B isn't the owning thread, thread A was the owning > > thread, and by virtue of decrefing its thread count to zero, acquiring > > the GIL, and checking the global refcount to make sure that either > > someone else is responsible for its deallocation (global refcount > 0), > > or that thread A is responsible for its deallocation (global refcount == > > 0). > > Thread B holding the GIL doesn't help, because the > local refcount is not covered by the GIL. Thread A > must be able to assume it has total ownership of the > local refcount, otherwise there's no benefit in > the scheme. I seem to not be explaining myself well enough. What you describe is precisely what I described earlier. I don't believe we have a disagreement about the execution semantics of threads on an object with a local thread count. I was only mentioning A acquiring the GIL if/when it becomes finished with the object, to determine if the object could be sent to the standard Python deallocation rutines, and/or if thread A should send it (as opposed to thread B in the case if thread B was passed the object and was using it beyond the time that A did). - Josiah From brett at python.org Sun Apr 15 21:52:33 2007 From: brett at python.org (Brett Cannon) Date: Sun, 15 Apr 2007 12:52:33 -0700 Subject: [Python-ideas] Ideas towards GIL removal In-Reply-To: <46225CE4.4040207@acm.org> References: <461ED2F9.9020407@canterbury.ac.nz> <46225CE4.4040207@acm.org> Message-ID: On 4/15/07, Talin wrote: > > Greg Ewing wrote: > > 2) Objects owned by a thread > > > > Python code creates and destroys temporary objects > > at a high rate -- stack frames, argument tuples, > > intermediate results, etc. If the code is executed > > by a thread, those objects are rarely if ever seen > > outside of that thread. It would be beneficial if > > refcount operations on such objects could be carried > > out by the thread that created them without locking. > > While we are on the topic of reference counting, I'd like to direct your > attention to Recycler, an IBM research project: > > "The Recycler is a concurrent multiprocessor garbage collector with > extremely low pause times (maximum of 6 milliseconds over eight > benchmarks) while remaining competitive with the best throughput- > oriented collectors in end-to-end execution times. This paper describes > the overall architecture of the Recycler, including its use of reference > counting and concurrent cycle collection, and presents extensive > measurements of the system comparing it to a parallel, stop-the-world > mark-and-sweep collector." > > There are a bunch of research papers describing Recycler which can be > found at the following link: > > http://www.research.ibm.com/people/d/dfb/publications.html > > I'd start with the papers entitled "Java without the Coffee Breaks: A > Non-intrusive Multiprocessor Garbage Collector" and "Concurrent Cycle > Collection in Reference Counted Systems". > > Let me describe a bit about how the Recycler works and how it relates to > what you've proposed. > > The basic idea is that for each thread, there is a set of thread local > data (TLD) that contains a pair of "refcount buffers", one buffer for > increfs and one buffer for decrefs. Each refcount buffer is a flat array > of pointers which starts empty and gradually fills up. > > The "incref" operation does not actually touch the reference count field > of the object. Instead, an "incref" appends a pointer to the object to > the end of the incref buffer for that thread. Similarly, a decref > operation appends a pointer to the object to the decref buffer. Since > the refcount buffers are thread-local, there is no need for locking or > synchronization. > > When one of the buffers gets full, both buffers are swapped out for new > ones, and the old buffers are placed on a queue which is processed by > the collector thread. The collector thread is the only thread which is > allowed to actually touch the reference counts of the individual > objects, and its the only thread which is allowed to delete objects. > > Processing the buffers is relatively simple: First, the incref buffer is > processed. The collector thread scans through each pointer in the > buffer, and increments the refcount of each object. Then the decref > buffer is processed in a similar way, decrementing the refcount. > > However, it also needs to process the buffers for the other threads > before the memory can be reclaimed. Recycler defines a series of > "epochs" (i.e. intervals between collections). Within a refcount buffer, > each epoch is represented as a contiguous range of values within the > array -- all of the increfs and decrefs which occurred during that > epoch. An auxiliary array records the high water mark for each epoch. Huh, interesting idea. I downloaded the papers and plan to give them a read. The one isssue I can see with this is that because of these epochs and using a buffer instead of actually manipulating the refcount means a delay. And I know some people love the (mostly) instantaneous garbage collection when the refcount should be at 0. Anyway, I will give the paper a read before I make any more ignorant statements about the design. =) -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Mon Apr 16 01:50:12 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 16 Apr 2007 11:50:12 +1200 Subject: [Python-ideas] Ideas towards GIL removal In-Reply-To: <20070415111128.6307.JCARLSON@uci.edu> References: <20070414001918.6301.JCARLSON@uci.edu> <46220E6B.5060208@canterbury.ac.nz> <20070415111128.6307.JCARLSON@uci.edu> Message-ID: <4622BA34.7050103@canterbury.ac.nz> Josiah Carlson wrote: > I was only mentioning A acquiring the GIL if/when it becomes finished > with the object, to determine if the object could be sent to the > standard Python deallocation rutines Oh, yes, that part is fine. The problem is what happens if thread A stuffs a reference into another object that lives beyond A's interest in matters. Then another thread can see an object that still has local ref counts, even though the owning thread no longer cares about it and is never going to get rid of those local refcounts itself. I haven't thought of a non-expensive way of fixing that yet. -- Greg From greg.ewing at canterbury.ac.nz Mon Apr 16 01:52:52 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 16 Apr 2007 11:52:52 +1200 Subject: [Python-ideas] Ideas towards GIL removal In-Reply-To: References: <461ED2F9.9020407@canterbury.ac.nz> <46225CE4.4040207@acm.org> Message-ID: <4622BAD4.2020706@canterbury.ac.nz> Brett Cannon wrote: > And I know some people love the (mostly) instantaneous > garbage collection when the refcount should be at 0. Yeah, and even a 6 millisecond pause could be too long in some applications, such as high frame rate animation. 6 milliseconds is a *long* time for today's GHz processors. -- Greg From jimjjewett at gmail.com Mon Apr 16 18:52:04 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 16 Apr 2007 12:52:04 -0400 Subject: [Python-ideas] Ideas towards GIL removal In-Reply-To: <461ED2F9.9020407@canterbury.ac.nz> References: <461ED2F9.9020407@canterbury.ac.nz> Message-ID: On 4/12/07, Greg Ewing wrote: > I've been thinking about some ideas for reducing the > amount of refcount adjustment that needs to be done, > with a view to making GIL removal easier. > > 1) Permanent objects I have some vague memory (but couldn't find the references) that someone tried and it was too expensive. INCREF and DECREF on something in the header of an object you need anyhow were just too small to beat once you added any logic. (That said, the the experiment was pretty old, and the results may have changed.) > 2) Objects owned by a thread [Create a owner-refcount separate from the global count] Some distributed systems already take advantage of the fact that the actual count is irrelevant. They use weights, so that other stores don't need to be updated until the (local) weight hits zero. While it would be reasonable for a thread to only INCREF once, and then keep its internal refcount elsewhere ... it is really hard to beat "(add1 to/subtract 1 from) an int already at a known location in cache." Also note that Mark Miller and Ping Yee http://www.eros-os.org/pipermail/e-lang/1999-May/002590.html suggested a way to mark objects as "expensive" (==> release as soon as possible). Combining this, today's python looks only at an object's size when determining which memory pool to use. There might be some value in also categorizing types based on their instances typical memory usage. Examples: (1) Default pool, like today. (2) Permanent Pool: Expected to be small and permanent. Maybe skip the refcount entirely? Or at least ignore it going to zero, so you don't need to lock for updates? (3) Thread-local. Has an "external refcount" field that would normally be zero. (4) Expensive: Going to a rooted GC won't release it fast enough. -jJ From jimjjewett at gmail.com Mon Apr 16 19:01:50 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 16 Apr 2007 13:01:50 -0400 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: <462034A7.2070603@canterbury.ac.nz> References: <461ECBEA.2050001@canterbury.ac.nz> <20070412193258.62E6.JCARLSON@uci.edu> <462034A7.2070603@canterbury.ac.nz> Message-ID: On 4/13/07, Greg Ewing wrote: > Josiah Carlson wrote: > >>Does anyone have a use case where they *need* > >>the indentation to be preserved? > > Not personally. I think that telling people to > > use textwrap.dedent() is sufficient. > But it seems crazy to make people do this all > the time, when there's no reason not to do > it automatically in the first place. The textwrap methods (including a proposed dedent) might make useful string methods. Short of that (1) Where does this preservation actually hurt? def f(self, arg1): """My DocString ... And I continue here -- which really is what I want. """ I use docstrings online -- and I typically do want them indented like the code. (2) Should literals (or at least strings, or at least docstrings) be decoratable? Anywhere but a docstring, you could just call the function, but ... I suppose it serves the same meta-value is the proposed i(nternational) or t(emplate) strings. def f(...): .... @dedent """ ... ... """ -jJ From greg.ewing at canterbury.ac.nz Tue Apr 17 02:14:00 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 17 Apr 2007 12:14:00 +1200 Subject: [Python-ideas] Ideas towards GIL removal In-Reply-To: References: <461ED2F9.9020407@canterbury.ac.nz> Message-ID: <46241148.2040200@canterbury.ac.nz> Jim Jewett wrote: > I have some vague memory (but couldn't find the references) that > someone tried and it was too expensive. Too expensive compared to what? The question isn't whether it's more expensive than the current scheme, but whether it helps when there's no GIL and you have to lock the object to update the refcount. -- Greg From greg.ewing at canterbury.ac.nz Tue Apr 17 02:19:24 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 17 Apr 2007 12:19:24 +1200 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: References: <461ECBEA.2050001@canterbury.ac.nz> <20070412193258.62E6.JCARLSON@uci.edu> <462034A7.2070603@canterbury.ac.nz> Message-ID: <4624128C.7010204@canterbury.ac.nz> Jim Jewett wrote: > (1) Where does this preservation actually hurt? It hurts because it places a burden on everyone every time they use a triple quoted string to do something about the indentation which is unwanted 99.999% of the time. > I use docstrings online -- and I typically do want them indented like > the code. I don't understand what you mean by that. Can you give an example where an auto-dedented docstring would give an undesirable result? -- Greg From rrr at ronadam.com Tue Apr 17 09:50:49 2007 From: rrr at ronadam.com (Ron Adam) Date: Tue, 17 Apr 2007 02:50:49 -0500 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: <4624128C.7010204@canterbury.ac.nz> References: <461ECBEA.2050001@canterbury.ac.nz> <20070412193258.62E6.JCARLSON@uci.edu> <462034A7.2070603@canterbury.ac.nz> <4624128C.7010204@canterbury.ac.nz> Message-ID: <46247C59.9090305@ronadam.com> Greg Ewing wrote: > Jim Jewett wrote: > >> (1) Where does this preservation actually hurt? > > It hurts because it places a burden on everyone every > time they use a triple quoted string to do something > about the indentation which is unwanted 99.999% of > the time. > >> I use docstrings online -- and I typically do want them indented like >> the code. > > I don't understand what you mean by that. Can you > give an example where an auto-dedented docstring > would give an undesirable result? You didn't specify doc strings earlier, Just triple quoted strings in general. I don't think it would be problem for only doc strings. It could probably be done at compile time too. It's not really that different than the -OO option to remove them. Dedenting triple quoted strings in general would cause some problems in (python 2.x) with existing gui interfaces that use triple quoted strings to define their text. Cheers, Ron From theller at ctypes.org Wed Apr 18 21:02:24 2007 From: theller at ctypes.org (Thomas Heller) Date: Wed, 18 Apr 2007 21:02:24 +0200 Subject: [Python-ideas] Command line options Message-ID: Sometimes I think it would be great if it were possible to have standard Python command line options that would allow - initialize and configure the logging module - specify requirements for pkg_resources (for eggs installed with --multi-version All this would avoid having to change logging options or requirements in the script, or having to implement a command line parser for this stuff in every script. The idea is to call python in this way: python --require foo==dev --logging level=DEBUG myscript.py I have not been able to implement something like this in sitecustomize.py, because this module is executed when sys.argv is not yet available. Another possible way to implement this would probably be to set environment vars and parse those in sitecustomize.py, you would have to call env option1=foo option2=bar python script.py then; unfortuately windows does not have an 'env' utility. Does this sound like a useful idea? Thomas From taleinat at gmail.com Wed Apr 18 22:09:40 2007 From: taleinat at gmail.com (Tal Einat) Date: Wed, 18 Apr 2007 23:09:40 +0300 Subject: [Python-ideas] Command line options In-Reply-To: References: Message-ID: <7afdee2f0704181309lf9abb28we0cc8c01f861334d@mail.gmail.com> On 4/18/07, Thomas Heller wrote: > > Sometimes I think it would be great if it were possible to have standard > Python command line options that would allow > > - initialize and configure the logging module > - specify requirements for pkg_resources (for eggs installed with > --multi-version > > All this would avoid having to change logging options or requirements in > the > script, or having to implement a command line parser for this stuff in > every script. > The idea is to call python in this way: > > python --require foo==dev --logging level=DEBUG myscript.py > > > > I have not been able to implement something like this in sitecustomize.py, > because this module is executed when sys.argv is not yet available. > > Another possible way to implement this would probably be to set > environment vars > and parse those in sitecustomize.py, you would have to call > > env option1=foo option2=bar python script.py > > then; unfortuately windows does not have an 'env' utility. > > Does this sound like a useful idea? > > Thomas -1 on this. IMHO these options are too specific to be part of Python's standard command line options. And using environment variables as a workaround... would cause all sorts of problems, like the one you mentioned. If these are options you often use for different Python scripts, you could create a generic Python script runner which would parse these options, initialize whatever is required (logging, packages, etc.) and finally execfile the given Python script. For example: script_runner.py --require foo==dev --logging level=DEBUG myscript.py - Tal -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Thu Apr 19 00:02:50 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 19 Apr 2007 10:02:50 +1200 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: <46247C59.9090305@ronadam.com> References: <461ECBEA.2050001@canterbury.ac.nz> <20070412193258.62E6.JCARLSON@uci.edu> <462034A7.2070603@canterbury.ac.nz> <4624128C.7010204@canterbury.ac.nz> <46247C59.9090305@ronadam.com> Message-ID: <4626958A.4000809@canterbury.ac.nz> Ron Adam wrote: >> Can you >> give an example where an auto-dedented docstring >> would give an undesirable result? > > You didn't specify doc strings earlier, Just triple quoted strings in > general. Triple quoted strings in general is what I had in mind. I was replying to something that seemed to imply that it would cause trouble with docstrings, without being very clear about what the trouble was. > Dedenting triple quoted strings in general would cause some problems in > (python 2.x) with existing gui interfaces that use triple quoted strings > to define their text. I conjecture that in all such cases, the existing code is already dedenting the string itself. I still haven't seen a real case where a piece of code actually needs the extra indentation. -- Greg From lists at janc.be Thu Apr 19 00:43:28 2007 From: lists at janc.be (Jan Claeys) Date: Thu, 19 Apr 2007 00:43:28 +0200 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: <2F132947-A3F1-4622-91D0-F10BDFC229E2@atlas.st> References: <43aa6ff70704110801w2dd29e96j35d087cbee7a384f@mail.gmail.com> <7B3BA968-4228-49E2-A590-D84E34550EF4@atlas.st> <1176605216.28153.112.camel@localhost> <2F132947-A3F1-4622-91D0-F10BDFC229E2@atlas.st> Message-ID: <1176936208.28153.258.camel@localhost> Op zaterdag 14-04-2007 om 23:07 uur [tijdzone -0400], schreef Adam Atlas: > On 14 Apr 2007, at 22.46, Jan Claeys wrote: > > D uses '~' as a string concatenation operator... > > Eh... I like D, but that would be confusing in Python, since it > already uses ~ as a unary operator that means something totally > different. Python uses '+', '*', ':', '.', etc. for multiple different purposes already, and at least the '+' case is more confusing sometimes than '~' would ever be... -- Jan Claeys From adam at atlas.st Thu Apr 19 01:01:10 2007 From: adam at atlas.st (Adam Atlas) Date: Wed, 18 Apr 2007 19:01:10 -0400 Subject: [Python-ideas] Implicit String Concatenation In-Reply-To: <1176936208.28153.258.camel@localhost> References: <43aa6ff70704110801w2dd29e96j35d087cbee7a384f@mail.gmail.com> <7B3BA968-4228-49E2-A590-D84E34550EF4@atlas.st> <1176605216.28153.112.camel@localhost> <2F132947-A3F1-4622-91D0-F10BDFC229E2@atlas.st> <1176936208.28153.258.camel@localhost> Message-ID: <6FB44457-52D8-4BF9-BDAE-45FE7FC64FA9@atlas.st> On 18 Apr 2007, at 18.43, Jan Claeys wrote: > Op zaterdag 14-04-2007 om 23:07 uur [tijdzone -0400], schreef Adam > Atlas: >> On 14 Apr 2007, at 22.46, Jan Claeys wrote: >>> D uses '~' as a string concatenation operator... >> >> Eh... I like D, but that would be confusing in Python, since it >> already uses ~ as a unary operator that means something totally >> different. > > Python uses '+', '*', ':', '.', etc. for multiple different purposes > already, and at least the '+' case is more confusing sometimes than > '~' > would ever be... Heh, yeah, I actually realized immediately after I sent that email that the exact same thing could be said about +. But I don't know... even if + might be confused with an arithmetic operator sometimes, it's what people are used to, and I think it makes sense intuitively. 'Plus', in a very abstract sense, suggests 'put two things together', whether with numbers or strings or anything else for which we have a concept of 'putting together'. ~ doesn't have that advantage. If a programmer coming from pretty much any language sees "foo"+"bar", they're probably going to be able to guess that it's concatenations. If they see "foo"~"bar", it is really not immediately clear what it's doing. From thobes at gmail.com Thu Apr 19 19:02:28 2007 From: thobes at gmail.com (Tobias Ivarsson) Date: Thu, 19 Apr 2007 19:02:28 +0200 Subject: [Python-ideas] Fwd: Implicit String Concatenation In-Reply-To: <9997d5e60704191001q3ac384f0lb82106556dfa5ff2@mail.gmail.com> References: <461ECBEA.2050001@canterbury.ac.nz> <20070412193258.62E6.JCARLSON@uci.edu> <462034A7.2070603@canterbury.ac.nz> <9997d5e60704191001q3ac384f0lb82106556dfa5ff2@mail.gmail.com> Message-ID: <9997d5e60704191002l40b056fcg428c6a8420f58340@mail.gmail.com> On 4/16/07, Jim Jewett wrote: > > On 4/13/07, Greg Ewing wrote: > > Josiah Carlson wrote: > > > >>Does anyone have a use case where they *need* > > >>the indentation to be preserved? > > > > Not personally. I think that telling people to > > > use textwrap.dedent() is sufficient. > > > But it seems crazy to make people do this all > > the time, when there's no reason not to do > > it automatically in the first place. > > The textwrap methods (including a proposed dedent) might make useful > string methods. Short of that > > > (1) Where does this preservation actually hurt? > > def f(self, arg1): > """My DocString ... > > And I continue here -- which really is what I want. > """ > > I use docstrings online -- and I typically do want them indented like the > code. > > (2) Should literals (or at least strings, or at least docstrings) be > decoratable? Anywhere but a docstring, you could just call the > function, but ... I suppose it serves the same meta-value is the > proposed i(nternational) or t(emplate) strings. > > def f(...): > .... > @dedent > """ ... > ... > """ If docstrings is the problem you can always use a function decorator for it: def dedentdoc(func): func.__doc__ = dedent(func.__doc__) return func @dedentdoc def f(...): """ Long and indented docstring. extra indented unindented, phew""" pass /Tobias -jJ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcarlson at uci.edu Thu Apr 19 19:47:32 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Thu, 19 Apr 2007 10:47:32 -0700 Subject: [Python-ideas] Fwd: Implicit String Concatenation In-Reply-To: <9997d5e60704191002l40b056fcg428c6a8420f58340@mail.gmail.com> References: <9997d5e60704191001q3ac384f0lb82106556dfa5ff2@mail.gmail.com> <9997d5e60704191002l40b056fcg428c6a8420f58340@mail.gmail.com> Message-ID: <20070419104540.6359.JCARLSON@uci.edu> "Tobias Ivarsson" wrote: > If docstrings is the problem you can always use a function decorator for it: That wasn't the question. Greg was asking "when is dedenting a docstring *not* the right solution?" We all understand and know that any string can be manually dedented, the question is whether automatic dedenting of all triple-quoted strings should be done. - Josiah From brett at python.org Fri Apr 20 05:38:42 2007 From: brett at python.org (Brett Cannon) Date: Thu, 19 Apr 2007 20:38:42 -0700 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports Message-ID: Some of you might remember a discussion that took place on this list about not being able to execute a script contained in a package that used relative imports (read the PEP if you don't quite get what I am talking about). The PEP below proposes a solution (along with a counter-solution). Let me know what you think. I especially want to hear which proposal people prefer; the one in the PEP or the one in the Open Issues section. Plus I wouldn't mind suggestions on a title for this PEP. =) ------------------------------------------- PEP: XXX Title: XXX Version: $Revision: 52916 $ Last-Modified: $Date: 2006-12-04 11:59:42 -0800 (Mon, 04 Dec 2006) $ Author: Brett Cannon Status: Draft Type: Standards Track Content-Type: text/x-rst Created: XXX-Apr-2007 Abstract ======== Because of how name resolution works for relative imports in a world where PEP 328 is implemented, the ability to execute modules within a package ceases being possible. This failing stems from the fact that the module being executed as the "main" module replaces its ``__name__`` attribute with ``"__main__"`` instead of leaving it as the actual, absolute name of the module. This breaks import's ability to resolve relative imports from the main module into absolute names. In order to resolve this issue, this PEP proposes to change how a module is delineated as the module that is being executed as the main module. By leaving the ``__name__`` attribute in a module alone and setting a module attribute named ``__main__`` to a true value for the main module (and thus false in all others), proper relative name resolution can occur while still having a clear way for a module to know if it is being executed as the main module. The Problem =========== With the introduction of PEP 328, relative imports became dependent on the ``__name__`` attribute of the module performing the import. This is because the use of dots in a relative import are used to strip away parts of the calling module's name to calcuate where in the package hierarchy a relative import should fall (prior to PEP 328 relative imports could fail and would fall back on absolute imports which had a chance of succeeding). For instance, consider the import ``from .. import spam`` made from the ``bacon.ham.beans`` module (``bacon.ham.beans`` is not a package itself, i.e., does not define ``__path__``). Name resolution of the relative import takes the caller's name (``bacon.ham.beans``), splits on dots, and then slices off the last n parts based on the level (which is 2). In this example both ``ham`` and ``beans`` are dropped and ``spam`` is joined with what is left (``bacon``). This leads to the proper import of the module ``bacon.spam``. This reliance on the ``__name__`` attribute of a module when handling realtive imports becomes an issue with executing a script within a package. Because the executing script is set to ``'__main__'``, import cannot resolve any relative imports. This leads to an ``ImportError`` if you try to execute a script in a package that uses any relative import. For example, assume we have a package named ``bacon`` with an ``__init__.py`` file containing:: from . import spam Also create a module named ``spam`` within the ``bacon`` package (it can be an empty file). Now if you try to execute the ``bacon`` package (either through ``python bacon/__init__.py`` or ``python -m bacon``) you will get an ``ImportError`` about trying to do a relative import from within a non-package. Obviously the import is valid, but because of the setting of ``__name__`` to ``'__main__'`` import thinks that ``bacon/__init__.py`` is not in a package since no dots exist in ``__name__``. To see how the algorithm works, see ``importlib.Import._resolve_name()`` in the sandbox [#importlib]_. Currently a work-around is to remove all relative imports in the module being executed and make them absolute. This is unfortunate, though, as one should not be required to use a specific type of resource in order to make a module in a package be able to be executed. The Solution ============ The solution to the problem is to not change the value of ``__name__`` in modules. But there still needs to be a way to let executing code know it is being executed as a script. This is handled with a new module attribute named ``__main__``. When a module is being executed as a script, ``__main__`` will be set to a true value. For all other modules, ``__main__`` will be set to a false value. This changes the current idiom of:: if __name__ == '__main__': ... to:: if __main__: ... The current idiom is not as obvious and could cause confusion for new programmers. The proposed idiom, though, does not require explaining why ``__name__`` is set as it is. With the proposed solution the convenience of finding out what module is being executed by examining ``sys.modules['__main__']`` is lost. To make up for this, the ``sys`` module will gain the ``main`` attribute. It will contain a string of the name of the module that is considered the executing module. A competing solution is discussed in `Open Issues`_. Transition Plan =============== Using this solution will not work directly in Python 2.6. Code is dependent upon the semantics of having ``__name__`` set to ``'__main__'``. There is also the issue of pre-existing global variables in a module named ``__main__``. To deal with these issues, a two-step solution is needed. First, a Py3K deprecation warning will be raised during AST generation when a global variable named ``__main__`` is defined. This will help with the detection of code that would reset the value of ``__main__`` for a module. Without adding a warning when a global variable is injected into a module, though, it is not fool-proof. But this solution should cover the vast majority of variable rebinding problems. Second, 2to3 [#2to3]_ will gain a rule to transform the current ``if __name__ == '__main__': ...`` idiom to the new one. While it will not help with code that checks ``__name__`` outside of the idiom, that specific line of code makes up a large proporation of code that every looks for ``__name__`` set to ``'__main__'``. Open Issues =========== A counter-proposal to introducing the ``__main__`` attribute on modules was to introduce a built-in with the same name. The value of the built-in would be the name of the module being executed (just like the proposed ``sys.main``). This would lead to a new idiom of:: if __name__ == __main__: ... The perk of this idiom over the one proposed earlier is that the general semantics does not differ greatly from the current idiom. The drawback is that the syntactic difference is subtle; the dropping of quotes around "__main__". Some believe that for existing Python programmers bugs will be introduced where the quotation marks will be put on by accident. But one could argue that the bug would be discovered quickly through testing as it is a very shallow bug. The other pro of this proposal over the earlier one is the alleviation of requiring import code to have to set the value of ``__main__``. By making it a built-in variable import does not have to care about ``__main__`` as executing the code itself will pick up the built-in ``__main__`` itself. This simplies the implementation of the proposal as it only requires setting a built-in instead of changing import to set an attribute on every module that has exactly one module have a different value (much like the current implementation has to do to set ``__name__`` in one module to ``'__main__'``). References ========== .. [#2to3] 2to3 tool (http://svn.python.org/view/sandbox/trunk/2to3/) [ViewVC] .. [#importlib] importlib (http://svn.python.org/view/sandbox/trunk/import_in_py/importlib.py?view=markup) [ViewVC] Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From selliott4 at austin.rr.com Fri Apr 20 07:30:44 2007 From: selliott4 at austin.rr.com (Steven Elliott) Date: Fri, 20 Apr 2007 00:30:44 -0500 Subject: [Python-ideas] [Python-Dev] Making builtins more efficient In-Reply-To: <45DCF124.6040101@canterbury.ac.nz> References: <1141879691.11091.78.camel@grey> <440FF9CB.5030407@gmail.com> <79990c6b0603090400h25dd2c7ev3d5c379f6529f3c2@mail.gmail.com> <1141915806.11091.127.camel@grey> <4410BE69.3080004@canterbury.ac.nz> <1171984065.22648.47.camel@grey> <45DCF124.6040101@canterbury.ac.nz> Message-ID: <1177047044.16345.79.camel@grey> Thanks for forwarding this. It took me a while to catch on to the thread being moved here from python-dev. On Thu, 2007-02-22 at 14:25 +1300, Greg Ewing wrote: > Steven Elliott wrote: > > > What I have in mind may be close to what you are suggesting above. > > My idea is somewhat more uniform and general than that. > > For the module dict, you use a special mapping type that > allows selected items to be accessed by an index as well > as a name. The set of such names is determined when the > module's code is compiled -- it's simply the names used > in that module to refer to globals or builtins. That sounds like an interesting idea. How does it differ from PEP 280?: http://www.python.org/dev/peps/pep-0280 (assuming PEP 280 isn't what you are describing). If there is some place I can read more about this idea, that would be great. > The first time a given builtin is referenced in the module, > it will be unbound in the module dict, so it is looked up > in the usual way and then written into the module dict, > so it can subsequently be retrieved by index. > > The advantages of this scheme over yours are that it speeds > access to module-level names as well as builtins, and it > doesn't require the compiler to have knowledge of a > predefined set of names. How does your scheme speed access to module-level names? Are they referred to by index? With your idea would module level names only be referred to by index internally in the module as global variables (LOAD_GLOBAL)? It seems like referring to attributes inside the module from outside the module (LOAD_ATTR) would require something like a visioning scheme where the compiler knows in advance knows what version the module is so that it can get the index right. Again, if your idea is PEP 280 then my questions in the above paragraph are answered. > It does entail a slight semantic change, as changes made > to a builtin won't be seen by a module that has already > used that builtin for the first time. But from what Guido > has said before, it seems he is willing to accept a change > like that if it will help. I think slight semantic changes like that are worth it if it buys greater performance, but I understand the importance of reverse compatibility as well. After exploring some of the different ideas for making globals/bulitins/attributes more efficient it seems to me that there is an overall tradeoff - How much complexity is justified to avoid hash table lookups or extra levels of indirection? For example, PEP 280 avoids a hash table lookup but adds a level of indirection (the cells point to the value), which is a net performance gain. My idea (my last big email) avoids a level of indirection, but only for a predefined set of names. And it requires new opcodes. And the compiler has to be aware of the predefined set of names. I have a question about PEP 280, but maybe I'll ask it here since it seems relevant. One of the elegant things about the way Python compiles code is that, for the most part, each function can be compiled independently without concern for context. For example, the co_names in the code object for a function has only the names for that function. So the co_names for a given function does not depend on anything outside of that function. What if PEP 280's proposed co_globals, which is currently only has globals referenced by that function (similar to co_names), was instead a pointer to a shared array of globals that was common for the entire module (one to one with the module's dict)? I think doing so could avoid a level of indirection, but at the cost of forcing the compiler to keep track of the indexes of all globals so that it could get the index right (the index being the "" in the proposed LOAD_GLOBAL_CELL). The celldict might then have indexes into co_globals instead of cells. So the cost would be making the compiling of functions less independent. -- -- ----------------------------------------------------------------------- | Steven Elliott | selliott4 at austin.rr.com | ----------------------------------------------------------------------- From steven.bethard at gmail.com Fri Apr 20 07:56:09 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Thu, 19 Apr 2007 23:56:09 -0600 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: On 4/19/07, Brett Cannon wrote: > Let me know what you think. I especially want to hear which proposal > people prefer; the one in the PEP or the one in the Open Issues > section. Plus I wouldn't mind suggestions on a title for this PEP. As you've probably already guessed, I prefer the:: if __main__: version. I don't think I've ever used sys.modules['__main__']. > Transition Plan > =============== > > Using this solution will not work directly in Python 2.6. Code is > dependent upon the semantics of having ``__name__`` set to > ``'__main__'``. There is also the issue of pre-existing global > variables in a module named ``__main__``. Could you explain a bit why __main__ couldn't be inserted into modules before the module is actually executed? E.g. something like:: >>> module_text = '''\ ... __main__ = 'foo' ... print __main__ ... ''' >>> import new >>> mod = new.module('mod') >>> mod.__main__ = True >>> exec module_text in mod.__dict__ foo >>> mod.__main__ 'foo' I would have thought that if Python inserted __main__ before any of the module contents got exec'd, it would be backwards compatible because any use of __main__ would just overwrite the default one. Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From jcarlson at uci.edu Fri Apr 20 09:16:49 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 20 Apr 2007 00:16:49 -0700 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: <20070419235504.6374.JCARLSON@uci.edu> "Brett Cannon" wrote: > > Some of you might remember a discussion that took place on this list > about not being able to execute a script contained in a package that > used relative imports (read the PEP if you don't quite get what I am > talking about). The PEP below proposes a solution (along with a > counter-solution). > > Let me know what you think. I especially want to hear which proposal > people prefer; the one in the PEP or the one in the Open Issues > section. Plus I wouldn't mind suggestions on a title for this PEP. > =) About all I can come up with is "Fixing relative imports". > if __name__ == '__main__': > ... > > to:: > > if __main__: > ... According to your PEP, the point of the above is so that __name__ can become something descriptive, so that relative imports can do their thing as per PEP 328 semantics. However, both of your proposals seek to offer a value for __main__ (either as a builtin or module global). While others will probably disagree with me, I'm going to go with your 'open issues' proposal of ... > if __name__ == __main__: > ... As you say, errors arising from the 'subtle' removal of quotes will be quickly discovered (without a 2to3 conversion), and with a 2to3 conversion can be automatically converted. In 2.6, it could result in a warning or exception, depending on how Python 2.6 is run and/or what __future__ statements are used. It also doesn't rely on sticking yet another value in a module's globals (which makes it easier for 3rd parties to handle module loading by hand), while still makeing __main__ accessable. For people who had previously been using sys.modules['__main__'], they can instead use sys.modules[__main__] to get the same effect, which your initial proposal does not allow. - Josiah From lists at cheimes.de Fri Apr 20 15:43:21 2007 From: lists at cheimes.de (Christian Heimes) Date: Fri, 20 Apr 2007 15:43:21 +0200 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: Brett Cannon schrieb: > When a module is being executed as a script, ``__main__`` will be set > to a true value. For all other modules, ``__main__`` will be set to a > false value. This changes the current idiom of:: > > if __name__ == '__main__': > ... > > to:: > > if __main__: > ... > > The current idiom is not as obvious and could cause confusion for new > programmers. The proposed idiom, though, does not require explaining > why ``__name__`` is set as it is. > > With the proposed solution the convenience of finding out what module > is being executed by examining ``sys.modules['__main__']`` is lost. > To make up for this, the ``sys`` module will gain the ``main`` > attribute. It will contain a string of the name of the module that is > considered the executing module. What about import sys if __name__ == sys.main: ... You won't have to introduce a new global module var __name__ and it's easy to understand for newbies and experienced developers. The code is only executed when the name of the current module is equal to the executed main module (sys.main). IMO it's much less PIT...B then introducing __main__. Christian From jimjjewett at gmail.com Fri Apr 20 17:18:51 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 20 Apr 2007 11:18:51 -0400 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: On 4/19/07, Brett Cannon wrote: > ... By leaving the ``__name__`` attribute in a module alone and > setting a module attribute named ``__main__`` to a true value for the > main module (and thus false in all others) ... Part of me says that you are already proposing the right answer, as these alternatives are just a little too hackish. Still, they are good enough that they should be listed in the PEP, even if only as rejected alternatives. (1) You could add a builtin __main__ that is false. The real main module would mask it, but no other code would need to change. Con: Another builtin, and this one wouldn't even make sense as an independent object. (2) You could special-case the import to use __file__ instead of __name__ when __name__ == "__main__" Con: may be more fragile. (3) You could set __name__ to (an instance of) a funky string subclass that overrides __eq__. Con: may be hard to find exactly the *right* behavior. Examples: What should str(name) do? Maybe __main__ should be the primary value, and split should be overridden? -jJ From grosser.meister.morti at gmx.net Fri Apr 20 17:41:19 2007 From: grosser.meister.morti at gmx.net (=?ISO-8859-15?Q?Mathias_Panzenb=F6ck?=) Date: Fri, 20 Apr 2007 17:41:19 +0200 Subject: [Python-ideas] ordered dict Message-ID: <4628DF1F.3060803@gmx.net> Some kind of ordered dictionary would be nice to have in the standard library. e.g. a AVL tree or something like that. It would be nice so we can do things like that: for value in tree[:end_key]: do_something_with(value) del tree[:end_key] A alternative would be just to sort the keys of a dict but that's O( n log n ) for each sort. Depending on what's the more often occurring case (lookup, insert, get key-range, etc.) a other kind of dict object would make sense. What do you think? -panzi From jcarlson at uci.edu Fri Apr 20 18:23:54 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 20 Apr 2007 09:23:54 -0700 Subject: [Python-ideas] ordered dict In-Reply-To: <4628DF1F.3060803@gmx.net> References: <4628DF1F.3060803@gmx.net> Message-ID: <20070420091127.637D.JCARLSON@uci.edu> Mathias Panzenb?ck wrote: > > Some kind of ordered dictionary would be nice to have in the > standard library. e.g. a AVL tree or something like that. > It would be nice so we can do things like that: > > for value in tree[:end_key]: > do_something_with(value) > > del tree[:end_key] > > > A alternative would be just to sort the keys of a dict but > that's O( n log n ) for each sort. Depending on what's the more > often occurring case (lookup, insert, get key-range, etc.) a > other kind of dict object would make sense. > > What do you think? This has been brought up many times. The general consensus has been 'you don't get what you think you get'. >>> u'a' < 'b' < () < u'a' True That is to say, there isn't a total ordering on objects that would make sense as a sorted key,value dictionary. In Python 3.0, objects that don't make sense to compare won't be comparable, so list.sort() and/or an AVL tree may make sense again. However, the problem with choosing to use an AVL (Red-Black, 2-3, etc.) tree is deciding semantics. Do you allow duplicate keys? Do you allow insertion and removal by position? Do you allow the fetching of the key/value at position X? Do you allow the fetching of the position for key X? Insertion before/after (bisect_left, bisect_right equivalents). Etcetera. In many cases, using a sorted list gets you what you want, is almost as fast, and has the benefit of using less memory. - Josiah From brett at python.org Fri Apr 20 19:09:55 2007 From: brett at python.org (Brett Cannon) Date: Fri, 20 Apr 2007 10:09:55 -0700 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: On 4/19/07, Steven Bethard wrote: > On 4/19/07, Brett Cannon wrote: > > Let me know what you think. I especially want to hear which proposal > > people prefer; the one in the PEP or the one in the Open Issues > > section. Plus I wouldn't mind suggestions on a title for this PEP. > > As you've probably already guessed, I prefer the:: > > if __main__: > > version. I don't think I've ever used sys.modules['__main__']. > Yeah, I figured you would. =) > > Transition Plan > > =============== > > > > Using this solution will not work directly in Python 2.6. Code is > > dependent upon the semantics of having ``__name__`` set to > > ``'__main__'``. There is also the issue of pre-existing global > > variables in a module named ``__main__``. > > Could you explain a bit why __main__ couldn't be inserted into modules > before the module is actually executed? E.g. something like:: > > >>> module_text = '''\ > ... __main__ = 'foo' > ... print __main__ > ... ''' > >>> import new > >>> mod = new.module('mod') > >>> mod.__main__ = True > >>> exec module_text in mod.__dict__ > foo > >>> mod.__main__ > 'foo' > > I would have thought that if Python inserted __main__ before any of > the module contents got exec'd, it would be backwards compatible > because any use of __main__ would just overwrite the default one. That's right, and that is the problem. That would mean if __main__ was false but then overwritten by a function or something, it suddenly became true. It isn't a problem in terms of whether the code will run, but whether the expected semantics will occur. -Brett From brett at python.org Fri Apr 20 19:11:43 2007 From: brett at python.org (Brett Cannon) Date: Fri, 20 Apr 2007 10:11:43 -0700 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: <20070419235504.6374.JCARLSON@uci.edu> References: <20070419235504.6374.JCARLSON@uci.edu> Message-ID: On 4/20/07, Josiah Carlson wrote: > > "Brett Cannon" wrote: > > > > Some of you might remember a discussion that took place on this list > > about not being able to execute a script contained in a package that > > used relative imports (read the PEP if you don't quite get what I am > > talking about). The PEP below proposes a solution (along with a > > counter-solution). > > > > Let me know what you think. I especially want to hear which proposal > > people prefer; the one in the PEP or the one in the Open Issues > > section. Plus I wouldn't mind suggestions on a title for this PEP. > > =) > > About all I can come up with is "Fixing relative imports". > > > > if __name__ == '__main__': > > ... > > > > to:: > > > > if __main__: > > ... > > According to your PEP, the point of the above is so that __name__ can > become something descriptive, so that relative imports can do their > thing as per PEP 328 semantics. However, both of your proposals seek to > offer a value for __main__ (either as a builtin or module global). > > While others will probably disagree with me, I'm going to go with your > 'open issues' proposal of ... > > > if __name__ == __main__: > > ... > Woohoo! It's my preference, but that's just because I think it will be easier to implement. > As you say, errors arising from the 'subtle' removal of quotes will be > quickly discovered (without a 2to3 conversion), and with a 2to3 > conversion can be automatically converted. In 2.6, it could result in a > warning or exception, depending on how Python 2.6 is run and/or what > __future__ statements are used. It also doesn't rely on sticking yet > another value in a module's globals (which makes it easier for 3rd > parties to handle module loading by hand), while still makeing __main__ > accessable. > That's a good point. > For people who had previously been using sys.modules['__main__'], they > can instead use sys.modules[__main__] to get the same effect, which your > initial proposal does not allow. Yep. -Brett From brett at python.org Fri Apr 20 19:15:43 2007 From: brett at python.org (Brett Cannon) Date: Fri, 20 Apr 2007 10:15:43 -0700 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: On 4/20/07, Christian Heimes wrote: > Brett Cannon schrieb: > > When a module is being executed as a script, ``__main__`` will be set > > to a true value. For all other modules, ``__main__`` will be set to a > > false value. This changes the current idiom of:: > > > > if __name__ == '__main__': > > ... > > > > to:: > > > > if __main__: > > ... > > > > The current idiom is not as obvious and could cause confusion for new > > programmers. The proposed idiom, though, does not require explaining > > why ``__name__`` is set as it is. > > > > With the proposed solution the convenience of finding out what module > > is being executed by examining ``sys.modules['__main__']`` is lost. > > To make up for this, the ``sys`` module will gain the ``main`` > > attribute. It will contain a string of the name of the module that is > > considered the executing module. > > What about > > import sys > if __name__ == sys.main: > ... > > You won't have to introduce a new global module var __name__ and it's > easy to understand for newbies and experienced developers. The code is > only executed when the name of the current module is equal to the > executed main module (sys.main). > IMO it's much less PIT...B then introducing __main__. > True, but it does introduce an import for a module that may never be used if the module is not being executed. That kind of sucks for minor performance reasons. But what do other people think? -Brett From brett at python.org Fri Apr 20 19:16:48 2007 From: brett at python.org (Brett Cannon) Date: Fri, 20 Apr 2007 10:16:48 -0700 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: On 4/20/07, Jim Jewett wrote: > On 4/19/07, Brett Cannon wrote: > > > ... By leaving the ``__name__`` attribute in a module alone and > > setting a module attribute named ``__main__`` to a true value for the > > main module (and thus false in all others) ... > > Part of me says that you are already proposing the right answer, as > these alternatives are just a little too hackish. Still, they are > good enough that they should be listed in the PEP, even if only as > rejected alternatives. > > (1) You could add a builtin __main__ that is false. The real main > module would mask it, but no other code would need to change. > > Con: Another builtin, and this one wouldn't even make sense as an > independent object. > > (2) You could special-case the import to use __file__ instead of > __name__ when __name__ == "__main__" > > Con: may be more fragile. > > (3) You could set __name__ to (an instance of) a funky string > subclass that overrides __eq__. > > Con: may be hard to find exactly the *right* behavior. Examples: > What should str(name) do? Maybe __main__ should be the primary value, > and split should be overridden? > Yeah, I don't like any of them. =) I will add them to the PEP in a Rejected Ideas section. -Brett From brett at python.org Fri Apr 20 19:22:28 2007 From: brett at python.org (Brett Cannon) Date: Fri, 20 Apr 2007 10:22:28 -0700 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: I realized two things that I didn't mention in the PEP. One is that Python will have to infer the proper package name for a module being executed. Currently Python only knows the name of a module because you asked for something and it tries to find a module that fits that request. But what is being proposed here has to figure out what you would have asked for in order for the import to happen. So I need to spell out the algorithm that will need to be used to figure out ``python bacon/__init__.py`` is the bacon package. Using the '-m' option solves this as the name is given as an argument. Maybe this should only be expected to work with the -m option? Would simplify things, but it does restrict the usefulness overall (but not entirely as you would still gain a new feature). The other issue is what to do if the module being executed is above the current directory where Python is executing from (e.g., ``python ../spam.py``). You can't infer the name for that module if the parent directory is not on sys.path. Setting the name to "__main__" might need to stay for instances where the module being executed cannot have it's name inferred. This is another argument to only support '-m' with this. -Brett On 4/19/07, Brett Cannon wrote: > Some of you might remember a discussion that took place on this list > about not being able to execute a script contained in a package that > used relative imports (read the PEP if you don't quite get what I am > talking about). The PEP below proposes a solution (along with a > counter-solution). > > Let me know what you think. I especially want to hear which proposal > people prefer; the one in the PEP or the one in the Open Issues > section. Plus I wouldn't mind suggestions on a title for this PEP. > =) > > ------------------------------------------- > PEP: XXX > Title: XXX > Version: $Revision: 52916 $ > Last-Modified: $Date: 2006-12-04 11:59:42 -0800 (Mon, 04 Dec 2006) $ > Author: Brett Cannon > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: XXX-Apr-2007 > > Abstract > ======== > > Because of how name resolution works for relative imports in a world > where PEP 328 is implemented, the ability to execute modules within a > package ceases being possible. This failing stems from the fact that > the module being executed as the "main" module replaces its > ``__name__`` attribute with ``"__main__"`` instead of leaving it as > the actual, absolute name of the module. This breaks import's ability > to resolve relative imports from the main module into absolute names. > > In order to resolve this issue, this PEP proposes to change how a > module is delineated as the module that is being executed as the main > module. By leaving the ``__name__`` attribute in a module alone and > setting a module attribute named ``__main__`` to a true value for the > main module (and thus false in all others), proper relative name > resolution can occur while still having a clear way for a module to > know if it is being executed as the main module. > > > The Problem > =========== > > With the introduction of PEP 328, relative imports became dependent on > the ``__name__`` attribute of the module performing the import. This > is because the use of dots in a relative import are used to strip away > parts of the calling module's name to calcuate where in the package > hierarchy a relative import should fall (prior to PEP 328 relative > imports could fail and would fall back on absolute imports which had a > chance of succeeding). > > For instance, consider the import ``from .. import spam`` made from the > ``bacon.ham.beans`` module (``bacon.ham.beans`` is not a package > itself, i.e., does not define ``__path__``). Name resolution of the > relative import takes the caller's name (``bacon.ham.beans``), splits > on dots, and then slices off the last n parts based on the level > (which is 2). In this example both ``ham`` and ``beans`` are dropped > and ``spam`` is joined with what is left (``bacon``). This leads to > the proper import of the module ``bacon.spam``. > > This reliance on the ``__name__`` attribute of a module when handling > realtive imports becomes an issue with executing a script within a > package. Because the executing script is set to ``'__main__'``, > import cannot resolve any relative imports. This leads to an > ``ImportError`` if you try to execute a script in a package that uses > any relative import. > > For example, assume we have a package named ``bacon`` with an > ``__init__.py`` file containing:: > > from . import spam > > Also create a module named ``spam`` within the ``bacon`` package (it > can be an empty file). Now if you try to execute the ``bacon`` > package (either through ``python bacon/__init__.py`` or > ``python -m bacon``) you will get an ``ImportError`` about trying to > do a relative import from within a non-package. Obviously the import > is valid, but because of the setting of ``__name__`` to ``'__main__'`` > import thinks that ``bacon/__init__.py`` is not in a package since no > dots exist in ``__name__``. To see how the algorithm works, see > ``importlib.Import._resolve_name()`` in the sandbox [#importlib]_. > > Currently a work-around is to remove all relative imports in the > module being executed and make them absolute. This is unfortunate, > though, as one should not be required to use a specific type of > resource in order to make a module in a package be able to be > executed. > > > The Solution > ============ > > The solution to the problem is to not change the value of ``__name__`` > in modules. But there still needs to be a way to let executing code > know it is being executed as a script. This is handled with a new > module attribute named ``__main__``. > > When a module is being executed as a script, ``__main__`` will be set > to a true value. For all other modules, ``__main__`` will be set to a > false value. This changes the current idiom of:: > > if __name__ == '__main__': > ... > > to:: > > if __main__: > ... > > The current idiom is not as obvious and could cause confusion for new > programmers. The proposed idiom, though, does not require explaining > why ``__name__`` is set as it is. > > With the proposed solution the convenience of finding out what module > is being executed by examining ``sys.modules['__main__']`` is lost. > To make up for this, the ``sys`` module will gain the ``main`` > attribute. It will contain a string of the name of the module that is > considered the executing module. > > A competing solution is discussed in `Open Issues`_. > > > Transition Plan > =============== > > Using this solution will not work directly in Python 2.6. Code is > dependent upon the semantics of having ``__name__`` set to > ``'__main__'``. There is also the issue of pre-existing global > variables in a module named ``__main__``. To deal with these issues, > a two-step solution is needed. > > First, a Py3K deprecation warning will be raised during AST generation > when a global variable named ``__main__`` is defined. This will help > with the detection of code that would reset the value of ``__main__`` > for a module. Without adding a warning when a global variable is > injected into a module, though, it is not fool-proof. But this > solution should cover the vast majority of variable rebinding > problems. > > Second, 2to3 [#2to3]_ will gain a rule to transform the current ``if > __name__ == '__main__': ...`` idiom to the new one. While it will not > help with code that checks ``__name__`` outside of the idiom, that > specific line of code makes up a large proporation of code that every > looks for ``__name__`` set to ``'__main__'``. > > > Open Issues > =========== > > A counter-proposal to introducing the ``__main__`` attribute on > modules was to introduce a built-in with the same name. The value of > the built-in would be the name of the module being executed (just like > the proposed ``sys.main``). This would lead to a new idiom of:: > > if __name__ == __main__: > ... > > The perk of this idiom over the one proposed earlier is that the > general semantics does not differ greatly from the current idiom. > > The drawback is that the syntactic difference is subtle; the dropping > of quotes around "__main__". Some believe that for existing Python > programmers bugs will be introduced where the quotation marks will be > put on by accident. But one could argue that the bug would be > discovered quickly through testing as it is a very shallow bug. > > The other pro of this proposal over the earlier one is the alleviation > of requiring import code to have to set the value of ``__main__``. By > making it a built-in variable import does not have to care about > ``__main__`` as executing the code itself will pick up the built-in > ``__main__`` itself. This simplies the implementation of the proposal > as it only requires setting a built-in instead of changing import to > set an attribute on every module that has exactly one module have a > different value (much like the current implementation has to do to set > ``__name__`` in one module to ``'__main__'``). > > > References > ========== > > .. [#2to3] 2to3 tool > (http://svn.python.org/view/sandbox/trunk/2to3/) [ViewVC] > > .. [#importlib] importlib > (http://svn.python.org/view/sandbox/trunk/import_in_py/importlib.py?view=markup) > [ViewVC] > > > > Copyright > ========= > > This document has been placed in the public domain. > > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > From lists at cheimes.de Fri Apr 20 19:32:45 2007 From: lists at cheimes.de (Christian Heimes) Date: Fri, 20 Apr 2007 19:32:45 +0200 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: Brett Cannon schrieb: > True, but it does introduce an import for a module that may never be > used if the module is not being executed. That kind of sucks for > minor performance reasons. Yeah but sys is used by a lot of modules. Probably 95%+ of executable modules are either using sys directly to access sys.argv or os which imports sys. Also sys is a builtin module which is imported ridiculously fast. I assume that the speed penalty for scripts that don't use sys is minor. In my humble opinion it sucks less to force the import of a core module that is already used by most modules than to bind valuable developer time in the __main__ approach. I think it's a Pythonic solution as well. :) Christian From jcarlson at uci.edu Fri Apr 20 19:38:37 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 20 Apr 2007 10:38:37 -0700 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: <20070420103528.6380.JCARLSON@uci.edu> "Brett Cannon" wrote: > I realized two things that I didn't mention in the PEP. > > One is that Python will have to infer the proper package name for a > module being executed. Currently Python only knows the name of a > module because you asked for something and it tries to find a module > that fits that request. But what is being proposed here has to figure > out what you would have asked for in order for the import to happen. > So I need to spell out the algorithm that will need to be used to > figure out ``python bacon/__init__.py`` is the bacon package. Using > the '-m' option solves this as the name is given as an argument. There's also the rub that if you 'run' the module in /a/b/c/d/e/f.py, but all a-e are packages, the "proper" semantics may state that you need to import a/__init__.py, a/b/__init__.py, etc., prior to the execution of f.py . Of course the only way that you would know that is if you checked the paths .../e/, .../d/, etc. The PEP should probably be changed to state the order of imports in a case similar to this, and whether or not it bothers to check ancestor paths for package information. - Josiah From brett at python.org Fri Apr 20 19:46:46 2007 From: brett at python.org (Brett Cannon) Date: Fri, 20 Apr 2007 10:46:46 -0700 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: <20070420103528.6380.JCARLSON@uci.edu> References: <20070420103528.6380.JCARLSON@uci.edu> Message-ID: On 4/20/07, Josiah Carlson wrote: > > "Brett Cannon" wrote: > > I realized two things that I didn't mention in the PEP. > > > > One is that Python will have to infer the proper package name for a > > module being executed. Currently Python only knows the name of a > > module because you asked for something and it tries to find a module > > that fits that request. But what is being proposed here has to figure > > out what you would have asked for in order for the import to happen. > > So I need to spell out the algorithm that will need to be used to > > figure out ``python bacon/__init__.py`` is the bacon package. Using > > the '-m' option solves this as the name is given as an argument. > > There's also the rub that if you 'run' the module in /a/b/c/d/e/f.py, > but all a-e are packages, the "proper" semantics may state that you need > to import a/__init__.py, a/b/__init__.py, etc., prior to the execution > of f.py . > > Of course the only way that you would know that is if you checked the > paths .../e/, .../d/, etc. > > The PEP should probably be changed to state the order of imports in a > case similar to this, and whether or not it bothers to check ancestor > paths for package information. > Good point. It's one of the ways my import implementation differs from the current one as I just import the parent up to the requested module while the current implementation throws an exception. -Brett From adam at atlas.st Fri Apr 20 19:48:36 2007 From: adam at atlas.st (Adam Atlas) Date: Fri, 20 Apr 2007 13:48:36 -0400 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: <727CDD37-8C12-490F-93ED-BDBFF5F0E3D3@atlas.st> On 19 Apr 2007, at 23.38, Brett Cannon wrote: > Open Issues > =========== > > A counter-proposal to introducing the ``__main__`` attribute on > modules was to introduce a built-in with the same name. The value of > the built-in would be the name of the module being executed (just like > the proposed ``sys.main``). This would lead to a new idiom of:: > > if __name__ == __main__: > ... I like that one. But one thing I've always thought would be handy is a builtin (maybe __this__?) pointing to the current module object itself (instead of its name). Any chance of that happening? In that case, __main__ could globally point to the main module instead of its name. The idiom would then be "if __this__ is __main__:...'. I think that reads pretty well: "If this is [the] main [module, then ...]." From tjreedy at udel.edu Fri Apr 20 20:13:44 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 20 Apr 2007 14:13:44 -0400 Subject: [Python-ideas] PEP for executing a module in a package containingrelative imports References: Message-ID: "Brett Cannon" wrote in message news:bbaeab100704192038v110b053eqfdcf49f613302f8 at mail.gmail.com... | Let me know what you think. I especially want to hear which proposal | people prefer; the one in the PEP or the one in the Open Issues section. This PEP has two proposals, which I think should be better separated. 1. Leave __name__ alone (without the '__main__' hack) so that relative imports work when executing scripts within packages. My comment here is that I am fuzzy on the difference between __name__ and __file__ and why we would then need both. 2. Fix the 'main' self-knowledge problem introduced by fix 1. The 'counter-proposal' is only an alternative to this second proposal, as it agree with the first. I had the same idea as Christian as a third alternative, but as a user would prefer the simplest invocation possible. I agree with Jim that multiple alternatives should be listed. I think the '__main__' hack was both elegant and a wart, and agree that we should seriously consider a pair of coupled fixes. | Plus I wouldn't mind suggestions on a title for this PEP.| =) Package scripts, relative imports, and main identification. Terry Jan Reedy From grosser.meister.morti at gmx.net Fri Apr 20 20:31:50 2007 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Fri, 20 Apr 2007 20:31:50 +0200 Subject: [Python-ideas] ordered dict In-Reply-To: <20070420091127.637D.JCARLSON@uci.edu> References: <4628DF1F.3060803@gmx.net> <20070420091127.637D.JCARLSON@uci.edu> Message-ID: <46290716.6080504@gmx.net> Josiah Carlson schrieb: > > However, the problem with choosing to use an AVL (Red-Black, 2-3, etc.) > tree is deciding semantics. Do you allow duplicate keys? Does dict? no. so no. > Do you allow > insertion and removal by position? Does dict? no. so no. > Do you allow the fetching of the > key/value at position X? Does dict? no. so no. > Do you allow the fetching of the position for > key X? Does dict? no. so no. > Insertion before/after (bisect_left, bisect_right equivalents). > Etcetera. > Why should all this be relevant? It just has to be some kind of relation between a key and a value, and the keys should be accessible in a sorted way (and you should not to have to sort them every time). So it would be possible to slice such a container. > In many cases, using a sorted list gets you what you want, is almost as > fast, and has the benefit of using less memory. > AFAIK does a doubled link list use the same amount of memory as a (very) simple AVL tree: struct tree_node { struct tree_node left; struct tree_node right; void * data; }; struct list_node { struct list_node prev; struct list_node next; void * data; }; > > - Josiah > From tjreedy at udel.edu Fri Apr 20 20:34:58 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 20 Apr 2007 14:34:58 -0400 Subject: [Python-ideas] ordered dict References: <4628DF1F.3060803@gmx.net> Message-ID: "Mathias Panzenb?ck" wrote in message news:4628DF1F.3060803 at gmx.net... | Some kind of ordered dictionary would be nice to have in the | standard library. This has come up frequently, with 'ordered' having two quite different meanings. 1. Order of entry into the dictionary (for use with class definitions, for instance(though don't ask me why!). When a given key is entered just once, this is relatively easy: just append to a subsidiary list. I believe this is being at least considered for 3.0. 2. Order in the sorting or collation sense, which I presume you mean. To reduce confusion, call this a sorted dictionary, as others have done. Regardless, this has the problem that potential keys are not always comparable. This will become worse when most cross-type comparisons are disallowed in 3.0. So pershaps the __init__ method should require a tuple of allowed key types. | e.g. a AVL tree or something like that. ... | A alternative would be just to sort the keys of a dict but | that's O( n log n ) for each sort. Depending on what's the more | often occurring case (lookup, insert, get key-range, etc.) a | other kind of dict object would make sense. | | What do you think? If not already present in PyPI, someone could code an implementation and add it there. When such has be tested and achieved enough usage, then it might be proposed for addition to the collections module. Terry Jan Reedy From jcarlson at uci.edu Fri Apr 20 21:38:20 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 20 Apr 2007 12:38:20 -0700 Subject: [Python-ideas] ordered dict In-Reply-To: <46290716.6080504@gmx.net> References: <20070420091127.637D.JCARLSON@uci.edu> <46290716.6080504@gmx.net> Message-ID: <20070420121603.6387.JCARLSON@uci.edu> Mathias Panzenb?ck wrote: > > Josiah Carlson schrieb: > > > > However, the problem with choosing to use an AVL (Red-Black, 2-3, etc.) > > tree is deciding semantics. Do you allow duplicate keys? > > Does dict? no. so no. > > > Do you allow > > insertion and removal by position? > > Does dict? no. so no. > > > Do you allow the fetching of the > > key/value at position X? > > Does dict? no. so no. > > > Do you allow the fetching of the position for > > key X? > > Does dict? no. so no. > > > Insertion before/after (bisect_left, bisect_right equivalents). > > Etcetera. > > > > Why should all this be relevant? Very few use-cases of trees involve an ordered key/value dictionary. In 90% of the cases where I have needed (and implemented) trees involved one of the following use-cases; sorted keys (but no values), no but fast insertion of value based on position, sorted keys indexed by position or key with (and without) values, etc. Please also understand that the semantics of Python's dictionary is a function of its implementation as an open-addressed hash table. It's useful for 95% of use-cases, but among the remaining 5% (which includes the use-case you have in mind for the structure), there is a huge variety of just as significant uses that shouldn't be discounted. > > In many cases, using a sorted list gets you what you want, is almost as > > fast, and has the benefit of using less memory. > > > > AFAIK does a doubled link list use the same amount of memory as a > (very) simple AVL tree: Python lists aren't linked lists. If you didn't know this, then you don't know enough about the underlying implementation to make comments about what should or should not be available in the base language. - Josiah From talin at acm.org Fri Apr 20 21:50:39 2007 From: talin at acm.org (Talin) Date: Fri, 20 Apr 2007 12:50:39 -0700 Subject: [Python-ideas] ordered dict In-Reply-To: <20070420091127.637D.JCARLSON@uci.edu> References: <4628DF1F.3060803@gmx.net> <20070420091127.637D.JCARLSON@uci.edu> Message-ID: <4629198F.70200@acm.org> Josiah Carlson wrote: > Mathias Panzenb?ck wrote: >> Some kind of ordered dictionary would be nice to have in the >> standard library. e.g. a AVL tree or something like that. >> It would be nice so we can do things like that: >> >> for value in tree[:end_key]: >> do_something_with(value) >> >> del tree[:end_key] >> >> >> A alternative would be just to sort the keys of a dict but >> that's O( n log n ) for each sort. Depending on what's the more >> often occurring case (lookup, insert, get key-range, etc.) a >> other kind of dict object would make sense. >> >> What do you think? > > This has been brought up many times. The general consensus has been > 'you don't get what you think you get'. > > >>> u'a' < 'b' < () < u'a' > True > > That is to say, there isn't a total ordering on objects that would make > sense as a sorted key,value dictionary. In Python 3.0, objects that > don't make sense to compare won't be comparable, so list.sort() and/or > an AVL tree may make sense again. > > However, the problem with choosing to use an AVL (Red-Black, 2-3, etc.) > tree is deciding semantics. Do you allow duplicate keys? Do you allow > insertion and removal by position? Do you allow the fetching of the > key/value at position X? Do you allow the fetching of the position for > key X? Insertion before/after (bisect_left, bisect_right equivalents). > Etcetera. I generally agree. I also think that the term "ordered dictionary" ought to be avoided. One the one hand, I have no particular objection to someone creating an implementation of RB trees, B+-trees, PATRICIA radix trees and so on - in fact, these might be very useful things to have as standard collection classes. However, 'dict' has a whole set of semantic baggage that goes along with it that may or may not apply to these other container types; And similarly, these other container types may have operations and semantics that don't correspond to the standard Python dictionary. One expects to be able to do certain things with an RB-tree that are either disallowed or very inefficient with a regular dict, and the converse is true as well. You give a number of examples such as fetching the position for a given key. So my feeling is - let trees be trees, and dicts be dicts, and don't attempt to conflate the two. Otherwise, you end up with what I like to call the "overfactoring" anti-pattern - that is, attempt to generalize and unify two disparate systems that have different purposes and design intents into a single uniform interface. -- Talin From steven.bethard at gmail.com Sat Apr 21 00:08:16 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Fri, 20 Apr 2007 16:08:16 -0600 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: On 4/20/07, Brett Cannon wrote: > On 4/19/07, Steven Bethard wrote: > > On 4/19/07, Brett Cannon wrote: > > > Transition Plan > > > =============== > > > > > > Using this solution will not work directly in Python 2.6. Code is > > > dependent upon the semantics of having ``__name__`` set to > > > ``'__main__'``. There is also the issue of pre-existing global > > > variables in a module named ``__main__``. > > > > Could you explain a bit why __main__ couldn't be inserted into modules > > before the module is actually executed? E.g. something like:: > > > > >>> module_text = '''\ > > ... __main__ = 'foo' > > ... print __main__ > > ... ''' > > >>> import new > > >>> mod = new.module('mod') > > >>> mod.__main__ = True > > >>> exec module_text in mod.__dict__ > > foo > > >>> mod.__main__ > > 'foo' > > > > I would have thought that if Python inserted __main__ before any of > > the module contents got exec'd, it would be backwards compatible > > because any use of __main__ would just overwrite the default one. > > That's right, and that is the problem. That would mean if __main__ > was false but then overwritten by a function or something, it suddenly > became true. It isn't a problem in terms of whether the code will > run, but whether the expected semantics will occur. Sure, but I don't see how it's much different from anyone who writes:: list = [foo, bar, baz] and then later wonders why:: list(obj) gives a ``TypeError: 'list' object is not callable``. If someone doesn't understand that the __main__ they defined at the beginning of a module is going to be the same __main__ they use at the end of the module, they're going to need to go do some reading about how name binding works in Python anyway. Of course, I definitely think it would be valuable to have a Py3K deprecation warning to help users identify when they've made a silly mistake like this. (Note that the counter-proposal has the same problem, so this needs to be resolved regardless of which approach gets taken.) I'd really like there to be a way to write Python 3.0 compatible code in Python 2.6 without having to run through 2to3. I think it's clear that __main__ can be defined (at module-level or in the builtins) without introducing any backwards compatibility problems right? Anyone that doesn't want to use the Python 3.0 idiom can still write ``if __name__ == '__main__'`` and it will continue to work in Python 2.X. And anyone who does want to use the Python 3.0 idiom is probably using the Py3K flag anyway, so if they make a stupid mistake, it'll get caught pretty quickly. Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From steven.bethard at gmail.com Sat Apr 21 00:14:44 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Fri, 20 Apr 2007 16:14:44 -0600 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: On 4/20/07, Christian Heimes wrote: > What about > > import sys > if __name__ == sys.main: > ... > > You won't have to introduce a new global module var __name__ and it's > easy to understand for newbies and experienced developers. The code is > only executed when the name of the current module is equal to the > executed main module (sys.main). But you have to understand a few things to understand why this works. You have to know that __name__ is the name of the module, and that if you want to find out the name of the main module, you need to look at sys.main. With the idiom:: if __main__: all you need to know is that the main module has __main__ set to true. > IMO it's much less PIT...B then introducing __main__. Could you elaborate? Do you think it would be hard to introduce another module-level attribute (like we already do for __name__)? Or do you think that the code would be hard to maintain? Or something else...? Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From aahz at pythoncraft.com Sat Apr 21 00:30:19 2007 From: aahz at pythoncraft.com (Aahz) Date: Fri, 20 Apr 2007 15:30:19 -0700 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: <20070420223019.GA12929@panix.com> On Fri, Apr 20, 2007, Brett Cannon wrote: > On 4/20/07, Christian Heimes wrote: >> >> What about >> >> import sys >> if __name__ == sys.main: >> ... >> >> You won't have to introduce a new global module var __name__ and it's >> easy to understand for newbies and experienced developers. The code is >> only executed when the name of the current module is equal to the >> executed main module (sys.main). >> IMO it's much less PIT...B then introducing __main__. > > True, but it does introduce an import for a module that may never be > used if the module is not being executed. That kind of sucks for > minor performance reasons. > > But what do other people think? Looks good to me! sys is essentially guaranteed to be imported, so you're only wasting a few cycles to bring it into the module namespace. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ Help a hearing-impaired person: http://rule6.info/hearing.html From brett at python.org Sat Apr 21 03:35:42 2007 From: brett at python.org (Brett Cannon) Date: Fri, 20 Apr 2007 18:35:42 -0700 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: On 4/20/07, Steven Bethard wrote: > On 4/20/07, Brett Cannon wrote: > > On 4/19/07, Steven Bethard wrote: > > > On 4/19/07, Brett Cannon wrote: > > > > Transition Plan > > > > =============== > > > > > > > > Using this solution will not work directly in Python 2.6. Code is > > > > dependent upon the semantics of having ``__name__`` set to > > > > ``'__main__'``. There is also the issue of pre-existing global > > > > variables in a module named ``__main__``. > > > > > > Could you explain a bit why __main__ couldn't be inserted into modules > > > before the module is actually executed? E.g. something like:: > > > > > > >>> module_text = '''\ > > > ... __main__ = 'foo' > > > ... print __main__ > > > ... ''' > > > >>> import new > > > >>> mod = new.module('mod') > > > >>> mod.__main__ = True > > > >>> exec module_text in mod.__dict__ > > > foo > > > >>> mod.__main__ > > > 'foo' > > > > > > I would have thought that if Python inserted __main__ before any of > > > the module contents got exec'd, it would be backwards compatible > > > because any use of __main__ would just overwrite the default one. > > > > That's right, and that is the problem. That would mean if __main__ > > was false but then overwritten by a function or something, it suddenly > > became true. It isn't a problem in terms of whether the code will > > run, but whether the expected semantics will occur. > > Sure, but I don't see how it's much different from anyone who writes:: > > list = [foo, bar, baz] > > and then later wonders why:: > > list(obj) > > gives a ``TypeError: 'list' object is not callable``. > Exactly. It's just that 'list' was known about when the code was written while __main__ was not. > If someone doesn't understand that the __main__ they defined at the > beginning of a module is going to be the same __main__ they use at the > end of the module, they're going to need to go do some reading about > how name binding works in Python anyway. > > Of course, I definitely think it would be valuable to have a Py3K > deprecation warning to help users identify when they've made a silly > mistake like this. > > (Note that the counter-proposal has the same problem, so this needs to > be resolved regardless of which approach gets taken.) > Yep. > I'd really like there to be a way to write Python 3.0 compatible code > in Python 2.6 without having to run through 2to3. I think it's clear > that __main__ can be defined (at module-level or in the builtins) > without introducing any backwards compatibility problems right? Anyone > that doesn't want to use the Python 3.0 idiom can still write ``if > __name__ == '__main__'`` and it will continue to work in Python 2.X. > And anyone who does want to use the Python 3.0 idiom is probably using > the Py3K flag anyway, so if they make a stupid mistake, it'll get > caught pretty quickly. Exactly. Python 2.6 will still have __name__ set to '__main__', but also have __main__ set. Python 3.0 will not change __name__ at all. This is why the PEP is a Py3K PEP and not a 2.6 PEP. -Brett From grosser.meister.morti at gmx.net Sat Apr 21 03:37:52 2007 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Sat, 21 Apr 2007 03:37:52 +0200 Subject: [Python-ideas] ordered dict In-Reply-To: <4628DF1F.3060803@gmx.net> References: <4628DF1F.3060803@gmx.net> Message-ID: <46296AF0.7050608@gmx.net> Ok, now. Forget all I said. Just a short question: When you have to store values accosiated with keys and the keys have to be accessible in a sorted manner. What container type would you use? What data structure would you implement? (I just thought a AVL tree would have been a good choice.) Thanks, panzi From jcarlson at uci.edu Sat Apr 21 04:46:17 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 20 Apr 2007 19:46:17 -0700 Subject: [Python-ideas] ordered dict In-Reply-To: <46296AF0.7050608@gmx.net> References: <4628DF1F.3060803@gmx.net> <46296AF0.7050608@gmx.net> Message-ID: <20070420193620.6399.JCARLSON@uci.edu> Mathias Panzenb?ck wrote: > Ok, now. Forget all I said. Just a short question: > When you have to store values accosiated with keys and the > keys have to be accessible in a sorted manner. What container > type would you use? What data structure would you implement? > (I just thought a AVL tree would have been a good choice.) If you want to use only things that are available in base Python, use a list and the bisect module. If you need O(logn) insertion and removal, then an AVL/Red-Black/2-3 tree with the semantics you described would also work. (I think there is both an AVL and Red-Black tree implementation in the Python package index [1]) If you only need to concern yourself with ordering every once in a while, then x = dct.items();x.sort() works reasonably well. Sometimes a "pair heap" can get you what you are looking for [2]. Data structure choices are tricky. It is usually better to describe the problem and one's approach (why you choose to use a particular algorithm and structure), rather than strictly asking "where can I find data structure X". - Josiah [1] http://www.python.org/pypi/ [2] http://mail.python.org/pipermail/python-dev/2006-November/069845.html From bjourne at gmail.com Sat Apr 21 06:28:18 2007 From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=) Date: Sat, 21 Apr 2007 04:28:18 +0000 Subject: [Python-ideas] ordered dict In-Reply-To: References: <4628DF1F.3060803@gmx.net> Message-ID: <740c3aec0704202128g6537c5bfv94c0f60a5d883d76@mail.gmail.com> On 4/20/07, Terry Reedy wrote: > 2. Order in the sorting or collation sense, which I presume you mean. To > reduce confusion, call this a sorted dictionary, as others have done. > > Regardless, this has the problem that potential keys are not always > comparable. This will become worse when most cross-type comparisons are > disallowed in 3.0. So pershaps the __init__ method should require a tuple > of allowed key types. >>> l = [(), "moo", 123, []] >>> l.sort() >>> l [123, [], 'moo', ()] If it is not a problem for lists it is not a problem for ordered dictionaries. > If not already present in PyPI, someone could code an implementation and > add it there. When such has be tested and achieved enough usage, then it > might be proposed for addition to the collections module. And that is how the currently considered for Python 3.0 ordered dict implementation got into Python? I find it amusing that over the years people have argued against having an ordered dict in Python. But now, when one is considered, only THAT version with THOSE semantics, is good. The rest should go to PyPI. -- mvh Bj?rn From steven.bethard at gmail.com Sat Apr 21 07:19:54 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Fri, 20 Apr 2007 23:19:54 -0600 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: On 4/20/07, Brett Cannon wrote: > Exactly. Python 2.6 will still have __name__ set to '__main__', but > also have __main__ set. Python 3.0 will not change __name__ at all. That should be Python 3.0 will not change __main__ at all, right? Because __name__ is going to change from being "__main__" in the main module to being the actual module name in Python 3.0, right? Assuming that's right, I think it was unclear to me that you wanted to add __main__ to Python 2.x. Probably chainging: First, a Py3K deprecation warning will be raised... to: First, each module will gain a __main__ attribute and a Py3K deprecation warning will be raised... would make the intent clearer. Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From brett at python.org Sat Apr 21 08:18:20 2007 From: brett at python.org (Brett Cannon) Date: Fri, 20 Apr 2007 23:18:20 -0700 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: On 4/20/07, Steven Bethard wrote: > On 4/20/07, Brett Cannon wrote: > > Exactly. Python 2.6 will still have __name__ set to '__main__', but > > also have __main__ set. Python 3.0 will not change __name__ at all. > > That should be Python 3.0 will not change __main__ at all, right? > Because __name__ is going to change from being "__main__" in the main > module to being the actual module name in Python 3.0, right? Yes. > > Assuming that's right, I think it was unclear to me that you wanted to > add __main__ to Python 2.x. Probably chainging: > First, a Py3K deprecation warning will be raised... > to: > First, each module will gain a __main__ attribute and a Py3K > deprecation warning will be raised... > would make the intent clearer. > Yes, __main__ will be defined in 2.6 and a warning raised if someone defines __main__ later on in the module. -Brett From jcarlson at uci.edu Sat Apr 21 09:10:03 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Sat, 21 Apr 2007 00:10:03 -0700 Subject: [Python-ideas] ordered dict In-Reply-To: <740c3aec0704202128g6537c5bfv94c0f60a5d883d76@mail.gmail.com> References: <740c3aec0704202128g6537c5bfv94c0f60a5d883d76@mail.gmail.com> Message-ID: <20070421000102.63A5.JCARLSON@uci.edu> "BJ?rn Lindqvist" wrote: > > On 4/20/07, Terry Reedy wrote: > > 2. Order in the sorting or collation sense, which I presume you mean. To > > reduce confusion, call this a sorted dictionary, as others have done. > > > > Regardless, this has the problem that potential keys are not always > > comparable. This will become worse when most cross-type comparisons are > > disallowed in 3.0. So pershaps the __init__ method should require a tuple > > of allowed key types. > > >>> l = [(), "moo", 123, []] > >>> l.sort() > >>> l > [123, [], 'moo', ()] > > If it is not a problem for lists it is not a problem for ordered dictionaries. It's about a total ordering. Without a total ordering, you won't necessarily be able to *find* an object even if it is in there. >>> import random >>> a = ['b', (), u'a'] >>> a.sort() >>> a ['b', (), u'a'] >>> random.shuffle(a) >>> a.sort() >>> a [u'a', 'b', ()] Also, in 3.0, objects will only be orderable if they are of compatible type. str and tuple are not compatible, so will raise an exception when something like "" < () is performed. > > If not already present in PyPI, someone could code an implementation and > > add it there. When such has be tested and achieved enough usage, then it > > might be proposed for addition to the collections module. > > And that is how the currently considered for Python 3.0 ordered dict > implementation got into Python? > > I find it amusing that over the years people have argued against > having an ordered dict in Python. But now, when one is considered, > only THAT version with THOSE semantics, is good. The rest should go to > PyPI. No, the "ordered dict" that is making its way into Python 3.0 is specifically ordered based on insertion order, and is to make more reasonable database interfaces like... class Person(db.table): firstname = str ... Its implementation is also a very simple variant of a dictionary, which isn't the case with any tree implementation. Further, because there are *so many* possible behaviors for a dictionary ordered by keys implemented as a tree, picking one (or even a small set of them) is guaranteed to raise comments of "can't we have one that does X too?" - Josiah From lists at cheimes.de Sat Apr 21 16:25:57 2007 From: lists at cheimes.de (Christian Heimes) Date: Sat, 21 Apr 2007 16:25:57 +0200 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: Steven Bethard wrote: > But you have to understand a few things to understand why this works. > You have to know that __name__ is the name of the module, and that if > you want to find out the name of the main module, you need to look at > sys.main. With the idiom:: > > if __main__: > > all you need to know is that the main module has __main__ set to true. > >> IMO it's much less PIT...B then introducing __main__. > > Could you elaborate? Do you think it would be hard to introduce > another module-level attribute (like we already do for __name__)? Or > do you think that the code would be hard to maintain? Or something > else...? This is just my humble opinion. I'm new to Python core development. Well, in my opinion a new module level var like __main__ isn't worth to add when it is just boolean flag. With the proposed addition of sys.main the same information is available with just few more characters to type. If I recall correctly Python is trying to get rid of global variables in Python 3000. I don't think it's hard to add - even for me although I know less about the Python core. I'm more worried about the side effect when people have already used __main__ as a function. The problem is in 2to3. If you like to introduce __main__ why not implement http://www.python.org/dev/peps/pep-0299 ? It proposes a __main__(*argv*) module level function that replaced the "if __name__ == '__main__'" idiom. The __main__ function follows the example of other programming languages like C, C# and Java. I'm aware of the fact that the PEP was rejected but I think it's worth to discuss it again. Christian From tjreedy at udel.edu Sat Apr 21 16:59:46 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 21 Apr 2007 10:59:46 -0400 Subject: [Python-ideas] ordered dict References: <4628DF1F.3060803@gmx.net> <740c3aec0704202128g6537c5bfv94c0f60a5d883d76@mail.gmail.com> Message-ID: "BJ?rn Lindqvist" wrote in message news:740c3aec0704202128g6537c5bfv94c0f60a5d883d76 at mail.gmail.com... >On 4/20/07, Terry Reedy wrote: >> 2. Order in the sorting or collation sense, which I presume you mean. >> To >> reduce confusion, call this a sorted dictionary, as others have done. >> Regardless, this has the problem that potential keys are not always >> comparable. Current example: >>> [1, 1j].sort() Traceback (most recent call last): File "", line 1, in -toplevel- [1, 1j].sort() TypeError: no ordering relation is defined for complex numbers >> This will become worse when most cross-type comparisons are >> disallowed in 3.0. > >>> l = [(), "moo", 123, []] > >>> l.sort() > >>> l > [123, [], 'moo', ()] Py 3.0 will raise an exception here as these will all be incomparable. > If it is not a problem for lists it is not a problem for ordered > dictionaries. But it *is* currently a problem for lists that will become much more extensive in the future, so it *is* currently a problem for sorted dicts that will be much more of a problem in the future. Hence, sorted dicts will have to be restricted to one type or one group of truly comparable types. Terry Jan Reedy From bjourne at gmail.com Sat Apr 21 17:17:07 2007 From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=) Date: Sat, 21 Apr 2007 15:17:07 +0000 Subject: [Python-ideas] ordered dict In-Reply-To: References: <4628DF1F.3060803@gmx.net> <740c3aec0704202128g6537c5bfv94c0f60a5d883d76@mail.gmail.com> Message-ID: <740c3aec0704210817m3e11a9e5lb3325523e2490348@mail.gmail.com> On 4/21/07, Terry Reedy wrote: > But it *is* currently a problem for lists that will become much more > extensive in the future, so it *is* currently a problem for sorted dicts > that will be much more of a problem in the future. Hence, sorted dicts > will have to be restricted to one type or one group of truly comparable > types. Alternatively, you could require a comparator function to be specified at creation time. -- mvh Bj?rn From brett at python.org Sat Apr 21 19:54:07 2007 From: brett at python.org (Brett Cannon) Date: Sat, 21 Apr 2007 10:54:07 -0700 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: On 4/21/07, Christian Heimes wrote: [SNIP] > If you like to introduce __main__ why not implement > http://www.python.org/dev/peps/pep-0299 ? It proposes a __main__(*argv*) > module level function that replaced the "if __name__ == '__main__'" > idiom. The __main__ function follows the example of other programming > languages like C, C# and Java. Because I don't like the solution and thus didn't want to do the footwork for it. =) -Brett From steven.bethard at gmail.com Sat Apr 21 20:12:57 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Sat, 21 Apr 2007 12:12:57 -0600 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: On 4/21/07, Christian Heimes wrote: > If you like to introduce __main__ why not implement > http://www.python.org/dev/peps/pep-0299 ? It proposes a __main__(*argv*) > module level function that replaced the "if __name__ == '__main__'" > idiom. The __main__ function follows the example of other programming > languages like C, C# and Java. I don't like the __main__ function signature. There are lots of options, like optparse and argparse_ that are much better than manually parsing sys.argv as the PEP 299 signature would suggest. And if there's nothing to be passed to the function, why make it a function at all? Personally, I thought one of the pluses of the current status quo (as well as what Brett is proposing here) is that it *didn't* follow in the (misplaced IMHO) footsteps of languages like C and Java. I think we're probably best letting dead PEPs lie. .. _argparse: http://argparse.python-hosting.com/ STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From jcarlson at uci.edu Sat Apr 21 20:29:44 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Sat, 21 Apr 2007 11:29:44 -0700 Subject: [Python-ideas] ordered dict In-Reply-To: <740c3aec0704210817m3e11a9e5lb3325523e2490348@mail.gmail.com> References: <740c3aec0704210817m3e11a9e5lb3325523e2490348@mail.gmail.com> Message-ID: <20070421112051.63AB.JCARLSON@uci.edu> "BJ?rn Lindqvist" wrote: > On 4/21/07, Terry Reedy wrote: > > But it *is* currently a problem for lists that will become much more > > extensive in the future, so it *is* currently a problem for sorted dicts > > that will be much more of a problem in the future. Hence, sorted dicts > > will have to be restricted to one type or one group of truly comparable > > types. > > Alternatively, you could require a comparator function to be specified > at creation time. You could, but that would imply a total ordering on elements that Python itself is removing because it doesn't make any sense. Including a list of 'acceptable' classes as Terry has suggested would work, but would generally be superfluous. The moment a user first added an object to the sorted dictionary is the moment the type of objects that can be inserted is easily limited (hello Abstract Base Classes PEP!) - Josiah From jh at improva.dk Sat Apr 21 22:27:36 2007 From: jh at improva.dk (Jacob Holm) Date: Sat, 21 Apr 2007 16:27:36 -0400 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: <462A73B8.4080406@improva.dk> Steven Bethard wrote: > On 4/21/07, Christian Heimes wrote: > >> If you like to introduce __main__ why not implement >> http://www.python.org/dev/peps/pep-0299 ? It proposes a __main__(*argv*) >> module level function that replaced the "if __name__ == '__main__'" >> idiom. The __main__ function follows the example of other programming >> languages like C, C# and Java. >> > > I don't like the __main__ function signature. There are lots of > options, like optparse and argparse_ that are much better than > manually parsing sys.argv as the PEP 299 signature would suggest. I agree that optparse and argparse are better ways to parse a command line than using sys.argv directly, but nothing in PEP299 would prevent you from using them. In fact, I am pretty sure that with a suitable decorator on __main__ you could make their use even simpler. > And > if there's nothing to be passed to the function, why make it a > function at all? Because you may want to call it from somewhere else, possibly with different arguments? > Personally, I thought one of the pluses of the > current status quo (as well as what Brett is proposing here) is that > it *didn't* follow in the (misplaced IMHO) footsteps of languages like > C and Java. I think we're probably best letting dead PEPs lie. > I find it very sad that PEP299 did in fact die, because I think it is much cleaner solution than the proposal that started this thread. That said, I would like to se a way to remove the __name__=='__main__' weirdness. I am +1 on resurrecting PEP299, but also +1 on adding a "sys.main" that could be used in a new "if __name__=sys.main". I am -1 on adding a builtin/global __main__ as proposed, because that would clash with my own PEP299-like use of that name. Jacob -- Jacob Holm CTO Improva ApS From jcarlson at uci.edu Sat Apr 21 23:09:23 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Sat, 21 Apr 2007 14:09:23 -0700 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: <20070419235504.6374.JCARLSON@uci.edu> Message-ID: <20070421135948.63AF.JCARLSON@uci.edu> After reading other posts in the thread, I'm going to put my support into the sys.main variant. It has all of the benefits of the builtin __name__ == __main__, with none of the drawbacks (no builtin!), and only a slight annoyance of 'import sys', which is more or less free. - Josiah From jimjjewett at gmail.com Sun Apr 22 00:03:03 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Sat, 21 Apr 2007 18:03:03 -0400 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: On 4/20/07, Brett Cannon wrote: > On 4/20/07, Steven Bethard wrote: > > On 4/20/07, Brett Cannon wrote: > > > On 4/19/07, Steven Bethard wrote: > > > > I would have thought that if Python inserted __main__ before any of > > > > the module contents got exec'd, it would be backwards compatible > > > > because any use of __main__ would just overwrite the default one. > > > That's right, and that is the problem. That would mean if __main__ > > > was false but then overwritten by a function or something, it suddenly > > > became true. It isn't a problem in terms of whether the code will > > > run, but whether the expected semantics will occur. If the code is still using a __main__ variable of its own, then presumably it isn't using the new meaning of __main__, and isn't affected by the unexpected semantics. Or are you concerned that some code *outside* a module could check to see whether that module is __main__? > > Sure, but I don't see how it's much different from anyone who writes:: > > list = [foo, bar, baz] > > and then later wonders why:: > > list(obj) > > gives a ``TypeError: 'list' object is not callable``. > Exactly. It's just that 'list' was known about when the code was > written while __main__ was not. In that case, the module itself isn't using (and doesn't care) about the new __main__ semantics. Code external to the module can't rely on either (list or __main__) being unchanged, even today. > > I'd really like there to be a way to write Python 3.0 compatible code > > in Python 2.6 without having to run through 2to3. To me, this is a fairly important requirement that I fear is sometimes being forgotten. 2to3 isn't really a one-time translation unless you stop supporting 2.x after running it. -jJ From rrr at ronadam.com Sun Apr 22 00:50:29 2007 From: rrr at ronadam.com (Ron Adam) Date: Sat, 21 Apr 2007 17:50:29 -0500 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: <462A73B8.4080406@improva.dk> References: <462A73B8.4080406@improva.dk> Message-ID: <462A9535.3010600@ronadam.com> Jacob Holm wrote: > Steven Bethard wrote: >> On 4/21/07, Christian Heimes wrote: >> >>> If you like to introduce __main__ why not implement >>> http://www.python.org/dev/peps/pep-0299 ? It proposes a __main__(*argv*) >>> module level function that replaced the "if __name__ == '__main__'" >>> idiom. The __main__ function follows the example of other programming >>> languages like C, C# and Java. >>> >> I don't like the __main__ function signature. There are lots of >> options, like optparse and argparse_ that are much better than >> manually parsing sys.argv as the PEP 299 signature would suggest. > > I agree that optparse and argparse are better ways to parse a command > line than using sys.argv directly, but nothing in PEP299 would prevent > you from using them. > In fact, I am pretty sure that with a suitable decorator on __main__ you > could make their use even simpler. > >> And >> if there's nothing to be passed to the function, why make it a >> function at all? > Because you may want to call it from somewhere else, possibly with > different arguments? > >> Personally, I thought one of the pluses of the >> current status quo (as well as what Brett is proposing here) is that >> it *didn't* follow in the (misplaced IMHO) footsteps of languages like >> C and Java. I think we're probably best letting dead PEPs lie. >> > > I find it very sad that PEP299 did in fact die, because I think it is > much cleaner solution than the proposal that started this thread. > > That said, I would like to se a way to remove the __name__=='__main__' > weirdness. I am +1 on resurrecting PEP299, but also +1 on adding a > "sys.main" that could be used in a new "if __name__=sys.main". I am -1 > on adding a builtin/global __main__ as proposed, because that would > clash with my own PEP299-like use of that name. I had at one time (about 4 years ago) thought it was a bit strange. But that was only for a very short while. Python differs from other languages in a very important way. python *always* starts at the top of the file and works it way down until if falls off the bottom. What it does in between the top and the bottom is entirely up to you. It's very dynamic. Other languages *compile* all the code first without executing any of it. Then you are required to tell the the compiler where the program will start, which is why you need to define a main() function. In Python, letting control fall off the bottom in order to start again at some place in the middle doesn't make much sense. It's already started, so you don't need to do that. Cheers, Ron From brett at python.org Sun Apr 22 01:49:47 2007 From: brett at python.org (Brett Cannon) Date: Sat, 21 Apr 2007 16:49:47 -0700 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: <20070421135948.63AF.JCARLSON@uci.edu> References: <20070419235504.6374.JCARLSON@uci.edu> <20070421135948.63AF.JCARLSON@uci.edu> Message-ID: On 4/21/07, Josiah Carlson wrote: > > After reading other posts in the thread, I'm going to put my support > into the sys.main variant. It has all of the benefits of the builtin __name__ > == __main__, with none of the drawbacks (no builtin!), and only a slight > annoyance of 'import sys', which is more or less free. > Yeah, I am starting to like it as well. Steven and Jim, what do you think? -Brett From jh at improva.dk Sun Apr 22 02:04:04 2007 From: jh at improva.dk (Jacob Holm) Date: Sat, 21 Apr 2007 20:04:04 -0400 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: <462A9535.3010600@ronadam.com> References: <462A73B8.4080406@improva.dk> <462A9535.3010600@ronadam.com> Message-ID: <462AA674.1020803@improva.dk> Ron Adam wrote: > > Jacob Holm wrote: >> I find it very sad that PEP299 did in fact die, because I think it is >> much cleaner solution than the proposal that started this thread. >> That said, I would like to se a way to remove the >> __name__=='__main__' weirdness. I am +1 on resurrecting PEP299, but >> also +1 on adding a "sys.main" that could be used in a new "if >> __name__=sys.main". I am -1 on adding a builtin/global __main__ as >> proposed, because that would clash with my own PEP299-like use of >> that name. > > I had at one time (about 4 years ago) thought it was a bit strange. > But that was only for a very short while. > To clarify: By "weirdness" here I meant the fact that the name of a module changes when it is used as the main module. > Python differs from other languages in a very important way. python > *always* starts at the top of the file and works it way down until if > falls off the bottom. What it does in between the top and the bottom > is entirely up to you. It's very dynamic. > > Other languages *compile* all the code first without executing any of > it. Then you are required to tell the the compiler where the program > will start, which is why you need to define a main() function. > I know all that. > In Python, letting control fall off the bottom in order to start again > at some place in the middle doesn't make much sense. It's already > started, so you don't need to do that. There are a number of reasons to want to use a function for the main part of the code, instead of putting it in an "if" at the end of the module. Two simple ones are: Keeping the module namespace clean. The ability to call the function from other code, most likely with different args. Since I am usually writing such a function anyway, I would prefer not to have to write the "if" boilerplate at the bottom in order to get it called. Oh, and automatically calling a __main__ function if it exists, does not prevent people who like the current "if" aproach from using that. It would just make *my* life that tiny bit easier. Therefore I would like to keep that door open by *not* adding the proposed __main__ variable at this point. Fortunately, the people that matter here seem to think avoiding the extra variable is a good idea (although for different reasons). Jacob -- Jacob Holm CTO Improva ApS From ironfroggy at gmail.com Sun Apr 22 06:14:59 2007 From: ironfroggy at gmail.com (Calvin Spealman) Date: Sun, 22 Apr 2007 00:14:59 -0400 Subject: [Python-ideas] partial with skipped arguments Message-ID: <76fd5acf0704212114y22db7080je41e378c6ceaa994@mail.gmail.com> I often wish you could bind to arguments in a partial out of order, skipping some positionals. The solution I came up with is a singleton object located as an attribute of the partial function itself and used like this: def foo(a, b): return a / b pf = partial(foo, partial.skip, 2) assert pf(1.0) == 0.5 -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://ironfroggy-code.blogspot.com/ From tjreedy at udel.edu Sun Apr 22 06:57:06 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 22 Apr 2007 00:57:06 -0400 Subject: [Python-ideas] ordered dict References: <740c3aec0704210817m3e11a9e5lb3325523e2490348@mail.gmail.com> <20070421112051.63AB.JCARLSON@uci.edu> Message-ID: "Josiah Carlson" wrote in message news:20070421112051.63AB.JCARLSON at uci.edu... > Including a list of 'acceptable' classes as Terry has suggested would > work, but would > generally be superfluous. I realized that later. The main use would be to improve the error message, or allow introspection ("Sdict, what can I put in you?"). tjr From steven.bethard at gmail.com Sun Apr 22 07:10:16 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Sat, 21 Apr 2007 23:10:16 -0600 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: <20070419235504.6374.JCARLSON@uci.edu> <20070421135948.63AF.JCARLSON@uci.edu> Message-ID: On 4/21/07, Brett Cannon wrote: > On 4/21/07, Josiah Carlson wrote: > > > > After reading other posts in the thread, I'm going to put my support > > into the sys.main variant. It has all of the benefits of the builtin __name__ > > == __main__, with none of the drawbacks (no builtin!), and only a slight > > annoyance of 'import sys', which is more or less free. > > Yeah, I am starting to like it as well. Steven and Jim, what do you think? Note that the one benefit the sys.main-only variant doesn't have is the lower cognitive load of just having to know about __main__, instead of having to know about __name__, import and sys.main. That said, since the PEP as it stands introduces a sys.main anyway, we might as well start with that. People can then play around with it and see if we need to introduce a __main__ module attribute or builtin as well. Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From steven.bethard at gmail.com Sun Apr 22 07:14:01 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Sat, 21 Apr 2007 23:14:01 -0600 Subject: [Python-ideas] partial with skipped arguments In-Reply-To: <76fd5acf0704212114y22db7080je41e378c6ceaa994@mail.gmail.com> References: <76fd5acf0704212114y22db7080je41e378c6ceaa994@mail.gmail.com> Message-ID: On 4/21/07, Calvin Spealman wrote: > I often wish you could bind to arguments in a partial out of order, > skipping some positionals. The solution I came up with is a singleton > object located as an attribute of the partial function itself and used > like this: > > def foo(a, b): > return a / b > pf = partial(foo, partial.skip, 2) > assert pf(1.0) == 0.5 The other way I've seen this proposed is as:: rpartial(foo, 2) In this particular situation, you could also just write:: partial(foo, b=2) I think the presence of keyword argument support is why rpartial wasn't added originally. Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From ironfroggy at gmail.com Sun Apr 22 14:11:55 2007 From: ironfroggy at gmail.com (Calvin Spealman) Date: Sun, 22 Apr 2007 08:11:55 -0400 Subject: [Python-ideas] partial with skipped arguments In-Reply-To: References: <76fd5acf0704212114y22db7080je41e378c6ceaa994@mail.gmail.com> Message-ID: <76fd5acf0704220511wb7e5026odd3daf21f68b396b@mail.gmail.com> On 4/22/07, Steven Bethard wrote: > On 4/21/07, Calvin Spealman wrote: > > I often wish you could bind to arguments in a partial out of order, > > skipping some positionals. The solution I came up with is a singleton > > object located as an attribute of the partial function itself and used > > like this: > > > > def foo(a, b): > > return a / b > > pf = partial(foo, partial.skip, 2) > > assert pf(1.0) == 0.5 > > The other way I've seen this proposed is as:: > > rpartial(foo, 2) > > In this particular situation, you could also just write:: > > partial(foo, b=2) > > I think the presence of keyword argument support is why rpartial > wasn't added originally. > > Steve > -- > I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a > tiny blip on the distant coast of sanity. > --- Bucky Katt, Get Fuzzy Relying on the names of position arguments is not always a good idea, of course. Also, it doesn't work at all with builtin (and extension?) functions. The design is a little different, but I like it. Also, the rpartial idea just creates multiple names for essentially the same thing and still doesn't allow for skipping middle arguments or specify only middle arguments, etc. I'd like to write a patch, if it would be considered. -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://ironfroggy-code.blogspot.com/ From steven.bethard at gmail.com Sun Apr 22 17:15:58 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Sun, 22 Apr 2007 09:15:58 -0600 Subject: [Python-ideas] partial with skipped arguments In-Reply-To: <76fd5acf0704220511wb7e5026odd3daf21f68b396b@mail.gmail.com> References: <76fd5acf0704212114y22db7080je41e378c6ceaa994@mail.gmail.com> <76fd5acf0704220511wb7e5026odd3daf21f68b396b@mail.gmail.com> Message-ID: On 4/22/07, Calvin Spealman wrote: > On 4/22/07, Steven Bethard wrote: > > On 4/21/07, Calvin Spealman wrote: > > > I often wish you could bind to arguments in a partial out of order, > > > skipping some positionals. The solution I came up with is a singleton > > > object located as an attribute of the partial function itself and used > > > like this: > > > > > > def foo(a, b): > > > return a / b > > > pf = partial(foo, partial.skip, 2) > > > assert pf(1.0) == 0.5 > > > > The other way I've seen this proposed is as:: > > > > rpartial(foo, 2) > > > > In this particular situation, you could also just write:: > > > > partial(foo, b=2) > > Relying on the names of position arguments is not always a good idea, > of course. Also, it doesn't work at all with builtin (and extension?) > functions. The design is a little different, but I like it. Also, the > rpartial idea just creates multiple names for essentially the same > thing and still doesn't allow for skipping middle arguments or specify > only middle arguments, etc. I'd like to write a patch, if it would be > considered. Well, I can pretty much guarantee you'll get the two responses above, so if you post a patch, make sure you let python-dev know that you've already considered these options and don't see them as satisfactory. Your best bet of convincing people is probably to find a few real-world use cases and post the corresponding code. Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From collinw at gmail.com Sun Apr 22 18:14:01 2007 From: collinw at gmail.com (Collin Winter) Date: Sun, 22 Apr 2007 09:14:01 -0700 Subject: [Python-ideas] partial with skipped arguments In-Reply-To: <76fd5acf0704212114y22db7080je41e378c6ceaa994@mail.gmail.com> References: <76fd5acf0704212114y22db7080je41e378c6ceaa994@mail.gmail.com> Message-ID: <43aa6ff70704220914l489de151ta648c485155446f8@mail.gmail.com> On 4/21/07, Calvin Spealman wrote: > I often wish you could bind to arguments in a partial out of order, > skipping some positionals. The solution I came up with is a singleton > object located as an attribute of the partial function itself and used > like this: > > def foo(a, b): > return a / b > pf = partial(foo, partial.skip, 2) > assert pf(1.0) == 0.5 In Python 2.5.0: >>> import functools >>> def f(a, b): ... return a + b ... >>> p = functools.partial(f, b=9) >>> p >>> p(3) 12 >>> Is this what you're looking for? Collin Winter From jimjjewett at gmail.com Sun Apr 22 19:11:52 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 22 Apr 2007 13:11:52 -0400 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: <20070419235504.6374.JCARLSON@uci.edu> <20070421135948.63AF.JCARLSON@uci.edu> Message-ID: On 4/21/07, Brett Cannon wrote: > On 4/21/07, Josiah Carlson wrote: > > After reading other posts in the thread, I'm going to put my support > > into the sys.main variant. It has all of the benefits of the builtin __name__ > > == __main__, with none of the drawbacks (no builtin!), and only a slight > > annoyance of 'import sys', which is more or less free. > Yeah, I am starting to like it as well. Steven and Jim, what do you think? Better than adding a builtin. I'm not sure I like the idea of another semi-random object in sys either, though. (1) One of the motivations was importing. It looks like __file__ already has sufficient information. I understand that relying on it (or on __package__?) seems a bit hacky, but is it really worse than adding something? (2) Is there a reason the main module can't appear in sys.modules twice, once under the alias "__main__"? # Equivalent to today if __name__ == sys.modules["__main__"].__name__: # Better than today if __name__ is sys.modules["__main__"].__name__: # What I would like (pending PEP I hope to write tonight) if __this_module__ is sys.modules["__main__"]: -jJ From jcarlson at uci.edu Sun Apr 22 19:39:57 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Sun, 22 Apr 2007 10:39:57 -0700 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: <20070422102818.63B5.JCARLSON@uci.edu> "Jim Jewett" wrote: > > On 4/21/07, Brett Cannon wrote: > > On 4/21/07, Josiah Carlson wrote: > > > > After reading other posts in the thread, I'm going to put my support > > > into the sys.main variant. It has all of the benefits of the builtin __name__ > > > == __main__, with none of the drawbacks (no builtin!), and only a slight > > > annoyance of 'import sys', which is more or less free. > > > Yeah, I am starting to like it as well. Steven and Jim, what do you think? > > Better than adding a builtin. > > I'm not sure I like the idea of another semi-random object in sys > either, though. > > (1) One of the motivations was importing. It looks like __file__ > already has sufficient information. I understand that relying on it > (or on __package__?) seems a bit hacky, but is it really worse than > adding something? > > (2) Is there a reason the main module can't appear in sys.modules > twice, once under the alias "__main__"? While it is unlikely, there may be cleanup issues when the process is ending. > # Equivalent to today > if __name__ == sys.modules["__main__"].__name__: > > # Better than today > if __name__ is sys.modules["__main__"].__name__: The above two should be equivalent unless the importer has a bad habit. > # What I would like (pending PEP I hope to write tonight) > if __this_module__ is sys.modules["__main__"]: While I would also very much like the ability to access *this module*, I don't believe that this necessarily precludes the use of a proper package.module naming scheme for all __name__ values. - Josiah From steven.bethard at gmail.com Sun Apr 22 19:42:38 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Sun, 22 Apr 2007 11:42:38 -0600 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: <20070419235504.6374.JCARLSON@uci.edu> <20070421135948.63AF.JCARLSON@uci.edu> Message-ID: On 4/22/07, Jim Jewett wrote: > # Equivalent to today > if __name__ == sys.modules["__main__"].__name__: > > # Better than today > if __name__ is sys.modules["__main__"].__name__: > > # What I would like (pending PEP I hope to write tonight) > if __this_module__ is sys.modules["__main__"]: Is it just me, or are the proposals starting to look more and more like:: public static void main(String args[]) I think this PEP now needs to explicitly state that keeping the "am I the main module?" idiom as simple as possible is *not* a goal. Because everything I've seen (except for the original proposals in the PEP) are substantially more complicated than the current:: if __name__ == '__main__': I guess I don't understand why we wouldn't be willing to put up with a new module attribute or builtin to minimize the boilerplate in pretty much every Python application out there. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From lists at cheimes.de Sun Apr 22 20:15:43 2007 From: lists at cheimes.de (Christian Heimes) Date: Sun, 22 Apr 2007 20:15:43 +0200 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: <20070419235504.6374.JCARLSON@uci.edu> <20070421135948.63AF.JCARLSON@uci.edu> Message-ID: Steven Bethard wrote: > I think this PEP now needs to explicitly state that keeping the "am I > the main module?" idiom as simple as possible is *not* a goal. Because > everything I've seen (except for the original proposals in the PEP) > are substantially more complicated than the current:: > > if __name__ == '__main__': > I'm proposing the following changes: * sys.main is added which contains the dotted name of the main script. This allows code like: if __name__ == sys.main: ... main_module = sys.modules[sys.main] * __name__ is never mangled and contains always the dotted name of the current module. It's not set to '__main__' any more. You can get the current module object with this_module = sys.modules[__name__] * I'm against sys.modules['__main__] = main_module because it may cause ugly side effects with reload. The same functionality is available with sys.modules[sys.main]. The Zen Of Python says that there should be one and only one obvious way. > I guess I don't understand why we wouldn't be willing to put up with a > new module attribute or builtin to minimize the boilerplate in pretty > much every Python application out there. Why bother with the second price when you can win the first prize? In my opinion a __main__() function makes live easier than a __main__ module level variable. It's also my opinion that the main code should be in a function and not in the body of the module. I consider it good style because the code is unit testable (is this a word? *g*) and callable from another module while code in the body is not accessable from unit tests and other scripts. I know that some people are against __main__(argv) but I've good reasons to propose the argv syntax. Although argv is available via sys.argv I like the see it as an argument for __main__() for the same reasons I like to see __main__. It makes unit testing and calls from another module possible. W/o the argv argument is harder to change the argument in unit tests. Now for some syntactic sugar and a dream of mine: @argumentdecorator(MyOptionParserClass) def __main__(egg, spam=5): pass The argumentdecorator function takes some kind of option parser class that is used to parse argv. This would allow nice code like __main__(('mainscript.py', '--eggs 5', '--no-spam')) Christian From steven.bethard at gmail.com Sun Apr 22 20:32:03 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Sun, 22 Apr 2007 12:32:03 -0600 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: <20070419235504.6374.JCARLSON@uci.edu> <20070421135948.63AF.JCARLSON@uci.edu> Message-ID: On 4/22/07, Christian Heimes wrote: > Steven Bethard wrote: > > I think this PEP now needs to explicitly state that keeping the "am I > > the main module?" idiom as simple as possible is *not* a goal. Because > > everything I've seen (except for the original proposals in the PEP) > > are substantially more complicated than the current:: > > > > if __name__ == '__main__': > > > > I'm proposing the following changes: > > * sys.main is added which contains the dotted name of the main script. > This allows code like: > > if __name__ == sys.main: > ... Note that this really requires the code:: import sys if __name__ == sys.main: The import statement matters to me because 77% of my modules that use the __main__ idiom *don't* import sys. Hence, for those modules, this new idiom introduces more boilerplate. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From brett at python.org Sun Apr 22 20:39:01 2007 From: brett at python.org (Brett Cannon) Date: Sun, 22 Apr 2007 11:39:01 -0700 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: <20070419235504.6374.JCARLSON@uci.edu> <20070421135948.63AF.JCARLSON@uci.edu> Message-ID: On 4/22/07, Jim Jewett wrote: > On 4/21/07, Brett Cannon wrote: > > On 4/21/07, Josiah Carlson wrote: > > > > After reading other posts in the thread, I'm going to put my support > > > into the sys.main variant. It has all of the benefits of the builtin __name__ > > > == __main__, with none of the drawbacks (no builtin!), and only a slight > > > annoyance of 'import sys', which is more or less free. > > > Yeah, I am starting to like it as well. Steven and Jim, what do you think? > > Better than adding a builtin. > > I'm not sure I like the idea of another semi-random object in sys > either, though. > > (1) One of the motivations was importing. It looks like __file__ > already has sufficient information. I understand that relying on it > (or on __package__?) seems a bit hacky, but is it really worse than > adding something? > Yes, because you have no guarantee __file__ will in any way be unique or even defined (look at 'sys'). It's up to the loader to set __file__ and it can do whatever it wants. This doesn't happen with __name__ since it is rather clear what that should be no matter where the module was loaded from (unless it was a Python file specified at the command line in some random directory). -Brett From brett at python.org Sun Apr 22 20:44:56 2007 From: brett at python.org (Brett Cannon) Date: Sun, 22 Apr 2007 11:44:56 -0700 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: <20070419235504.6374.JCARLSON@uci.edu> <20070421135948.63AF.JCARLSON@uci.edu> Message-ID: On 4/22/07, Christian Heimes wrote: > Steven Bethard wrote: > > I think this PEP now needs to explicitly state that keeping the "am I > > the main module?" idiom as simple as possible is *not* a goal. Because > > everything I've seen (except for the original proposals in the PEP) > > are substantially more complicated than the current:: > > > > if __name__ == '__main__': > > > > I'm proposing the following changes: > > * sys.main is added which contains the dotted name of the main script. > This allows code like: > > if __name__ == sys.main: > ... > > main_module = sys.modules[sys.main] > > * __name__ is never mangled and contains always the dotted name of > the current module. It's not set to '__main__' any more. That can't be true. If I am in the directory /spam but I execute the file /bacon/code.py, what is the name of /bacon/code.py supposed to be? It makes absolutely no sense unless sys.path happens to have either / or /bacon. This is why I wondered out loud if setting whatever attribute that is chosen not to __main__ should only be done with '-m' as that keeps it simple and clear instead of having to try to reverse-engineer a file's __name__ attribute. >You can > get the current module object with > > this_module = sys.modules[__name__] > > * I'm against sys.modules['__main__] = main_module because it may > cause ugly side effects with reload. I assume that key is a string? There is a single quote that is not closed off. > The same functionality is > available with sys.modules[sys.main]. The Zen Of Python says that > there should be one and only one obvious way. > > > I guess I don't understand why we wouldn't be willing to put up with a > > new module attribute or builtin to minimize the boilerplate in pretty > > much every Python application out there. > > Why bother with the second price when you can win the first prize? In my > opinion a __main__() function makes live easier than a __main__ module > level variable. It's also my opinion that the main code should be in a > function and not in the body of the module. I consider it good style > because the code is unit testable (is this a word? *g*) and callable > from another module while code in the body is not accessable from unit > tests and other scripts. People can stop wishing for this. I am not going to be writing a PEP supporting this. I don't like it; never have. I like how Python handles things currently in terms of relying on how module are executed linearly. I am totally fine if people propose a competing PEP or try to resurrect PEP 299, but I am not going to be the person who does that leg work. -Brett From lists at cheimes.de Sun Apr 22 20:54:57 2007 From: lists at cheimes.de (Christian Heimes) Date: Sun, 22 Apr 2007 20:54:57 +0200 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: <20070419235504.6374.JCARLSON@uci.edu> <20070421135948.63AF.JCARLSON@uci.edu> Message-ID: Brett Cannon wrote: >> * __name__ is never mangled and contains always the dotted name of >> the current module. It's not set to '__main__' any more. > > That can't be true. If I am in the directory /spam but I execute the > file /bacon/code.py, what is the name of /bacon/code.py supposed to > be? It makes absolutely no sense unless sys.path happens to have > either / or /bacon. This is why I wondered out loud if setting > whatever attribute that is chosen not to __main__ should only be done > with '-m' as that keeps it simple and clear instead of having to try > to reverse-engineer a file's __name__ attribute. I haven't thought of that issue. :( >> * I'm against sys.modules['__main__] = main_module because it may >> cause ugly side effects with reload. > > I assume that key is a string? There is a single quote that is not closed off. Yes, it's a typo. It should say sys.modules['__main__']. > I am totally fine if people propose a competing PEP or try to > resurrect PEP 299, but I am not going to be the person who does that > leg work. Understood! :) Christian From lists at cheimes.de Sun Apr 22 21:00:28 2007 From: lists at cheimes.de (Christian Heimes) Date: Sun, 22 Apr 2007 21:00:28 +0200 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: <462AA674.1020803@improva.dk> References: <462A73B8.4080406@improva.dk> <462A9535.3010600@ronadam.com> <462AA674.1020803@improva.dk> Message-ID: Jacob Holm wrote: > Therefore I would like to keep that door open by *not* adding the > proposed __main__ variable at this point. Fortunately, the people that > matter here seem to think avoiding the extra variable is a good idea > (although for different reasons). +1 from me Christian From jimjjewett at gmail.com Sun Apr 22 21:26:43 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 22 Apr 2007 15:26:43 -0400 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: <20070419235504.6374.JCARLSON@uci.edu> <20070421135948.63AF.JCARLSON@uci.edu> Message-ID: On 4/22/07, Steven Bethard wrote: > On 4/22/07, Christian Heimes wrote: > > I'm proposing the following changes: > > * sys.main is added which contains the dotted name of the main > > script. This allows code like: > > if __name__ == sys.main: > Note that this really requires the code:: > import sys > if __name__ == sys.main: As long as we're in python-ideas, I'll throw out the radical suggestion of auto-importing sys into builtins, the way os autoimports path. -jJ From steven.bethard at gmail.com Sun Apr 22 22:56:09 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Sun, 22 Apr 2007 14:56:09 -0600 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: <20070419235504.6374.JCARLSON@uci.edu> <20070421135948.63AF.JCARLSON@uci.edu> Message-ID: On 4/22/07, Jim Jewett wrote: > On 4/22/07, Steven Bethard wrote: > > On 4/22/07, Christian Heimes wrote: > > > > I'm proposing the following changes: > > > > * sys.main is added which contains the dotted name of the main > > > script. This allows code like: > > > > if __name__ == sys.main: > > > Note that this really requires the code:: > > > import sys > > if __name__ == sys.main: > > As long as we're in python-ideas, I'll throw out the radical > suggestion of auto-importing sys into builtins, the way os autoimports > path. While that would address my concern, I wonder if adding sys to the builtins is really any better than adding __main__ to the builtins. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From lists at cheimes.de Sun Apr 22 23:08:57 2007 From: lists at cheimes.de (Christian Heimes) Date: Sun, 22 Apr 2007 23:08:57 +0200 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: <20070419235504.6374.JCARLSON@uci.edu> <20070421135948.63AF.JCARLSON@uci.edu> Message-ID: Steven Bethard wrote: > While that would address my concern, I wonder if adding sys to the > builtins is really any better than adding __main__ to the builtins. If I understand the proposal right then __main__ won't be a builtin. Each module would get a new global variable __main__ which is set either to True or False. Also I consider sys kinda reserved for the sys module while the __main__ global var approach would reserve a new name that I like to see used for something else. +0.25 for sys in builtins Christian From greg.ewing at canterbury.ac.nz Mon Apr 23 00:18:22 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 23 Apr 2007 10:18:22 +1200 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: <462BDF2E.1070004@canterbury.ac.nz> Steven Bethard wrote: > if there's nothing to be passed to the function, why make it a > function at all? I don't usually like to put big lumps of init code at the module level, because it pollutes the module namespace with local variables. So I typically end up with def main(): ... ... ... if __name__ == "__main__": main() So I'd be quite happy if I could just define a function called __main__() and be done with. I don't understand why there's so much opposition to that idea. -- Greg From george.sakkis at gmail.com Mon Apr 23 00:49:49 2007 From: george.sakkis at gmail.com (George Sakkis) Date: Sun, 22 Apr 2007 18:49:49 -0400 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: <462BDF2E.1070004@canterbury.ac.nz> References: <462BDF2E.1070004@canterbury.ac.nz> Message-ID: <91ad5bf80704221549g105a61f8p2dca945e1895d2db@mail.gmail.com> On 4/22/07, Greg Ewing wrote: > Steven Bethard wrote: > > if there's nothing to be passed to the function, why make it a > > function at all? > > I don't usually like to put big lumps of init code > at the module level, because it pollutes the module > namespace with local variables. So I typically end > up with > > def main(): > ... > ... > ... > > if __name__ == "__main__": > main() > > So I'd be quite happy if I could just define a > function called __main__() and be done with. I > don't understand why there's so much opposition > to that idea. +1. Although I may start out at the module level, that's typically the idiom I use eventually for any non-trivial (e.g. more than 1-2 lines) main*. George * Only exception is if the module consists essentially of main(), i.e. a small standalone script without classes, functions, etc. From aahz at pythoncraft.com Mon Apr 23 01:09:26 2007 From: aahz at pythoncraft.com (Aahz) Date: Sun, 22 Apr 2007 16:09:26 -0700 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: <20070419235504.6374.JCARLSON@uci.edu> <20070421135948.63AF.JCARLSON@uci.edu> Message-ID: <20070422230926.GB7208@panix.com> On Sat, Apr 21, 2007, Steven Bethard wrote: > > Note that the one benefit the sys.main-only variant doesn't have is > the lower cognitive load of just having to know about __main__, > instead of having to know about __name__, import and sys.main. >From my POV that is indeed a lower cognitive load because all I need to remember is to look in the docs for the sys module -- everything else is there. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "...string iteration isn't about treating strings as sequences of strings, it's about treating strings as sequences of characters. The fact that characters are also strings is the reason we have problems, but characters are strings for other good reasons." --Aahz From aahz at pythoncraft.com Mon Apr 23 01:11:38 2007 From: aahz at pythoncraft.com (Aahz) Date: Sun, 22 Apr 2007 16:11:38 -0700 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: <20070419235504.6374.JCARLSON@uci.edu> <20070421135948.63AF.JCARLSON@uci.edu> Message-ID: <20070422231138.GC7208@panix.com> On Sun, Apr 22, 2007, Steven Bethard wrote: > On 4/22/07, Christian Heimes wrote: >> >> I'm proposing the following changes: >> >> * sys.main is added which contains the dotted name of the main script. >> This allows code like: >> >> if __name__ == sys.main: >> ... > > Note that this really requires the code:: > > import sys > if __name__ == sys.main: > > The import statement matters to me because 77% of my modules that use > the __main__ idiom *don't* import sys. Hence, for those modules, this > new idiom introduces more boilerplate. Does this follow the axiom that 83% of all statistics are made up on the spot? ;-) Seriously, if I'm writing a script that requires __main__, chances are excellent that it already includes sys (because it's probably a command-line script that's graduating to module status). -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "...string iteration isn't about treating strings as sequences of strings, it's about treating strings as sequences of characters. The fact that characters are also strings is the reason we have problems, but characters are strings for other good reasons." --Aahz From rrr at ronadam.com Mon Apr 23 01:50:17 2007 From: rrr at ronadam.com (Ron Adam) Date: Sun, 22 Apr 2007 18:50:17 -0500 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: <20070419235504.6374.JCARLSON@uci.edu> <20070421135948.63AF.JCARLSON@uci.edu> Message-ID: <462BF4B9.40403@ronadam.com> Jim Jewett wrote: > As long as we're in python-ideas, I'll throw out the radical > suggestion of auto-importing sys into builtins, the way os autoimports > path. +1 I thought that this was discussed before and had gotten general approval. Also it makes sense to me to have additional entries in sys to identify the starting main, and the root package modules. I also like the idea of having a way to say this_module. if __module__ is sys.__main__: ... Notice names aren't used this way, which is generally how you would compare any object in python. You wouldn't try to get it's name and then compare that to the name of another object. Ron From ntoronto at cs.byu.edu Mon Apr 23 01:51:26 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Sun, 22 Apr 2007 17:51:26 -0600 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: <20070419235504.6374.JCARLSON@uci.edu> <20070421135948.63AF.JCARLSON@uci.edu> Message-ID: <462BF4FE.1050103@cs.byu.edu> Steven Bethard wrote: > On 4/22/07, Jim Jewett wrote: > >> # Equivalent to today >> if __name__ == sys.modules["__main__"].__name__: >> >> # Better than today >> if __name__ is sys.modules["__main__"].__name__: >> >> # What I would like (pending PEP I hope to write tonight) >> if __this_module__ is sys.modules["__main__"]: >> > > Is it just me, or are the proposals starting to look more and more like:: > > public static void main(String args[]) > > I think this PEP now needs to explicitly state that keeping the "am I > the main module?" idiom as simple as possible is *not* a goal. Because > everything I've seen (except for the original proposals in the PEP) > are substantially more complicated than the current:: > > if __name__ == '__main__': > > I guess I don't understand why we wouldn't be willing to put up with a > new module attribute or builtin to minimize the boilerplate in pretty > much every Python application out there. > Agreed - it's getting horrid. As Pythonic as they think this is, they're completely forgetting the newb. So let's look at it from his point of view. Say I'm a Python newb. I've written some modules and some executable Python scripts and I'm somewhat comfy with the language. (Of course, it only took me about two hours to get comfy - this is Python, after all.) I now want to write either: 1) A module that runs unit tests when it is run as a script, but not when it's just imported; or 2) A script that can be imported as a module when I need a few of its functions. (I should really split them into another module, but this is a use case.) Now I have to import sys? Never seen that one... okay. Imported. Now, what's this Greek I have to write to test whether the script is the main script? How am I supposed to remember this? This is worse than fork()! On the other hand, IMNSHO, either of the following two are just about perfect in terms of understandability, and parsimony: def __main__(): # we really don't need args here # stuff if __main__: # stuff Chances are, the first will be very familiar, but refreshing that it's just a plain old, gibberish-free function. Both are easier than what we've got currently. (IMO, the first is better, because 1) the code can be put anywhere in the module; 2) it automatically doesn't pollute the global namespace; and 3) it's less boilerplate for complex modules and no more boilerplate for simple ones.) FWIW, I don't see a problem with a sys.modules['__main__'] - it would even occasionally be useful - but nobody should be *required* to use an abomination like that for what's clearly a newbie task: determining whether a module is run as a script. Neil From adam at atlas.st Mon Apr 23 02:00:44 2007 From: adam at atlas.st (Adam Atlas) Date: Sun, 22 Apr 2007 20:00:44 -0400 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: <462BF4B9.40403@ronadam.com> References: <20070419235504.6374.JCARLSON@uci.edu> <20070421135948.63AF.JCARLSON@uci.edu> <462BF4B9.40403@ronadam.com> Message-ID: On 22 Apr 2007, at 19.50, Ron Adam wrote: > I also like the idea of > having a way to say this_module. > > if __module__ is sys.__main__: > ... Agreed... I suggested something like that a couple of days ago (except assuming __main__ would be a builtin global instead of in sys). I proposed __this__ as the name for accessing the current module. Mainly because I like the Englishlike way it reads: "if __this__ is __main__". 'If this is main' -- couldn't be simpler. Though I'd also be fine with sys.__main__ or sys.main (I'd prefer the latter). I would support having sys be an automatic global. From lists at cheimes.de Mon Apr 23 02:35:14 2007 From: lists at cheimes.de (Christian Heimes) Date: Mon, 23 Apr 2007 02:35:14 +0200 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: <462BF4FE.1050103@cs.byu.edu> References: <20070419235504.6374.JCARLSON@uci.edu> <20070421135948.63AF.JCARLSON@uci.edu> <462BF4FE.1050103@cs.byu.edu> Message-ID: Neil Toronto wrote: > On the other hand, IMNSHO, either of the following two are just about > perfect in terms of understandability, and parsimony: > > def __main__(): # we really don't need args here > # stuff I think __main__(*argv) has some benefits over __main__(). It allows you to call the function with different arguments from another script or a unit test. def __main__(argv=None): if argv is None: argv = sys.argv # has the same effect but it is ugly > FWIW, I don't see a problem with a sys.modules['__main__'] - it would > even occasionally be useful - but nobody should be *required* to use an > abomination like that for what's clearly a newbie task: determining > whether a module is run as a script. I see the problem in having the same module under two names in sys.modules. It may lead to issues (reload?). Also it is not necessary to get the main module if we store the dotted name in sys.main. So sys.modules[sys.main] would return the main module. Christian From steven.bethard at gmail.com Mon Apr 23 03:18:51 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Sun, 22 Apr 2007 19:18:51 -0600 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: <20070422231138.GC7208@panix.com> References: <20070419235504.6374.JCARLSON@uci.edu> <20070421135948.63AF.JCARLSON@uci.edu> <20070422231138.GC7208@panix.com> Message-ID: On 4/22/07, Aahz wrote: > On Sun, Apr 22, 2007, Steven Bethard wrote: > > On 4/22/07, Christian Heimes wrote: > >> > >> I'm proposing the following changes: > >> > >> * sys.main is added which contains the dotted name of the main script. > >> This allows code like: > >> > >> if __name__ == sys.main: > >> ... > > > > Note that this really requires the code:: > > > > import sys > > if __name__ == sys.main: > > > > The import statement matters to me because 77% of my modules that use > > the __main__ idiom *don't* import sys. Hence, for those modules, this > > new idiom introduces more boilerplate. > > Does this follow the axiom that 83% of all statistics are made up on the > spot? ;-) Seriously, if I'm writing a script that requires __main__, > chances are excellent that it already includes sys (because it's > probably a command-line script that's graduating to module status). No, I actually went and counted in my local repository. There are two main reasons why that's true: (1) Most unittest modules just run unittest.main(), so no import of sys. (2) Most other modules use optparse or argparse, so no import of sys. Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From steven.bethard at gmail.com Mon Apr 23 03:24:43 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Sun, 22 Apr 2007 19:24:43 -0600 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: <462BDF2E.1070004@canterbury.ac.nz> References: <462BDF2E.1070004@canterbury.ac.nz> Message-ID: On 4/22/07, Greg Ewing wrote: > Steven Bethard wrote: > > if there's nothing to be passed to the function, why make it a > > function at all? > > I don't usually like to put big lumps of init code > at the module level, because it pollutes the module > namespace with local variables. So I typically end > up with > > def main(): > ... > ... > ... > > if __name__ == "__main__": > main() > > So I'd be quite happy if I could just define a > function called __main__() and be done with. I > don't understand why there's so much opposition > to that idea. I guess I'm just the odd one out here in that I parse my arguments before passing them to module-level functions. So my code normally looks like:: if __name__ == '__main__': ... a few lines of argument parsing code ... some_function_name(args.foo, args.bar, args.baz) That is, I do the argument parsing at the module level, and then call the module functions with more meaningful arguments than sys.argv. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From steven.bethard at gmail.com Mon Apr 23 03:28:19 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Sun, 22 Apr 2007 19:28:19 -0600 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: <20070422230926.GB7208@panix.com> References: <20070419235504.6374.JCARLSON@uci.edu> <20070421135948.63AF.JCARLSON@uci.edu> <20070422230926.GB7208@panix.com> Message-ID: On 4/22/07, Aahz wrote: > On Sat, Apr 21, 2007, Steven Bethard wrote: > > > > Note that the one benefit the sys.main-only variant doesn't have is > > the lower cognitive load of just having to know about __main__, > > instead of having to know about __name__, import and sys.main. > > From my POV that is indeed a lower cognitive load because all I need to > remember is to look in the docs for the sys module -- everything else is > there. As a newbie, you need to remember to lookup at least two things: __name__ and sys.main. As compared to having to lookup just __main__. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From brett at python.org Mon Apr 23 03:46:39 2007 From: brett at python.org (Brett Cannon) Date: Sun, 22 Apr 2007 18:46:39 -0700 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: I revised the PEP to use the sys.main idea and sent it off to python-3000. If you care to participate in the discussion please move it over there. Thanks to everyone who contributed to the discussion. I really appreciate the help! -Brett From steven.bethard at gmail.com Mon Apr 23 04:30:18 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Sun, 22 Apr 2007 20:30:18 -0600 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: On 4/22/07, Brett Cannon wrote: > I revised the PEP to use the sys.main idea and sent it off to > python-3000. Just wanted to say thanks Brett for putting the time into this! Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From jimjjewett at gmail.com Mon Apr 23 05:05:14 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 22 Apr 2007 23:05:14 -0400 Subject: [Python-ideas] PEP 30xx: Access to Module/Class/Function Currently Being Defined (this) Message-ID: (Please note that several groups were Cc'd. For now, please limit followups to python-3000. This would *probably* be backported to 2.6, but that wouldn't be decided until the implementation strategy was settled.) PEP: 30XX Title: Access to Module/Class/Function Currently Being Defined (this) Version: $Revision$ Last-Modified: $Date$ Author: Jim J. Jewett Status: Draft Type: Standards Track Content-Type: text/plain Created: 22-Apr-2007 Python-Version: 3.0 Post-History: 22-Apr-2007 Abstract It is common to need a reference to the current module, class, or function, but there is currently no entirely correct way to do this. This PEP proposes adding the keywords __module__, __class__, and __function__. Rationale Many modules export various functions, classes, and other objects, but will perform additional activities (such as running unit tests) when run as a script. The current idiom is to test whether the module's name has been set to magic value. if __name__ == "__main__": ... More complicated introspection requires a module to (attempt to) import itself. If importing the expected name actually produces a different module, there is no good workaround. Proposal: Add a __module__ keyword which refers to the module currently being defined (executed). (But see open issues.) if __module__ is sys.main: ... # assuming PEP 3020, Cannon Class methods are passed the current instance; from this they can determine self.__class__ (or cls, for classmethods). Unfortunately, this reference is to the object's actual class, which may be a subclass of the defining class. The current workaround is to repeat the name of the class, and assume that the name will not be rebound. class C(B): def meth(self): super(C, self).meth() # Hope C is never rebound. class D(C): def meth(self): super(C, self).meth() # ?!? issubclass(D,C), so it "works" Proposal: Add a __class__ keyword which refers to the class currently being defined (executed). (But see open issues.) class C(B): def meth(self): super(__class__, self).meth() Note that super calls may be further simplified by PEP 30XX, Jewett. The __class__ (or __this_class__) attribute came up in attempts to simplify the explanation and/or implementation of that PEP, but was separated out as an independent decision. Note that __class__ (or __this_class__) is not quite the same as the __thisclass__ property on bound super objects. The existing super.__thisclass__ property refers to the class from which the Method Resolution Order search begins. In the above class D, it would refer to (the current reference of name) C. Functions (including methods) often want access to themselves, usually for a private storage location. While there are several workarounds, all have their drawbacks. def counter(_total=[0]): # _total shouldn't really appear in the _total[0] += 1 # signature at all; the list wrapping and return _total[0] # [0] unwrapping obscure the code @annotate(total=0) def counter(): counter.total += 1 # Assume name counter is never rebound return counter.total class _wrap(object): # class exists only to provide storage __total=0 def f(self): self.__total += 1 return self.__total accum=_wrap().f # set module attribute to a bound method Proposal: Add a __function__ keyword which refers to the function (or method) currently being defined (executed). (But see open issues.) @annotate(total=0) def counter(): __function__.total += 1 # Always refers to this function obj return __function__.total Backwards Compatibility While a user could be using these names already, __anything__ names are explicitly reserved to the interpreter. It is therefore acceptable to introduce special meaning to these names within a single feature release. Implementation Ideally, these names would be keywords treated specially by the bytecode compiler. Guido has suggested [1] using a cell variable filled in by the metaclass. Michele Simionato has provided a prototype using bytecode hacks [2]. Open Issues - Are __module__, __class__, and __function__ the right names? In particular, should the names include the word "this", either as __this_module__, __this_class__, and __this_function__, (format discussed on the python-3000 and python-ideas lists) or as __thismodule__, __thisclass__, and __thisfunction__ (inspired by, but conflicting with, current usage of super.__thisclass__). - Are all three keywords needed, or should this enhancement be limited to a subset of the objects? Should methods be treated separately from other functions? References [1] Fixing super anyone? Guido van Rossum http://mail.python.org/pipermail/python-3000/2007-April/006671.html [2] Descriptor/Decorator challenge, Michele Simionato http://groups.google.com/group/comp.lang.python/browse_frm/thread/a6010c7494871bb1/62a2da68961caeb6?lnk=gst&q=simionato+challenge&rnum=1&hl=en#62a2da68961caeb6 Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From adam at atlas.st Mon Apr 23 05:22:36 2007 From: adam at atlas.st (Adam Atlas) Date: Sun, 22 Apr 2007 23:22:36 -0400 Subject: [Python-ideas] Object adaptation and interfaces and so forth Message-ID: <0F15F5F9-5104-4D02-A708-EC3FFDE4047F@atlas.st> (Not exactly an idea post, but I don't want to bother python-dev or python-3000 with this.) PEP 246 was rejected a year or so ago, and Guido's rejection note stated "Something much better is about to happen; it's too early to say exactly what, but it's not going to resemble the proposal in this PEP." Does anyone know if anything has gone on with this concept since then? It seems like it has a lot of really interesting potential, although I do see why PEP 246's specific proposal was rejected. It's just the "Something much better is about to happen" that got me curious -- is it happening yet? :) From jimjjewett at gmail.com Mon Apr 23 05:33:57 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 22 Apr 2007 23:33:57 -0400 Subject: [Python-ideas] Object adaptation and interfaces and so forth In-Reply-To: <0F15F5F9-5104-4D02-A708-EC3FFDE4047F@atlas.st> References: <0F15F5F9-5104-4D02-A708-EC3FFDE4047F@atlas.st> Message-ID: I think the rejection was refering to parameter annotations. def f(arg1:int, arg2:"woot the bounding main"): ... In combination with decorations, this can provide adaptation. -jJ On 4/22/07, Adam Atlas wrote: > (Not exactly an idea post, but I don't want to bother python-dev or > python-3000 with this.) > > PEP 246 was rejected a year or so ago, and Guido's rejection note > stated "Something much better is about to happen; it's too early to > say exactly what, but it's not going to resemble the proposal in this > PEP." Does anyone know if anything has gone on with this concept > since then? It seems like it has a lot of really interesting > potential, although I do see why PEP 246's specific proposal was > rejected. It's just the "Something much better is about to happen" > that got me curious -- is it happening yet? :) From talin at acm.org Mon Apr 23 05:45:29 2007 From: talin at acm.org (Talin) Date: Sun, 22 Apr 2007 20:45:29 -0700 Subject: [Python-ideas] Object adaptation and interfaces and so forth In-Reply-To: <0F15F5F9-5104-4D02-A708-EC3FFDE4047F@atlas.st> References: <0F15F5F9-5104-4D02-A708-EC3FFDE4047F@atlas.st> Message-ID: <462C2BD9.5060004@acm.org> Adam Atlas wrote: > (Not exactly an idea post, but I don't want to bother python-dev or > python-3000 with this.) > > PEP 246 was rejected a year or so ago, and Guido's rejection note > stated "Something much better is about to happen; it's too early to > say exactly what, but it's not going to resemble the proposal in this > PEP." Does anyone know if anything has gone on with this concept > since then? It seems like it has a lot of really interesting > potential, although I do see why PEP 246's specific proposal was > rejected. It's just the "Something much better is about to happen" > that got me curious -- is it happening yet? :) Reminds me of that scene from 2010: Dave Bowman: You see, something's going to happen. You must leave. Heywood Floyd: What? What's going to happen? Dave Bowman: Something wonderful. Heywood Floyd: What? Dave Bowman: I understand how you feel. You see, it's all very clear to me now. The whole thing. It's wonderful. The answer to your question is "yes", although it's happening in very small stages. Specifically, Python 3000'a argument decorators and abstract base classes are laying the groundwork for an adaption system via generic functions. Argument decorators make declaring of generic functions much less cumbersome than was previously possible. And abstract base classes give the generic functions something to work on that is more general than merely working on concrete types - it provides a way to reason about types in a duck-typing world. What happens next is that there will be various 3rd party implementations of generic function dispatch which will be based on those two things. Phillip J. Eby has already stated that he is interested in creating a kind of reference implementation that incorporates most of the interesting features, however his need not be the only one. These generic function dispatchers, working off of both concrete and abstract types can be used to implement object adaptation in various ways (If anyone wants to supply some concrete examples here, please be my guest.) -- Talin From brett at python.org Mon Apr 23 05:58:02 2007 From: brett at python.org (Brett Cannon) Date: Sun, 22 Apr 2007 20:58:02 -0700 Subject: [Python-ideas] PEP for executing a module in a package containing relative imports In-Reply-To: References: Message-ID: On 4/22/07, Steven Bethard wrote: > On 4/22/07, Brett Cannon wrote: > > I revised the PEP to use the sys.main idea and sent it off to > > python-3000. > > Just wanted to say thanks Brett for putting the time into this! > Welcome. I am just glad I got the email off literally 15 minutes or so before my laptop died. So if the hard drive is gone at least I have the latest version still. =) From ironfroggy at gmail.com Mon Apr 23 17:44:48 2007 From: ironfroggy at gmail.com (Calvin Spealman) Date: Mon, 23 Apr 2007 11:44:48 -0400 Subject: [Python-ideas] partial with skipped arguments In-Reply-To: <43aa6ff70704220914l489de151ta648c485155446f8@mail.gmail.com> References: <76fd5acf0704212114y22db7080je41e378c6ceaa994@mail.gmail.com> <43aa6ff70704220914l489de151ta648c485155446f8@mail.gmail.com> Message-ID: <76fd5acf0704230844t39ed2f47oe0df07d6e2915cf1@mail.gmail.com> On 4/22/07, Collin Winter wrote: > On 4/21/07, Calvin Spealman wrote: > > I often wish you could bind to arguments in a partial out of order, > > skipping some positionals. The solution I came up with is a singleton > > object located as an attribute of the partial function itself and used > > like this: > > > > def foo(a, b): > > return a / b > > pf = partial(foo, partial.skip, 2) > > assert pf(1.0) == 0.5 > > In Python 2.5.0: > > >>> import functools > >>> def f(a, b): > ... return a + b > ... > >>> p = functools.partial(f, b=9) > >>> p > > >>> p(3) > 12 > >>> > > Is this what you're looking for? > > Collin Winter > More or less but that posses two problems that I mentioned previously: 1) Relying on the names of position arguments does not feel right. 2) Buitin and extension functions don't work with that because you can't pass positionals to them by name. Besides, its a good excersize for me to finally get into any moderately real hacking of CPython. I'm working on the patch right now, one way or the other. -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://ironfroggy-code.blogspot.com/ From ironfroggy at gmail.com Tue Apr 24 16:04:25 2007 From: ironfroggy at gmail.com (Calvin Spealman) Date: Tue, 24 Apr 2007 10:04:25 -0400 Subject: [Python-ideas] Removing instancemethod in favor of partial? Message-ID: <76fd5acf0704240704m1b38d39fq189bd6e8f1f511ed@mail.gmail.com> Hey, why not? They do basically the same thing, except instancemethod allows only a single argument. Why not allow class and instance methods to be wrapped with a partial instead of their own type? We can rip out 300 lines of C code supporting instance method, at least. The only thorn is the im_class attribute, but few seem to even use it (few meaning just Twisted, according to Google Code Search). Anyway, I figure we don't really need it anyway because if some of the proposals for a way to reliably get the current function, class, or module go through, perhaps we'll have a reference to the class from the function itself. What does anyone think? -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://ironfroggy-code.blogspot.com/ From grosser.meister.morti at gmx.net Tue Apr 24 18:32:24 2007 From: grosser.meister.morti at gmx.net (=?ISO-8859-15?Q?Mathias_Panzenb=F6ck?=) Date: Tue, 24 Apr 2007 18:32:24 +0200 Subject: [Python-ideas] Sandbox? Message-ID: <462E3118.1040600@gmx.net> Are there any plans on a sandbox for python 3.0? Just wondering. -panzi From brett at python.org Tue Apr 24 19:54:39 2007 From: brett at python.org (Brett Cannon) Date: Tue, 24 Apr 2007 10:54:39 -0700 Subject: [Python-ideas] Removing instancemethod in favor of partial? In-Reply-To: <76fd5acf0704240704m1b38d39fq189bd6e8f1f511ed@mail.gmail.com> References: <76fd5acf0704240704m1b38d39fq189bd6e8f1f511ed@mail.gmail.com> Message-ID: On 4/24/07, Calvin Spealman wrote: > Hey, why not? They do basically the same thing, except instancemethod > allows only a single argument. Why not allow class and instance > methods to be wrapped with a partial instead of their own type? We can > rip out 300 lines of C code supporting instance method, at least. The > only thorn is the im_class attribute, but few seem to even use it (few > meaning just Twisted, according to Google Code Search). Anyway, I > figure we don't really need it anyway because if some of the proposals > for a way to reliably get the current function, class, or module go > through, perhaps we'll have a reference to the class from the function > itself. What does anyone think? > Huh, interesting idea. In theory it seems fine. I would want to know if any performance penalty exists from this first, though. -Brett From brett at python.org Tue Apr 24 19:56:37 2007 From: brett at python.org (Brett Cannon) Date: Tue, 24 Apr 2007 10:56:37 -0700 Subject: [Python-ideas] Sandbox? In-Reply-To: <462E3118.1040600@gmx.net> References: <462E3118.1040600@gmx.net> Message-ID: On 4/24/07, Mathias Panzenb?ck wrote: > Are there any plans on a sandbox for python 3.0? > Just wondering. Specifically no. I guess my security work is the closest thing to a sandbox in the pipeline (http://sayspy.blogspot.com/2007/04/python-security-paper-online.html). But I don't know if it is going to make it into Python 3 or not. -Brett From talin at acm.org Thu Apr 26 07:59:44 2007 From: talin at acm.org (Talin) Date: Wed, 25 Apr 2007 22:59:44 -0700 Subject: [Python-ideas] The case against static type checking, in detail (long) Message-ID: <46303FD0.1020504@acm.org> (This is a fragment of an email that I sent to Guido earlier, I mention this here so that Guido can skip reading it. Of course, I recognize that most people here already know all this - but since it relates to the recent discussion about the value of type checking, I'd like to post it here as a kind of "manifesto" of why Python is the way it is.) Strongly-typed languages such as C++ and Java require up-front declaration of everything. It is the nature of such languages that there is a lot of cross-checking in the compiler between the declaration of a thing and its use. The idea is to prevent programmer errors by insuring internal consistency. However, we find in practice that much of the programmer's effort is spent in maintaining this cross-checking structure. To use a building analogy, a statically-typed program is like a truss structure, where there's a complex web of cross-braces, and a force applied at any given point is spread out over the whole structure. Each time such a program is modified, the programmer must partially dismantle and then re-assemble the existing structure. This takes time. It also clutters the code. Reading the source code of a program written in a statically typed language reveals that a substantial part of the text serves only to support the compile-time checking of definitions, or provides visual redundancy to aid the programmer in connecting two aspects of the program which are defined far apart. An example of what I mean is the use of variable type declarations - even in a statically typed language, it would be fairly easy for the compiler to automatically infer most variable types if the language were designed that way; The fact that the programmer is required to manually specify these types serves as an additional consistency check on the code. However, time spend serving the needs of these consistency checks is time away from actually serving the functional purpose of the code. Programmers in Python, on the other hand, not only need not worry about type declarations, they also spend much less time worrying about converting from one type to another in order to meet the constraints of a particular API. This is one of the reasons why I can generally write Python code about 4 times as fast as the C++ equivalent. (Understand that this is coming from someone who loves working in C++ and Java and has used them daily for the last 15 years. At the same time, however, I also enjoy programming in Python and I recognize that each language has their strengths.) There is also the question of how much static typing helps improve program reliability. In statically typed languages, there are two kinds of ways that types are used. In languages such as C and Pascal, the type declarations serve primarily as a consistency check. However, in C++ template metaprogramming, and in languages like Haskell, there is a second use for types, which is to provide a kind of type calculus or type inferencing, which gives additional expressive power to the language. C++ templates can act as powerful code generators, allowing the programmer to program in ever higher levels of abstraction and express the basic ideas even more succinctly and clearly than before. In a rapid-prototyping environment, the second use of types can be a major productivity win; However I would argue that the first use of types, consistency checking, is less beneficial, and is often more of a distraction to the programmer than a help. Yes, static type checking does detect some errors; But it also causes errors by making the code larger and more wordy, because that the programmer cannot hold large portions of the program in their mind all at once, which can lead to errors in overall design. It means the programmer spends more time thinking about the behavior of individual variables and less about the algorithm as a whole. At this point, I want to talk about a related matter, another fundamental design aspect of Python which I call "decriminalization of minor errors". An example of this is illustrated by the recent discussion over string slicing. As you know, when you attempt to index a string with a slice object that extends outside of the bounds of the string, the range is silently truncated. Some argued that Python should be more strict, and report an error when this occurs - but instead, it was reaffirmed that the current behavior is correct. I would agree that this current behavior is the more Pythonic, and is part of a general pattern, which I shall attempt to describe: To "decriminalize" an error means to find some alternative interpretation of the programmer's intent that produces a non-error result. That is, given a syntactical construct, and a choice of several interpretations of what that construct should mean, attempt to pick an interpretation that, when executed, does not produce an error. In the design of the Python language, it is a regular practice to decriminalize minor errors, provided that the alternative interpretation can meet some fairly strict criteria: That it is useful, intuitive, reasonable, and pedagogically sound. Note that this is a much more conservative rule than that used by languages such as Rexx, Javascript, and Perl, languages which make "heroic efforts" to bend the interpretation of an operation to a non-error result. Python does not do this. Nor is decriminalizing errors isn't the same as ignoring errors. Errors are still, and should be, enforced vigorously and reported. The distinction is that decriminalizing an error results in code that produces a useful, logical result, whereas ignoring errors results in code that produces garbage or nothing. Decriminalization comes about when we broaden our definitions of what is the correct result of a given operation. A couple of other examples of decriminalization: 1) there are languages in which the only valid argument for an 'if' statement is a boolean. Attempts to say "if 0" are errors. In Python we relax that rule, allowing any type to be used as the argument to an if-statement. We do this by having a broader interpretation of what it means to test an object for 'trueness', and allow 'trueness' to be implied by 'non-emptiness'. 2) Duck-typing is a decriminalization of the error that polymorphic types are required to inherit from a common interface. It also decriminalizes "missing methods", as long as those methods are never called. Again, this is due to having a broader interpretation of 'polymorphism'. (In fact, this aspect of Python is so fundamental, that I think that it deserves its own acronym alongside TOOWTDI and others, but I can't think of a short, pithy description of it. Maybe IOANEIR - "Interpret operations as non-errors if reasonable.") Both static typing and decriminalization serve the same end - telling the programmer "don't sweat the small stuff". Both are very helpful and powerful, because they allow programmers to spend much less time worrying about minor error cases, things that would have to be checked for in C++. Python code is simply more *concise* than the C++ equivalent, yet it achieves this without being terse and cryptic, because the text of a Python program more closely embodies the "essence" of an algorithm, uncluttered by other agendas. The price we pay for this, of course, is that sometimes errors show up much later (like, after ship) than they would have otherwise. But unit testing can catch a lot of the same errors. And in many cases, the seriousness of such errors depends on what we mean by "ship". It's one thing to discover a fatal error after you've pressed thousands of CDs and shipped them all over the world; It's a much different matter if the program has the ability to automatically update itself, or is downloaded from some kind of subscription model such as a package manager. In many environments, it is far more important to get something done quickly and validate the general concept, than it is to insure that the code is 100% correct. In other words, if it would take you 6 months to write it in a statically typed language, but only 2 months to write it in a dynamic language - well, that's 4 extra months you have to write unit tests and make sure it's right! And in the mean time, you can have real users banging on the code and making sure of something that is far more important, which is whether what you wrote is the right thing at all. -- Talin From terry at jon.es Thu Apr 26 02:52:16 2007 From: terry at jon.es (Terry Jones) Date: Thu, 26 Apr 2007 02:52:16 +0200 Subject: [Python-ideas] Minor suggestion for unittest Message-ID: <17967.63424.413682.569882@terry-jones-computer.local> There's a simple change that could be made to unittest that would make it easier to automate some forms of testing. I want to be able to dynamically add tests to an instance of a class derived from unittest.TestCase. There are occasions when I don't want to write my tests upfront in a Python file. E.g., given a bunch of test/expectedResult data sitting around (below in a variable named MyTestData), it would be nice to be able to do the following (untested here, but I did it earlier for real and it works fine): import unittest class Test(unittest.TestCase): def runTest(): pass suite = unittest.TestSuite() for testFunc, expectedResult in MyTestData: newTestFuncName = 'dynamic-test-' + testFunc.__name__ def tester(): self.assertEqual(testFunc(), expectedResult) test = Test() setattr(test, newTestFuncName, tester) # Set the class instance up so that it will be the one run. test.__init__(newTestFuncName) # ugh! suite.addTest(test) suite.run() The explicit call to __init__ (marked ugh!) is ugly, dangerous, etc. You could also say test._testMethodName = newTestFuncName (and set _testMethodDoc too), but that's also ugly. This would all be very simple though if instead of starting out like: class TestCase: def __init__(self, methodName='runTest'): try: self._testMethodName = methodName testMethod = getattr(self, methodName) self._testMethodDoc = testMethod.__doc__ except AttributeError: raise ValueError, "no such test method in %s: %s" % \ (self.__class__, methodName) unittest.TestCase started out like this: class TestCase: def __init__(self, methodName='runTest'): self.setTestMethod(methodName) def setTestMethod(self, methodName): try: self._testMethodName = methodName testMethod = getattr(self, methodName) self._testMethodDoc = testMethod.__doc__ except AttributeError: raise ValueError, "no such test method in %s: %s" % \ (self.__class__, methodName) That would allow people to create an instance of their Test class, add a method to it using setattr, and then use setTestMethod to set the method to be run. A further improvement would be to have _testMethodName be None or left undefined (and accessed via __getattr__) for as long as possible rather than being set to runTest (and looked up with getattr) immediately. That would allow the removal of the do-nothing runTest method in the above. No old code need be broken as runTest would still be the default. You'd just have a chance to get in there earlier so it never saw the light of day. Programmers like to automate things, especially testing. These changes don't break any existing code but they allow additional test automation. Of course you _could_ achieve the above by writing out a brand new temp.py file, running it, and so on, but that's not very Pythonic, is a bunch more work, needs cleanup (temp.py needs to go away), etc. I have some further thoughts about how to make this a bit more flexible, but I'll save those for later, supposing there's any interest in the above. Terry Jones From bjourne at gmail.com Thu Apr 26 15:22:47 2007 From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=) Date: Thu, 26 Apr 2007 06:22:47 -0700 Subject: [Python-ideas] ordered dict In-Reply-To: <20070421112051.63AB.JCARLSON@uci.edu> References: <740c3aec0704210817m3e11a9e5lb3325523e2490348@mail.gmail.com> <20070421112051.63AB.JCARLSON@uci.edu> Message-ID: <740c3aec0704260622m46dfd3aav56025918408e3935@mail.gmail.com> On 4/21/07, Josiah Carlson wrote: > > "BJ?rn Lindqvist" wrote: > > On 4/21/07, Terry Reedy wrote: > > > But it *is* currently a problem for lists that will become much more > > > extensive in the future, so it *is* currently a problem for sorted dicts > > > that will be much more of a problem in the future. Hence, sorted dicts > > > will have to be restricted to one type or one group of truly comparable > > > types. > > > > Alternatively, you could require a comparator function to be specified > > at creation time. > > You could, but that would imply a total ordering on elements that Python > itself is removing because it doesn't make any sense. Including a list > of 'acceptable' classes as Terry has suggested would work, but would > generally be superfluous. The moment a user first added an object to > the sorted dictionary is the moment the type of objects that can be > inserted is easily limited (hello Abstract Base Classes PEP!) Where did the "we are all consenting adults here" mantra go? Java doesn't imply any total order on elements either, yet it manages to fit a TreeMap class that does not artificially limit the kind of items you can put in it. Yes, you can "screw up" by overriding the hashCode and equals methods of the items you put in it. Java, in this case, doesn't try to enforce correctness on the language level, instead it documents the contract the programmer is supposed to follow. m1.equals(m2) should imply that m1.hashCode() == m2.hashCode(). Python suffer the same "problem": class Obj: def __eq__(self, o): return 0 o1 = Obj() o2 = Obj() L = [o1, o2] assert L.index(o2) == 1 Similar fuck ups are possible when using dicts. In practice this is not a problem. An ordered dict doesn't need any more safeguards than Python's already existing data structures. Using the natural order of its items are just fine and when you need something more fancy, override the __eq__ method or give the collections sort method a comparator function argument. -- mvh Bj?rn From lists at cheimes.de Thu Apr 26 17:34:18 2007 From: lists at cheimes.de (Christian Heimes) Date: Thu, 26 Apr 2007 17:34:18 +0200 Subject: [Python-ideas] The case against static type checking, in detail (long) In-Reply-To: <46303FD0.1020504@acm.org> References: <46303FD0.1020504@acm.org> Message-ID: Wow that's a good posting. Can you put it on a website so I can show it to friends when they argue about dynamic typing sucks? :] Christian From collinw at gmail.com Thu Apr 26 17:45:01 2007 From: collinw at gmail.com (Collin Winter) Date: Thu, 26 Apr 2007 08:45:01 -0700 Subject: [Python-ideas] Minor suggestion for unittest In-Reply-To: <17967.63424.413682.569882@terry-jones-computer.local> References: <17967.63424.413682.569882@terry-jones-computer.local> Message-ID: <43aa6ff70704260845x3f5f6ef0hde320f5513ce7fde@mail.gmail.com> On 4/25/07, Terry Jones wrote: > import unittest > > class Test(unittest.TestCase): > def runTest(): pass > > suite = unittest.TestSuite() > > for testFunc, expectedResult in MyTestData: > newTestFuncName = 'dynamic-test-' + testFunc.__name__ > def tester(): > self.assertEqual(testFunc(), expectedResult) > test = Test() > setattr(test, newTestFuncName, tester) > # Set the class instance up so that it will be the one run. > test.__init__(newTestFuncName) # ugh! > suite.addTest(test) > > suite.run() It sounds like what you're looking for is FunctionTestCase (http://docs.python.org/lib/unittest-contents.html). Using that, your loop above becomes something like for testFunc, expectedResult in MyTestData: def tester(): self.assertEqual(testFunc(), expectedResult) suite.addTest(FunctionTestCase(tester)) Collin Winter From phd at phd.pp.ru Thu Apr 26 17:47:24 2007 From: phd at phd.pp.ru (Oleg Broytmann) Date: Thu, 26 Apr 2007 19:47:24 +0400 Subject: [Python-ideas] The case against static type checking, in detail (long) In-Reply-To: References: <46303FD0.1020504@acm.org> Message-ID: <20070426154724.GE13988@phd.pp.ru> On Thu, Apr 26, 2007 at 05:34:18PM +0200, Christian Heimes wrote: > Wow that's a good posting. Can you put it on a website so I can show it > to friends when they argue about dynamic typing sucks? :] At least it in the mailing list archive: http://mail.python.org/pipermail/python-ideas/2007-April/000552.html Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From ldlandis at gmail.com Thu Apr 26 18:00:16 2007 From: ldlandis at gmail.com (LD 'Gus' Landis) Date: Thu, 26 Apr 2007 11:00:16 -0500 Subject: [Python-ideas] What would you call such an object (was: ordered dict) Message-ID: Hi, I am wondering if in y'alls opinion the following is "just a dictionary" or is it a different kind of object that has some of the characteristics of a dictionary, and has order. If y'all think that it is "just a dictionary", then how does one override the notion of a "hash" for the key in a dictionary and make it some other ordered structure (e.g. a B-Tree, AVL, etc). (Please no flame toss to some other list -- this is a "use" of an ordered "ordered dict") I don't know what such a critter would be called (in Python). It has the name of "array" language where it is central, but don't want to go into that. The object has the following characteristics: - It is indexed by keys which are immutable (like dicts) - Each key has a single value (like dicts) - The keys are ordered (usually a B-Tree underneath) - The keys are "sorted" yielding a hierarchy such that (using Python tuples as an example and pseudo Python): object = { (0,): "value of node", (0,"name") : "name of node", (0,"name",1): "some data about name", (1,): "value of another node", (1,2,3): "some data value", (2,): 2, (2,2,"somekey",1): 32, (3,): 28, ("abc",1,2): 14 } - Introspection of the object allows walking the keys by hierarchy, using the above: key = object.order(None) -> 0 key = object.order(key) -> 1 key = object.order(key) -> 2 key = object.order(key) -> 3 key = object.order(key) -> "abc" key = object.order(key) -> None The first key is fetched when (None) is the initial key (or last key if modifier is -1) Supplying a modifier (-1, where 1 is default of forward, -1 is reverse) in the call traverses the keys in the reverse order from that shown above. - Introspection of the key results in: hasdata = object.data(key) =0 no subkeys no data for 'key' (in the above (39) would have no subkeys, no data) =1 no subkeys has data for 'key' (in the above (3) has no subkeys, but has data) =10 has subkeys no data for 'key' (in the above (2,2) has subkeys but no data) =11 has subkeys has data for 'key' (in the above (2) has subkeys and has data) - Introspection of object can yield "depth first" keys key = object.query(None) -> (0,) key = object.query(key) -> (0,"name") key = object.query(key) -> (0,"name",1) key = object.query(key) -> (1,) key = object.query(key) -> (1,2,3) key = object.query(key) -> (2,) key = object.query(key) -> (2,2,"somekey",1) key = object.query(key) -> (3,) key = object.query(key) -> ("abc",1,2) key = object.query(key) -> None Like object.order(), object.query() has the same "reverse" (using -1) option to walk the keys in a reverse order. - Having an iterator over order/query: for key in object.ordered([start[,end]): for key in object.queryed([start[,end]): (spelling?? other alternative) - Set/get of object[(0,"name")] = "new name of node" print object[(0,"name")] Cheers, --ldl -- LD Landis - N0YRQ - de la tierra del encanto 3960 Schooner Loop, Las Cruces, NM 88012 651/340-4007 N32 21'48.28" W106 46'5.80" "If a thing is worth doing, it is worth doing badly." ?GK Chesterton. An interpretation: For things worth doing: Doing them, even if badly, is better than doing nothing perfectly (on them). From jcarlson at uci.edu Thu Apr 26 18:17:03 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Thu, 26 Apr 2007 09:17:03 -0700 Subject: [Python-ideas] ordered dict In-Reply-To: <740c3aec0704260622m46dfd3aav56025918408e3935@mail.gmail.com> References: <20070421112051.63AB.JCARLSON@uci.edu> <740c3aec0704260622m46dfd3aav56025918408e3935@mail.gmail.com> Message-ID: <20070426090200.6427.JCARLSON@uci.edu> "BJ?rn Lindqvist" wrote: > On 4/21/07, Josiah Carlson wrote: > > > > "BJ?rn Lindqvist" wrote: > > > On 4/21/07, Terry Reedy wrote: > > > > But it *is* currently a problem for lists that will become much more > > > > extensive in the future, so it *is* currently a problem for sorted dicts > > > > that will be much more of a problem in the future. Hence, sorted dicts > > > > will have to be restricted to one type or one group of truly comparable > > > > types. > > > > > > Alternatively, you could require a comparator function to be specified > > > at creation time. > > > > You could, but that would imply a total ordering on elements that Python > > itself is removing because it doesn't make any sense. Including a list > > of 'acceptable' classes as Terry has suggested would work, but would > > generally be superfluous. The moment a user first added an object to > > the sorted dictionary is the moment the type of objects that can be > > inserted is easily limited (hello Abstract Base Classes PEP!) > > Where did the "we are all consenting adults here" mantra go? Java > doesn't imply any total order on elements either, yet it manages to > fit a TreeMap class that does not artificially limit the kind of items > you can put in it. Yes, you can "screw up" by overriding the hashCode > and equals methods of the items you put in it. Java, in this case, > doesn't try to enforce correctness on the language level, instead it > documents the contract the programmer is supposed to follow. > m1.equals(m2) should imply that m1.hashCode() == m2.hashCode(). At no point has there been discussion over removing the ability for types which don't have a total ordering to be placed into a dictionary (hash table). a = {1:2, None:6, 'hello':0} will always work. The only thing that anyone has talked about is the removal of >, >=, <, <= for types that make no sense to compare. Like unicode and tuple, or int and tuple, or int and list, etc. The removal of a "total ordering" does not imply that 5 != 'hello' will somehow start failing, it means that 5 < 'hello' will begin to raise an exception because it doesn't make any sense. > Similar fuck ups are possible when using dicts. In practice this is > not a problem. An ordered dict doesn't need any more safeguards than > Python's already existing data structures. Using the natural order of > its items are just fine and when you need something more fancy, > override the __eq__ method or give the collections sort method a > comparator function argument. Except this series of posts is about a "sorted dict", with a key,value mapping in which the equivalent .items() are sorted() as an ordering (rather than more or less dependant on hash value as in a standard dictionary). But as I, and others have stated before, which you should read once again because you don't seem to get it: THE EXISTANCE OF A TOTAL ORDERING ON VALUES IN PYTHON TODAY IS A LIE. IN FUTURE PYTHONS WE ARE REMOVING THE LIE BECAUSE IT DOESN'T HELP ANYONE. IF YOU DON'T LIKE IT; TOUGH COOKIES. STANDARD PYTHON DICTIONARIES WILL WORK THE WAY THEY ALWAYS HAVE. ONLY PEOPLE WHO BELIEVE THAT INCOMPATIBLE TYPES SHOULD BE ORDERED IN A PARTICULAR WAY IN THINGS LIKE lst.sort() WILL BE AFFECTED. If you want an actual reference, please see PEP 3100 which says, "Comparisons other than == and != between disparate types will raise an exception unless explicitly supported by the type" ... and references: http://mail.python.org/pipermail/python-dev/2004-June/045111.html If you don't understand this, please ask again without profanity or accusing the Python developers of removing the "consenting adults" requirement. Python is getting smarter. Maybe you just don't understand why this is the case. - Josiah From jcarlson at uci.edu Thu Apr 26 18:22:27 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Thu, 26 Apr 2007 09:22:27 -0700 Subject: [Python-ideas] What would you call such an object (was: ordered dict) In-Reply-To: References: Message-ID: <20070426091746.642A.JCARLSON@uci.edu> "LD 'Gus' Landis" wrote: > Hi, > > I am wondering if in y'alls opinion the following is "just a dictionary" or > is it a different kind of object that has some of the characteristics of a > dictionary, and has order. > > If y'all think that it is "just a dictionary", then how does one override the > notion of a "hash" for the key in a dictionary and make it some other > ordered structure (e.g. a B-Tree, AVL, etc). (Please no flame toss to > some other list -- this is a "use" of an ordered "ordered dict") This is easily implemented as a variant of a treap, in which rather than choosing a new sub-node based on different characters in a string, you choose a new sub-node based on different values in a tuple. There is one small problem with the structure as you have described it; in order to be able to choose a (sorted) ordering on the portion of a key as you show here... > key = object.order(key) -> 3 > key = object.order(key) -> "abc" ...it won't make any sense in future Pythons. 3 < "abc" will return an exception. - Josiah From tony at PageDNA.com Thu Apr 26 21:39:39 2007 From: tony at PageDNA.com (Tony Lownds) Date: Thu, 26 Apr 2007 12:39:39 -0700 Subject: [Python-ideas] The case against static type checking, in detail (long) In-Reply-To: <46303FD0.1020504@acm.org> References: <46303FD0.1020504@acm.org> Message-ID: <0DC60652-60EA-4C25-9EF5-89B767EB94FE@PageDNA.com> On Apr 25, 2007, at 10:59 PM, Talin wrote: > However, we find in practice that much of the programmer's effort is > spent in maintaining this cross-checking structure. How does this effort compare to writing all of the equivalent type tests? Static type checking subsumes the need to write tests to ensure that for every operation, the inputs are valid types, and the result type will be a valid type. > To use a building > analogy, a statically-typed program is like a truss structure, where > there's a complex web of cross-braces, and a force applied at any > given > point is spread out over the whole structure. Each time such a program > is modified, the programmer must partially dismantle and then > re-assemble the existing structure. However the work to ensure the re-assembled structure is completely valid is shifted from human inspection and possibly incomplete tests, to static analysis. Its like having a computer check all of the cross brace connections. When the modifications are small dismantle/reassmbly costs can be dominated by the checking costs. > Yes, static type checking > does detect some errors; But it also causes errors by making the code > larger and more wordy, because that the programmer cannot hold large > portions of the program in their mind all at once, which can lead to > errors in overall design. It means the programmer spends more time > thinking about the behavior of individual variables and less about the > algorithm as a whole. Thats like saying stairs should not have rails because thinking about where to put your hand gets in the way of thinking about where to put your feet! Proposals for static type checking in Python have long included the concept of optional type checking where programs without declarations continue to run. So clearly the desire not to clutter or force work on a type-declaration averse programmer is already taken as a requirement. -Tony From adam at atlas.st Thu Apr 26 21:59:46 2007 From: adam at atlas.st (Adam Atlas) Date: Thu, 26 Apr 2007 15:59:46 -0400 Subject: [Python-ideas] Python package files Message-ID: <92B05426-D7F6-47A2-B3E5-344421964B15@atlas.st> I think it would be useful for Python to accept imports of standalone files representing entire packages, maybe with the extension .pyp. A package file would basically be a ZIP file, so it would follow fairly easily from the current zipimport mechanism... its top-level directory would be the contents of a package named by the outer ZIP file. In other words, suppose we have a ZIP file called "package.pyp", and at its top level, it contains "__init__.py" and "blah.py". Anywhere this can be located, it would be equivalent to a physical directory called "package" containing those two files. So you can simply do "import package" as usual, regardless of whether it's a directory or a .pyp. A while ago I wrote a program called Squisher that does this (it takes a ZIP file and turns it into an importable .pyc file), but it's a huge hack. The hackishness mainly comes from my desire to not require users of Squished packages to install Squisher itself; so each module basically has to bootstrap itself, adding its own import hook and then adding its own path to sys.path and shuffling around a couple of things in sys.modules. All that could be avoided if this were a core feature; I expect a straightforward import hook would suffice. As PEP 302 says, "Distributing lots of source or pyc files around is not always appropriate, so there is a frequent desire to package all needed modules in a single file." It's very useful to be able to download a single file, plop it into a directory, and immediately be able to import it like any .py or .pyc file. Eggs are nice, but having to manually add them to sys.path or install them system-wide with setuptools is not always ideal. From brett at python.org Thu Apr 26 23:29:09 2007 From: brett at python.org (Brett Cannon) Date: Thu, 26 Apr 2007 14:29:09 -0700 Subject: [Python-ideas] Python package files In-Reply-To: <92B05426-D7F6-47A2-B3E5-344421964B15@atlas.st> References: <92B05426-D7F6-47A2-B3E5-344421964B15@atlas.st> Message-ID: On 4/26/07, Adam Atlas wrote: > I think it would be useful for Python to accept imports of standalone > files representing entire packages, maybe with the extension .pyp. A > package file would basically be a ZIP file, so it would follow fairly > easily from the current zipimport mechanism... its top-level > directory would be the contents of a package named by the outer ZIP > file. In other words, suppose we have a ZIP file called > "package.pyp", and at its top level, it contains "__init__.py" and > "blah.py". Anywhere this can be located, it would be equivalent to a > physical directory called "package" containing those two files. So > you can simply do "import package" as usual, regardless of whether > it's a directory or a .pyp. > So basically zipimport, but instead of putting the zip file on sys.path the zip file exists in a directory on sys.path and the file name acts at the top-level package name? I like the idea as making stuff just work more easily by dropping into some common place and not having to muck with the import settings would be nice. -Brett From greg.ewing at canterbury.ac.nz Fri Apr 27 03:32:08 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 27 Apr 2007 13:32:08 +1200 Subject: [Python-ideas] The case against static type checking, in detail (long) In-Reply-To: <0DC60652-60EA-4C25-9EF5-89B767EB94FE@PageDNA.com> References: <46303FD0.1020504@acm.org> <0DC60652-60EA-4C25-9EF5-89B767EB94FE@PageDNA.com> Message-ID: <46315298.7020803@canterbury.ac.nz> Tony Lownds wrote: > Thats like saying stairs should not have rails because thinking about > where > to put your hand gets in the way of thinking about where to put your > feet! Instead of rails, Python stairs have bouncy cushions along the sides and at the bottom to catch you gently if you happen to fall, rather than burden you with having to hold on every time you use the stairs, even though on most occasions you don't fall. Also it provides a lot more escalators. -- Greg From guido at python.org Fri Apr 27 03:47:26 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 26 Apr 2007 18:47:26 -0700 Subject: [Python-ideas] The case against static type checking, in detail (long) In-Reply-To: <46315298.7020803@canterbury.ac.nz> References: <46303FD0.1020504@acm.org> <0DC60652-60EA-4C25-9EF5-89B767EB94FE@PageDNA.com> <46315298.7020803@canterbury.ac.nz> Message-ID: On 4/26/07, Greg Ewing wrote: > Tony Lownds wrote: > > > Thats like saying stairs should not have rails because thinking about > > where > > to put your hand gets in the way of thinking about where to put your > > feet! > > Instead of rails, Python stairs have bouncy cushions > along the sides and at the bottom to catch you gently > if you happen to fall, rather than burden you with > having to hold on every time you use the stairs, > even though on most occasions you don't fall. > > Also it provides a lot more escalators. But beware of the rotating knives! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From collinw at gmail.com Sun Apr 29 06:44:29 2007 From: collinw at gmail.com (Collin Winter) Date: Sat, 28 Apr 2007 21:44:29 -0700 Subject: [Python-ideas] Minor suggestion for unittest In-Reply-To: <17971.21160.287099.857675@terry-jones-computer.local> References: <17967.63424.413682.569882@terry-jones-computer.local> <43aa6ff70704260845x3f5f6ef0hde320f5513ce7fde@mail.gmail.com> <17971.21160.287099.857675@terry-jones-computer.local> Message-ID: <43aa6ff70704282144hd6ad9d9q7036253fd9afe211@mail.gmail.com> On 4/28/07, Terry Jones wrote: > | It sounds like what you're looking for is FunctionTestCase > | (http://docs.python.org/lib/unittest-contents.html). Using that, your > | loop above becomes something like > | > | for testFunc, expectedResult in MyTestData: > | def tester(): > | self.assertEqual(testFunc(), expectedResult) > | suite.addTest(FunctionTestCase(tester)) > > I had read about FunctionTestCase but it didn't seem to be what I was > looking for - though it's the closest. FunctionTestCase is intended to > allow people to easily bring a set of pre-existing tests under the umbrella > of unittest. It overrides setUp and tearDown, and doesn't result in the > test being a first-class test like those you get when you write tests for > unittest from scratch (using TestCase directly, or something you write > based on it). I'm not sure why you think the tests produced FunctionTestCase are somehow second-class citizens: unittest treats all test objects equally, so long as they conform to the expected API. If the objection to using FunctionTestCase is that the test names don't conform to the same pattern as the statically-defined tests, that's easily solved with a subclass. > I want to dynamically (i.e. at run time) add functions that are treated > equally with those that are added statically in python code. That could be > really simple (and I can hack around it to achieve it), but the current > method unittest uses to set its self._testMethodName prevents me from doing > this in a nice way (because TestCase.__init__ immediately does a hasattr to > look for the named method, and fails if it's absent). If FunctionTestCase is undesirable, you might look at creating your own TestSuite subclass. The API is pretty simple, and that would give you all the control in the world over pulling in tests dynamically. Hope that helps, Collin Winter From terry at jon.es Sat Apr 28 15:56:56 2007 From: terry at jon.es (Terry Jones) Date: Sat, 28 Apr 2007 15:56:56 +0200 Subject: [Python-ideas] Minor suggestion for unittest In-Reply-To: Your message at 08:45:01 on Thursday, 26 April 2007 References: <17967.63424.413682.569882@terry-jones-computer.local> <43aa6ff70704260845x3f5f6ef0hde320f5513ce7fde@mail.gmail.com> Message-ID: <17971.21160.287099.857675@terry-jones-computer.local> Hi Collin Thanks for the reply. | It sounds like what you're looking for is FunctionTestCase | (http://docs.python.org/lib/unittest-contents.html). Using that, your | loop above becomes something like | | for testFunc, expectedResult in MyTestData: | def tester(): | self.assertEqual(testFunc(), expectedResult) | suite.addTest(FunctionTestCase(tester)) I had read about FunctionTestCase but it didn't seem to be what I was looking for - though it's the closest. FunctionTestCase is intended to allow people to easily bring a set of pre-existing tests under the umbrella of unittest. It overrides setUp and tearDown, and doesn't result in the test being a first-class test like those you get when you write tests for unittest from scratch (using TestCase directly, or something you write based on it). I want to dynamically (i.e. at run time) add functions that are treated equally with those that are added statically in python code. That could be really simple (and I can hack around it to achieve it), but the current method unittest uses to set its self._testMethodName prevents me from doing this in a nice way (because TestCase.__init__ immediately does a hasattr to look for the named method, and fails if it's absent). I wonder if I'm being clear... it's pretty simple, but my explanation may not be so good. Regards, Terry From rrr at ronadam.com Mon Apr 30 13:09:08 2007 From: rrr at ronadam.com (Ron Adam) Date: Mon, 30 Apr 2007 06:09:08 -0500 Subject: [Python-ideas] Python package files In-Reply-To: References: <92B05426-D7F6-47A2-B3E5-344421964B15@atlas.st> Message-ID: <4635CE54.7070403@ronadam.com> Brett Cannon wrote: > On 4/26/07, Adam Atlas wrote: >> I think it would be useful for Python to accept imports of standalone >> files representing entire packages, maybe with the extension .pyp. A >> package file would basically be a ZIP file, so it would follow fairly >> easily from the current zipimport mechanism... its top-level >> directory would be the contents of a package named by the outer ZIP >> file. In other words, suppose we have a ZIP file called >> "package.pyp", and at its top level, it contains "__init__.py" and >> "blah.py". Anywhere this can be located, it would be equivalent to a >> physical directory called "package" containing those two files. So >> you can simply do "import package" as usual, regardless of whether >> it's a directory or a .pyp. >> > > So basically zipimport, but instead of putting the zip file on > sys.path the zip file exists in a directory on sys.path and the file > name acts at the top-level package name? I like the idea as making > stuff just work more easily by dropping into some common place and not > having to muck with the import settings would be nice. I like that too. + 1 I really dislike scattering a projects files around. And conversely, I really dislike combining files from different sources. Ron From lists at cheimes.de Mon Apr 30 16:22:31 2007 From: lists at cheimes.de (Christian Heimes) Date: Mon, 30 Apr 2007 16:22:31 +0200 Subject: [Python-ideas] Python package files In-Reply-To: <92B05426-D7F6-47A2-B3E5-344421964B15@atlas.st> References: <92B05426-D7F6-47A2-B3E5-344421964B15@atlas.st> Message-ID: Adam Atlas wrote: > I think it would be useful for Python to accept imports of standalone > files representing entire packages, maybe with the extension .pyp. A > package file would basically be a ZIP file, so it would follow fairly > easily from the current zipimport mechanism... its top-level > directory would be the contents of a package named by the outer ZIP > file. In other words, suppose we have a ZIP file called > "package.pyp", and at its top level, it contains "__init__.py" and > "blah.py". Anywhere this can be located, it would be equivalent to a > physical directory called "package" containing those two files. So > you can simply do "import package" as usual, regardless of whether > it's a directory or a .pyp. What are the benefits of your proposal over the already established Python eggs? As far as I understand your proposal it's not much different to eggs. In fact eggs + setuptools support more features like dependencies, multiversion installation and many more. Christian From adam at atlas.st Mon Apr 30 17:26:05 2007 From: adam at atlas.st (Adam Atlas) Date: Mon, 30 Apr 2007 11:26:05 -0400 Subject: [Python-ideas] Python package files In-Reply-To: References: <92B05426-D7F6-47A2-B3E5-344421964B15@atlas.st> Message-ID: <9C20E3C1-F11E-4313-987E-E2842E75731B@atlas.st> On 30 Apr 2007, at 10.22, Christian Heimes wrote: > Adam Atlas wrote: >> I think it would be useful for Python to accept imports of standalone >> files representing entire packages, maybe with the extension .pyp. A >> package file would basically be a ZIP file, so it would follow fairly >> easily from the current zipimport mechanism... its top-level >> directory would be the contents of a package named by the outer ZIP >> file. In other words, suppose we have a ZIP file called >> "package.pyp", and at its top level, it contains "__init__.py" and >> "blah.py". Anywhere this can be located, it would be equivalent to a >> physical directory called "package" containing those two files. So >> you can simply do "import package" as usual, regardless of whether >> it's a directory or a .pyp. > > What are the benefits of your proposal over the already established > Python eggs? As far as I understand your proposal it's not much > different to eggs. In fact eggs + setuptools support more features > like > dependencies, multiversion installation and many more. Python eggs use zipimport, which allow them to be elements of sys.path. Then, modules inside them can be imported as usual. My proposal is to make .pyp ZIP files importable themselves. You import a .pyp just like a package directory, instead of having to add an egg to sys.path and then import modules contained in it. It's convenient. It is true that eggs do have many benefits for production use, but often while developing something, or using a package that you don't expect to use outside one project, or just trying out a package that you're not sure you'll use, it's simpler to be able to just drop a file into your project directory instead of having to `sudo easy_install` it system-wide. Zero-installation is nice. Though since setuptools is set to be included in Python 2.6 (right?), maybe it could take advantage of those benefits -- perhaps .pyps could optionally include an EGG-INFO directory, and there could be a simple tool to transform those .pyps into eggs and vice versa. That way you can use whichever way is the most practical at the time, but be able to easily switch to the other if need be. From fdrake at acm.org Mon Apr 30 20:10:14 2007 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 30 Apr 2007 14:10:14 -0400 Subject: [Python-ideas] Python package files In-Reply-To: <9C20E3C1-F11E-4313-987E-E2842E75731B@atlas.st> References: <92B05426-D7F6-47A2-B3E5-344421964B15@atlas.st> <9C20E3C1-F11E-4313-987E-E2842E75731B@atlas.st> Message-ID: <200704301410.15350.fdrake@acm.org> On Monday 30 April 2007, Adam Atlas wrote: > It is true that eggs do have many benefits for production use, but > often while developing something, or using a package that you don't > expect to use outside one project, or just trying out a package that > you're not sure you'll use, it's simpler to be able to just drop a > file into your project directory instead of having to `sudo > easy_install` it system-wide. Zero-installation is nice. -1 on adding yet-another-ZIP-thing. Python eggs aren't always convenient, but they're easy enough to work with, and good tools to work with egg-based installations are appearing. Having another way to do this, especially something that will be turned into eggs for deployment, seems like a distraction. Differences between development environments and production environments lead to bugs, not ease-of-use. -Fred -- Fred L. Drake, Jr.